Supervised classification of astronomical data

Advisor: Dr. L.Sarro

Advances in computer science and spatial technology have allowed us to have a huge amount of data. Although this provides us an excellent situation to generate new knowledge, only by using automatic techniques like data mining we can confront this problem. For several years I have been working in applying and creating supervised classification algorithms for classifying periodic variable stars. The result of this work has been integrated in the classification pipeline of Gaia data, a mission funded by the European Space Agency (ESA) that is gathering information from 10^9 stars, of which about 10^8 are estimated to be variable. I belong to the international team, composed of people from Switzerland, Belgium and Spain, that are in charge of all the supervised classification process. Some of the main advances created for this kind of classification have been the study and implementation of a multistage approach, an automatic system for creating the multistage tree that best represents the problem, a solution for imbalanced datasets, investigate and implement methods to select relevant features and a method for analyzing the output probability vectors in order to optimize the classification. The application of these techniques have produced several articles in journals with high impact factors, the presentation in workshops and they have been applied to other missions like the classification of central stars of planetary nebulae (article in progress).

Mauro López del Fresno técnico del CAB , INTA-CSIC