Using clustering and Friedman tests to improve feature selection in airborne pollen forecasts.
Supervisor: Dr. J.L. Aznarte
Predicting future concentrations of pollen is of a great importance both for patients and for public health institutions. We present a forecasting approach which relies on data and makes no assumptions on the underlying phenomena affecting the plants and the pollination process. Machine learning is used to build a model and to select the most important variables for prediction.
Through non-parametric hypothesis testing, we show how some variables are indeed more important than others and how the careful combination of these variables can lead to more accurate and parsimonious models which avoid the huge computational times of more complex models while outperforming them in terms of the precision of the forecasts.