CosmoClub: Julie Josse (06/04/2017)

Date: April 6th 2017

Speaker: Julie Josse (Ecole Polytechnique)

Title: Missing data imputation using principal component methods


Missing values are ubiquitous and can occur for plenty of reasons: machines that fail, survey participants who do not answer to all questions, etc. The problem of missing values is somehow exacerbated with the amount of available data: data are often multisources (several projects aim to build large repositories by compiling data from preexisting databases) and due to the wide heterogeneity of measurement methods and research objectives, these large databases often exhibit extraordinarily high number of missing values. Missing values are problematic since most statistical methods can not be applied directly on a incomplete data.
In this talk, I will present recent tools developed to handle, in a practicable way, missing values. Among them, we can note the treatment of heterogeneous data (both quantitative and categorical) as well as the possibility to go far beyond single estimations and suggest subtle ways of assessing uncertainties. I will discuss imputation methods based on (regularized) singular value decomposition that caught the attention of the community due to their ability to handle large matrices with large amount of missing entries. Then, I will show how to extend these methods to multiple imputation to get notions of confidence intervals to know which credit should be given to analyses obtained from an incomplete data set. Such multiple imputation methods also offer new ways to visualize the variability of the results due to missing values.