Very recent advances in machine learning and in deep learning methods introduced highly sophisticated data analysis tools that are promising candidates to build unsupervised data-driven representations. These statistical methods have already proved their efficiency to solve supervised data classification tasks in applications as diverse as computer vision, speech recognition, natural language processing, to only name a few. Machine learning techniques have recently been advocated as a powerful tool for deriving useful features straight from the data. This aspect of learning, called representation learning, is expected to provide an efficient re-parameterization of the data, composed of more salient features. The main interest of the methods studied by the CosmoStat team is their ability to extract information in a unsupervised manner. In other words, the aim is to design features allowing to efficiently unfold complex underlying structures in the data without including prior information or labelled examples.

Machine learning techniques have been applied to a variety of topics within the CosmoStat group including:

Simulation-Based Inference


Deep learning paved the way for new methods to perform inference which is of particular interest in cosmology. Indeed, the current methodology to perform inference of cosmological parameters uses MCMC assuming a gaussian likelihood. The latter assumption requires substantial effort to estimate the covariance matrix of the likelihood. Moreover, at certain scales or using certain summary statistics, the likelihood might deviate from the gaussian.

Normalizing Flows are a class of transformations that allows, using neural network, to learn a distribution by transporting a simple distribution (e.g. a multivariate normal distribution) to a complex one using one-to-one mappings in parameter space. Those mappings can be parametrized using a neural network. This class of methods is called Neural Density Estimation (NDE).

An example of Neural Density Estimation using RealNVP (c) S. Guerrini


Simulation-Based Inference relies on the idea that the model is mechanistic and can therefore be simulated. In cosmology, this can be done using N-body simulations that can be used to learn a distribution of interest, i.e. either the posterior of the cosmological parameters or the likelihood, using NDE. It however requires a substantial number of simulations and a tradeoff must be found between precision of the simulations and their computation time.


Figure: Outputs of N-body simulations using different method. Left panel: standard N-body simulation. Middle left panel: Particle mesh simulation (Modi et al. 2020). Middle right panel: Particle mesh simulation with Neural Network correction. Right panel: Particle mesh simulation with Potential Gradient Descent correction. The particle mesh simulations are faster than N-body codes but require corrections to resolve the small-scales accurately. (c) Lanzieri et al. 2022


Those methods are currently investigated to be applied on real data and are a promising alternative to accurately perform inference using higher-order statistics. Ongoing work investigate applications of those technique on stage-III survey and explore methodologies to exploit the wealth of data to be available in the era of stage-IV surveys such as Euclid or LSST.


Application to galaxy Spectral Energy Distributions


We have investigated the use of one recently introduced machine learning method, namely denoising autoencoders, for unsupervised feature learning from galaxy SEDs. In the spirit of SED color diagrams, the proposed approach allows deriving a new galaxy SEDs’ representation. We have evaluated how the resulting DAE diagram can recover the standard star-forming/quiescent galaxy bimodality. As well, we show that, according to the current understanding of autoencoders, DAE yields a diagram that extracts astrophysically relevant information from the data that standard SED colour diagrams do not exhibit. This work therefore illustrates the interest of these methods for galaxy SEDs’ representation and paves the way for the design of more sophisticated models, 

Frontera-Pons et al. 2017

Deblending of galaxy images


Ongoing work is be carried out to investigate the possibility of identifying blended sources in survey images using machine learning techniques. Current method often employ fixed thresholds to determine whether or not a given patch of the sky contains contributions from multiple sources. Machine learning may offer a more flexible approach that will account for the diversity of objects in the field.

Shear Bias Calibration


One of the main challenges in Weak Gravitational Lensing is the correct measurement of the shear signal obtained from the shapes of the galaxies. This signal is usually biased due to many factors such as the shape estimation method, pixellization, model bias, image noise, etc. The dependences of shear bias are very complex and cannot be modelled with simple analytical approaches. We use denoising autoencoders to recover the dependencies of shear bias on many properties simultaneously in order to infer the shear bias coming from individual galaxy images.


Mass Mapping


Mapping the dark content of the Universe is a challenging problem. CosmoStat members have produced the first reconstruction of dark matter maps from weak lensing observational data using deep learning. DeepMass has been used to reconstruct a mass map from DES data and was shown to be substantially more accurate than existing mass-mapping methods.


Distinguishing Cosmological Models


Deep learning tools such as neural networks, which are modelled after the neurological processes of a human brain, have proven to be very useful in solving complex problems with multiple features. One example of its application is distinguishing modified gravity models from ΛCDM using convolutional neural networks.

                                                                                                                            Peel et al. (2019)


Magnetic Resonance Imaging (MRI) Reconstruction


MRI reconstruction is a challenging inverse problem where you need to infer an anatomical image (like a knee or a brain) from its under-sampled Fourier coefficients. We are investigating Deep Learning methods like unrolled networks, score-based generative modelling or implicit deep learning to tackle this problem. These approaches, often inspired by well-established methods, allow us to push the limit in terms of acceleration of the MRI scan time, potentially allowing even faster MRI exams or MR images with super high resolution.

Muckley et al. (2020)