Cosmostat Day on Machine Learning in Astrophysics

Date: January the 17th, 2020

Organizer:  Joana Frontera-Pons  <joana.frontera-pons@cea.fr>

Venue:

Local information

CEA Saclay is around 23 km South of Paris. The astrophysics division (DAp) is located at the CEA site at Orme des Merisiers, which is around 1 km South of the main CEA campus. See http://www.cosmostat.org/contact  for detailed information on how to arrive.


On January the 17th, 2020, we organize the 5th day on machine learning in astrophysics at DAp, CEA Saclay. 

Program:

All talks are taking place at DAp, Salle Galilée (Building 713)

10:00 - 10:15h. Welcome and coffee
10:15 - 10:45h. Parameter inference using neural networks Tom Charnock (Institut d'Astrophysique de Paris)
10:45 - 11:15h. Detection and characterisation of solar-type stars with machine learning -  Lisa Bugnet (DAp, CEA Paris-Saclay)
11:15 - 11:45h. DeepMass: Deep learning dark matter map reconstructions with Dark Energy Survey data - Niall Jeffrey (ENS)

12:00 - 13:30h. Lunch

13:30 - 14:00h. Hybrid physical-deep learning models for astronomical image processing - François Lanusse (Berkeley Center for Cosmological Physics and CosmoStat CEA Paris Saclay)
14:00 - 14:30h. A flexible EM-like clustering algorithm for noisy data  Violeta Roizman (L2S, CentraleSupélec)                                                           
14:30 - 15:00h. Regularizing Optimal Transport Using Regularity Theory -  François-Pierre Paty (CREST, ENSAE)
15:00 - 15:30h. Deep Learning @ Safran for Image Processing -  Arnaud Woiselle (Safran Electronics and Defense)

15:30 - 16:00h. End of the day


Parameter inference using neural networks

Tom Charnock (Institut d'Astrophysique de Paris)

Neural networks with large training sets are currently providing tighter constraints on cosmological and astrophysical parameters than ever before. However, in their current form, these neural networks are unable to give true Bayesian inference of such model parameters. I will describe why this is true and present two methods by which the information extracting power of neural networks can be built into the necessary robust statistical framework to perform trustworthy inference, whilst at the same time massively reducing the quantity of training data required.


Detection and characterisation of solar-type stars with machine learning

Lisa Bugnet (DAp, CEA Paris-Saclay)

Stellar astrophysics has been strengthened in the 70’s by the discovery of stellar oscillations due to acoustic waves inside the Sun. These waves evolving inside solar-type stars contain information about the composition and dynamics of the surrounding plasma, and are thus very interesting for the understanding of stellar internal and surface physical processes. With classical asteroseismology we are able to extract very precise and accurate masses, radius, and ages of oscillating stars, that are key parameters for the understanding of stellar evolution.
However, classical methods of asteroseismology are time consuming processes, that can only be applied for stars showing a large enough oscillation signal. In the context of the hundred of thousand stars observed by the Transiting Exoplanet Survey Satellite (TESS), the stellar community has to adapt the methodologies previously built for the study of the few ten thousand of stars observed with much better resolution by the Kepler satellite. Our “method exploits the use of Random Forest machine learning algorithms that aim at automatically 1) classifying and 2) characterizing any stellar pulsators from global non-seismic parameters. We also present a recent result based on neural networks on the automatic detection of peculiar solar-type pulsators that have a surprinsigly low dipolar-oscillation amplitude, the signature of an unknown physical process affecting oscillation modes inside the core.


DeepMass: Deep learning dark matter map reconstructions with Dark Energy Survey data

Niall Jeffrey (ENS)

I will present the first reconstruction of dark matter maps from weak lensing observational data using deep learning. We train a convolution neural network (CNN) with a Unet based architecture on over 3.6×10^5 simulated data realisations with non-Gaussian shape noise and with cosmological parameters varying over a broad prior distribution. We interpret our newly created DES SV map as an approximation of the posterior mean P(κ|γ) of the convergence given observed shear. DeepMass method is substantially more accurate than existing mass-mapping methods with a a validation set of 8000 simulated DES SV data realisations. With higher galaxy density in future weak lensing data unveiling more non-linear scales, it is likely that deep learning will be a leading approach for mass mapping with Euclid and LSST.


Hybrid physical-deep learning models for astronomical image processing

François Lanusse (Berkeley Center for Cosmological Physics and CosmoStat CEA Paris Saclay)

The upcoming generation of wide-field optical surveys which includes LSST will aim to shed some much needed light on the physical nature of dark energy and dark matter by mapping the Universe in great detail and on an unprecedented scale. However, with the increase in data quality also comes a significant increase in  data complexity, bringing new and outstanding challenges at all levels of the scientific analysis.
In this talk, I will illustrate how deep generative models, combined with physical modeling, can be used to address some of these challenges at the image processing level, specifically by providing data-driven priors of galaxy morphology.
I will first describe how to build such generative models from corrupted and heterogeneous data, i.e. when the training set contains varying observing
conditions (in terms of noise, seeing, or even instruments). This is a necessary step for practical applications, made possible by a hybrid modeling of the
generation process, using deep neural networks to model the underlying distribution of galaxy morphologies, complemented by a physical model of
the noise and instrumental response. Sampling from these models produces realistic galaxy light profiles, which can then be used in survey emulation,
for the purpose of validating and/or calibrating data reduction pipelines. 

Even more interestingly, these models can be used as priors on galaxy morphologies and used as such as part of standard Bayesian inference techniques to solve astronomical inverse problems ranging from deconvolution to deblending galaxy images. I will present how combining these deep morphology priors with a physical forward model of observed blended scenes allows us to address the galaxy deblending problem in a physically motivated and interpretable way.


A flexible EM-like clustering algorithm for noisy data

Violeta Roizman (L2S, CentraleSupélec)

Though very popular, it is well known that the EM algorithm suffers from non-Gaussian distribution shapes and outliers. This talk will present a flexible EM-like clustering algorithm that can deal with noise and outliers in diverse data sets. This flexibility is due to extra scale parameters that allow us to accommodate for heavier tail distributions and outliers without significantly loosing efficiency in various classical scenarios. I will show experiments where we compare it to other clustering methods such as k-means, EM and spectral clustering when applied to both synthetic data and real data sets. I will conclude with an application example of our algorithm used for image segmentation.


Regularizing Optimal Transport Using Regularity Theory

François-Pierre Paty (CREST, ENSAE)

Optimal transport (OT) dates back to the end of the 18th century, when French mathematician Gaspard Monge proposed to solve the problem of déblais and remblais. In the last few years, OT has also found new applications in statistics and machine learning as a way to analyze and compare data. Both in practice and for statistical reasons, OT need be regularized. In this talk, I will present a new regularization of OT leveraging regularity of the Monge map. Instead of considering regularity as a property that can be proved under suitable assumptions, we consider regularity as a condition that must be enforced when estimating OT. This further allows us to transport out-of-sample points, as well as define a new estimator of the 2-Wasserstein distance between arbitrary measures. (Based on a joint work with Alexandre d'Aspremont and Marco Cuturi).


Deep Learning @ Safran for Image Processing

Arnaud Woiselle (Safran Electronics and Defense)

Deep learning has become the natural tool in computer vision for nearly all high-level tasks, such as object detection and classification for many years, and is now state of the art in most image processing (restoration) tasks, such as debluring or super-resolution. Safran looked into these methods for a large variety of problems, focusing on the use of a low number of network structures, due to electronics constraints for future implementation, and transferred them to real-life noisy and blurry data, both in the visible and the infrared. I will show the results in many applications, and conclude with some tips and take-away messages on what seems important to apply deep learning on a given task.


 Previous Cosmostat Days on Machine Learning in Astrophysics :

Cosmostat Day on Machine Learning in Astrophysics

Date: January the 24th, 2019

Organizer:  Joana Frontera-Pons  <joana.frontera-pons@cea.fr>

Venue:

Local information

CEA Saclay is around 23 km South of Paris. The astrophysics division (DAp) is located at the CEA site at Orme des Merisiers, which is around 1 km South of the main CEA campus. See http://www.cosmostat.org/contact  for detailed information on how to arrive.


On January the 24th, 2019, we organize the fourth day on machine learning in astrophysics at DAp, CEA Saclay. 

Program:

All talks are taking place at DAp, Salle Galilée (Building 713)

14:00 - 14:30h. Machine Learning in High Energy Physics : trends and successes -  David Rousseau (LAL)                             
14:30 - 15:00h. Learning recurring patterns in large signals with convolutional dictionary learning - Thomas Moreau (Parietal team - INRIA Saclay)
15:00 - 15:30h. Distinguishing standard and modified gravity cosmologies with machine learning -  Austin Peel (CEA Saclay - CosmoStat)

15:30 - 16:00h. Coffee break

16:00 - 16:30h.  The ASAP algorithm for nonsmooth nonconvex optimization. Applications in imagery - Pauline Tan (LJLL - Sorbonne Université)                                      16:30 - 17:00h. Deep Learning for Blended Source Identification in Galaxy Survey Data - Samuel Farrens (CEA Saclay - CosmoStat)


Machine Learning in High Energy Physics : trends and successes

David Rousseau (LAL)

Machine Learning has been used somewhat in HEP in the nighties, then at the Tevatron and recently at the LHC (mostly Boosted Decision Tree). However with the birth of internet giants at the turn of the century, there has been an explosion of Machine Learning tools in the industry.. A collective effort has been started for the last few years to bring state-of-the-art Machine Learning tools to High Energy Physics.
This talk will give a tour d’horizon of Machine Learning in HEP : review of tools ; example of applications, some used currently, some in a (possibly distant) future (e.g. deep learning, image vision, GAN) ; recent and future HEP ML Kaggle competitions. I’ll conclude on the key points to set up frameworks for High Energy Physics and Machine Learning collaborations.


Learning recurring patterns in large signals with convolutional dictionary learning

Thomas Moreau (Parietal team - INRIA Saclay)

Convolutional dictionary learning has become a popular tool in image processing for denoising or inpainting. This technique extends dictionary learning to learn adapted basis that are shift invariant. This talk will discuss how this technique can also be used in the context of large multivariate signals to learn and localize recurring patterns. I will discuss both computational aspects, with efficient iterative and distributed convolutional sparse coding algorithms, as well as a novel rank 1 constraint for the learned atoms. This constraint, inspired from the underlying physical model for neurological signals, is then used to highlight relevant structure in MEG signals.


Distinguishing standard and modified gravity cosmologies with machine learning

Austin Peel (CEA Saclay - CosmoStat)

Modified gravity models that include massive neutrinos can mimic the standard concordance model in terms of weak-lensing observables. The inability to distinguish between these cases could limit our understanding of the origin of cosmic acceleration and of the fundamental nature of gravity. I will present a neural network we have designed to classify such cosmological scenarios based on the weak-lensing maps they generate. I will discuss the network's performance on both clean and noisy data, as well as how results compare to conventional statistical approaches.


The ASAP algorithm for nonsmooth nonconvex optimization. Applications in imagery

Pauline Tan (LJLL - Sorbonne Université)

In this talk, I will address the challenging problem of optimizing nonsmooth and nonconvex objective functions. Such problems are increasingly encountered in applications, especially when tackling joint estimation problems. I will propose a novel algorithm and demonstrate its convergence properties. Eventually, three actual applications in industrial imagery problems will be presented.


Deep Learning for Blended Source Identification in Galaxy Survey Data

Samuel Farrens (CEA Saclay - CosmoStat)

Weak gravitational lensing is a powerful probe of cosmology that will be employed by upcoming surveys such as Euclid and LSST to map the distribution of dark matter in the Universe. The technique, however, requires precise measurements of galaxy shapes over larges areas. The chance alignment of galaxies along the line of sight, i.e. blending of images, can lead to biased shape measurements that propagate
to parameter estimates. Machine learning techniques can provide an automated and robust way of dealing with these blended sources. In this talk I will discuss how machine learning can be used to classify sources identified in survey data as blended or not and show some preliminary results for CFIS simulations. I will then present some plans for future developments making use of multi-class classification and segmentation.


 Previous Cosmostat Days on Machine Learning in Astrophysics :

Cosmostat Day on Machine Learning in Astrophysics

Cosmostat Day on Machine Learning in Astrophysics

Date: January the 26th, 2018

Organizer:  Joana Frontera-Pons  <joana.frontera-pons@cea.fr>

Venue:

Local information

CEA Saclay is around 23 km South of Paris. The astrophysics division (DAp) is located at the CEA site at Orme des Merisiers, which is around 1 km South of the main CEA campus. See http://www.cosmostat.org/link/how-to-get-to-sap/ for detailed information on how to arrive.


On January the 26th, 2017, we organize the third day on machine learning in astrophysics at DAp, CEA Saclay. 

Program:

All talks are taking place at DAp, Salle Galilée (Building 713)

10:00 - 10:45h. Artificial Intelligence: Past, present and future -   Marc Duranton  (CEA Saclay)
10:45 - 11:15h. Astronomical image reconstruction with convolutional neural networks -  Rémi Flamary (Université Nice-Sophia Antipolis)
11:15 - 11:45h. CNN based strong gravitational Lens finder for the Euclid pipeline - Christoph Ernst René Schäfer  (Laboratory of Astrophysics, EPFL)

12:00 - 13:30h. Lunch

13:30 - 14:00h. Optimize training samples for future supernova surveys using Active Learning - Emille Ishida  (Laboratoire de Physique de Clermont)
14:00 - 14:30h. Regularization via proximal methods - Silvia Villa (Politecnico di Milano)                                                            
14:30 - 15:00h. Deep Learning for Physical Processes:  Incorporating Prior Scientific Knowledge - Arthur Pajot (LIP6)
15:00 - 15:30h. Wasserstein dictionary Learning -  Morgan Schmitz  (CEA Saclay - CosmoStat)

15:30 - 16:00h. Coffe break

16:00 - 17:00h. Round table

 


Artificial Intelligence: Past, present and future

Marc Duranton (CEA Saclay)

There is a high hype today about Deep Learning and its applications. This technology originated from the 50's from a simplification of the observations done by neurophysiologists and vision specialists that tried to understand how the neurons interact with each other and how the brain is structured for vision. This talk will come back to the history of the connectionist approach and will give a quick overview of how it works and of the current applications in various domains. It will also open discussions on how bio-inspiration could lead to a new approach in computing science.


Astronomical image reconstruction with convolutional neural networks

Rémi Flamary (Université Nice-Sophia Antipolis)

State of the art methods in astronomical image reconstruction rely on the resolution of a regularized or constrained optimization problem. 
Solving this problem can be computationally intensive especially with large images. We investigate in this work the use of convolutional 
neural networks for image reconstruction in astronomy. With neural networks, the computationally intensive tasks is the training step, but 
the prediction step has a fixed complexity per pixel, i.e. a linear complexity. Numerical experiments for fixed PSF and varying PSF in large 
field of views show that CNN are computationally efficient and competitive with optimization based methods in addition to being interpretable.


CNN based strong gravitational Lens finder for the Euclid pipeline

Christoph Ernst René Schäfer (Laboratory of Astrophysics, EPFL) 

Within the Euclid survey 10^5 new strong gravitational lenses are expected to be found within 35% of the observable sky. Identifying these objects in a reasonable of time necessitates the development of powerful machine learning based classifiers. One option for the Euclid pipeline are CNN-based classifiers which performed admirably during the last Galaxy-Galaxy Strong Lensing Challenge. This talk will showcase first the potential of CNN for this particular task and second expose some of the issues that CNN still have to overcome.


Optimize training samples for future supernova surveys using Active Learning

 Emille Ishida (Laboratoire de Physique de Clermont)

The full exploitation of the next generation of large scale photometric supernova surveys depends heavily on our ability to provide a reliable early-epoch classification based solely on photometric data. In preparation for this scenario, there has been many attempts to apply different machine learning algorithms to the supernova photometric classification problem. Although different methods present different degree of success, text-book machine learning methods fail to address the crucial issue of lack of representativeness between spectroscopic (training) and photometric (target) samples. In this talk I will show how Active Learning (or optimal experiment design) can be used as a tool for optimizing the construction of spectroscopic samples for classification purposes. I will present results on how the design of spectroscopic samples from the beginning of the survey can achieve optimal classification results with a much lower number of spectra than the current adopted strategy.


Regularization via proximal methods

Silvia Villa (Politecnico di Milano) 

In the context of linear inverse problems, I will discuss iterative regularization methods allowing to consider large classes of data-fit terms and regularizers. In particular, I will investigate regularization properties of first order proximal splitting optimization techniques.  Such methods are appealing since their computational complexity is tailored to the estimation accuracy allowed by the data, as I will show theoretically and numerically.


Deep Learning for Physical Processes:  Incorporating Prior Scientific Knowledge 

Arthur Pajot (LIP6)

We consider the use of Deep Learning methods for modeling complex phenomena like those occurring in natural physical processes. With the large amount of data gathered on these phenomena the data intensive paradigm could begin to challenge more traditional approaches elaborated over the years in fields like maths or physics. However, despite considerable successes in a variety of application domains, the machine learning field is not yet ready to handle the level of complexity required by such problems. Using an example application, namely Sea Surface Temperature Prediction, we show how general background knowledge gained from physics could be used as a guideline for designing efficient Deep Learning models.


Wasserstein dictionary Learning

Morgan Schmitz (CEA Saclay - CosmoStat)

Optimal Transport theory enables the definition of a distance across the set of measures on any given space. This Wasserstein distance naturally accounts for geometric warping between measures (including, but not exclusive to, images). We introduce a new, Optimal Transport-based representation learning method in close analogy with the usual Dictionary Learning problem. This approach typically relies on a matrix dot-product between the learned dictionary and the codes making up the new representation. The relationship between atoms and data is thus ultimately linear. 

We instead use automatic differentiation to derive gradients of the Wasserstein barycenter operator, and we learn a set of atoms and barycentric weights from the data in an unsupervised fashion. Since our data is reconstructed as Wasserstein barycenters of our learned atoms, we can make full use of the attractive properties of the Optimal Transport geometry. In particular, our representation allows for non-linear relationships between atoms and data.

 


 Previous Cosmostat Days on Machine Learning in Astrophysics :

 

Unsupervised feature learning for galaxy SEDs with denoising autoencoders

 

Authors: Frontera-Pons, J., Sureau, F., Bobin, J. and Le Floc'h E.
Journal: Astronomy & Astrophysics
Year: 2017
Download: ADS | arXiv


Abstract

With the increasing number of deep multi-wavelength galaxy surveys, the spectral energy distribution (SED) of galaxies has become an invaluable tool for studying the formation of their structures and their evolution. In this context, standard analysis relies on simple spectro-photometric selection criteria based on a few SED colors. If this fully supervised classification already yielded clear achievements, it is not optimal to extract relevant information from the data. In this article, we propose to employ very recent advances in machine learning, and more precisely in feature learning, to derive a data-driven diagram. We show that the proposed approach based on denoising autoencoders recovers the bi-modality in the galaxy population in an unsupervised manner, without using any prior knowledge on galaxy SED classification. This technique has been compared to principal component analysis (PCA) and to standard color/color representations. In addition, preliminary results illustrate that this enables the capturing of extra physically meaningful information, such as redshift dependence, galaxy mass evolution and variation over the specific star formation rate. PCA also results in an unsupervised representation with physical properties, such as mass and sSFR, although this representation separates out less other characteristics (bimodality, redshift evolution) than denoising autoencoders.