Cosmostat Day on Machine Learning in Astrophysics

Date: March the 5th, 2021

Organizer:  Joana Frontera-Pons  <>

Venue: Remote conference. Zoom link to come

On March the 5th, 2021, we organize the 6th day on machine learning in astrophysics at DAp, CEA Saclay. 


All talks are taking place remotely

13:30 - 13:40h. Welcome message                                                   
13:40 - 14:20h. Data-driven detection of multi-messenger transients - Iftach Sadeh (Deutsches Elektronen-Synchrotron)
14:20 - 15:00h. Deep Learning in Radio Astronomy - Vesna Lukic (Vrije Universiteit Brussel)   
15:00 - 15:40h. Machine Learning for Galaxy Image Reconstruction with Problem Specific Loss - Fadi Nammour (CosmoStat - CEA Saclay)   

15:40 - 16:00h. Coffee break with virtual croissants

16:00 - 16:40h. Anomaly detection with generative methods - Coloma Ballester (Universitat Pompeu Fabra)
16:40 - 17:20h. Deep learning for environmental sciences - Jan Dirk Wegner (ETH Zurich)
17:20 - 18:00h. Graph Neural Networks - Fernando Gama ( University of California, Berkeley)

18:00 - 18:05h. End of the day

Data-driven detection of multi-messenger transients

Iftach Sadeh (Deutsches Elektronen-Synchrotron)

The primary challenge in the study of explosive astrophysical transients is their detection and characterisation using multiple messengers. For this purpose, we have developed a new data-driven discovery framework, based on deep learning. We demonstrate its use for searches involving neutrinos, optical supernovae, and gamma rays. We show that we can match or substantially improve upon the performance of state-of-the-art techniques, while significantly minimising the dependence on modelling and on instrument characterisation. Particularly, our approach is intended for near- and real-time analyses, which are essential for effective follow-up of detections. Our algorithm is designed to combine a range of instruments and types of input data, representing different messengers, physical regimes, and temporal scales. The methodology is optimised for agnostic searches of unexpected phenomena, and has the potential to substantially enhance their discovery prospects.

Deep Learning in Radio Astronomy

Vesna Lukic (Vrije Universiteit Brussel)

Machine learning techniques have proven to be increasingly useful in astronomical applications over the last few years, for example in image classification and time series analysis. A topic of current interest is the classification of radio galaxy morphologies, as it gives us insight into the nature of the Active Galactic Nuclei and structure formation. Future surveys such as the Square Kilometre Array (SKA), will detect many million sources and will require the use of automated techniques. Convolutional neural networks are a machine learning technique that have been very successful in image classification, due to their ability to capture high-dimensional features in the data. We show the performance of simple convolutional network architectures in classifying radio sources from the Radio Galaxy Zoo. The use of pooling in such networks results in information losses which adversely affect the classification performance, however Capsule networks preserve this information with the use of dynamic routing. We explore a couple of convolutional neural network architectures against variations of Capsule network setups and evaluate their performance in replicating the classifications of radio galaxies detected by the Low Frequency Array (LOFAR). Finally, we also show how it is possible to use convolutional neural networks to find sources in radio surveys.

Machine Learning for Galaxy Image Reconstruction with Problem Specific Loss

Fadi Nammour (CosmoStat - CEA Saclay)

Telescope images are corrupted with blur and noise. Generally, blur is represented by a convolution with a Point Spread Function and noise is modelled as Additive Gaussian Noise. Restoring galaxy images from the observations is an inverse problem that is ill-posed and specifically ill-conditioned. The majority of the standard reconstruction methods minimise the Mean Square Error to reconstruct images, without any guarantee that the shape objects contained in the data (e.g. galaxies) is preserved. Here we introduce a shape constraint, exhibit its properties and show how it preserves galaxy shapes when combined to Machine Learning reconstruction algorithms.

Anomaly detection with generative methods

Coloma Ballester (Universitat Pompeu Fabra)

Anomaly detection is frequently approached as out-of-distribution or outlier detection. In this talk, a method for out-of-distribution will be discussed. It leverages the learning of the probability distribution of normal data through generative adversarial networks while simultaneously keeping track of the states of the learning to finally estimate an efficient anomaly detector.

Deep learning for environmental sciences

 Jan Dirk Wegner (ETH Zurich) 

A multitude of different sensors is capturing massive amounts of geo-coded data with different spatial resolution, temporal frequency, viewpoint, and quality every day. Modelling functional relationships for applications is often hard and loses predictive power due to the high variance in sensor modality. Data-driven approaches, especially modern deep learning, come to the rescue and learn expressive models directly from (labeled) input data. In this talk, I will present deep learning methods to analyze geospatial data at large scale for two specific applications in the environmental sciences: biodiversity estimation and global vegetation height mapping.

Graph Neural Networks

Fernando Gama ( University of California, Berkeley)

Graphs are generic models of signal structure that can help to learn in several practical problems. To learn from graph data, we need scalable architectures that can be trained on moderate dataset sizes and that can be implemented distributedly. In this talk, I will draw from graph signal processing to define graph convolutions, and use them to introduce graph neural networks (GNNs). I will prove that GNNs are permutation equivariant and stable to perturbations of the graph, properties that explain their scalability and transferability. I will also use these results to explain the advantages of GNNs over linear graph filters. I will then discuss the problem of learning decentralized controllers, and how GNNs naturally leverage the partial information structure inherent to distributed systems. Using flocking as an illustrative example, I will show that GNNs, not only successfully learn distributed actions that coordinate the team but also transfer and scale to larger teams.

 Previous Cosmostat Days on Machine Learning in Astrophysics :

CosmosClub: Ariel Sánchez (09/07/20)

CosmosClub Ariel Sánchez

Date: July 9th 2020, 10.00 a.m.

Speaker: Ariel Sánchez (MPE Garching/ Max-Planck-Institut für extraterrestrische Physik )

Title:   Let us bury the prehistoric h: arguments against using h1Mpc units in observational cosmology

Room: Zoom Meeting (connection details will be updated soon)


It is common to express cosmological measurements in units of h^-1 Mpc. Here, we review some of the complications that originate from this practice. A crucial problem caused by these units is related to the normalization of the matter power spectrum, which is commonly characterized in terms of the linear-theory rms mass fluctuation in spheres of radius 8h^-1 Mpc, σ8. This parameter does not correctly capture the impact of h on the amplitude of density fluctuations. We show that the use of σ8 has caused critical misconceptions for both the so-called σ8 tension regarding the consistency between low-redshift probes and cosmic microwave background data, and the way in which growth-rate estimates inferred from redshift-space distortions are commonly expressed. We propose to abandon the use of h^1 Mpc units in cosmology and to characterize the amplitude of the matter power spectrum in terms of σ12, defined as the mass fluctuation in spheres of radius 12Mpc, whose value is similar to the standard σ8 for h0.67.


CosmosClub: Erwan Allys (02/07/20)

CosmosClub Erwan Allys

Date: July 2nd 2020, 10.00 a.m.

Speaker: Erwan Allys (ENS Paris / École Normale Supérieure, Laboratoire de Radioastronomie )

Title:   The Wavelet Phase Harmonics, a new interpretable statistical description for analysis and synthesis of the LSS

Room: Zoom Meeting (connection details will be updated soon)


The statistical characterization of non-Gaussian fields is a major problem in current astrophysics, and no method has clearly emerged up to now to do so. In this presentation, I will introduce the Wavelet Phase Harmonics (WPH), a low-dimensional and interpretable set of statistics that efficiently characterizes the couplings between scales in non-linear processes. This description, that has been recently introduced in data science, is inspired from neural networks. Applied to projected matter density field from Quijote N-body Large Scale Structure (LSS) simulations, I will show how the WPH are able to provide better constraints on five cosmological parameters than the joint power spectrum and bispectrum, as well as to produce new realistic statistical syntheses from a maximum-entropy model. These results open the path to the use of a new type of statistical description for non-Gaussian fields in astrophysics.


CosmosClub: Florent Mertens (10/03/20)

Date: March 18th 2020, 10.30am

Speaker: Florent Mertens (LERMA / Kapteyn Astronomical Institute)

Title: The challenges of observing the Epoch of Reionization and Cosmic Dawn

Room: Cassini 


Low-frequency observations of the redshifted 21cm line promise to open a new window onto the first billion years of cosmic history, allowing us to directly study the astrophysical processes occurring during the Epoch of Reionization (EoR) and the Cosmic Dawn (CD). This exciting goal is challenged by the difficulty of extracting the feeble 21-cm signal buried under astrophysical foregrounds orders of magnitude brighter and contaminated by numerous instrumental systematics. Several experiments such as LOFAR, MWA, HERA, and NenuFAR are currently underway aiming at statistically detecting the 21-cm brightness temperature fluctuations from the EoR and CD. While no detection is yet in sight, considerable progress has been made recently. In this talk, I will review the many challenges faced by these difficult experiments and I will share the latest development of the LOFAR Epoch of Reionization and NenuFAR Cosmic Dawn key science projects.


CosmosClub: Celine Gouin (20/02/20)

Date: February 20th 2020, 10.00 am

Room: Kepler

Speaker: Céline Gouin (IAS, COSMIX)

Title: Probing the azimuthal environment of galaxies around clusters. From cluster core to cosmic filaments


Galaxy clusters are connected at their peripheries to the large scale structures by cosmic filaments that funnel accreting material.Therefore, the vicinity of galaxy clusters are ideal places to quantify the geometry and topology of the cosmic web.These filamentary structures are studied to investigate both environment-driven galaxy evolution and the growth of massive structures. In this presentation, I probe angular features in the distribution of galaxies around clusters by performing harmonic decompositions in large photometric galaxy catalogues around low-z clusters. In the clusters’ outskirts, filamentary patterns are detected in harmonic space: massive clusters seem to have a larger number of connected filaments than low-mass ones. Our results suggest also a gradient of galaxy activity in filaments around clusters.

Euclid joint meeting: WL + GC + CG SWG + OU-LE3




February 3 - 7, 2020


IAP - Institue d'Astrophysique de Paris, 98 bis, bd Arago, 75014 Paris


The preliminary schedule can be found here:

Slides (password-protected) are on redmine.

The meeting starts on Monday 3 February at 9:30.


Participant list

Please add your name to the following list if you intend to participate. To access IAP, external people are required to indicate their name in advance of the meeting, and might have to show identification at the IAP front desk. There is no conference fee.

Practical information

How to get to IAP.

Hotel list.

Restaurant list.


Martin Kilbinger  <>

Sandrine Codis <>


CosmosClub: Irène Waldspurger (04/12/19)

Date: December 4rd 2019, 10.30am

Speaker: Irène Waldspurger (CEREMADE,  Université Paris-Dauphine)

Title: Convex and non-convex algorithms for phase retrieval

Room: Cassini 


Phase retrieval problems consist in recovering elements of a complex vector space from the modulus of their scalar product with a fixed family of measurement vectors. Traditional reconstruction algorithms rely on simple local optimization heuristics. Although they can in principle, because of the non-convexity of the problem, get stuck in local optima, they are observed to work well in many situations.

In this talk, we will see which theoretical correctness guarantees one can establish, in a particular setting, for the most well-known such algorithm. We will also present a different family of algorithms, based on so-called convexification techniques, describe its advantages and limitations.



Cosmostat Day on Machine Learning in Astrophysics

Date: January the 17th, 2020

Organizer:  Joana Frontera-Pons  <>


Local information

CEA Saclay is around 23 km South of Paris. The astrophysics division (DAp) is located at the CEA site at Orme des Merisiers, which is around 1 km South of the main CEA campus. See  for detailed information on how to arrive.

On January the 17th, 2020, we organize the 5th day on machine learning in astrophysics at DAp, CEA Saclay. 


All talks are taking place at DAp, Salle Galilée (Building 713)

10:00 - 10:15h. Welcome and coffee
10:15 - 10:45h. Parameter inference using neural networks Tom Charnock (Institut d'Astrophysique de Paris)
10:45 - 11:15h. Detection and characterisation of solar-type stars with machine learning -  Lisa Bugnet (DAp, CEA Paris-Saclay)
11:15 - 11:45h. DeepMass: Deep learning dark matter map reconstructions with Dark Energy Survey data - Niall Jeffrey (ENS)

12:00 - 13:30h. Lunch

13:30 - 14:00h. Hybrid physical-deep learning models for astronomical image processing - François Lanusse (Berkeley Center for Cosmological Physics and CosmoStat CEA Paris Saclay)
14:00 - 14:30h. A flexible EM-like clustering algorithm for noisy data Violeta Roizman (L2S, CentraleSupélec)                                                           
14:30 - 15:00h. Regularizing Optimal Transport Using Regularity Theory -  François-Pierre Paty (CREST, ENSAE)
15:00 - 15:30h. Deep Learning @ Safran for Image Processing -  Arnaud Woiselle (Safran Electronics and Defense)

15:30 - 16:00h. End of the day

Parameter inference using neural networks

Tom Charnock (Institut d'Astrophysique de Paris)

Neural networks with large training sets are currently providing tighter constraints on cosmological and astrophysical parameters than ever before. However, in their current form, these neural networks are unable to give true Bayesian inference of such model parameters. I will describe why this is true and present two methods by which the information extracting power of neural networks can be built into the necessary robust statistical framework to perform trustworthy inference, whilst at the same time massively reducing the quantity of training data required.

Detection and characterisation of solar-type stars with machine learning

Lisa Bugnet (DAp, CEA Paris-Saclay)

Stellar astrophysics has been strengthened in the 70’s by the discovery of stellar oscillations due to acoustic waves inside the Sun. These waves evolving inside solar-type stars contain information about the composition and dynamics of the surrounding plasma, and are thus very interesting for the understanding of stellar internal and surface physical processes. With classical asteroseismology we are able to extract very precise and accurate masses, radius, and ages of oscillating stars, that are key parameters for the understanding of stellar evolution.
However, classical methods of asteroseismology are time consuming processes, that can only be applied for stars showing a large enough oscillation signal. In the context of the hundred of thousand stars observed by the Transiting Exoplanet Survey Satellite (TESS), the stellar community has to adapt the methodologies previously built for the study of the few ten thousand of stars observed with much better resolution by the Kepler satellite. Our “method exploits the use of Random Forest machine learning algorithms that aim at automatically 1) classifying and 2) characterizing any stellar pulsators from global non-seismic parameters. We also present a recent result based on neural networks on the automatic detection of peculiar solar-type pulsators that have a surprinsigly low dipolar-oscillation amplitude, the signature of an unknown physical process affecting oscillation modes inside the core.

DeepMass: Deep learning dark matter map reconstructions with Dark Energy Survey data

Niall Jeffrey (ENS)

I will present the first reconstruction of dark matter maps from weak lensing observational data using deep learning. We train a convolution neural network (CNN) with a Unet based architecture on over 3.6×10^5 simulated data realisations with non-Gaussian shape noise and with cosmological parameters varying over a broad prior distribution. We interpret our newly created DES SV map as an approximation of the posterior mean P(κ|γ) of the convergence given observed shear. DeepMass method is substantially more accurate than existing mass-mapping methods with a a validation set of 8000 simulated DES SV data realisations. With higher galaxy density in future weak lensing data unveiling more non-linear scales, it is likely that deep learning will be a leading approach for mass mapping with Euclid and LSST.

Hybrid physical-deep learning models for astronomical image processing

François Lanusse (Berkeley Center for Cosmological Physics and CosmoStat CEA Paris Saclay)

The upcoming generation of wide-field optical surveys which includes LSST will aim to shed some much needed light on the physical nature of dark energy and dark matter by mapping the Universe in great detail and on an unprecedented scale. However, with the increase in data quality also comes a significant increase in  data complexity, bringing new and outstanding challenges at all levels of the scientific analysis.
In this talk, I will illustrate how deep generative models, combined with physical modeling, can be used to address some of these challenges at the image processing level, specifically by providing data-driven priors of galaxy morphology.
I will first describe how to build such generative models from corrupted and heterogeneous data, i.e. when the training set contains varying observing
conditions (in terms of noise, seeing, or even instruments). This is a necessary step for practical applications, made possible by a hybrid modeling of the
generation process, using deep neural networks to model the underlying distribution of galaxy morphologies, complemented by a physical model of
the noise and instrumental response. Sampling from these models produces realistic galaxy light profiles, which can then be used in survey emulation,
for the purpose of validating and/or calibrating data reduction pipelines. 

Even more interestingly, these models can be used as priors on galaxy morphologies and used as such as part of standard Bayesian inference techniques to solve astronomical inverse problems ranging from deconvolution to deblending galaxy images. I will present how combining these deep morphology priors with a physical forward model of observed blended scenes allows us to address the galaxy deblending problem in a physically motivated and interpretable way.

A flexible EM-like clustering algorithm for noisy data

Violeta Roizman (L2S, CentraleSupélec)

Though very popular, it is well known that the EM algorithm suffers from non-Gaussian distribution shapes and outliers. This talk will present a flexible EM-like clustering algorithm that can deal with noise and outliers in diverse data sets. This flexibility is due to extra scale parameters that allow us to accommodate for heavier tail distributions and outliers without significantly loosing efficiency in various classical scenarios. I will show experiments where we compare it to other clustering methods such as k-means, EM and spectral clustering when applied to both synthetic data and real data sets. I will conclude with an application example of our algorithm used for image segmentation.

Regularizing Optimal Transport Using Regularity Theory

François-Pierre Paty (CREST, ENSAE)

Optimal transport (OT) dates back to the end of the 18th century, when French mathematician Gaspard Monge proposed to solve the problem of déblais and remblais. In the last few years, OT has also found new applications in statistics and machine learning as a way to analyze and compare data. Both in practice and for statistical reasons, OT need be regularized. In this talk, I will present a new regularization of OT leveraging regularity of the Monge map. Instead of considering regularity as a property that can be proved under suitable assumptions, we consider regularity as a condition that must be enforced when estimating OT. This further allows us to transport out-of-sample points, as well as define a new estimator of the 2-Wasserstein distance between arbitrary measures. (Based on a joint work with Alexandre d'Aspremont and Marco Cuturi).

Deep Learning @ Safran for Image Processing

Arnaud Woiselle (Safran Electronics and Defense)

Deep learning has become the natural tool in computer vision for nearly all high-level tasks, such as object detection and classification for many years, and is now state of the art in most image processing (restoration) tasks, such as debluring or super-resolution. Safran looked into these methods for a large variety of problems, focusing on the use of a low number of network structures, due to electronics constraints for future implementation, and transferred them to real-life noisy and blurry data, both in the visible and the infrared. I will show the results in many applications, and conclude with some tips and take-away messages on what seems important to apply deep learning on a given task.

 Previous Cosmostat Days on Machine Learning in Astrophysics :

CosmosClub: Miguel Zumalacarregui (06/11/19)

Testing Gravity and Dark Energy with Cosmology and Gravitational Waves

Date: November 06th 2019, 15h30

Speaker: Miguel Zumalacarregui (UC Berkeley & IPhT Saclay)

Title: Testing Gravity and Dark Energy with Cosmology and Gravitational Waves

Room: Cassini


Alternative theories of gravity may provide viable models of cosmic acceleration with the possibility of alleviating shortcomings of the standard paradigm such as discrepant measurement of the Hubble parameter. I will present recent progress in constructing viable, yet extremely predictive theories of gravity and dark energy, extracting their cosmological implications and testing them with data, current and forthcoming. I will also present how most of these theories affect the propagation of gravitational waves. In particular, the speed of gravitational waves provides the most stringent test for a large class of theories, which have been recently ruled out by the GW speed measurement following the neutron star merger GW170817. Other effects on gravitational wave propagation (damping, modified dispersion and oscillations) can be used to test the landscape of gravitational theories.

CosmosClub: Fangchen Feng (10/10/19)

Date: October 10th 2019, 15h00

Speaker: Fangchen Feng (Laboratoire Astroparticule & Cosmologie)

Title: Reconstruction and characterisation of polarisations of a gravitational-wave signal

Room: Kepler


Polarisation properties of gravitational waves carry crucial information about the physics of gravitational sources (binary compact systems of black holes or neutron stars, etc. ) such as precession effects. In practice, the reconstruction of the two polarizations h+(t) and h×(t) is made possible by the use of at least two non-aligned detectors. To this aim, we propose a complete analysis procedure of gravitational-wave signals. Starting from measurements, this procedure estimates the sky position of the source, reconstructs the two components h+(t) and h×(t) and estimates instantaneous Stokes parameters of the wave. This set of non-parametric observables encodes many fine properties of the astrophysical source without close bounds to a specific dynamical model, making them particularly suited to decipher precession effects.