## Cosmostat Day on Machine Learning in Astrophysics

### Cosmostat Day on Machine Learning in Astrophysics

Date: January the 26th, 2018

Organizer:  Joana Frontera-Pons  <joana.frontera-pons@cea.fr>

Venue:

Local information

CEA Saclay is around 23 km South of Paris. The astrophysics division (DAp) is located at the CEA site at Orme des Merisiers, which is around 1 km South of the main CEA campus. See http://www.cosmostat.org/link/how-to-get-to-sap/ for detailed information on how to arrive.

On January the 26th, 2017, we organize the third day on machine learning in astrophysics at DAp, CEA Saclay.

## Program:

All talks are taking place at DAp, Salle Galilée (Building 713)

10:00 - 10:45h. Artificial Intelligence: Past, present and future -   Marc Duranton  (CEA Saclay)
10:45 - 11:15h. Astronomical image reconstruction with convolutional neural networks -  Rémi Flamary (Université Nice-Sophia Antipolis)
11:15 - 11:45h. CNN based strong gravitational Lens finder for the Euclid pipeline - Christoph Ernst René Schäfer  (Laboratory of Astrophysics, EPFL)

12:00 - 13:30h. Lunch

13:30 - 14:00h. Optimize training samples for future supernova surveys using Active Learning - Emille Ishida  (Laboratoire de Physique de Clermont)
14:00 - 14:30h. Regularization via proximal methods - Silvia Villa (Politecnico di Milano)
14:30 - 15:00h. Deep Learning for Physical Processes:  Incorporating Prior Scientific Knowledge - Arthur Pajot (LIP6)
15:00 - 15:30h. Wasserstein dictionary Learning -  Morgan Schmitz  (CEA Saclay - CosmoStat)

15:30 - 16:00h. Coffe break

16:00 - 17:00h. Round table

### Artificial Intelligence: Past, present and future

Marc Duranton (CEA Saclay)

There is a high hype today about Deep Learning and its applications. This technology originated from the 50's from a simplification of the observations done by neurophysiologists and vision specialists that tried to understand how the neurons interact with each other and how the brain is structured for vision. This talk will come back to the history of the connectionist approach and will give a quick overview of how it works and of the current applications in various domains. It will also open discussions on how bio-inspiration could lead to a new approach in computing science.

### Astronomical image reconstruction with convolutional neural networks

Rémi Flamary (Université Nice-Sophia Antipolis)

State of the art methods in astronomical image reconstruction rely on the resolution of a regularized or constrained optimization problem.
Solving this problem can be computationally intensive especially with large images. We investigate in this work the use of convolutional
neural networks for image reconstruction in astronomy. With neural networks, the computationally intensive tasks is the training step, but
the prediction step has a fixed complexity per pixel, i.e. a linear complexity. Numerical experiments for fixed PSF and varying PSF in large
field of views show that CNN are computationally efficient and competitive with optimization based methods in addition to being interpretable.

### CNN based strong gravitational Lens finder for the Euclid pipeline

Christoph Ernst René Schäfer (Laboratory of Astrophysics, EPFL)

Within the Euclid survey 10^5 new strong gravitational lenses are expected to be found within 35% of the observable sky. Identifying these objects in a reasonable of time necessitates the development of powerful machine learning based classifiers. One option for the Euclid pipeline are CNN-based classifiers which performed admirably during the last Galaxy-Galaxy Strong Lensing Challenge. This talk will showcase first the potential of CNN for this particular task and second expose some of the issues that CNN still have to overcome.

### Optimize training samples for future supernova surveys using Active Learning

Emille Ishida (Laboratoire de Physique de Clermont)

The full exploitation of the next generation of large scale photometric supernova surveys depends heavily on our ability to provide a reliable early-epoch classification based solely on photometric data. In preparation for this scenario, there has been many attempts to apply different machine learning algorithms to the supernova photometric classification problem. Although different methods present different degree of success, text-book machine learning methods fail to address the crucial issue of lack of representativeness between spectroscopic (training) and photometric (target) samples. In this talk I will show how Active Learning (or optimal experiment design) can be used as a tool for optimizing the construction of spectroscopic samples for classification purposes. I will present results on how the design of spectroscopic samples from the beginning of the survey can achieve optimal classification results with a much lower number of spectra than the current adopted strategy.

### Regularization via proximal methods

Silvia Villa (Politecnico di Milano)

In the context of linear inverse problems, I will discuss iterative regularization methods allowing to consider large classes of data-fit terms and regularizers. In particular, I will investigate regularization properties of first order proximal splitting optimization techniques.  Such methods are appealing since their computational complexity is tailored to the estimation accuracy allowed by the data, as I will show theoretically and numerically.

### Deep Learning for Physical Processes:  Incorporating Prior Scientific Knowledge

Arthur Pajot (LIP6)

We consider the use of Deep Learning methods for modeling complex phenomena like those occurring in natural physical processes. With the large amount of data gathered on these phenomena the data intensive paradigm could begin to challenge more traditional approaches elaborated over the years in fields like maths or physics. However, despite considerable successes in a variety of application domains, the machine learning field is not yet ready to handle the level of complexity required by such problems. Using an example application, namely Sea Surface Temperature Prediction, we show how general background knowledge gained from physics could be used as a guideline for designing efficient Deep Learning models.

### Wasserstein dictionary Learning

Morgan Schmitz (CEA Saclay - CosmoStat)

Optimal Transport theory enables the definition of a distance across the set of measures on any given space. This Wasserstein distance naturally accounts for geometric warping between measures (including, but not exclusive to, images). We introduce a new, Optimal Transport-based representation learning method in close analogy with the usual Dictionary Learning problem. This approach typically relies on a matrix dot-product between the learned dictionary and the codes making up the new representation. The relationship between atoms and data is thus ultimately linear.

We instead use automatic differentiation to derive gradients of the Wasserstein barycenter operator, and we learn a set of atoms and barycentric weights from the data in an unsupervised fashion. Since our data is reconstructed as Wasserstein barycenters of our learned atoms, we can make full use of the attractive properties of the Optimal Transport geometry. In particular, our representation allows for non-linear relationships between atoms and data.

## Ming Jiang PhD Defense

Event: Ming Jiang's Thesis Defence

Date: 10/11/2017

Venue: Salle Galilée, Bât: 713C (CEA-Saclay)

My thesis is approaching its final destination after 3 years of work! I am pleased to announce you that my defense will be held at 2 pm on November 10th in Galilée room. You are welcomed to my defense!

Multichannel Compressed Sensing and its Applications in Radioastronomy

The new generation of radio interferometer instruments, such as LOFAR and SKA, will allow us to build radio images with very high angular resolution and sensitivity. One of the major problems in interferometry imaging is that it involves an ill-posed inverse problem because only a few Fourier components (visibility points) can be acquired by a radio interferometer. Compressed Sensing (CS) theory is a paradigm to solve many underdetermined inverse problems and has shown its strength in radio astronomy.

This thesis focuses on the methodology of Multichannel Compressed Sensing data reconstruction and its application in radio astronomy. For instance, radio transients are an active research field in radio astronomy but their detection is a challenging problem because of low angular resolution and low signal-to-noise observations. To address this issue, we investigated the sparsity of temporal information of radio transients and proposed a spatial-temporal sparse reconstruction method to efficiently detect radio sources. Experiments have shown the strength of this sparse recovery method compared to the state-of-the-art methods.

A second application is concerned with multi-wavelength radio interferometry imaging in which the data are degraded differently in terms of wavelength due to the wavelength-dependent varying instrumental beam. Based on a source mixture model, a novel Deconvolution Blind Source Separation (DBSS) model is proposed. The DBSS problem is not only non-convex but also ill-conditioned due to convolution kernels. Our proposed DecGMCA method, which benefits from a sparsity prior and leverages an alternating projected least squares, is an efficient algorithm to tackle simultaneously the deconvolution and BSS problems. Experiments have shown that taking into account joint deconvolution and BSS gives much better results than applying sequential deconvolution and BSS.

## École Euclid de cosmologie 2017

Date: June 27 - July 8 2017

Venue: Fréjus, France

Lecture Weak gravitational lensing'' (Le lentillage gravitationnel), Martin Kilbinger.

Find here links to the lecture notes, TD exercises, "tables rondes" topics, and other information.

• Resources.
• A great and detailed introduction to (weak) gravitational lensing are the 2005 Saas Fee lecture notes by Peter Schneider. Download Part I (Introduction to lensing) and Part III (Weak lensing) from my homepage.
• Check out Sarah Bridle's video lectures on WL from 2014.
• TD cycle 1+2, Data analysis.
1.  We will work on a rather large (150 MB) weak-lensing catalogue from the public CFHTLenS web page. During the TD I will show instructions how to create and download this catalogue. For faster access, it will be available on the server during the school, and I will bring a few USB sticks.
If you like, you can however download the catalogue on your laptop at home. Please have a look at the instructions (available soon).
2. If you want to do the TD on your laptop, you'll need to download and install athena (the newest version 1.7).
3.  For one of the bonus TD you'll need a new version of pallas.py (v 1.8beta). Download it here.
• Lecture notes and exercise classes: