Big Bang and Big Data

The new international projects, such as the Euclid space telescope, are ushering in the era of Big Data for cosmologists. Our questions about dark matter and dark energy, which on their own account for 95% of the content of our Universe, throw up new algorithmic, computational and theoretical challenges. The fourth concerns reproducible research, a fundamental concept for the verification and credibility of the published results.

Astrophysique et IRM, un mariage qui a du sens

La Direction de la recherche fondamentale au CEA lance le projet COSMIC, né du rapprochement de deux compétences en traitement des données localisées à l'Institut des sciences du vivant Frédéric-Joliot (NeuroSpin) et au CEA-Irfu (CosmoStat). Les mécanismes d'acquisition de données en radio-astronomie et en IRM présentent des similarités. Les modèles mathématiques utilisés sont en effet basés sur les principes de parcimonie et d'acquisition comprimée, dérivés de l'analyse harmonique.

Unsupervised feature learning for galaxy SEDs with denoising autoencoders

 

Authors: Frontera-Pons, J., Sureau, F., Bobin, J. and Le Floc'h E.
Journal: Astronomy & Astrophysics
Year: 2017
Download: ADS | arXiv


Abstract

With the increasing number of deep multi-wavelength galaxy surveys, the spectral energy distribution (SED) of galaxies has become an invaluable tool for studying the formation of their structures and their evolution. In this context, standard analysis relies on simple spectro-photometric selection criteria based on a few SED colors. If this fully supervised classification already yielded clear achievements, it is not optimal to extract relevant information from the data. In this article, we propose to employ very recent advances in machine learning, and more precisely in feature learning, to derive a data-driven diagram. We show that the proposed approach based on denoising autoencoders recovers the bi-modality in the galaxy population in an unsupervised manner, without using any prior knowledge on galaxy SED classification. This technique has been compared to principal component analysis (PCA) and to standard color/color representations. In addition, preliminary results illustrate that this enables the capturing of extra physically meaningful information, such as redshift dependence, galaxy mass evolution and variation over the specific star formation rate. PCA also results in an unsupervised representation with physical properties, such as mass and sSFR, although this representation separates out less other characteristics (bimodality, redshift evolution) than denoising autoencoders.

PSF field learning based on Optimal Transport Distances

 

Authors: F. Ngolè Mboula, J-L. Starck
Journal: arXiv
Year: 2017
Download: ADS | arXiv

 


Abstract

Context: in astronomy, observing large fractions of the sky within a reasonable amount of time implies using large field-of-view (fov) optical instruments that typically have a spatially varying Point Spread Function (PSF). Depending on the scientific goals, galaxies images need to be corrected for the PSF whereas no direct measurement of the PSF is available. Aims: given a set of PSFs observed at random locations, we want to estimate the PSFs at galaxies locations for shapes measurements correction. Contributions: we propose an interpolation framework based on Sliced Optimal Transport. A non-linear dimension reduction is first performed based on local pairwise approximated Wasserstein distances. A low dimensional representation of the unknown PSFs is then estimated, which in turn is used to derive representations of those PSFs in the Wasserstein metric. Finally, the interpolated PSFs are calculated as approximated Wasserstein barycenters. Results: the proposed method was tested on simulated monochromatic PSFs of the Euclid space mission telescope (to be launched in 2020). It achieves a remarkable accuracy in terms of pixels values and shape compared to standard methods such as Inverse Distance Weighting or Radial Basis Function based interpolation methods.

Joint Multichannel Deconvolution and Blind Source Separation

 

Authors: M. Jiang, J. Bobin, J-L. Starck
Journal: SIAM J. Imaging Sci.
Year: 2017
Download: ADS | arXiv | SIIMS

 


Abstract

Blind Source Separation (BSS) is a challenging matrix factorization problem that plays a central role in multichannel imaging science. In a large number of applications, such as astrophysics, current unmixing methods are limited since real-world mixtures are generally affected by extra instrumental effects like blurring. Therefore, BSS has to be solved jointly with a deconvolution problem, which requires tackling a new inverse problem: deconvolution BSS (DBSS). In this article, we introduce an innovative DBSS approach, called DecGMCA, based on sparse signal modeling and an efficient alternative projected least square algorithm. Numerical results demonstrate that the DecGMCA algorithm performs very well on simulations. It further highlights the importance of jointly solving BSS and deconvolution instead of considering these two problems independently. Furthermore, the performance of the proposed DecGMCA algorithm is demonstrated on simulated radio-interferometric data.

Space variant deconvolution of galaxy survey images

 

Authors: S. Farrens, J-L. Starck, F. Ngolè Mboula
Journal: A&A
Year: 2017
Download: ADS | arXiv


Abstract

Removing the aberrations introduced by the Point Spread Function (PSF) is a fundamental aspect of astronomical image processing. The presence of noise in observed images makes deconvolution a nontrivial task that necessitates the use of regularisation. This task is particularly difficult when the PSF varies spatially as is the case for the Euclid telescope. New surveys will provide images containing thousand of galaxies and the deconvolution regularisation problem can be considered from a completely new perspective. In fact, one can assume that galaxies belong to a low-rank dimensional space. This work introduces the use of the low-rank matrix approximation as a regularisation prior for galaxy image deconvolution and compares its performance with a standard sparse regularisation technique. This new approach leads to a natural way to handle a space variant PSF. Deconvolution is performed using a Python code that implements a primal-dual splitting algorithm. The data set considered is a sample of 10 000 space-based galaxy images convolved with a known spatially varying Euclid-like PSF and including various levels of Gaussian additive noise. Performance is assessed by examining the deconvolved galaxy image pixels and shapes. The results demonstrate that for small samples of galaxies sparsity performs better in terms of pixel and shape recovery, while for larger samples of galaxies it is possible to obtain more accurate estimates of the galaxy shapes using the low-rank approximation.


Summary

Point Spread Function

The Point Spread Function or PSF of an imaging system (also referred to as the impulse response) describes how the system responds to a point (unextended) source. In astrophysics, stars or quasars are often used to measure the PSF of an instrument as in ideal conditions their light would occupy a single pixel on a CCD. Telescopes, however, diffract the incoming photons which limits the maximum resolution achievable. In reality, the images obtained from telescopes include aberrations from various sources such as:

  • The atmosphere (for ground based instruments)
  • Jitter (for space based instruments)
  • Imperfections in the optical system
  • Charge spread of the detectors

Deconvolution

In order to recover the true image properties it is necessary to remove PSF effects from observations. If the PSF is known (which is certainly not trivial) one can attempt to deconvolve the PSF from the image. In the absence of noise this is simple. We can model the observed image \mathbf{y} as follows

\mathbf{y}=\mathbf{Hx}

where \mathbf{x} is the true image and \mathbf{H} is an operator that represents the convolution with the PSF. Thus, to recover the true image, one would simply invert \mathbf{H} as follows

\mathbf{x}=\mathbf{H}^{-1}\mathbf{y}

Unfortunately, the images we observe also contain noise (e.g. from the CCD readout) and this complicates the problem.

\mathbf{y}=\mathbf{Hx} + \mathbf{n}

This problem is ill-posed as even the tiniest amount of noise will have a large impact on the result of the operation. Therefore, to obtain a stable and unique solution, it is necessary to regularise the problem by adding additional prior knowledge of the true images.

Sparsity

One way to regularise the problem is using sparsity. The concept of sparsity is quite simple. If we know that there is a representation of \mathbf{x} that is sparse (i.e. most of the coefficients are zeros) then we can force our deconvolved observation \mathbf{\hat{x}} to be sparse in the same domain. In practice we aim to minimise a problem of the following form

\begin{aligned} & \underset{\mathbf{x}}{\text{argmin}} & \frac{1}{2}\|\mathbf{y}-\mathbf{H}\mathbf{x}\|_2^2 + \lambda\|\Phi(\mathbf{x})\|_1 & & \text{s.t.} & & \mathbf{x} \ge 0 \end{aligned}

where \Phi is a matrix that transforms \mathbf{x} to the sparse domain and \lambda is a regularisation control parameter.

Low-Rank Approximation

Another way to regularise the problem is assume that all of the images one aims to deconvolve live on a underlying low-rank manifold. In other words, if we have a sample of galaxy images we wish to deconvolve then we can construct a matrix X X where each column is a vector of galaxy pixel coefficients. If many of these galaxies have similar properties then we know that X X will have a smaller rank than if images were all very different. We can use this knowledge to regularise the deconvolution problem in the following way

\begin{aligned} & \underset{\mathbf{X}}{\text{argmin}} & \frac{1}{2}\|\mathbf{Y}-\mathcal{H}(\mathbf{X})\|_2^2 + \lambda|\mathbf{X}\|_* & & \text{s.t.} & & \mathbf{X} \ge 0 \end{aligned}

Results

In the paper I implement both of these regularisation techniques and compare how well they perform at deconvolving a sample of 10,000 Euclid-like galaxy images. The results show that, for the data used, sparsity does a better job at recovering the image pixels, while the low-rank approximation does a better job a recovering the galaxy shapes (provided enough galaxies are used).


Code

SF_DECONVOLVE is a Python code designed for PSF deconvolution using a low-rank approximation and sparsity. The code can handle a fixed PSF for the entire field or a stack of PSFs for each galaxy position.

 


DEDALE Provides Analysis Methods to Find the Right Data

​A key challenge in cosmological research is how to extract the most important information from satellite imagery and radio signals. The difficulty lies in the systematic processing of extremely noisy data for studying how stars and galaxies evolve through time. This is critical for astrophysicists in their effort to gain insights into cosmological processes such as the characterisation of dark matter in the Universe. Helping scientists find their way through this data maze is DEDALE, an interdisciplinary project that intends to develop the next generation of data analysis methods for the new era of big data in astrophysics and compressed sensing.

Unravelling the Cosmic Web Survey Gives Insights into Universes Structure

Today marks the release of the first papers to result from the XXL survey, the largest survey of galaxy clusters ever undertaken with ESA's XMM-Newton X-ray observatory. The gargantuan clusters of galaxies surveyed are key features of the large-scale structure of the Universe and to better understand them is to better understand this structure and the circumstances that led to its evolution. The first results from the survey, published in a special issue of Astronomy and Astrophysics, hint at the answers and surprises that are captured in this unique bank of data and reveal the true potential of the survey.

Decoding the Universe from gravitational distorsions

In a review article in "Reports on Progress in Physics", Martin Kilbinger of Astrophysics Department - AIM Laboratory at CEA-IRFU presents a comprehensive assessment of the results obtained from observations of the cosmic shear in the last 15 years. The cosmic shear effect has been measured for the first time in 2000. This effect is a distortion of the images of galaxies under the effect of gravity of the intervening clumps of matter. It allows to map the dark matter but also to determine how dark energy affects the cosmic web. The article highlights the most important challenges for turning cosmic shear into an accurate tool for cosmology. So far, dark matter has been mapped for only a tiny fraction of the sky. Future observations, such as those of the future space mission Euclid, will cover most accessible regions of the sky. The review presents the progress expected from these potential future missions for our understanding of the cosmos.

La Lueur Primordiale de l'Univers se Précise, Les Défis du CEA

I l y a 13,8 milliards d’années naît l’Univers, sous la forme d’une singularité évoluant instantanément en un brouillard chaud, opaque, fait de noyaux d’hydrogène et d’électrons. Pendant plus de 300000 ans, ce plasma s’étend, par inflation•, mais les grains de lumière émis, les photons, sont aussitôt réabsorbés par les particules de matière. L’Univers est alors une véritable purée de pois. Puis vient le moment, en l’an 380000 après le big bang, où il est suffisamment dilaté et refroidi pour que les photons puissent se « libérer »: le cosmos devient transparent, la première lumière jaillit. Et c’est une image de cette toute première lumière, appelée fond diffus cosmologique (voir encadré), qu’ont publiée des chercheurs de l’École polytechnique fédérale de Lausanne (EPFL)1 et du CEA-Irfu. D’une précision exceptionnelle, elle a été reconstruite à partir des données enregistrées par les télescopes spatiaux WMAP et Planck, à l’aide de méthodes mathématiques très poussées.