DEDALE: Mathematical Tools to Help Navigate the Big Data Maze

Managing the huge volumes and varying streams of Big Data digital information presents formidable analytical challenges to anyone wanting to make sense of it. Consider the mapping of space, where scientists collect, process and transmit giga-scale data sets to generate accurate visual representations of millions of galaxies. Or consider the vast information being generated by genomics and bioinformatics as genomes are mapped and new drugs discovered. And soon the Internet of Things will bring millions of interconnected information-sensing and transmitting devices.

Improving Weak Lensing Mass Map Reconstructions using Gaussian and Sparsity Priors: Application to DES SV

 Authors: N. Jeffrey, F. B. Abdalla, O. Lahav, F. Lanusse, J.-L. Starck, et al Journal: Year: 01/2018 Download: ADS| Arxiv

Abstract

Mapping the underlying density field, including non-visible dark matter, using weak gravitational lensing measurements is now a standard tool in cosmology. Due to its importance to the science results of current and upcoming surveys, the quality of the convergence reconstruction methods should be well understood. We compare three different mass map reconstruction methods: Kaiser-Squires (KS), Wiener filter, and GLIMPSE. KS is a direct inversion method, taking no account of survey masks or noise. The Wiener filter is well motivated for Gaussian density fields in a Bayesian framework. The GLIMPSE method uses sparsity, with the aim of reconstructing non-linearities in the density field. We compare these methods with a series of tests on the public Dark Energy Survey (DES) Science Verification (SV) data and on realistic DES simulations. The Wiener filter and GLIMPSE methods offer substantial improvement on the standard smoothed KS with a range of metrics. For both the Wiener filter and GLIMPSE convergence reconstructions we present a 12% improvement in Pearson correlation with the underlying truth from simulations. To compare the mapping methods' abilities to find mass peaks, we measure the difference between peak counts from simulated {\Lambda}CDM shear catalogues and catalogues with no mass fluctuations. This is a standard data vector when inferring cosmology from peak statistics. The maximum signal-to-noise value of these peak statistic data vectors was increased by a factor of 3.5 for the Wiener filter and by a factor of 9 using GLIMPSE. With simulations we measure the reconstruction of the harmonic phases, showing that the concentration of the phase residuals is improved 17% by GLIMPSE and 18% by the Wiener filter. We show that the correlation between the reconstructions from data and the foreground redMaPPer clusters is increased 18% by the Wiener filter and 32% by GLIMPSE.

Kostas Themelis – Eurotalent!

CosmoStat postdoc Kostas Themelis has been awarded a Enhanced Eurotalents grant to work on “PRobabilistic cOmponent separaTion tEchniqUes with application to aStrophysical data analysis - PROTEUS”!

Be sure to keep an eye out for exciting new developments in this field!

Big Bang and Big Data

The new international projects, such as the Euclid space telescope, are ushering in the era of Big Data for cosmologists. Our questions about dark matter and dark energy, which on their own account for 95% of the content of our Universe, throw up new algorithmic, computational and theoretical challenges. The fourth concerns reproducible research, a fundamental concept for the verification and credibility of the published results.

Astrophysique et IRM, un mariage qui a du sens

La Direction de la recherche fondamentale au CEA lance le projet COSMIC, né du rapprochement de deux compétences en traitement des données localisées à l’Institut des sciences du vivant Frédéric-Joliot (NeuroSpin) et au CEA-Irfu (CosmoStat). Les mécanismes d’acquisition de données en radio-astronomie et en IRM présentent des similarités. Les modèles mathématiques utilisés sont en effet basés sur les principes de parcimonie et d’acquisition comprimée, dérivés de l’analyse harmonique.

Unsupervised feature learning for galaxy SEDs with denoising autoencoders

 Authors: Frontera-Pons, J., Sureau, F., Bobin, J. and Le Floc'h E. Journal: Astronomy & Astrophysics Year: 2017 Download: ADS | arXiv

Abstract

With the increasing number of deep multi-wavelength galaxy surveys, the spectral energy distribution (SED) of galaxies has become an invaluable tool for studying the formation of their structures and their evolution. In this context, standard analysis relies on simple spectro-photometric selection criteria based on a few SED colors. If this fully supervised classification already yielded clear achievements, it is not optimal to extract relevant information from the data. In this article, we propose to employ very recent advances in machine learning, and more precisely in feature learning, to derive a data-driven diagram. We show that the proposed approach based on denoising autoencoders recovers the bi-modality in the galaxy population in an unsupervised manner, without using any prior knowledge on galaxy SED classification. This technique has been compared to principal component analysis (PCA) and to standard color/color representations. In addition, preliminary results illustrate that this enables the capturing of extra physically meaningful information, such as redshift dependence, galaxy mass evolution and variation over the specific star formation rate. PCA also results in an unsupervised representation with physical properties, such as mass and sSFR, although this representation separates out less other characteristics (bimodality, redshift evolution) than denoising autoencoders.

Abstract

Context: in astronomy, observing large fractions of the sky within a reasonable amount of time implies using large field-of-view (fov) optical instruments that typically have a spatially varying Point Spread Function (PSF). Depending on the scientific goals, galaxies images need to be corrected for the PSF whereas no direct measurement of the PSF is available. Aims: given a set of PSFs observed at random locations, we want to estimate the PSFs at galaxies locations for shapes measurements correction. Contributions: we propose an interpolation framework based on Sliced Optimal Transport. A non-linear dimension reduction is first performed based on local pairwise approximated Wasserstein distances. A low dimensional representation of the unknown PSFs is then estimated, which in turn is used to derive representations of those PSFs in the Wasserstein metric. Finally, the interpolated PSFs are calculated as approximated Wasserstein barycenters. Results: the proposed method was tested on simulated monochromatic PSFs of the Euclid space mission telescope (to be launched in 2020). It achieves a remarkable accuracy in terms of pixels values and shape compared to standard methods such as Inverse Distance Weighting or Radial Basis Function based interpolation methods.

Joint Multichannel Deconvolution and Blind Source Separation

 Authors: M. Jiang, J. Bobin, J-L. Starck Journal: SIAM J. Imaging Sci. Year: 2017 Download: ADS | arXiv | SIIMS

Abstract

Blind Source Separation (BSS) is a challenging matrix factorization problem that plays a central role in multichannel imaging science. In a large number of applications, such as astrophysics, current unmixing methods are limited since real-world mixtures are generally affected by extra instrumental effects like blurring. Therefore, BSS has to be solved jointly with a deconvolution problem, which requires tackling a new inverse problem: deconvolution BSS (DBSS). In this article, we introduce an innovative DBSS approach, called DecGMCA, based on sparse signal modeling and an efficient alternative projected least square algorithm. Numerical results demonstrate that the DecGMCA algorithm performs very well on simulations. It further highlights the importance of jointly solving BSS and deconvolution instead of considering these two problems independently. Furthermore, the performance of the proposed DecGMCA algorithm is demonstrated on simulated radio-interferometric data.

Abstract

Removing the aberrations introduced by the Point Spread Function (PSF) is a fundamental aspect of astronomical image processing. The presence of noise in observed images makes deconvolution a nontrivial task that necessitates the use of regularisation. This task is particularly difficult when the PSF varies spatially as is the case for the Euclid telescope. New surveys will provide images containing thousand of galaxies and the deconvolution regularisation problem can be considered from a completely new perspective. In fact, one can assume that galaxies belong to a low-rank dimensional space. This work introduces the use of the low-rank matrix approximation as a regularisation prior for galaxy image deconvolution and compares its performance with a standard sparse regularisation technique. This new approach leads to a natural way to handle a space variant PSF. Deconvolution is performed using a Python code that implements a primal-dual splitting algorithm. The data set considered is a sample of 10 000 space-based galaxy images convolved with a known spatially varying Euclid-like PSF and including various levels of Gaussian additive noise. Performance is assessed by examining the deconvolved galaxy image pixels and shapes. The results demonstrate that for small samples of galaxies sparsity performs better in terms of pixel and shape recovery, while for larger samples of galaxies it is possible to obtain more accurate estimates of the galaxy shapes using the low-rank approximation.

Summary

The Point Spread Function or PSF of an imaging system (also referred to as the impulse response) describes how the system responds to a point (unextended) source. In astrophysics, stars or quasars are often used to measure the PSF of an instrument as in ideal conditions their light would occupy a single pixel on a CCD. Telescopes, however, diffract the incoming photons which limits the maximum resolution achievable. In reality, the images obtained from telescopes include aberrations from various sources such as:

• The atmosphere (for ground based instruments)
• Jitter (for space based instruments)
• Imperfections in the optical system
• Charge spread of the detectors

Deconvolution

In order to recover the true image properties it is necessary to remove PSF effects from observations. If the PSF is known (which is certainly not trivial) one can attempt to deconvolve the PSF from the image. In the absence of noise this is simple. We can model the observed image $$\mathbf{y}$$ as follows

$$\mathbf{y}=\mathbf{Hx}$$

where $$\mathbf{x}$$ is the true image and $$\mathbf{H}$$ is an operator that represents the convolution with the PSF. Thus, to recover the true image, one would simply invert $$\mathbf{H}$$ as follows

$$\mathbf{x}=\mathbf{H}^{-1}\mathbf{y}$$

Unfortunately, the images we observe also contain noise (e.g. from the CCD readout) and this complicates the problem.

$$\mathbf{y}=\mathbf{Hx} + \mathbf{n}$$

This problem is ill-posed as even the tiniest amount of noise will have a large impact on the result of the operation. Therefore, to obtain a stable and unique solution, it is necessary to regularise the problem by adding additional prior knowledge of the true images.

Sparsity

One way to regularise the problem is using sparsity. The concept of sparsity is quite simple. If we know that there is a representation of $$\mathbf{x}$$ that is sparse (i.e. most of the coefficients are zeros) then we can force our deconvolved observation $$\mathbf{\hat{x}}$$ to be sparse in the same domain. In practice we aim to minimise a problem of the following form

\begin{aligned} & \underset{\mathbf{x}}{\text{argmin}} & \frac{1}{2}\|\mathbf{y}-\mathbf{H}\mathbf{x}\|_2^2 + \lambda\|\Phi(\mathbf{x})\|_1 & & \text{s.t.} & & \mathbf{x} \ge 0 \end{aligned}

where $$\Phi$$ is a matrix that transforms $$\mathbf{x}$$ to the sparse domain and $$\lambda$$ is a regularisation control parameter.

Low-Rank Approximation

Another way to regularise the problem is assume that all of the images one aims to deconvolve live on a underlying low-rank manifold. In other words, if we have a sample of galaxy images we wish to deconvolve then we can construct a matrix X X where each column is a vector of galaxy pixel coefficients. If many of these galaxies have similar properties then we know that X X will have a smaller rank than if images were all very different. We can use this knowledge to regularise the deconvolution problem in the following way

\begin{aligned} & \underset{\mathbf{X}}{\text{argmin}} & \frac{1}{2}\|\mathbf{Y}-\mathcal{H}(\mathbf{X})\|_2^2 + \lambda|\mathbf{X}\|_* & & \text{s.t.} & & \mathbf{X} \ge 0 \end{aligned}

Results

In the paper I implement both of these regularisation techniques and compare how well they perform at deconvolving a sample of 10,000 Euclid-like galaxy images. The results show that, for the data used, sparsity does a better job at recovering the image pixels, while the low-rank approximation does a better job a recovering the galaxy shapes (provided enough galaxies are used).

Code

SF_DECONVOLVE is a Python code designed for PSF deconvolution using a low-rank approximation and sparsity. The code can handle a fixed PSF for the entire field or a stack of PSFs for each galaxy position.