A Distributed Learning Architecture for Scientific Imaging Problems

 

Authors: A. Panousopoulou, S. Farrens, K. Fotiadou, A. Woiselle, G. Tsagkatakis, J-L. Starck,  P. Tsakalides
Journal: arXiv
Year: 2018
Download: ADS | arXiv


Abstract

Current trends in scientific imaging are challenged by the emerging need of integrating sophisticated machine learning with Big Data analytics platforms. This work proposes an in-memory distributed learning architecture for enabling sophisticated learning and optimization techniques on scientific imaging problems, which are characterized by the combination of variant information from different origins. We apply the resulting, Spark-compliant, architecture on two emerging use cases from the scientific imaging domain, namely: (a) the space variant deconvolution of galaxy imaging surveys (astrophysics), (b) the super-resolution based on coupled dictionary training (remote sensing). We conduct evaluation studies considering relevant datasets, and the results report at least 60\% improvement in time response against the conventional computing solutions. Ultimately, the offered discussion provides useful practical insights on the impact of key Spark tuning parameters on the speedup achieved, and the memory/disk footprint.

DEDALE: Mathematical Tools to Help Navigate the Big Data Maze

Managing the huge volumes and varying streams of Big Data digital information presents formidable analytical challenges to anyone wanting to make sense of it. Consider the mapping of space, where scientists collect, process and transmit giga-scale data sets to generate accurate visual representations of millions of galaxies. Or consider the vast information being generated by genomics and bioinformatics as genomes are mapped and new drugs discovered. And soon the Internet of Things will bring millions of interconnected information-sensing and transmitting devices.

Improving Weak Lensing Mass Map Reconstructions using Gaussian and Sparsity Priors: Application to DES SV

 

Authors: N. JeffreyF. B. AbdallaO. LahavF. LanusseJ.-L. Starck, et al
Journal:  
Year: 01/2018
Download: ADS| Arxiv


Abstract

Mapping the underlying density field, including non-visible dark matter, using weak gravitational lensing measurements is now a standard tool in cosmology. Due to its importance to the science results of current and upcoming surveys, the quality of the convergence reconstruction methods should be well understood. We compare three different mass map reconstruction methods: Kaiser-Squires (KS), Wiener filter, and GLIMPSE. KS is a direct inversion method, taking no account of survey masks or noise. The Wiener filter is well motivated for Gaussian density fields in a Bayesian framework. The GLIMPSE method uses sparsity, with the aim of reconstructing non-linearities in the density field. We compare these methods with a series of tests on the public Dark Energy Survey (DES) Science Verification (SV) data and on realistic DES simulations. The Wiener filter and GLIMPSE methods offer substantial improvement on the standard smoothed KS with a range of metrics. For both the Wiener filter and GLIMPSE convergence reconstructions we present a 12% improvement in Pearson correlation with the underlying truth from simulations. To compare the mapping methods' abilities to find mass peaks, we measure the difference between peak counts from simulated {\Lambda}CDM shear catalogues and catalogues with no mass fluctuations. This is a standard data vector when inferring cosmology from peak statistics. The maximum signal-to-noise value of these peak statistic data vectors was increased by a factor of 3.5 for the Wiener filter and by a factor of 9 using GLIMPSE. With simulations we measure the reconstruction of the harmonic phases, showing that the concentration of the phase residuals is improved 17% by GLIMPSE and 18% by the Wiener filter. We show that the correlation between the reconstructions from data and the foreground redMaPPer clusters is increased 18% by the Wiener filter and 32% by GLIMPSE.

Big Bang and Big Data

The new international projects, such as the Euclid space telescope, are ushering in the era of Big Data for cosmologists. Our questions about dark matter and dark energy, which on their own account for 95% of the content of our Universe, throw up new algorithmic, computational and theoretical challenges. The fourth concerns reproducible research, a fundamental concept for the verification and credibility of the published results.

Astrophysique et IRM, un mariage qui a du sens

La Direction de la recherche fondamentale au CEA lance le projet COSMIC, né du rapprochement de deux compétences en traitement des données localisées à l'Institut des sciences du vivant Frédéric-Joliot (NeuroSpin) et au CEA-Irfu (CosmoStat). Les mécanismes d'acquisition de données en radio-astronomie et en IRM présentent des similarités. Les modèles mathématiques utilisés sont en effet basés sur les principes de parcimonie et d'acquisition comprimée, dérivés de l'analyse harmonique.

Unsupervised feature learning for galaxy SEDs with denoising autoencoders

 

Authors: Frontera-Pons, J., Sureau, F., Bobin, J. and Le Floc'h E.
Journal: Astronomy & Astrophysics
Year: 2017
Download: ADS | arXiv


Abstract

With the increasing number of deep multi-wavelength galaxy surveys, the spectral energy distribution (SED) of galaxies has become an invaluable tool for studying the formation of their structures and their evolution. In this context, standard analysis relies on simple spectro-photometric selection criteria based on a few SED colors. If this fully supervised classification already yielded clear achievements, it is not optimal to extract relevant information from the data. In this article, we propose to employ very recent advances in machine learning, and more precisely in feature learning, to derive a data-driven diagram. We show that the proposed approach based on denoising autoencoders recovers the bi-modality in the galaxy population in an unsupervised manner, without using any prior knowledge on galaxy SED classification. This technique has been compared to principal component analysis (PCA) and to standard color/color representations. In addition, preliminary results illustrate that this enables the capturing of extra physically meaningful information, such as redshift dependence, galaxy mass evolution and variation over the specific star formation rate. PCA also results in an unsupervised representation with physical properties, such as mass and sSFR, although this representation separates out less other characteristics (bimodality, redshift evolution) than denoising autoencoders.

PSF field learning based on Optimal Transport Distances

 

Authors: F. Ngolè Mboula, J-L. Starck
Journal: arXiv
Year: 2017
Download: ADS | arXiv

 


Abstract

Context: in astronomy, observing large fractions of the sky within a reasonable amount of time implies using large field-of-view (fov) optical instruments that typically have a spatially varying Point Spread Function (PSF). Depending on the scientific goals, galaxies images need to be corrected for the PSF whereas no direct measurement of the PSF is available. Aims: given a set of PSFs observed at random locations, we want to estimate the PSFs at galaxies locations for shapes measurements correction. Contributions: we propose an interpolation framework based on Sliced Optimal Transport. A non-linear dimension reduction is first performed based on local pairwise approximated Wasserstein distances. A low dimensional representation of the unknown PSFs is then estimated, which in turn is used to derive representations of those PSFs in the Wasserstein metric. Finally, the interpolated PSFs are calculated as approximated Wasserstein barycenters. Results: the proposed method was tested on simulated monochromatic PSFs of the Euclid space mission telescope (to be launched in 2020). It achieves a remarkable accuracy in terms of pixels values and shape compared to standard methods such as Inverse Distance Weighting or Radial Basis Function based interpolation methods.