Faster and better sparse blind source separation through mini-batch optimization

Sparse Blind Source Separation (sBSS) plays a key role in scientific domains as different as biomedical imaging, remote sensing or astrophysics, which require the development of increasingly faster and scalable BSS methods without sacrificing the separation performances. To that end, a new distributed sparse BSS algorithm is introduced based on a mini-batch ex-tension of the Generalized Morphological Component Analysis algorithm (GMCA). Precisely, it combines a robust projected alternate least-squares method with mini-batches optimization. The originality further lies in the use of a manifold-based aggregation of asynchronously estimated mixing ma- trices. Numerical experiments are carried out on realistic spectroscopic spectra, and highlight the ability of the proposed distributed GMCA (dGMCA) to provide very good separation results even when very small mini-batches are used. Quite unexpectedly, it can further outperform the (non-distributed) state-of-the-art methods for highly sparse sources.

Reference: Christophe Kervazo, Tobias Liaudat and Jérôme Bobin.
“Faster and better sparse blind source separation through mini-batch optimization, Digital Signal Processing, Elsevier, 2020.

DSP Elsevier, HAL.

Recovery of 21-cm intensity maps with sparse component separation

 

Authors: I.P. Carucci, M.O. Irfan, J.Bobin
Journal: MNRAS
Year: 2020
Download: ADS | arXiv

21 cm intensity mapping has emerged as a promising technique to map the large-scale structure of the Universe. However, the presence of foregrounds with amplitudes orders of magnitude larger than the cosmological signal constitutes a critical challenge. In this work, we test the sparsity-based algorithm Generalised Morphological Component Analysis (GMCA) as a blind component separation technique for this class of experiments. We test the GMCA performance against realistic full-sky mock temperature maps that include, besides astrophysical foregrounds, also a fraction of the polarized part of the signal leaked into the unpolarized one, a very troublesome foreground to subtract, usually referred to as polarization leakage. To our knowledge, this is the first time the removal of such component is performed with no prior assumption. We assess the success of the cleaning by comparing the true and recovered power spectra, in the angular and radial directions. In the best scenario looked at, GMCA is able to recover the input angular (radial) power spectrum with an average bias of 5% for >25 (2030% for k_ll ≳ 0.02 Mpc/h), in the presence of polarization leakage. Our results are robust also when up to 40% of channels are missing, mimicking a Radio Frequency Interference (RFI) flagging of the data. Having quantified the notable effect of polarisation leakage on our results, in perspective we advocate the use of more realistic simulations when testing 21 cm intensity mapping capabilities.

Code and demonstrative notebooks are available here and data-set to reproduce our results is available here.

Determining thermal dust emission from Planck HFI data using a sparse, parametric technique

 

Authors: M.O. Irfan, J.Bobin, M-A.Miville-Deschenes, I.Grenier 
Journal: A&A
Year: 2018
Download: ADS | arXiv


Abstract

Context: The Planck data releases have provided the community with sub-millimetre and radio observations of the full-sky at unprecedented resolutions. We make use of the Planck 353, 545 and 857 GHz maps alongside the IRAS 3000 GHz map. These maps contain information on the cosmic microwave background (CMB), cosmic infrared background (CIB), extragalactic point sources and diffuse thermal dust emission. Aims: We aim to determine the modified black body (MBB) model parameters of thermal dust emission in total intensity and produce all sky maps of pure thermal dust, having separated this Galactic component from the CMB and CIB. Methods: This separation is completed using a new, sparsity-based, parametric method which we refer to as premise. The method comprises of three main stages: 1) filtering of the raw data to reduce the effect of the CIB on the MBB fit. 2) fitting an MBB model to the filtered data across super-pixels of various sizes determined by the algorithm itself and 3) refining these super-pixel estimates into full resolution maps of the MBB parameters. Results: We present our maps of MBB temperature, spectral index and optical depth at 5 arcmin resolution and compare our estimates to those of GNILC as well as the two-step MBB fit presented by the Planck collaboration in 2013. Conclusions: By exploiting sparsity we avoid the need for smoothing, enabling us to produce the first full resolution MBB parameter maps from intensity measurements of thermal dust emission.We consider the premise parameter estimates to be competitive with the existing state-of-the-art solutions, outperforming these methods within low signal-to-noise regions as we account for the CIB without removing thermal dust emission through over-smoothing.

A highly precise shape-noise-free shear bias estimator

 

Authors: A. Pujol, M. Kilbinger, F. Sureau & J. Bobin
Journal:  
Year: 06/2018
Download: ADS| Arxiv


Abstract

We present a new method to estimate shear measurement bias in image simulations that significantly improves its precision with respect to the state-of-the-art methods. This method is based on measuring the shear response for individual images. We generate sheared versions of the same image to measure how the shape measurement changes with the changes in the shear, so that we obtain a shear response for each original image, as well as its additive bias. Using the exact same noise realizations for each sheared version allows us to obtain an exact estimation of its shear response. The estimated shear bias of a sample of galaxies comes from the measured averages of the shear response and individual additive bias. The precision of this method supposes an improvement with respect to previous methods since our method is not affected by shape noise. As a consequence, the method does not require shape noise cancellation for a precise estimation of shear bias. The method can be easily applied to many applications such as shear measurement validation and calibration, reducing the number of necessary simulated images by a few orders of magnitude to achieve the same precision requirements.

Sparse estimation of model-based diffuse thermal dust emission

 

Authors: M.O. Irfan, J.Bobin 
Journal: MNRAS
Year: 2017
Download: ADS | arXiv


Abstract

Component separation for the Planck HFI data is primarily concerned with the estimation of thermal dust emission, which requires the separation of thermal dust from the cosmic infrared background (CIB). For that purpose, current estimation methods rely on filtering techniques to decouple thermal dust emission from CIB anisotropies, which tend to yield a smooth, low- resolution, estimation of the dust emission. In this paper we present a new parameter estimation method, premise: Parameter Recovery Exploiting Model Informed Sparse Estimates. This method exploits the sparse nature of thermal dust emission to calculate all-sky maps of thermal dust temperature, spectral index and optical depth at 353 GHz. premise is evaluated and validated on full-sky simulated data. We find the percentage difference between the premise results and the true values to be 2.8, 5.7 and 7.2 per cent at the 1 sigma level across the full sky for thermal dust temperature, spectral index and optical depth at 353 GHz, respectively. Comparison between premise and a GNILC-like method over selected regions of our sky simulation reveals that both methods perform comparably within high signal-to-noise regions. However outside of the Galactic plane premise is seen to outperform the GNILC-like method with increasing success as the signal-to-noise ratio worsens.

Shear measurement bias: dependencies on methods, simulation parameters and measured parameters

 

Authors: A. Pujol, F. Sureau, J. Bobin et al.
Journal: A&A
Year: 06/2017
Download: ADS| Arxiv


Abstract

We present a study of the dependencies of shear and ellipticity bias on simulation (input) and measured (output) parameters, noise, PSF anisotropy, pixel size and the model bias coming from two different and independent shape estimators. We use simulated images from Galsim based on the GREAT3 control-space-constant branch and we measure ellipticity and shear bias from a model-fitting method (gFIT) and a moment-based method (KSB). We show the bias dependencies found on input and output parameters for both methods and we identify the main dependencies and causes. We find consistent results between the two methods (given the precision of the analysis) and important dependencies on orientation and morphology properties such as flux, size and ellipticity. We show cases where shear bias and ellipticity bias behave very different for the two methods due to the different nature of these measurements. We also show that noise and pixelization play an important role on the bias dependences on the output properties. We find a large model bias for galaxies consisting of a bulge and a disk with different ellipticities or orientations. We also see an important coupling between several properties on the bias dependences. Because of this we need to study several properties simultaneously in order to properly understand the nature of shear bias.

Unsupervised feature learning for galaxy SEDs with denoising autoencoders

 

Authors: Frontera-Pons, J., Sureau, F., Bobin, J. and Le Floc'h E.
Journal: Astronomy & Astrophysics
Year: 2017
Download: ADS | arXiv


Abstract

With the increasing number of deep multi-wavelength galaxy surveys, the spectral energy distribution (SED) of galaxies has become an invaluable tool for studying the formation of their structures and their evolution. In this context, standard analysis relies on simple spectro-photometric selection criteria based on a few SED colors. If this fully supervised classification already yielded clear achievements, it is not optimal to extract relevant information from the data. In this article, we propose to employ very recent advances in machine learning, and more precisely in feature learning, to derive a data-driven diagram. We show that the proposed approach based on denoising autoencoders recovers the bi-modality in the galaxy population in an unsupervised manner, without using any prior knowledge on galaxy SED classification. This technique has been compared to principal component analysis (PCA) and to standard color/color representations. In addition, preliminary results illustrate that this enables the capturing of extra physically meaningful information, such as redshift dependence, galaxy mass evolution and variation over the specific star formation rate. PCA also results in an unsupervised representation with physical properties, such as mass and sSFR, although this representation separates out less other characteristics (bimodality, redshift evolution) than denoising autoencoders.

Joint Multichannel Deconvolution and Blind Source Separation

 

Authors: M. Jiang, J. Bobin, J-L. Starck
Journal: SIAM J. Imaging Sci.
Year: 2017
Download: ADS | arXiv | SIIMS

 


Abstract

Blind Source Separation (BSS) is a challenging matrix factorization problem that plays a central role in multichannel imaging science. In a large number of applications, such as astrophysics, current unmixing methods are limited since real-world mixtures are generally affected by extra instrumental effects like blurring. Therefore, BSS has to be solved jointly with a deconvolution problem, which requires tackling a new inverse problem: deconvolution BSS (DBSS). In this article, we introduce an innovative DBSS approach, called DecGMCA, based on sparse signal modeling and an efficient alternative projected least square algorithm. Numerical results demonstrate that the DecGMCA algorithm performs very well on simulations. It further highlights the importance of jointly solving BSS and deconvolution instead of considering these two problems independently. Furthermore, the performance of the proposed DecGMCA algorithm is demonstrated on simulated radio-interferometric data.

Blind separation of sparse sources in the presence of outliers

 

Authors: C.Chenot, J.Bobin
Journal: Signal Processing, Elsevier
Year: 2016
Download: Elsevier / Preprint

 


 

Abstract

 

Blind Source Separation (BSS) plays a key role to analyze multichannel data since it aims at recovering unknown underlying elementary sources from observed linear mixtures in an unsupervised way. In a large number of applications, multichannel measurements contain corrupted entries, which are highly detrimental for most BSS techniques. In this article, we introduce a new {\it robust} BSS technique coined robust Adaptive Morphological Component Analysis (rAMCA). Based on sparse signal modeling, it makes profit of an alternate reweighting minimization technique that yields a robust estimation of the sources and the mixing matrix simultaneously with the removal of the spurious outliers. Numerical experiments are provided that illustrate the robustness of this new algorithm with respect to aberrant outliers on a wide range of blind separation instances. In contrast to current robust BSS methods, the rAMCA algorithm is shown to perform very well when the number of observations is close or equal to the number of sources.

CMB reconstruction from the WMAP and Planck PR2 data

 

Authors:  J. Bobin, F. Sureau and J. -L. Starck
Journal: A&A
Year: 2015
Download: ADS | arXiv


Abstract

In this article, we describe a new estimate of the Cosmic Microwave Background (CMB) intensity map reconstructed by a joint analysis of the full Planck 2015 data (PR2) and WMAP nine-years. It provides more than a mere update of the CMB map introduced in (Bobin et al. 2014b) since it benefits from an improvement of the component separation method L-GMCA (Local-Generalized Morphological Component Analysis) that allows the efficient separation of correlated components (Bobin et al. 2015). Based on the most recent CMB data, we further confirm previous results (Bobin et al. 2014b) showing that the proposed CMB map estimate exhibits appealing characteristics for astrophysical and cosmological applications: i) it is a full sky map that did not require any inpainting or interpolation post-processing, ii) foreground contamination is showed to be very low even on the galactic center, iii) it does not exhibit any detectable trace of thermal SZ contamination. We show that its power spectrum is in good agreement with the Planck PR2 official theoretical best-fit power spectrum. Finally, following the principle of reproducible research, we provide the codes to reproduce the L-GMCA, which makes it the only reproducible CMB map.