Semi-supervised dictionary learning with graph regularization and active points


Authors: Khanh-Hung TranFred-Maurice Ngole-Mboula, J-L. Starck
Journal: SIAM Journal on Imaging Sciences
Year: 2020
DOI: 10.1137/19M1285469
Download: arXiv


Supervised Dictionary Learning has gained much interest in the recent decade and has shown significant performance improvements in image classification. However, in general, supervised learning needs a large number of labelled samples per class to achieve an acceptable result. In order to deal with databases which have just a few labelled samples per class, semi-supervised learning, which also exploits unlabelled samples in training phase is used. Indeed, unlabelled samples can help to regularize the learning model, yielding an improvement of classification accuracy. In this paper, we propose a new semi-supervised dictionary learning method based on two pillars: on one hand, we enforce manifold structure preservation from the original data into sparse code space using Locally Linear Embedding, which can be considered a regularization of sparse code; on the other hand, we train a semi-supervised classifier in sparse code space. We show that our approach provides an improvement over state-of-the-art semi-supervised dictionary learning methods

Deep Learning for space-variant deconvolution in galaxy surveys


Authors: Florent Sureau, Alexis Lechat, J-L. Starck
Journal: Astronomy and Astrophysics
Year: 2020
DOI: 10.1051/0004-6361/201937039
Download: ADS | arXiv


The deconvolution of large survey images with millions of galaxies requires developing a new generation of methods that can take a space-variant point spread function into account. These methods have also to be accurate and fast. We investigate how deep learning might be used to perform this task. We employed a U-net deep neural network architecture to learn parameters that were adapted for galaxy image processing in a supervised setting and studied two deconvolution strategies. The first approach is a post-processing of a mere Tikhonov deconvolution with closed-form solution, and the second approach is an iterative deconvolution framework based on the alternating direction method of multipliers (ADMM). Our numerical results based on GREAT3 simulations with realistic galaxy images and point spread functions show that our two approaches outperform standard techniques that are based on convex optimization, whether assessed in galaxy image reconstruction or shape recovery. The approach based on a Tikhonov deconvolution leads to the most accurate results, except for ellipticity errors at high signal-to-noise ratio. The ADMM approach performs slightly better in this case. Considering that the Tikhonov approach is also more computation-time efficient in processing a large number of galaxies, we recommend this approach in this scenario.

In the spirit of reproducible research, the codes will be made freely available on the CosmoStat website ( The testing datasets will also be provided to repeat the experiments performed in this paper.

PySAP: Python Sparse Data Analysis Package for Multidisciplinary Image Processing


Authors: S. Farrens, A. Grigis, L. El Gueddari, Z. Ramzi, Chaithya G. R., S. Starck, B. Sarthou, H. Cherkaoui, P.Ciuciu, J-L. Starck
Journal: Astronomy and Computing
Year: 2020
DOI: 10.1016/j.ascom.2020.100402
Download: ADS | arXiv


We present the open-source image processing software package PySAP (Python Sparse data Analysis Package) developed for the COmpressed Sensing for Magnetic resonance Imaging and Cosmology (COSMIC) project. This package provides a set of flexible tools that can be applied to a variety of compressed sensing and image reconstruction problems in various research domains. In particular, PySAP offers fast wavelet transforms and a range of integrated optimisation algorithms. In this paper we present the features available in PySAP and provide practical demonstrations on astrophysical and magnetic resonance imaging data.


PySAP Code

Euclid preparation III. Galaxy cluster detection in the wide photometric survey, performance and algorithm selection


Authors: Euclid Collaboration, R. Adam, ..., S. Farrens, et al.
Journal: A&A
Year: 2019
DOI: 10.1051/0004-6361/201935088
Download: ADS | arXiv


Galaxy cluster counts in bins of mass and redshift have been shown to be a competitive probe to test cosmological models. This method requires an efficient blind detection of clusters from surveys with a well-known selection function and robust mass estimates. The Euclid wide survey will cover 15000 deg2 of the sky in the optical and near-infrared bands, down to magnitude 24 in the H-band. The resulting data will make it possible to detect a large number of galaxy clusters spanning a wide-range of masses up to redshift ∼2. This paper presents the final results of the Euclid Cluster Finder Challenge (CFC). The objective of these challenges was to select the cluster detection algorithms that best meet the requirements of the Euclid mission. The final CFC included six independent detection algorithms, based on different techniques, such as photometric redshift tomography, optimal filtering, hierarchical approach, wavelet and friend-of-friends algorithms. These algorithms were blindly applied to a mock galaxy catalog with representative Euclid-like properties. The relative performance of the algorithms was assessed by matching the resulting detections to known clusters in the simulations. Several matching procedures were tested, thus making it possible to estimate the associated systematic effects on completeness to <3%. All the tested algorithms are very competitive in terms of performance, with three of them reaching >80% completeness for a mean purity of 80% down to masses of 1014 M⊙ and up to redshift z=2. Based on these results, two algorithms were selected to be implemented in the Euclid pipeline, the AMICO code, based on matched filtering, and the PZWav code, based on an adaptive wavelet approach.

A Distributed Learning Architecture for Scientific Imaging Problems


Authors: A. Panousopoulou, S. Farrens, K. Fotiadou, A. Woiselle, G. Tsagkatakis, J-L. Starck,  P. Tsakalides
Journal: arXiv
Year: 2018
Download: ADS | arXiv


Current trends in scientific imaging are challenged by the emerging need of integrating sophisticated machine learning with Big Data analytics platforms. This work proposes an in-memory distributed learning architecture for enabling sophisticated learning and optimization techniques on scientific imaging problems, which are characterized by the combination of variant information from different origins. We apply the resulting, Spark-compliant, architecture on two emerging use cases from the scientific imaging domain, namely: (a) the space variant deconvolution of galaxy imaging surveys (astrophysics), (b) the super-resolution based on coupled dictionary training (remote sensing). We conduct evaluation studies considering relevant datasets, and the results report at least 60\% improvement in time response against the conventional computing solutions. Ultimately, the offered discussion provides useful practical insights on the impact of key Spark tuning parameters on the speedup achieved, and the memory/disk footprint.

Space variant deconvolution of galaxy survey images


Authors: S. Farrens, J-L. Starck, F. Ngolè Mboula
Journal: A&A
Year: 2017
Download: ADS | arXiv


Removing the aberrations introduced by the Point Spread Function (PSF) is a fundamental aspect of astronomical image processing. The presence of noise in observed images makes deconvolution a nontrivial task that necessitates the use of regularisation. This task is particularly difficult when the PSF varies spatially as is the case for the Euclid telescope. New surveys will provide images containing thousand of galaxies and the deconvolution regularisation problem can be considered from a completely new perspective. In fact, one can assume that galaxies belong to a low-rank dimensional space. This work introduces the use of the low-rank matrix approximation as a regularisation prior for galaxy image deconvolution and compares its performance with a standard sparse regularisation technique. This new approach leads to a natural way to handle a space variant PSF. Deconvolution is performed using a Python code that implements a primal-dual splitting algorithm. The data set considered is a sample of 10 000 space-based galaxy images convolved with a known spatially varying Euclid-like PSF and including various levels of Gaussian additive noise. Performance is assessed by examining the deconvolved galaxy image pixels and shapes. The results demonstrate that for small samples of galaxies sparsity performs better in terms of pixel and shape recovery, while for larger samples of galaxies it is possible to obtain more accurate estimates of the galaxy shapes using the low-rank approximation.


Point Spread Function

The Point Spread Function or PSF of an imaging system (also referred to as the impulse response) describes how the system responds to a point (unextended) source. In astrophysics, stars or quasars are often used to measure the PSF of an instrument as in ideal conditions their light would occupy a single pixel on a CCD. Telescopes, however, diffract the incoming photons which limits the maximum resolution achievable. In reality, the images obtained from telescopes include aberrations from various sources such as:

  • The atmosphere (for ground based instruments)
  • Jitter (for space based instruments)
  • Imperfections in the optical system
  • Charge spread of the detectors


In order to recover the true image properties it is necessary to remove PSF effects from observations. If the PSF is known (which is certainly not trivial) one can attempt to deconvolve the PSF from the image. In the absence of noise this is simple. We can model the observed image \(\mathbf{y}\) as follows


where \(\mathbf{x}\) is the true image and \(\mathbf{H}\) is an operator that represents the convolution with the PSF. Thus, to recover the true image, one would simply invert \(\mathbf{H}\) as follows


Unfortunately, the images we observe also contain noise (e.g. from the CCD readout) and this complicates the problem.

\(\mathbf{y}=\mathbf{Hx} + \mathbf{n}\)

This problem is ill-posed as even the tiniest amount of noise will have a large impact on the result of the operation. Therefore, to obtain a stable and unique solution, it is necessary to regularise the problem by adding additional prior knowledge of the true images.


One way to regularise the problem is using sparsity. The concept of sparsity is quite simple. If we know that there is a representation of \(\mathbf{x}\) that is sparse (i.e. most of the coefficients are zeros) then we can force our deconvolved observation \(\mathbf{\hat{x}}\) to be sparse in the same domain. In practice we aim to minimise a problem of the following form

\(\begin{aligned} & \underset{\mathbf{x}}{\text{argmin}} & \frac{1}{2}\|\mathbf{y}-\mathbf{H}\mathbf{x}\|_2^2 + \lambda\|\Phi(\mathbf{x})\|_1 & & \text{s.t.} & & \mathbf{x} \ge 0 \end{aligned}\)

where \(\Phi\) is a matrix that transforms \(\mathbf{x}\) to the sparse domain and \(\lambda\) is a regularisation control parameter.

Low-Rank Approximation

Another way to regularise the problem is assume that all of the images one aims to deconvolve live on a underlying low-rank manifold. In other words, if we have a sample of galaxy images we wish to deconvolve then we can construct a matrix X X where each column is a vector of galaxy pixel coefficients. If many of these galaxies have similar properties then we know that X X will have a smaller rank than if images were all very different. We can use this knowledge to regularise the deconvolution problem in the following way

\(\begin{aligned} & \underset{\mathbf{X}}{\text{argmin}} & \frac{1}{2}\|\mathbf{Y}-\mathcal{H}(\mathbf{X})\|_2^2 + \lambda|\mathbf{X}\|_* & & \text{s.t.} & & \mathbf{X} \ge 0 \end{aligned} \)


In the paper I implement both of these regularisation techniques and compare how well they perform at deconvolving a sample of 10,000 Euclid-like galaxy images. The results show that, for the data used, sparsity does a better job at recovering the image pixels, while the low-rank approximation does a better job a recovering the galaxy shapes (provided enough galaxies are used).


SF_DECONVOLVE is a Python code designed for PSF deconvolution using a low-rank approximation and sparsity. The code can handle a fixed PSF for the entire field or a stack of PSFs for each galaxy position.