================================================
athena
Version 1.7 (03/2014)
Tree code for two-point correlation functions
power spectra, and smooth second-order functions
Authors:
Martin Kilbinger
Christopher Bonnett
Jean Coupon
2007-2014
================================================
=====================
Table of content:
1. Compiling the code
2. Running the code
3. Configuration file
4. File formats
5. Weights
6. Open angle
7. Angular correlation function w(theta)
8. Projections
9. Convergence power spectrum and smooth 2nd-order functions
10. Testing the code
11. Known bugs
12. Acknowledgments
13. Contact
=====================
=====================
1. Compiling the code
=====================
Download the code, and (optional) the test data, from
http://cosmostat.org/athena (see also http://ascl.net/athena), and un-tar it.
Two options to compile the code exists:
1) Using cmake to create a Makefile and to find libraries e.g. cfitsio.
Compile the code in a build directory. E.g. in the athena root directory
athena_1.7, type
cd build
cmake ..
make && make install
'make install' will just copy the executables athena and venice to ../bin.
2) Using make.
cd src
make -f Makefile.athena
No external packages or libraries are compulsory. However, venice (see Sect.
7.1), needed to obtain random catalogues for spatial correlation functions,
requires GSL (GNU Scientific Library, www.gls.org). Further, for (optional)
FITS file format support, the cfitsio library is used. cmake automatically
finds those libraries. For option 2, edit the variables 'GSL', 'FITS', and
'LIBFITS' in src/Makefile.athena .
===================
2. Running the code
===================
The command
athena_1.7/bin/athena [-c config_file]
runs the tree code. A configuration file has to be present, see Sect. 3.
Type 'athena -h' for command line options.
Test data sets and config files can be found in the directories test_xi
(shear-shear), test_g (shear-position), and test_w (position-position), all
subdirectories of test. See Sect. 7 to calculate the full (position-position)
angular correlation function, including the generation and use of random
catalogues. A test script exists that runs athena and pallas.py, which produces
output that can be compared to reference output (in test/test_*/results). To
run the tests, type in athena_1.7:
bin/test_suite_athena.py
=====================
3. Configuration file
=====================
The configuration file (first command line argument; default "config_tree") has
to contain the following entries, in this order:
GALCAT1 string First galaxy catalogue, ascii or fits format.
For shear-position, this is the background (shear)
catalogue
GALCAT2 string Second catalogue (for cross-correlation.
Can be same as GALCAT1 or "-" for
auto-correlation). Same file format as first catalogue.
For shear-position, this is the foreground (position)
catalogue
WCORR integer Type of correlation:
1: shear-shear (aka cosmic shear)
2: shear-position (aka galaxy-lensing)
4: position-position (aka w of theta)
# If WCORR is nn:
SWCORR_SUBTYPE string Correlation sub-type:
nn_2d: 2D (angular) correlation
nn_3d: 3D correlation (NOT WELL TESTED)
nn_rp_pi: 3D correlation, separated in parallel and
tangential coordinates (NOT WELL TESTED)
# endif
SFORMAT string Format of the catalogue, see gal_cat.c. One
of "standard", "position", "position_jack_num",
"lensing_jack_num", "hamana", and "fits", see below
# If SFORMAT is fits:
NCOL integer Number of columns to be read from fits file
COL_NAMES NCOL strings Names of the NCOL columns. Each string has the format
"type:name", where type can be
x (x-coordinate), y (y-coordinate), z (z-coordinate),
e1 (first shear component), e2 (second shear component),
w (weight), njk (resample number, e.g. Jackknife sample
number). For example, for a shear input catalogue:
x:RA y:DEC e1:E1 e2:E2 w:WEIGHT
#endif
SCOORD_INPUT string Coordinate units of input catalogue(s). One of
"arcsec", "arcmin", "deg", "rad"
SCOORD_OUTPUT string Coordinate units of output file(s), i.e. correlation data.
One of "arcsec", "arcmin", "deg", "rad"
THMIN double Minimum angular separation, in unit as specified
by "SCOORD_OUTPUT"
THMAX double Maximum angular separation, in unit as specified
by "SCOORD_OUTPUT"
NTH integer Number of angular bins
BINTYPE string "LIN" or "LOG" for linear or logarithmic bins
RADEC integer 0 (Cartesian coordinates) or > 0 (spherical
coordinates)
OATH double Open angle [rad], the smaller the slower
but more precise is the result. OATH=0 is
equivalent to brute-force
SERROR string Error type, 'none', 'bootstrap' or 'jackknife'
NRESAMPLE int int Number of samples for error calculations, one number for
each catalogue. Ignored if SERROR = 'none'
===============
4. File formats
===============
--------------------------
4.1 Input catalogue format
--------------------------
The following formats are supported, which can be chosen by the "SFORMAT" key
in the config file. For all but "fits", the input catalogue is in ascii format.
SFORMAT Columns
---------------------------------
standard x y e1 e2 w
position x y [z ...] w
position_jack_num x y w njk
lensing_jack_num x y e1 e2 w njk
hamana x y X X e1 e2
fits Column types and names specified by COL_NAMES key
x, y [z ...]: position in n dimensions (n=2, 3)
njk: Number of Jackknife sample
e1, e2: ellipticity or shear (see 4.3.1 for the case of a scalar)
w: galaxy weight
X: unused, column is ignored
The position coordinate units have to be specified in the config file (key
"INPUT_COORD").
The first line of the catalogue can but need not contain the total number of
objects. If absent, the number of lines is counted before reading the catalogue
(which might be slow for long files).
--------------------
4.2 Multiple objects
--------------------
The input catalogue can contain multiple objects, that is, objects at the
identical positions. In that case, these objects are merged, which means that
the average properties (shear, weight etc.) will be used in the correlation.
(In practice, nodes which multiple objects are not spawned further into child
nodes, which ensures that parent node containing the average properties
represents the lowest level.)
The command line option '-t' switches off merging. In case of multiple objects,
the code will stop with an error message.
-------------------------
4.3 Output files (athena)
-------------------------
Diagnostic output is written to the file "log" and, in case the random number
generator is initialised, the random seed is written to the file "init". The
output correlation function files are in the same format as the input catalogue
(ascii or fits). The default names are:
shear-shear "xi"
(scalar-scalar: see 4.3.1)
shear-shear diagnostics "xiref" (with option '--out_xiref')
shear-position "wgl"
position-position "w"
In case of resampling (SERROR = "bootstrap" or "jackknife"), and ascii input
catalogues, files with resampling error (suffix ".resample") and resampling
covariance (suffix ".resample.cov") are created. In case of fits input and
output, the resampling errors and covariance are written as additional hdu's.
To display the fits output files in ascii format, use bin/fits2ascii.py .
The output file names can be set with command line options of 'athena'. The
file columns are as follows:
"xi"
1 theta Angular separation in SCOORD_OUTPUT units (bin center)
2 xi_p xi_+ = +
3 xi_m xi_- = -
4 xi_x xi_x =
5 w Total weight
6 sqrt_D Poisson error
7 sqrt_Dcor Corrected Poisson error (see below)
8 n_pair Number of pairs
"xiref" (with option '--out_xiref')
1 theta Angular separation in SCOORD_OUTPUT units (bin center)
2 g11
3 g22
4 g12
5 g21
6 w Total weight
7 sqrt(D) Poisson error
8 sqrt(Dcor) Corrected Poisson error (see below)
9 npair Number of pairs
"wgl"
1 theta Angular separation in SCOORD_OUTPUT units (bin center)
2 g_t Tangential shear
3 g_x Cross-shear
4 w Total weight
5 sqrt_D Poisson error
6 sqrt_Dcor Corrected Poisson error (see below)
7 n_pair Number of pairs.
"w"
1 theta Angular separation in SCOORD_OUTPUT units (bin center)
2 n_pair Number of pairs. For a single input catalogue, each
pair of galaxies is counted only once.
3 n_pair_resample_1 Number of pairs for re-sample #1
... ...
B+2 n_pair_resample_B Number of pairs for re-sample #B
The Poisson error sqrt(D) is the square root of the term D from eq. (27) of
Schneider, van Waerbeke, Kilbinger & Mellier (2002) for shear-shear, and using
an analogous expression for shear-position. The summation for the number of
pairs is carried out over the pairs of nodes that are correlated, and not over
all galaxy pairs. Therefore, the Poisson error depends on the open-angle
threshold (OATH in config file): it increases with increasing OATH. A
"corrected" Poisson error "sqrt(Dcor)" is therefore calculated, with is Dcor =
D / (Npair/Nnode), that is, in each bin, D is down-weighted by the ratio of the
number of galaxy pairs by the number of nodes.
----------------------------
4.3 Output files (pallas.py)
----------------------------
pallas.py reads an athena output file ('xi', 'wgl', or 'w', ascii or fits
format) and produces various output files according to the option '-w'. The
files with their default names are as follows, without extension ('txt' or
'fits'):
File Option ('-w') Description
------------------------------------------------------------------------------------
output_pkappa_ell Pl (biased) power spectrum
output_pkappa_band Pb (de-biased) band-power spectrum
output_map2_poly Mp aperture-mass dispersion with polynomial filter
output_map2_gauss Mg aperture-mass dispersion with Gaussian filter
------------------------------------------------------------------------------------
The columns in the output files are:
ell 2D wave number (centre in case of band-power)
P_E E-mode power spectrum
P_B B-mode power spectrum
P_EB (parity-violating) mixed E-/B-mode power spectrum
ell_lower lower band limit
ell_upper upper band limit
theta angular scale (same units as input correlation file)
E-mode aperture-mass dispersion
B-mode aperture-mass dispersion
(parity-violating) mixed E-/B-mode aperture-mass dispersion
-------------------------------
4.4 Scalar correlation function
-------------------------------
Instead of the shear-shear correlation function xi_p, one can easily obtain the
correlation function of a scalar k (e.g. kappa, magnification). Replace both e1
and e2 in the input catalogue with k/sqrt(2) and the 'xi_p' column in the file
'xi' will contain . The columns 'xi_m' and 'xi_x' are not defined in a
meaningful sense in that case.
---------------------
4.5 Jackknife formats
---------------------
Jackknife errors are calculated for SERROR = jackknife. This is supported for
all input formats (SFORMAT key). For the formats 'position_jack_num' and
'lensing_jack_num', the catalogue's format has to be 'Jackknife sample number',
where each galaxy comes with its Jackknife sample number, an integer ranging
from 0 to NRESAMPLE - 1. According to this number, the Jackknife samples are
created. For all other formats, no Jackknife sample number need to be present
in the input catalogue. In these cases, athena creates Jackknife samples
according to the number of resamples (NRESAMPLE key) and the field dimension.
For display purposes, athena prints to file the input catalogue together with
the list of Jackknife indices (0 or 1). This is the so-called 'Jackknife sample
list' format. An plot of galaxy positions with color-coded Jackknife sample can
be created with 'plot_cat_jack_list.pl'. To transform the two Jackknife
catalogue formats into each other, use 'jack2jack.pl'.
Jackknife sample numbers can be created from unique names via the script
jack_name2numer.pl . In the input catalogue, one column for each object has to
contain a string. The script replaces each string with a Jackknife sample
number. The string can for example be the pointing or chip name of a Mosaic
survey.
With the command line options:
--out_ALL_xip_resample NAME_XIP
--out_ALL_xim_resample NAME_XIM
athena puts out a file named NAME_XIP and or NAME_XIM, respectively.
These files contain:
1 theta Angular separation in SCOORD_OUTPUT units (bin center)
2 (xi+ or xi-)_1 xi+ or xi- for bootstrap sample #1
...
B+1 (xi+ or xi-)_B xi+ or xi- for bootstrap sample #B
==========
5. Weights
==========
For shear input catalogues, the galaxy weights are used to calculate the
barycenter b of each node. The ellipticities e_1, e_2 for the shear-correlation
are weighted by the galaxy weights. For the position-position correlation
(angular clustering), all galaxies have an equal weight (of unity), the code
does not calculate weighted angular correlation functions.
=============
6. Open angle
=============
If two nodes see each other under angles which are smaller than the open-angle
threshold (OATH in config file), the tree is not further descended and those
nodes are correlated directly. The mean, weighted galaxy ellipticities of the
both nodes are multiplied, the angular separation is the binned barycenter
distance. This is of course very time-saving, which is the great advantage of
a tree code over a brute-force approach. It introduces however errors in the
correlation function, in particular on large scales, because the galaxy
ellipticities are smeared out. One should play around with the value of OATH to
see its influence on the results. As a guideline, OATH = 0.03 (1.7 degrees) can
be used in most cases, although we recommend to run the code with OATH = 0.02
and compare the results.
========================================
7. Angular correlation function w(theta)
========================================
The calculation of the angular correlation function w(theta) requires the use
of random catalogues, to which the galaxy clustering is compared. Auto- and
cross- correlations are required for estimators of w(theta). All this is
implemented in the perl script
bin/woftheta_xcorr.pl
This script performs several steps. It creates a random catalogue with
(optionally) reading a mask file, by calling 'venice'. It makes coordinate
transformation of the catalogues, e.g. from spherical to Cartesian coordinates.
It calls 'athena' to obtain data-data, data-random and random-random spatial
correlations, for various redshift bins. Finally, the angular correlation
function is calculated using the Landy&Szalay (1993) and the Hamilton (1993)
estimators. Bootstrap and Poisson errors are added.
For N catalogues the auto- and cross-correlation functions are calculated.
Including all combinations of data and random catalogues, this makes N(N+1)/2 +
N + 1 = N^2/2 + 3N/2 + 1 calls to athena. These runs of athena can be performed
in parallel (use 'woftheta_xcorr.pl -p NPROC').
----------
7.1 Set-up
----------
To run the script, athena and venice have to be compiled (see Sect. 1). The
path to these programs is determined automatically at the top of
bin/woftheta_xcorr.pl. It is set to the directory 'bin' where
'woftheta_xcorr.pl' is located.
-----------
7.2 Running
-----------
When running woftheta_xcorr.pl without the flag "-n", files from a previous run
are deleted on query. If a file is not deleted, the corresponding task will
read this file instead of redo the calculation. This can be time-saving, e.g.
when one decides to increase the size of the random catalogue, the data-data
correlations do not have to be recalculated. Use the flag '-p NPROC' to run
the angular correlation program ('athena') in parallel with NPROC processes.
---------------
7.2 Config file
---------------
A configuration file is read by the script. See test_w/config_woftheta for an
example. All parameters for the tree code are set as well in this config file.
---------------
7.3 Input files
---------------
The input catalogue names have to be of the form
"path/data*"
where {i} is a list of indices specifying the catalogues. These can be
integers, names, redshift ranges, etc. The variables "path" and "data", and the
list of redshifts are read from the config file.
The input file format has to be ascii (fits is supported in athena, but not yet
in woftheta_xcorr.pl).
The columns of the position x, y, and (optional) a jackknife sample number
(integer) are specified in the config file. The coordinates are either
spherical [rad] or Cartesian [arcsec]. If the jackknife sample number is not
given and error = "jackknife", the code automatically creates jackknife samples
by dividing the data area into Nx times Ny squares. The closest integer number
Nx * Ny to nresample is chosen.
The input catalogues may contain comment lines, starting with "#".
The mask file (optional) has to be a ds9 region file containing polygons
indicating the masked (unused) region.
----------------
7.4 Output files
----------------
Apart from a number of log files, woftheta_xcorr.pl produces the following main
output files:
w.DiDj, w.DjR, w.DpjR, w.RR Binned number of pairs for data-data, data-random,
random-random. The first is obtained for all
redshift combinations i<=j. The files also contain
the resampled pairs. In case of two input
catalogues (config_woftheta:catbase2), the w.DpjR are
the pairs of data2-random1.
w_theta_i_j_{LS|Ham}.dat Binned angular correlation function for the Landy-
Szalay/Hamilton estimators, with Poisson and
resampled errors.
cov_i_j_{LS|Ham}.dat Resample covariance for the Landy-Szalay/Hamilton
estimators.
With option '-r':
w_theta_i_j_resample_{LS|Ham}.dat
All resampled correlation functions.
There is no double counting of identical pairs if the two catalogues are the
same, i.e. for w.DiDi and w.RR. These files contain therefore half the number
of total pairs. Note that two identical files with different file names are
treated as two different catalogues, and therefore, each pair is counted twice.
This might lead to unexpected results.
==============
8. Projections
==============
The scripts 'bin/cat2gal.pl' and bin/center_gal.pl' can be used to project an
input catalogue from spherical to Cartesian coordinates, and to produce an
output catalogue which is in athena-input format. Two projections are
implemented:
1. Gnomonic or 'tan' (recommended):
This projection is described in Calabretta&Greisen (2002). It is a zenithal
perspective projection of the sphere onto a plane using the sphere's center
as reference point. Great circles are transformed into straight lines.
2.'cos(delta_c)'
The spherical coordinates ra and dec (alpha, delta) are projected to (x, y)
as:
x = alpha * cos(delta_c)
y = delta
where delta_c is the barycenter of the galaxy input catalogue.
Note that only the position coordinates are projected, not the ellipticities for
a lensing catalogue.
============================================================
9. Convergence power spectrum and smooth 2nd-order functions
============================================================
The python script
bin/pallas.py
computes various shear second-order functions as integrals over the shear
two-point correlation function. These are given in Schneider, van Waerbeke,
Kilbinger & Mellier (2002), A&A 396, 1 (SvWKM02)
- Convergence band power spectrum, SvWKM02 (49).
- Convergence power spectrum (biased estimator if the 2PCF is given on a
finite angular interval), SvWKM02 (45).
- Aperture-mass dispersion for the Gaussian and polynomial filter, SvWKM02 (16).
Type 'pallas.py -h' for help. Python version 3.x is required.
The name 'pallas' stands for 'Power-spectrum for kAppa of eLL bAnd eStimator'.
(In greek mythology, Pallas was Athena's companion, and was accidentally killed
by her.)
====================
10. Testing the code
====================
The directories 'test_xi', 'test_w' and 'test_g' contain sample configuration
files and test data. The test data files have to be downloaded as a separate
tar ball from the athena page. Run 'athena' in 'test_xi' and 'test_g', and
'woftheta_xcorr.pl' in 'test_w'. The outcome can be checked by comparing with
the files in the subdirectories 'results'.
The script
bin/test_suite_athena.py
runs all predefined tests autoamtically and writes the results into a text file.
==============
11. Known bugs
==============
* If the input catalogue contains non-numerical/non-ascii signs the tree code
will most probably die with a segmentation fault.
* Very large catalogues causes memory allocation to fail during the tree
construction phase.
* For the angular cross-correlation between two catalogues (config_woftheta:catbase2),
if two random catalogues are used (config_woftheta:random2), projection onto Cartesian
coordinates can introduce unwanted correlations. This is because each random
catalogue is projected to the centre of the associated data catalogue, which can
be slightly different. To avoid this, we recommend to either not project but use
spherical coordinates (config_woftheta:project=none), or to provide properly
projected catalogues and not let woftheta_xcorr.pl perform the correlation.
* Constructing the tree for nodes at large absolute declination, near the poles,
can be significantly slower than for low-delta nodes, since the node-splitting does
not (yet) account for spherical coordinates, making it inefficient but most likely
not biased.
===================
12. Acknowledgments
===================
We thank Mike Jarvis and Jan Hartlap for sharing their respective tree codes,
and Pablo Fosalba, Eric Jullo, Barnaby Rowe, Peter Schneider, and Lorne
Whiteway for valuable discussions. Jonathan Benjamin is thanked for his
thorough testing, bug-hunting and documenting the flaws and features of athena.
His findings have been encorporated in versions >= 1.3 of athena. H.J.
McCracken is thanked for sharing the idea of fast bootstrap calculations. We
thank L. Miller for clarifications concerning angles on spheres. We thank
Matthieu Béthermin, Ami Choi, Jörg Dietrich, and Oliver Friedrich for pointing
out various bugs in the code.
===========
13. Contact
===========
Questions, feedback, criticism, bug reports, suggestions, actually anything
related to the code is very welcome at martin.kilbinger@cea.fr .
*