================================================ athena Version 1.7 (03/2014) Tree code for two-point correlation functions power spectra, and smooth second-order functions Authors: Martin Kilbinger Christopher Bonnett Jean Coupon 2007-2014 ================================================ ===================== Table of content: 1. Compiling the code 2. Running the code 3. Configuration file 4. File formats 5. Weights 6. Open angle 7. Angular correlation function w(theta) 8. Projections 9. Convergence power spectrum and smooth 2nd-order functions 10. Testing the code 11. Known bugs 12. Acknowledgments 13. Contact ===================== ===================== 1. Compiling the code ===================== Download the code, and (optional) the test data, from http://cosmostat.org/athena (see also http://ascl.net/athena), and un-tar it. Two options to compile the code exists: 1) Using cmake to create a Makefile and to find libraries e.g. cfitsio. Compile the code in a build directory. E.g. in the athena root directory athena_1.7, type cd build cmake .. make && make install 'make install' will just copy the executables athena and venice to ../bin. 2) Using make. cd src make -f Makefile.athena No external packages or libraries are compulsory. However, venice (see Sect. 7.1), needed to obtain random catalogues for spatial correlation functions, requires GSL (GNU Scientific Library, www.gls.org). Further, for (optional) FITS file format support, the cfitsio library is used. cmake automatically finds those libraries. For option 2, edit the variables 'GSL', 'FITS', and 'LIBFITS' in src/Makefile.athena . =================== 2. Running the code =================== The command athena_1.7/bin/athena [-c config_file] runs the tree code. A configuration file has to be present, see Sect. 3. Type 'athena -h' for command line options. Test data sets and config files can be found in the directories test_xi (shear-shear), test_g (shear-position), and test_w (position-position), all subdirectories of test. See Sect. 7 to calculate the full (position-position) angular correlation function, including the generation and use of random catalogues. A test script exists that runs athena and pallas.py, which produces output that can be compared to reference output (in test/test_*/results). To run the tests, type in athena_1.7: bin/test_suite_athena.py ===================== 3. Configuration file ===================== The configuration file (first command line argument; default "config_tree") has to contain the following entries, in this order: GALCAT1 string First galaxy catalogue, ascii or fits format. For shear-position, this is the background (shear) catalogue GALCAT2 string Second catalogue (for cross-correlation. Can be same as GALCAT1 or "-" for auto-correlation). Same file format as first catalogue. For shear-position, this is the foreground (position) catalogue WCORR integer Type of correlation: 1: shear-shear (aka cosmic shear) 2: shear-position (aka galaxy-lensing) 4: position-position (aka w of theta) # If WCORR is nn: SWCORR_SUBTYPE string Correlation sub-type: nn_2d: 2D (angular) correlation nn_3d: 3D correlation (NOT WELL TESTED) nn_rp_pi: 3D correlation, separated in parallel and tangential coordinates (NOT WELL TESTED) # endif SFORMAT string Format of the catalogue, see gal_cat.c. One of "standard", "position", "position_jack_num", "lensing_jack_num", "hamana", and "fits", see below # If SFORMAT is fits: NCOL integer Number of columns to be read from fits file COL_NAMES NCOL strings Names of the NCOL columns. Each string has the format "type:name", where type can be x (x-coordinate), y (y-coordinate), z (z-coordinate), e1 (first shear component), e2 (second shear component), w (weight), njk (resample number, e.g. Jackknife sample number). For example, for a shear input catalogue: x:RA y:DEC e1:E1 e2:E2 w:WEIGHT #endif SCOORD_INPUT string Coordinate units of input catalogue(s). One of "arcsec", "arcmin", "deg", "rad" SCOORD_OUTPUT string Coordinate units of output file(s), i.e. correlation data. One of "arcsec", "arcmin", "deg", "rad" THMIN double Minimum angular separation, in unit as specified by "SCOORD_OUTPUT" THMAX double Maximum angular separation, in unit as specified by "SCOORD_OUTPUT" NTH integer Number of angular bins BINTYPE string "LIN" or "LOG" for linear or logarithmic bins RADEC integer 0 (Cartesian coordinates) or > 0 (spherical coordinates) OATH double Open angle [rad], the smaller the slower but more precise is the result. OATH=0 is equivalent to brute-force SERROR string Error type, 'none', 'bootstrap' or 'jackknife' NRESAMPLE int int Number of samples for error calculations, one number for each catalogue. Ignored if SERROR = 'none' =============== 4. File formats =============== -------------------------- 4.1 Input catalogue format -------------------------- The following formats are supported, which can be chosen by the "SFORMAT" key in the config file. For all but "fits", the input catalogue is in ascii format. SFORMAT Columns --------------------------------- standard x y e1 e2 w position x y [z ...] w position_jack_num x y w njk lensing_jack_num x y e1 e2 w njk hamana x y X X e1 e2 fits Column types and names specified by COL_NAMES key x, y [z ...]: position in n dimensions (n=2, 3) njk: Number of Jackknife sample e1, e2: ellipticity or shear (see 4.3.1 for the case of a scalar) w: galaxy weight X: unused, column is ignored The position coordinate units have to be specified in the config file (key "INPUT_COORD"). The first line of the catalogue can but need not contain the total number of objects. If absent, the number of lines is counted before reading the catalogue (which might be slow for long files). -------------------- 4.2 Multiple objects -------------------- The input catalogue can contain multiple objects, that is, objects at the identical positions. In that case, these objects are merged, which means that the average properties (shear, weight etc.) will be used in the correlation. (In practice, nodes which multiple objects are not spawned further into child nodes, which ensures that parent node containing the average properties represents the lowest level.) The command line option '-t' switches off merging. In case of multiple objects, the code will stop with an error message. ------------------------- 4.3 Output files (athena) ------------------------- Diagnostic output is written to the file "log" and, in case the random number generator is initialised, the random seed is written to the file "init". The output correlation function files are in the same format as the input catalogue (ascii or fits). The default names are: shear-shear "xi" (scalar-scalar: see 4.3.1) shear-shear diagnostics "xiref" (with option '--out_xiref') shear-position "wgl" position-position "w" In case of resampling (SERROR = "bootstrap" or "jackknife"), and ascii input catalogues, files with resampling error (suffix ".resample") and resampling covariance (suffix ".resample.cov") are created. In case of fits input and output, the resampling errors and covariance are written as additional hdu's. To display the fits output files in ascii format, use bin/fits2ascii.py . The output file names can be set with command line options of 'athena'. The file columns are as follows: "xi" 1 theta Angular separation in SCOORD_OUTPUT units (bin center) 2 xi_p xi_+ = + 3 xi_m xi_- = - 4 xi_x xi_x = 5 w Total weight 6 sqrt_D Poisson error 7 sqrt_Dcor Corrected Poisson error (see below) 8 n_pair Number of pairs "xiref" (with option '--out_xiref') 1 theta Angular separation in SCOORD_OUTPUT units (bin center) 2 g11 3 g22 4 g12 5 g21 6 w Total weight 7 sqrt(D) Poisson error 8 sqrt(Dcor) Corrected Poisson error (see below) 9 npair Number of pairs "wgl" 1 theta Angular separation in SCOORD_OUTPUT units (bin center) 2 g_t Tangential shear 3 g_x Cross-shear 4 w Total weight 5 sqrt_D Poisson error 6 sqrt_Dcor Corrected Poisson error (see below) 7 n_pair Number of pairs. "w" 1 theta Angular separation in SCOORD_OUTPUT units (bin center) 2 n_pair Number of pairs. For a single input catalogue, each pair of galaxies is counted only once. 3 n_pair_resample_1 Number of pairs for re-sample #1 ... ... B+2 n_pair_resample_B Number of pairs for re-sample #B The Poisson error sqrt(D) is the square root of the term D from eq. (27) of Schneider, van Waerbeke, Kilbinger & Mellier (2002) for shear-shear, and using an analogous expression for shear-position. The summation for the number of pairs is carried out over the pairs of nodes that are correlated, and not over all galaxy pairs. Therefore, the Poisson error depends on the open-angle threshold (OATH in config file): it increases with increasing OATH. A "corrected" Poisson error "sqrt(Dcor)" is therefore calculated, with is Dcor = D / (Npair/Nnode), that is, in each bin, D is down-weighted by the ratio of the number of galaxy pairs by the number of nodes. ---------------------------- 4.3 Output files (pallas.py) ---------------------------- pallas.py reads an athena output file ('xi', 'wgl', or 'w', ascii or fits format) and produces various output files according to the option '-w'. The files with their default names are as follows, without extension ('txt' or 'fits'): File Option ('-w') Description ------------------------------------------------------------------------------------ output_pkappa_ell Pl (biased) power spectrum output_pkappa_band Pb (de-biased) band-power spectrum output_map2_poly Mp aperture-mass dispersion with polynomial filter output_map2_gauss Mg aperture-mass dispersion with Gaussian filter ------------------------------------------------------------------------------------ The columns in the output files are: ell 2D wave number (centre in case of band-power) P_E E-mode power spectrum P_B B-mode power spectrum P_EB (parity-violating) mixed E-/B-mode power spectrum ell_lower lower band limit ell_upper upper band limit theta angular scale (same units as input correlation file) E-mode aperture-mass dispersion B-mode aperture-mass dispersion (parity-violating) mixed E-/B-mode aperture-mass dispersion ------------------------------- 4.4 Scalar correlation function ------------------------------- Instead of the shear-shear correlation function xi_p, one can easily obtain the correlation function of a scalar k (e.g. kappa, magnification). Replace both e1 and e2 in the input catalogue with k/sqrt(2) and the 'xi_p' column in the file 'xi' will contain . The columns 'xi_m' and 'xi_x' are not defined in a meaningful sense in that case. --------------------- 4.5 Jackknife formats --------------------- Jackknife errors are calculated for SERROR = jackknife. This is supported for all input formats (SFORMAT key). For the formats 'position_jack_num' and 'lensing_jack_num', the catalogue's format has to be 'Jackknife sample number', where each galaxy comes with its Jackknife sample number, an integer ranging from 0 to NRESAMPLE - 1. According to this number, the Jackknife samples are created. For all other formats, no Jackknife sample number need to be present in the input catalogue. In these cases, athena creates Jackknife samples according to the number of resamples (NRESAMPLE key) and the field dimension. For display purposes, athena prints to file the input catalogue together with the list of Jackknife indices (0 or 1). This is the so-called 'Jackknife sample list' format. An plot of galaxy positions with color-coded Jackknife sample can be created with 'plot_cat_jack_list.pl'. To transform the two Jackknife catalogue formats into each other, use 'jack2jack.pl'. Jackknife sample numbers can be created from unique names via the script jack_name2numer.pl . In the input catalogue, one column for each object has to contain a string. The script replaces each string with a Jackknife sample number. The string can for example be the pointing or chip name of a Mosaic survey. With the command line options: --out_ALL_xip_resample NAME_XIP --out_ALL_xim_resample NAME_XIM athena puts out a file named NAME_XIP and or NAME_XIM, respectively. These files contain: 1 theta Angular separation in SCOORD_OUTPUT units (bin center) 2 (xi+ or xi-)_1 xi+ or xi- for bootstrap sample #1 ... B+1 (xi+ or xi-)_B xi+ or xi- for bootstrap sample #B ========== 5. Weights ========== For shear input catalogues, the galaxy weights are used to calculate the barycenter b of each node. The ellipticities e_1, e_2 for the shear-correlation are weighted by the galaxy weights. For the position-position correlation (angular clustering), all galaxies have an equal weight (of unity), the code does not calculate weighted angular correlation functions. ============= 6. Open angle ============= If two nodes see each other under angles which are smaller than the open-angle threshold (OATH in config file), the tree is not further descended and those nodes are correlated directly. The mean, weighted galaxy ellipticities of the both nodes are multiplied, the angular separation is the binned barycenter distance. This is of course very time-saving, which is the great advantage of a tree code over a brute-force approach. It introduces however errors in the correlation function, in particular on large scales, because the galaxy ellipticities are smeared out. One should play around with the value of OATH to see its influence on the results. As a guideline, OATH = 0.03 (1.7 degrees) can be used in most cases, although we recommend to run the code with OATH = 0.02 and compare the results. ======================================== 7. Angular correlation function w(theta) ======================================== The calculation of the angular correlation function w(theta) requires the use of random catalogues, to which the galaxy clustering is compared. Auto- and cross- correlations are required for estimators of w(theta). All this is implemented in the perl script bin/woftheta_xcorr.pl This script performs several steps. It creates a random catalogue with (optionally) reading a mask file, by calling 'venice'. It makes coordinate transformation of the catalogues, e.g. from spherical to Cartesian coordinates. It calls 'athena' to obtain data-data, data-random and random-random spatial correlations, for various redshift bins. Finally, the angular correlation function is calculated using the Landy&Szalay (1993) and the Hamilton (1993) estimators. Bootstrap and Poisson errors are added. For N catalogues the auto- and cross-correlation functions are calculated. Including all combinations of data and random catalogues, this makes N(N+1)/2 + N + 1 = N^2/2 + 3N/2 + 1 calls to athena. These runs of athena can be performed in parallel (use 'woftheta_xcorr.pl -p NPROC'). ---------- 7.1 Set-up ---------- To run the script, athena and venice have to be compiled (see Sect. 1). The path to these programs is determined automatically at the top of bin/woftheta_xcorr.pl. It is set to the directory 'bin' where 'woftheta_xcorr.pl' is located. ----------- 7.2 Running ----------- When running woftheta_xcorr.pl without the flag "-n", files from a previous run are deleted on query. If a file is not deleted, the corresponding task will read this file instead of redo the calculation. This can be time-saving, e.g. when one decides to increase the size of the random catalogue, the data-data correlations do not have to be recalculated. Use the flag '-p NPROC' to run the angular correlation program ('athena') in parallel with NPROC processes. --------------- 7.2 Config file --------------- A configuration file is read by the script. See test_w/config_woftheta for an example. All parameters for the tree code are set as well in this config file. --------------- 7.3 Input files --------------- The input catalogue names have to be of the form "path/data" where {i} is a list of indices specifying the catalogues. These can be integers, names, redshift ranges, etc. The variables "path" and "data", and the list of redshifts are read from the config file. The input file format has to be ascii (fits is supported in athena, but not yet in woftheta_xcorr.pl). The columns of the position x, y, and (optional) a jackknife sample number (integer) are specified in the config file. The coordinates are either spherical [rad] or Cartesian [arcsec]. If the jackknife sample number is not given and error = "jackknife", the code automatically creates jackknife samples by dividing the data area into Nx times Ny squares. The closest integer number Nx * Ny to nresample is chosen. The input catalogues may contain comment lines, starting with "#". The mask file (optional) has to be a ds9 region file containing polygons indicating the masked (unused) region. ---------------- 7.4 Output files ---------------- Apart from a number of log files, woftheta_xcorr.pl produces the following main output files: w.DiDj, w.DjR, w.DpjR, w.RR Binned number of pairs for data-data, data-random, random-random. The first is obtained for all redshift combinations i<=j. The files also contain the resampled pairs. In case of two input catalogues (config_woftheta:catbase2), the w.DpjR are the pairs of data2-random1. w_theta_i_j_{LS|Ham}.dat Binned angular correlation function for the Landy- Szalay/Hamilton estimators, with Poisson and resampled errors. cov_i_j_{LS|Ham}.dat Resample covariance for the Landy-Szalay/Hamilton estimators. With option '-r': w_theta_i_j_resample_{LS|Ham}.dat All resampled correlation functions. There is no double counting of identical pairs if the two catalogues are the same, i.e. for w.DiDi and w.RR. These files contain therefore half the number of total pairs. Note that two identical files with different file names are treated as two different catalogues, and therefore, each pair is counted twice. This might lead to unexpected results. ============== 8. Projections ============== The scripts 'bin/cat2gal.pl' and bin/center_gal.pl' can be used to project an input catalogue from spherical to Cartesian coordinates, and to produce an output catalogue which is in athena-input format. Two projections are implemented: 1. Gnomonic or 'tan' (recommended): This projection is described in Calabretta&Greisen (2002). It is a zenithal perspective projection of the sphere onto a plane using the sphere's center as reference point. Great circles are transformed into straight lines. 2.'cos(delta_c)' The spherical coordinates ra and dec (alpha, delta) are projected to (x, y) as: x = alpha * cos(delta_c) y = delta where delta_c is the barycenter of the galaxy input catalogue. Note that only the position coordinates are projected, not the ellipticities for a lensing catalogue. ============================================================ 9. Convergence power spectrum and smooth 2nd-order functions ============================================================ The python script bin/pallas.py computes various shear second-order functions as integrals over the shear two-point correlation function. These are given in Schneider, van Waerbeke, Kilbinger & Mellier (2002), A&A 396, 1 (SvWKM02) - Convergence band power spectrum, SvWKM02 (49). - Convergence power spectrum (biased estimator if the 2PCF is given on a finite angular interval), SvWKM02 (45). - Aperture-mass dispersion for the Gaussian and polynomial filter, SvWKM02 (16). Type 'pallas.py -h' for help. Python version 3.x is required. The name 'pallas' stands for 'Power-spectrum for kAppa of eLL bAnd eStimator'. (In greek mythology, Pallas was Athena's companion, and was accidentally killed by her.) ==================== 10. Testing the code ==================== The directories 'test_xi', 'test_w' and 'test_g' contain sample configuration files and test data. The test data files have to be downloaded as a separate tar ball from the athena page. Run 'athena' in 'test_xi' and 'test_g', and 'woftheta_xcorr.pl' in 'test_w'. The outcome can be checked by comparing with the files in the subdirectories 'results'. The script bin/test_suite_athena.py runs all predefined tests autoamtically and writes the results into a text file. ============== 11. Known bugs ============== * If the input catalogue contains non-numerical/non-ascii signs the tree code will most probably die with a segmentation fault. * Very large catalogues causes memory allocation to fail during the tree construction phase. * For the angular cross-correlation between two catalogues (config_woftheta:catbase2), if two random catalogues are used (config_woftheta:random2), projection onto Cartesian coordinates can introduce unwanted correlations. This is because each random catalogue is projected to the centre of the associated data catalogue, which can be slightly different. To avoid this, we recommend to either not project but use spherical coordinates (config_woftheta:project=none), or to provide properly projected catalogues and not let woftheta_xcorr.pl perform the correlation. * Constructing the tree for nodes at large absolute declination, near the poles, can be significantly slower than for low-delta nodes, since the node-splitting does not (yet) account for spherical coordinates, making it inefficient but most likely not biased. =================== 12. Acknowledgments =================== We thank Mike Jarvis and Jan Hartlap for sharing their respective tree codes, and Pablo Fosalba, Eric Jullo, Barnaby Rowe, Peter Schneider, and Lorne Whiteway for valuable discussions. Jonathan Benjamin is thanked for his thorough testing, bug-hunting and documenting the flaws and features of athena. His findings have been encorporated in versions >= 1.3 of athena. H.J. McCracken is thanked for sharing the idea of fast bootstrap calculations. We thank L. Miller for clarifications concerning angles on spheres. We thank Matthieu Béthermin, Ami Choi, Jörg Dietrich, and Oliver Friedrich for pointing out various bugs in the code. =========== 13. Contact =========== Questions, feedback, criticism, bug reports, suggestions, actually anything related to the code is very welcome at martin.kilbinger@cea.fr .