The Canadian Advanced Network for Astronomy Research (CANFAR) is a consortium that serves data-intensive storage, access, and processing needs of university groups and centres engaged in astronomy research.
To get started on CANFAR you will need to do the following:
- Request an account on CADC
- Send an email to CANFAR support with your CADC username, the resources your need and a few sentences explaining what you are working on.
The virtual machine (VM) is a space where we can install software under a given Linux distribution with given CPU, RAM and storage limits. Once we are happy with a given set-up we can freeze these conditions (i.e. all the software versions etc. currently installed) by creating a snapshot that acts like a container for the VM. Jobs can then be submitted through the batch system using a given snapshot.
Note: All processing (except very minor tests) should be done through the batch system and not run directly on the VM.
These are the recommended steps to follow in order to set up a VM on CANFAR:
Create a VM:
Follow the instructions on CANFAR quick start.
VMs can be managed on OpenStack.
Note: An IP address has to be assigned to the VM in order to be able to log in and there are a limited number of IPs per workspace.
SSH to VM:
Run the following to connect to a given VM:
This will connect you to a generic ubuntu user space, shared between all users. Once connected software etc. can be installed and tested.
Note: You should only really be connecting to the VM with the intention of creating a new snapshot or running tests with the current set-up. Avoid making any software changes not intended for a new snapshot. Note: The person who creates the VM will have to manually added the SSH keys of any other potential user.
Install the following tools:
sudo apt update sudo apt install git sudo apt install make sudo apt install autoconf sudo apt install libtool
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh bash
Install VOSPACE client:
pip install vos
The VOSPACE client is needed to transfer data to/from the VOSPACE.
Generate certificate to access VOSPACE:
cadc-get-cert -u USERNAME
When asked, enter your CADC password. A CADC certificate is also needed to transfer data to/from the VOSPACE.
Create a snapshot of the VM status:
On OpenStack under "Instances" click the "Create Snapshot" button for the corresponding VM. Be sure to follow the snapshot naming scheme defined for the VM above.
Update VM and create a new snapshot:
The VM set-up only needs to be done once, afterwards the VM can simply be modified for new snapshots. e.g. pull the latest changes to a given software package and install any new dependencies then repeat step 7.
The batch system is a server where jobs can be submitted to the CANFAR cluster using a previously defined snapshot.
Note: You will have to request access to the batch system before you can connect.
SSH to batch system:
You can connect to the batch system as follows:
You will be connected to a personal user space.
Source OpenStack environment variables, e.g. for the lensing project:
When asked, enter your CADC password. This is a necessary step before submitting jobs.
Create a bash script, for example:
The bash script defines the command lines to be run on the snapshot. The following example script demonstrates how to:
- activate the ShapePipe environment,
- create an output directory,
- copy a configuration file to the snapshot from the VOSPACE,
- run ShapePipe,
- and copy the output back to the VOSPACE
#!/bin/bash export VM_HOME=/home/ubuntu source $VM_HOME/miniconda3/bin/activate <MY_CONDA_ENV> mkdir output <MY_SCRIPT>.py -o output vcp --certfile=$VM_HOME/.ssl/cadcproxy.pem output vos:cfis/cosmostat/<USERNAME>
Note: The default path for a snapshot is not the
/home/ubuntudirectory, hence the definition of the
Create a job file, for example:
The job file defines the script to be run (i.e. the bash script previously defined), the corresponding outputs and the computational requirements for the job.
executable = <MY_SCRIPT>.bash output = <MY_SCRIPT>.out error = <MY_SCRIPT>.err log = <MY_SCRIPT>.log # Make sure the requested resources do not exceed what was # specified for the VM request_cpus = 1 request_memory = 8G request_disk = 10G queue
Submit a job:
Jobs are submitted using the
canfar_submitcommand followed by the previously defined job file, the name of the desired snapshot and the flavour of the corresponding VM.
> canfar_submit JOB_FILE SNAP_SHOT FLAVOUR
This command tells you running, idle, and held jobs for you and other users.
Information for your own jobs only:
From there you can get the job ID, which lets you examine your job more closely:
condor_q -better-analyse <ID>
You can do an
ssh to the VM that is (or will be) running your job for checking:
condor_ssh_to_job -auto-retry <ID>
For multi-job submissions, the JOB_IDS has subnumbers, e.g.
1883.0-9. You can
ssh to each of those VMs, with e.g.
condor_ssh_to_job -auto-retry 1883.6
If the above condor commands do not help, try:
to check the status of all VMs.
Sometime a snap shot image is not (yet) active and shared, since its creation can take a lot of time. Check the status with:
openstack image show -c visibility -c status <SnapShotName>
When status = active, the job can be started. The field visibiltiy has value private before first use, which afterwards changes to shared.
In general, a job should be started within 5 - 10 minutes. This time will increase if the queue is full. If the job is launched before the snap shot status is active, it might be stuck in the queue for a long time (for ever?).
Contact Seb on the CANFAR slack channel, he usually replies quickly, sometimes there are issues that only he can fix.