Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
module load EasyBuild
wget https://raw.githubusercontent.com/easybuilders/easybuild-easyconfigs/master/easybuild/easyconfigs/m/Miniconda3/Miniconda3-4.5.12.eb
eb Miniconda3-4.5.12.eb
module avail
module load Miniconda3
conda --version

Submit a Multistage Job

This git repository serves as an example of many of the steps involved in submitting a more complex job. To begin, clone this git repository into your home directory on the HPC.

Code Block
languagebash
git clone https://github.com/utecht/hpc_tut
cd hpc_tut

To find a non-trivial amount of work to do, this example will follow the Moving Pictures tutorial of the QIIME2 microbiome toolset. This will require installing and setting up the QIIME2 tool on the HPC. To accomplish this we will use EasyBuild and a set of EasyBuild scripts found on their online repository. https://github.com/easybuilders/easybuild-easyconfigs

Code Block
languagebash
cd dependencies
module load EasyBuild
eb Miniconda3-4.4.10.eb
eb QIIME2-2019.1.eb
cd ..

Now, while on the login node, we can ensure that the QIIME2 tool has installed properly. If loading the module fails, read the output from module for advice on how to proceed.

Code Block
languagebash
module avail
module load QIIME2
qiime --version

Next we need to download the sample data for the tutorial and edit the step scripts to have your email and home path. This is all done with the initialize.sh script, which can serve as a reference for downloading data from the internet or mass find and replace of files.

Code Block
languagebash
./initialize.sh

Now that the data is in place and the module has been installed it is time to submit the actual work to the job queue and wait for the emails to roll in. The Moving Pictures tutorial has roughly 20 QIIME2 commands to run which produce various outputs. Some of these commands are quite intense and others will rely on previous output before they can run. On the HPC there is a set cost to spinning up a job on a node. Therefore often you will want to chain many commands together rather than submitting a job for each command that needs to be run. For the Moving Pictures tutorial I have broken the 20 commands into 3 sections, an initial very CPU heavy import/demultiplex/align and then 2 low CPU jobs in which smaller calculations are performed with the output from the first job. Looking at the top of the import.script with the head command you can see that it requests 1 node with 36 cpus, while the other two scripts just take the queue default of a single CPU.

Code Block
languagebash
cd steps
head -n20 import.script

Each of these scripts could be queued just by running the qsub command. However the second two scripts cannot run until the first job has finished. The qsub command takes a depend=afterok:<jobid> argument, which will submit those jobs onto the queue with a hold until the first has successfully exited. The runall.sh script shows how to schedule multiple jobs at a time while capturing their jobid to feed into later required jobs.

Code Block
languagebash
cat run_all.sh
./run_all.sh

Now you can sit back and wait for the emails to come pouring in. The first step takes roughly 10 minutes to complete and the later ones about half as long each, although they will run at the same time. Use the commands from the "Monitoring Jobs" section above to watch their progress.

Once all of the jobs have finished it is time to pack up your results and scp them back to your local machine to analyze the various visualizations and result tables.

Code Block
languagebash
cd ..
tar cfz results.tar.gz results/
exit
scp <username>@login.hpc.uams.edu:hpc_tut/results.tar.gz .
tar xfz results.tar.gz