Logging In
Code Block |
---|
|
ssh <username>@login.hpc.uams.edu |
...
You are now on the HPC login node. From here you can stage your data and jobs to be submitted to the computational nodes in the cluster. You can view the current load of the overall system from the login node with the showstate
showq
command.
Submit a Simple JobWhile the login node is a relatively powerful server, it should not be used to do any actual work, as that could impede others ability to use the system. We use Torque Slurm to manage jobs and resources on the cluster. The qsub
program srun
and sbatch
programs will be your primary interface for submitting jobs to the cluster. In its simplest form you can feed it a command on standard input and it will schedule and run a job. Here we will schedule a single command lscpu
to run using all of the defaults
Code Block |
---|
|
echosrun lscpu | qsub |
The output from jobs will end up in your home directory as they run. Since we didn't name the job we just submitted you will find a file called STDIN.o##JOBID##, which will contain the standard output of the lscpu command once it has finished running on the node it was assigned.
Code Block |
---|
|
less STDIN.o######## |
this job will print to directly to your terminal. This can be useful for very simple commands or testing, however normally you will submit more complex jobs as a batch file.
Submit a Scripted JobThe qsub
sbatch
program takes many arguments to control where the job will be scheduled and can be fed a script of commands and arguments to be run instead of just feeding them in through a pipe. We will now create a script which will both contain the arguments and actual commands to be run.
...
Code Block |
---|
|
#!/bin/bash
#PBS -k o #<---- Only keep the standard out
#PBS -M #SBATCH --mail-user=<YOUR_EMAIL>@uams.edu. #<---- Email address to notify
#PBS#SBATCH -m abe-mail-type=ALL #<---- Status to notify the email (abort,begin,end)
#PHS -N
#SBATCH --job-name=CPUinfo #<---- Name of this job
#PBS -j oe #<---- Join both the standard out and standard error streams
#<---- Commands below this point will be run on the assigned node
echo "Hello HPC"
lscpu
echo "Goodbye HPC" |
Once this script is created it can be run by passing it to the qsub
sbatch
program. After this job has finished there will now be a file named cpuinfoslurm-#####.o###### out in your home directory which will contain the output.
Code Block |
---|
|
qsubsbatch cpuinfo.script |
When submitting a script you can also pass arguments on the command line to qsub
sbatch
. Here we submit the lscpu
script again, except this time we ask for a node with a xeon processor. Compare the outputs of the two jobs, or experiment with different features constraints that can be requested.
Code Block |
---|
|
qsubsbatch -l feature-constraint=xeon cpuinfo.script |
Monitoring Jobs
Jobs so far have been quick to run, often though you will want to monitor longer running jobs. Remember that the showstate
showq
program will display the state of the entire cluster. There are many other programs which can help you monitor your own state and jobs.
This program will display your current and recent jobs submitted to the cluster. The S
column contains the current status of your jobs.
This This option will print the full status of current jobs and is useful for finding the exec_host
of a running job. Knowing the host will allow you to peek in a few ways at what the node is currently doing.
...
Code Block |
---|
|
pdsh -w <nodename> free -h
pdsh -w <nodename> uptime
pdsh -w <nodename> top -b -n1 |
Installing Software
The HPC has some software packages already installed, however they will need to be activated using Lmod. You can browse avaliable available modules or search for them and see descriptions with these commands.
...
One of the most useful modules is EasyBuild. This is a build and installation framework designed for HPCs. Many scientific toolsets tool sets can be installed using it, once they are, they can be activated using the module commands above. However, EasyBuild will always have to be loaded first, before anything installed with it can be loaded, the module spider <search>
command will explain this if you forget.
Code Block |
---|
|
module load EasyBuild
wget https://raw.githubusercontent.com/easybuilders/easybuild-easyconfigs/master/easybuild/easyconfigs/m/Miniconda3/Miniconda3-4.5.12.eb
eb Miniconda3-4.5.12.eb
module avail
module load Miniconda3
conda --version |
Submit a Multistage Job
This git repository serves as an example of many of the steps involved in submitting a more complex job. To begin, clone this git repository into your home directory on the HPC.
Code Block |
---|
|
git clone https://github.com/utecht/hpc_tut
cd hpc_tut |
To find a non-trivial amount of work to do, this example will follow the Moving Pictures tutorial of the QIIME2 microbiome toolset. This will require installing and setting up the QIIME2 tool on the HPC. To accomplish this we will use EasyBuild and a set of EasyBuild scripts found on their online repository. https://github.com/easybuilders/easybuild-easyconfigs
Code Block |
---|
|
cd dependencies
module load EasyBuild
eb Miniconda3-4.4.10.eb
eb QIIME2-2019.1.eb
cd .. |
Now, while on the login node, we can ensure that the QIIME2 tool has installed properly. If loading the module fails, read the output from module
for advice on how to proceed.
Code Block |
---|
|
module avail
module load QIIME2
qiime --version |
Next we need to download the sample data for the tutorial and edit the step scripts to have your email and home path. This is all done with the initialize.sh script, which can serve as a reference for downloading data from the internet or mass find and replace of files.
Code Block |
---|
|
./initialize.sh |
Now that the data is in place and the module has been installed it is time to submit the actual work to the job queue and wait for the emails to roll in. The Moving Pictures tutorial has roughly 20 QIIME2 commands to run which produce various outputs. Some of these commands are quite intense and others will rely on previous output before they can run. On the HPC there is a set cost to spinning up a job on a node. Therefore often you will want to chain many commands together rather than submitting a job for each command that needs to be run. For the Moving Pictures tutorial I have broken the 20 commands into 3 sections, an initial very CPU heavy import/demultiplex/align and then 2 low CPU jobs in which smaller calculations are performed with the output from the first job. Looking at the top of the import.script with the head
command you can see that it requests 1 node with 36 cpus, while the other two scripts just take the queue default of a single CPU.
Code Block |
---|
|
cd steps
head -n20 import.script |
Each of these scripts could be queued just by running the qsub command. However the second two scripts cannot run until the first job has finished. The qsub
command takes a depend=afterok:<jobid>
argument, which will submit those jobs onto the queue with a hold until the first has successfully exited. The runall.sh script shows how to schedule multiple jobs at a time while capturing their jobid to feed into later required jobs.
Code Block |
---|
|
cat run_all.sh
./run_all.sh |
Now you can sit back and wait for the emails to come pouring in. The first step takes roughly 10 minutes to complete and the later ones about half as long each, although they will run at the same time. Use the commands from the "Monitoring Jobs" section above to watch their progress.
Once all of the jobs have finished it is time to pack up your results and scp them back to your local machine to analyze the various visualizations and result tables.
...
...