Logging In
ssh <username>@login.hpc.uams.edu
If this is your first time logging into the system, now is when you should change your password
passwd
You are now on the HPC login node. From here you can stage your data and jobs to be submitted to the computational nodes in the cluster. You can view the current load of the overall system from the login node with the showq
command.
Submit a Simple Job
While the login node is a relatively powerful server, it should not be used to do any actual work, as that could impede others ability to use the system. We use Slurm to manage jobs and resources on the cluster. The srun
and sbatch
programs will be your primary interface for submitting jobs to the cluster. In its simplest form you can feed it a command on standard input and it will schedule and run a job. Here we will schedule a single command lscpu
to run using all of the defaults
srun lscpu
The output from this job will print to directly to your terminal. This can be useful for very simple commands or testing, however normally you will submit more complex jobs as a batch file.
Submit a Scripted Job
The sbatch
program takes many arguments to control where the job will be scheduled and can be fed a script of commands and arguments to be run instead of just feeding them in through a pipe. We will now create a script which will both contain the arguments and actual commands to be run.
nano cpuinfo.script
Below you see a simple script which will also perform the lscpu
command as above, except this will also set a few options.
#!/bin/bash #SBATCH --mail-user=<YOUR_EMAIL>@uams.edu #<---- Email address to notify #SBATCH --mail-type=ALL #<---- Status to notify the email #SBATCH --job-name=CPUinfo #<---- Name of this job #<---- Commands below this point will be run on the assigned node echo "Hello HPC" lscpu echo "Goodbye HPC"
Once this script is created it can be run by passing it to the sbatch
program. After this job has finished there will now be a file named slurm-#####.out in your home directory which will contain the output.
sbatch cpuinfo.script
When submitting a script you can also pass arguments on the command line to sbatch
. Here we submit the lscpu
script again, except this time we ask for a node with a xeon processor. Compare the outputs of the two jobs, or experiment with different constraints that can be requested.
sbatch --constraint=xeon cpuinfo.script
Monitoring Jobs
Jobs so far have been quick to run, often though you will want to monitor longer running jobs. Remember that the showq
program will display the state of the entire cluster. There are many other programs which can help you monitor your own state and jobs.
squeue
This option will print the full status of current jobs and is useful for finding the exec_host
of a running job. Knowing the host will allow you to peek in a few ways at what the node is currently doing.
pbsnodes <nodename>
This shows the configuration of an individual node, as well as its current status.
With the node name we can use the Parallel Shell program pdsh
to execute commands directly on a node. This should only ever be used to run short non-intensive commands, as it will take CPU time from any jobs that are executing on that node. Here are some possibly useful commands.
pdsh -w <nodename> free -h pdsh -w <nodename> uptime pdsh -w <nodename> top -b -n1
Installing Software
The HPC has some software packages already installed, however they will need to be activated using Lmod. You can browse available modules or search for them and see descriptions with these commands.
module avail module spider <search>
If one of them is already available you simply need to load it. Do note however, that this is only changing your local environment variables. If you plan on making use of anything inside of a module during a job, you must use module load
in the job script, before you try and use the commands that it enables.
module load <module_name>
One of the most useful modules is EasyBuild. This is a build and installation framework designed for HPCs. Many scientific tool sets can be installed using it, once they are, they can be activated using the module commands above. However, EasyBuild will always have to be loaded first, before anything installed with it can be loaded, the module spider <search>
command will explain this if you forget.