The Lookout cluster consists of four compute nodes based on the PowerPC9 architecture and a head node which is based on the x86_64 architecture. Each compute node has 4 Nvidia Tesla V100 GPUs available.
The typical way to use a compute cluster is to develop code elsewhere, then to compile and execute the code on the compute cluster.
To log into the lookout cluster use the following command:
Users should not execute jobs on the head node. The intention of the head node is to launch jobs and allow for various other light weight tasks.
To launch, a job submission script is used. An example script is as follows:
#!/bin/bash # execute in the general partition #SBATCH --partition=general # execute with 40 processes/tasks #SBATCH --ntasks=40 # execute on 4 nodes #SBATCH --nodes=4 # execute 4 threads per task #SBATCH --cpus-per-task=4 # maximum time is 30 minutes #SBATCH --time=00:30:00 # job name is my_job #SBATCH --job-name=my_job # load environment module load openmpi module load ... # application execution mpiexec application command line arguments
For non-MPI application users the srun process launcher is available for use.
... srun application command line arguments
The environment can be loaded in the submission script or the head node. Using the submission script as environment loading is preferred, since this allows knowledge of the required environment to be propagated with the executable for later reference. The available applications can be viewed on the lookout head node via the “module avail” command.
To submit the script for execution on the compute nodes use the following command:
Exhaustive description of the “sbatch” command can be found in the official documentation.
To access the GPU resources on the compute nodes these must be requested at job submission. By adding a single line to the job submission script this can be done easily:
... #SBATCH --gres=gpu:GPUS ...
Modify the GPUS value with the number of GPUs your job will require. The valid range is 0-4.
Some jobs may require exclusive access to a compute node. This should not be required for correct execution of the job, however may be preferable when benchmarking applications.
To request exclusive access on compute nodes which are allocated to a job, use this flag:
... #SBATCH --exclusive ...
This will allow processes from the same job on a compute node, but not any other jobs.
Sometimes users will want to have several jobs depend on each other in a certain order and not want to launch them by hand, but instead simple launch a set of jobs. This can be done with job dependencies.
To set up a job dependency instead a job script follow this format:
... #SBATCH --dependency=<dependency list> ...
The most common dependency will be of format:
Compiling for the back end compute nodes is special on lookout due to the separate head node architecture.
To compile an application users need to do one of the following:
- Compile in their submission script
- Write and launch a compile script
- Use an interactive session to compile the application on the back end node
- Target the compilation for a different architecture with a local compiler
Note: Compiling is disabled on the lookout head node to avoid confusion between the different architectures.
Typically an interactive session will be used for compiling on the back end compute nodes. Other reasons for interactive session exist, however be aware with budgets in place a long term interactive session will consume a lot of compute budget.
An interactive session is launched from the lookout head node with the command:
> salloc srun --pty bash
Once the session has been allocated a normal user shell will be presented which is executed on the remote compute node. At which point the user and do what they need:
> module load openmpi > module load cuda > > make