Lookout Cluster

The lookout cluster consists of four compute nodes based on the PowerPC9 architecture.

Each compute node has 4 Nvidia Tesla V100 GPUs available. This cluster's head node is a Virtual Machine server.

  • Head node FQDN: lookout.simcenter.utc.edu
  • Other nodes: lookout{01-03}

The typical way to use a compute cluster is to develop code elsewhere, then to compile and execute the code on the compute cluster.

Application installation requests can be directed to the helpdesk or by email simcenter-help@utc.edu.

Login procedure

To log into the lookout cluster use the following command:

ssh lookout.simcenter.utc.edu

Submitting Jobs

Users SHOULD NOT execute jobs on the head node. The intention of the head node is to launch jobs and allow for various other lightweight tasks.

To launch, a job submission script is used. An example script is as follows:

#!/bin/bash
 
# execute in the general partition
#SBATCH --partition=general
 
# execute with 40 processes/tasks
#SBATCH --ntasks=40
 
# execute on 4 nodes
#SBATCH --nodes=4
 
# execute 4 threads per task
#SBATCH --cpus-per-task=4
 
# maximum time is 30 minutes
#SBATCH --time=00:30:00
 
# job name is my_job
#SBATCH --job-name=my_job
 
# load environment
module load openmpi
module load ...
 
# application execution
mpiexec application command line arguments

For non-MPI application users, the srun process launcher is available for use.

...
 
srun application command line arguments

The environment can be loaded in the submission script or the head node. Using the submission script as environment loading is preferred since this allows knowledge of the required environment to be propagated with the executable for later reference.

The available applications can be viewed on the lookout head node via the command:

module avail

To submit the script for execution on the compute nodes use the following command:

sbatch script.sh

An exhaustive description of the “sbatch” command can be found in the official documentation.

GPU Access

To access the GPU resources on the compute nodes these must be requested at job submission. By adding a single line to the job submission script this can be done easily:

...
#SBATCH --gres=gpu:GPUS
...

Modify the GPUS value with the number of GPUs your job will require. The valid range is 0-4.

Exclusive Nodes

Some jobs may require exclusive access to a compute node. This should not be required for correct execution of the job, however may be preferable when benchmarking applications.

To request exclusive access on compute nodes which are allocated to a job, use this flag:

...
#SBATCH --exclusive
...

This will allow processes from the same job on a compute node, but not any other jobs.

Job Dependencies

Sometimes users will want to have several jobs depend on each other in a certain order and not want to launch them by hand, but instead simple launch a set of jobs. This can be done with job dependencies.

To set up a job dependency instead a job script follow this format:

...
#SBATCH --dependency=<dependency list>
...

The most common dependency will be of format:

after:job_id[:jobid...]

Compiling

Compiling for the back-end compute nodes is special on the lookout due to the separate head node architecture.

To compile an application users need to do one of the following:

  • Compile in their submission script
  • Write and launch a compile script
  • Use an interactive session to compile the application on the back end node
  • Target the compilation for a different architecture with a local compiler

Note: Compiling is disabled on the lookout head node to avoid confusion between the different architectures.

Interactive Session

Typically an interactive session will be used for compiling on the back end compute nodes. Other reasons for interactive sessions exist, however, be aware with budgets in place a long-term interactive session will consume a lot of computing budget.

An interactive session is launched from the lookout head node with the command:

> salloc srun --pty bash

Once the session has been allocated a normal user shell will be presented which is executed on the remote compute node. At which point the user and do what they need:

> module load openmpi
> module load cuda
>
> make