Site Tools


hpc_resources:how_to_run_jobs:login_node

How to run jobs - SSH into a login node

There is a single login node used to access all cluster nodes. All HPC jobs must be started from this node.

Login procedure

Use SSH to login to the cluster login node:

ssh epyc.simcenter.utc.edu

Submitting Slurm jobs

The best way to start a job is through a job submission script. This script will define all the parameters needed for the job, including run time, number of CPUs, number of GPUs, partition name, etc. Submitting jobs in this manner will the allow resources used to automatically be made available to the next user as soon as the code is finished running.

Here is an example submission script:

#!/bin/bash
 
#SBATCH --job-name=my_job             # Job name
#SBATCH --output=output.txt           # Output text file
#SBATCH --error=error.txt             # Error text file
#SBATCH --partition=partition_name    # Partition name
#SBATCH --nodes=1                     # Number of nodes
#SBATCH --ntasks-per-node=1           # Number of tasks per node
#SBATCH --cpus-per-task=1             # Number of CPU cores per task
#SBATCH --gpus-per-node=1             # Number of GPUs per node
#SBATCH --time=0-2:00:00              # Maximum runtime (D-HH:MM:SS)
#SBATCH --mail-type=END               # Send email at job completion
#SBATCH --mail-user=email-addr        # Email address for notifications
 
 
# load environment modules, if needed
 
module load openmpi
 
# Application execution 
# You can either run jobs directly, with srun, or with MPI.  Below are examples of each, only use one.
 
# direct example
python example.py
 
# srun example
srun <application> <command line arguments>
 
# MPI example for MPI programs
mpiexec application command line arguments

Submit the job to the cluster scheduler with sbatch job_script.sh

Exhaustive description of the “sbatch” command can be found in the Official Documentation.

Interactive Slurm Jobs

It is possible to launch a shell on a compute node to then interactively run your code on that node. This method is only recommended if you are watching the progress so when it completes you can release the resources by exiting the shell. If you walk away, the idle bash shell will consume resources and keep others from being able to use them.

To launch a shell on a node, the “srun” command is used. An example interactive job request is below:

To run an interactive job using GPUs:

srun --x11 --time=1-00:00:00 --partition=gpu --gres=gpu:1 --ntasks=4 --pty /bin/bash -l
To run an interactive job without GPUs:
srun --x11 --time=1-00:00:00 --partition=general --ntasks=120 --pty /bin/bash -l
Exhaustive description of “srun” command can be found in the Official Documentation

hpc_resources/how_to_run_jobs/login_node.txt · Last modified: 2024/05/09 15:54 by Christopher Howard

Page Tools