Useful Tools

SLURM CLI

pestat is a very popular tool for quick/overall view of the cluster, developed by Ole Holm Nielsen at Technical University of Denmark. You can find its github project page here

Status of each node on the cluster, please note that -G would list GPU usage

pestat -G

Status of each node within a partition

pestat -p mypartition -G

Status of a specific node

pestat -n mynode -G

List nodes that have a job owned by a specific user

pestat -u myuser -G

You can also use standard Slurm commands. To view all jobs that are queued up in the specific partition

squeue -p mypartition

You can view detailed information of a specific job

scontrol show job jobid

To cancel a job you started

scancel "jobid"

SC-Specific tools

showaccount

jimmyw@sc:~$ showaccount

   Cluster                 User    Account
---------- -------------------- ----------
       sc2               jimmyw       miso
       sc2               jimmyw        mkt

showjob

jimmyw@sc:~$ showjob 10679973
Job 10679973 was submitted by user jimmyw in account viscam on 2025-07-22T18:35:51
Job has state=PENDING
Job requests 1 CPUs and has a time limit of 6:00:00 (days-hh:mm:ss) = 360 min.
Job TRESRunMin: 360

Job is in state PENDING with reason=Priority

Queued job information:
JobId=10679973 JobName=bash
   UserId=jimmyw(10865) GroupId=users(100) MCS_label=N/A
   Priority=1 Nice=0 Account=viscam QOS=normal
   JobState=PENDING Reason=Priority Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=06:00:00 TimeMin=N/A
   SubmitTime=2025-07-22T18:35:51 EligibleTime=2025-07-22T18:35:51
   AccrueTime=2025-07-22T18:35:51
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-07-22T18:35:51 Scheduler=Main
   Partition=viscam-interactive AllocNode:Sid=sc:3488786
   ReqNodeList=viscam-hgx-1 ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=4G,node=1,billing=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=4G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/sailhome/jimmyw
   Power=

showalloc

jimmyw@sc:~$ showalloc sphinx
NODELIST       MEMORY    ALLOCMEM  CPUS  CPUS(A/I/O/T)  GRES                   GRES_USED
sphinx1        1031000   200704    252   28/224/0/252   gpu:a100:8             gpu:a100:7(IDX:0-6)
sphinx2        1031000   516096    252   44/208/0/252   gpu:a100:7             gpu:a100:7(IDX:0-6)
sphinx3        1031000   335872    252   16/236/0/252   gpu:a100:8             gpu:a100:7(IDX:0-6)
sphinx4        1031000   540672    252   16/236/0/252   gpu:a100:8             gpu:a100:8(IDX:0-7)
sphinx5        1031000   714752    252   22/230/0/252   gpu:a100:8             gpu:a100:8(IDX:0-7)
sphinx6        1031000   311296    252   52/200/0/252   gpu:a100:8             gpu:a100:8(IDX:0-7)
sphinx7        1031000   210944    252   14/238/0/252   gpu:a100:8             gpu:a100:7(IDX:0-6)
sphinx8        1031000   518144    252   22/230/0/252   gpu:a100:8             gpu:a100:8(IDX:0-7)
sphinx9        2050000   1949696   224   148/76/0/224   gpu:h100:8             gpu:h100:8(IDX:0-7)
sphinx10       3090000   2887680   224   166/58/0/224   gpu:h200:8             gpu:h200:8(IDX:0-7)
sphinx11       3090000   458752    224   32/192/0/224   gpu:h200:8             gpu:h200:8(IDX:0-7)

84 out of 87 GPUs are currently being used in sphinx partition

sgpu

jimmyw@sc:~$ sgpu -g sphinx
--------------------------------------------------------------------
sphinx,sphinx-hi,sphinx-lo,sphinx-hazy GPU Status
--------------------------------------------------------------------
There are a total of 87 gpus [up]
8 h100 gpus
16 h200 gpus
63 a100 gpus
--------------------------------------------------------------------
Current GPU Utilization
• GPU utilization: 53.57%
• GPU memory usage: 53.52%
--------------------------------------------------------------------
Usage by user:
hij        [total: 0  (interactive: 0 )] a100: 0
xmohri     [total: 1  (interactive: 0 )] h200: 1
kotha      [total: 1  (interactive: 1 )] h200: 1
drfein     [total: 1  (interactive: 1 )] a100: 1
shgwu      [total: 2  (interactive: 2 )] h100: 2
salzhu     [total: 2  (interactive: 2 )] a100: 1, h200: 1
rcsordas   [total: 2  (interactive: 0 )] a100: 2
yjruan     [total: 2  (interactive: 0 )] a100: 2
laya       [total: 3  (interactive: 0 )] a100: 1, h200: 2
kelvinkn   [total: 3  (interactive: 0 )] a100: 3
syu03      [total: 4  (interactive: 0 )] a100: 4
qinanyu    [total: 4  (interactive: 0 )] h200: 4
esui       [total: 6  (interactive: 0 )] h200: 2, a100: 4
wanjiazh   [total: 12 (interactive: 0 )] a100: 8, h200: 4
--------------------------------------------------------------------
There are 43 gpus available:
a100: 37 available
h100: 6 available
h200: 0 available
--------------------------------------------------------------------