Useful Tools¶
SLURM CLI¶
pestat is a very popular tool for quick/overall view of the cluster, developed by Ole Holm Nielsen at Technical University of Denmark. You can find its github project page here
Status of each node on the cluster, please note that -G would list GPU usage
pestat -G
Status of each node within a partition
pestat -p mypartition -G
Status of a specific node
pestat -n mynode -G
List nodes that have a job owned by a specific user
pestat -u myuser -G
You can also use standard Slurm commands. To view all jobs that are queued up in the specific partition
squeue -p mypartition
You can view detailed information of a specific job
scontrol show job jobid
To cancel a job you started
scancel "jobid"
SC-Specific tools¶
showaccount¶
jimmyw@sc:~$ showaccount
Cluster User Account
---------- -------------------- ----------
sc2 jimmyw miso
sc2 jimmyw mkt
showjob¶
jimmyw@sc:~$ showjob 10679973
Job 10679973 was submitted by user jimmyw in account viscam on 2025-07-22T18:35:51
Job has state=PENDING
Job requests 1 CPUs and has a time limit of 6:00:00 (days-hh:mm:ss) = 360 min.
Job TRESRunMin: 360
Job is in state PENDING with reason=Priority
Queued job information:
JobId=10679973 JobName=bash
UserId=jimmyw(10865) GroupId=users(100) MCS_label=N/A
Priority=1 Nice=0 Account=viscam QOS=normal
JobState=PENDING Reason=Priority Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:00:00 TimeLimit=06:00:00 TimeMin=N/A
SubmitTime=2025-07-22T18:35:51 EligibleTime=2025-07-22T18:35:51
AccrueTime=2025-07-22T18:35:51
StartTime=Unknown EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-07-22T18:35:51 Scheduler=Main
Partition=viscam-interactive AllocNode:Sid=sc:3488786
ReqNodeList=viscam-hgx-1 ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=4G,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=4G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=bash
WorkDir=/sailhome/jimmyw
Power=
showalloc¶
jimmyw@sc:~$ showalloc sphinx
NODELIST MEMORY ALLOCMEM CPUS CPUS(A/I/O/T) GRES GRES_USED
sphinx1 1031000 200704 252 28/224/0/252 gpu:a100:8 gpu:a100:7(IDX:0-6)
sphinx2 1031000 516096 252 44/208/0/252 gpu:a100:7 gpu:a100:7(IDX:0-6)
sphinx3 1031000 335872 252 16/236/0/252 gpu:a100:8 gpu:a100:7(IDX:0-6)
sphinx4 1031000 540672 252 16/236/0/252 gpu:a100:8 gpu:a100:8(IDX:0-7)
sphinx5 1031000 714752 252 22/230/0/252 gpu:a100:8 gpu:a100:8(IDX:0-7)
sphinx6 1031000 311296 252 52/200/0/252 gpu:a100:8 gpu:a100:8(IDX:0-7)
sphinx7 1031000 210944 252 14/238/0/252 gpu:a100:8 gpu:a100:7(IDX:0-6)
sphinx8 1031000 518144 252 22/230/0/252 gpu:a100:8 gpu:a100:8(IDX:0-7)
sphinx9 2050000 1949696 224 148/76/0/224 gpu:h100:8 gpu:h100:8(IDX:0-7)
sphinx10 3090000 2887680 224 166/58/0/224 gpu:h200:8 gpu:h200:8(IDX:0-7)
sphinx11 3090000 458752 224 32/192/0/224 gpu:h200:8 gpu:h200:8(IDX:0-7)
84 out of 87 GPUs are currently being used in sphinx partition
sgpu¶
jimmyw@sc:~$ sgpu -g sphinx
--------------------------------------------------------------------
sphinx,sphinx-hi,sphinx-lo,sphinx-hazy GPU Status
--------------------------------------------------------------------
There are a total of 87 gpus [up]
8 h100 gpus
16 h200 gpus
63 a100 gpus
--------------------------------------------------------------------
Current GPU Utilization
• GPU utilization: 53.57%
• GPU memory usage: 53.52%
--------------------------------------------------------------------
Usage by user:
hij [total: 0 (interactive: 0 )] a100: 0
xmohri [total: 1 (interactive: 0 )] h200: 1
kotha [total: 1 (interactive: 1 )] h200: 1
drfein [total: 1 (interactive: 1 )] a100: 1
shgwu [total: 2 (interactive: 2 )] h100: 2
salzhu [total: 2 (interactive: 2 )] a100: 1, h200: 1
rcsordas [total: 2 (interactive: 0 )] a100: 2
yjruan [total: 2 (interactive: 0 )] a100: 2
laya [total: 3 (interactive: 0 )] a100: 1, h200: 2
kelvinkn [total: 3 (interactive: 0 )] a100: 3
syu03 [total: 4 (interactive: 0 )] a100: 4
qinanyu [total: 4 (interactive: 0 )] h200: 4
esui [total: 6 (interactive: 0 )] h200: 2, a100: 4
wanjiazh [total: 12 (interactive: 0 )] a100: 8, h200: 4
--------------------------------------------------------------------
There are 43 gpus available:
a100: 37 available
h100: 6 available
h200: 0 available
--------------------------------------------------------------------