Slurm Batch System
Slurm User Interfaces (UIs)
Please login on slurm-ui.twgrid.org for the slurm batch job system.
User Interface Nodes |
OS |
Purpose |
Note |
---|---|---|---|
slurm-ui.twgrid.org |
CentOS 7 |
Job submission, File download/upload |
General portal, DNS Round Robin for load balance among slurm-ui01, slurm-ui02 and slurm-ui03 |
slurm-ui-asiop.twgrid.org |
CentOS 7 |
Job submission, File download/upload |
Dedicated for AS-IOP users, online in 20220919 |
Note
No computing jobs in the UI for its limitation , or your jobs will be killed without notice.
Your IP will be banned for if your wrong password login failures exceeds 5 times to avoid insecure attack.
To preview files with installed GUI software in UIs, please follow the example below:
ssh -XY <your_account>@slurm-ui.twgrid.org
to login in the UI with X11 forwarding enabled. If you are using Windows™ system, please install and execute Xming before connect to the UI. If you are using MacOS™, you will probably install xquartz to have the capability of X11 in the MacOS™.
Software
Filetype
Type
text editor
CLI
images
GUI
pdf files
GUI
Slurm Resources and Queues
Slurm Resources
Cluster |
Worker Nodes |
Total CPU cores |
CPU/node |
CPU model |
Memory/node |
Disk space/node |
Network |
GPU model |
GPU/node |
---|---|---|---|---|---|---|---|---|---|
HPC_EDR1 |
10 |
1920 |
192 |
AMD AMD Genoa 9654 @2.4GHz Processor |
1.5TB |
1.92TB (System: 20GB) |
100GbE |
N/A |
N/A |
HPC_HDR1 |
6 |
768 |
128 |
AMD EPYC 7662 64-Core Processor |
1520GB |
1TB (System: 20GB) |
100GbE |
N/A |
N/A |
HPC_FDR5 |
22 |
528 |
24 |
Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz |
125GB |
2TB (System: 400GB) |
10GbE |
N/A |
N/A |
HPC_FDR5-reserved |
70 |
1680 |
24 |
Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz |
125GB |
2TB (System: 400GB) |
10GbE |
N/A |
N/A |
GPU_V100 |
1 |
48 |
48 |
Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz |
768GB |
1TB (System: 20GB) |
10GbE |
V100 |
8 |
GPU_A100 |
1 |
64 |
64 |
AMD EPYC 7302 16-Core Processor |
1024GB |
1TB (System: 20GB) |
100GbE |
A100 |
8 |
Slurm Partitions (Queues)
Partition |
Timelimit |
Nodes |
Total CPU Cores |
Total GPU Boards |
Priority |
QoS |
Resource |
Note |
---|---|---|---|---|---|---|---|---|
large |
14-00:00:00 |
22 |
528 |
N/A |
10 |
N/A |
FDR5 |
UNLIMITED |
reserv |
14-00:00:00 |
70 |
1680 |
N/A |
10 |
N/A |
FDR5 |
UNLIMITED |
long_serial |
14-00:00:00 |
22 |
528 |
N/A |
10 |
N/A |
FDR5 |
1 |
short |
03:00:00 |
22 |
528 |
N/A |
1 |
N/A |
FDR5 |
UNLIMITED |
moderate_serial |
2-00:00:00 |
22 |
528 |
N/A |
100 |
cpu_single_moderate |
FDR5 |
70 |
short_serial |
04:00:00 |
22 |
528 |
N/A |
500 |
cpu_single_short |
FDR5 |
80 |
development |
01:00:00 |
2 |
48 |
N/A |
1000 |
N/A |
FDR5 |
1 |
v100 |
5-00:00:00 |
5 |
240 |
40 |
10 |
gpu_v100_general |
NVIDIA V100 |
UNLIMITED |
v100_short |
06:00:00 |
5 |
240 |
40 |
500 |
gpu_v100_short |
NVIDIA V100 |
UNLIMITED |
v100_long |
7-00:00:00 |
5 |
240 |
40 |
1 |
gpu_v100_general |
NVIDIA V100 |
UNLIMITED |
a100 |
5-00:00:00 |
2 |
128 |
16 |
10 |
gpu_a100_general |
NVIDIA A100 |
UNLIMITED |
a100_long |
7-00:00:00 |
2 |
128 |
16 |
1 |
gpu_a100_general |
NVIDIA A100 |
UNLIMITED |
a100_short |
06:00:00 |
2 |
128 |
16 |
500 |
gpu_a100_short |
NVIDIA A100 |
UNLIMITED |
amd |
5-00:00:00 |
4 |
512 |
N/A |
1 |
N/A |
HDR1 |
UNLIMITED |
amd_short |
04:00:00 |
4 |
512 |
N/A |
500 |
cpu_single_short |
HDR1 |
UNLIMITED |
amd_devel |
01:00:00 |
4 |
512 |
N/A |
1000 |
N/A |
HDR1 |
1 |
edr1_large |
14-00:00:00 |
10 |
1920 |
N/A |
10 |
N/A |
EDR1 |
UNLIMITED |
edr1_long_serial |
14-00:00:00 |
10 |
1920 |
N/A |
10 |
N/A |
EDR1 |
1 |
edr1_short |
03:00:00 |
10 |
1920 |
N/A |
1 |
N/A |
EDR1 |
UNLIMITED |
edr1_moderate_serial |
2-00:00:00 |
10 |
1920 |
N/A |
100 |
cpu_single_moderate |
EDR1 |
UNLIMITED |
edr1_short_serial |
04:00:00 |
10 |
1920 |
N/A |
500 |
cpu_single_short |
EDR1 |
UNLIMITED |
almalinux9-amd |
5-00:00:00 |
2 |
256 |
N/A |
1 |
N/A |
HDR1 |
UNLIMITED |
Note
The resources are shared with different queues, so some of the resources are mutually exclusive with different queues.
Slurm Quality of Service (QoS)
QoS Name |
Flags |
MaxTRES |
MaxTRESPerUser |
MinTRES |
---|---|---|---|---|
gpu_general |
DenyOnLimit |
gres/gpu=8 |
gres/gpu=1 |
|
gpu_short |
DenyOnLimit |
gres/gpu=8 |
gres/gpu=1 |
|
gputest |
DenyOnLimit |
gres/gpu=8 |
cpu=4 |
|
cpu_single_short |
DenyOnLimit |
cpu=24 |
||
cpu_single_moderate |
DenyOnLimit |
cpu=24 |
||
gpu_a100_general |
DenyOnLimit |
gres/gpu=8 |
gres/gpu=1 |
|
gpu_a100_short |
DenyOnLimit |
gres/gpu=8 |
gres/gpu=1 |
|
gpu_v100_short |
DenyOnLimit |
gres/gpu=8 |
gres/gpu=1 |
|
gpu_v100_general |
DenyOnLimit |
gres/gpu=8 |
gres/gpu=1 |
|
gpu_a100_devel |
gres/gpu=1 |
System Topography
The system scheme could be found in the following image. The network connection is majorly in 10G ethernet.
Slurm Software Repository
environment-modules
Easy setup of your software environment by using environment-modules (in module
command). Use the following commands to list and find softwares module:
module avail
Load MPICH2 + gcc48:
module load gcc/4.8.5
module load mpich
Unload all loaded modules:
module purge
Load Openmpi + Intel2018:
module load intel/2018
module load openmpi
Load OpenMPI + gcc48:
module load gcc/4.8.5
module load openmpi
See also
Slurm Tutorials
On Site Slurm Documents
User documents for SLURM are located in
/ceph/sharedfs/software/tutorial/user_document/
Specify a working directory and copy the scripts in your HOME space to run the examples.
/ceph/sharedfs/software/user_document/scripts/*
/ceph/sharedfs/software/
/ceph/sharedfs/pkg/
Request for Specific Software Installation
For your software requirements, please contact to DiCOS-Support@twgrid.org.