Slurm Batch System
Slurm User Interfaces (UIs)
Please login on slurm-ui.twgrid.org for the slurm batch job system.
User Interface Nodes |
OS |
Purpose |
Note |
---|---|---|---|
slurm-ui.twgrid.org |
CentOS 7 |
Job submission, File download/upload |
General portal, DNS Round Robin for load balance among slurm-ui01, slurm-ui02 and slurm-ui03 |
slurm-ui04.twgrid.org |
AlmaLinux 9 |
Job submission, File download/upload |
UI node for AlmaLinux partitions |
slurm-ui-asiop.twgrid.org |
CentOS 7 |
Job submission, File download/upload |
Dedicated for AS-IOP users, online in 20220919 |
Note
No computing jobs in the UI for its limitation , or your jobs will be killed without notice.
Your IP will be banned for if your wrong password login failures exceeds 5 times to avoid insecure attack.
To preview files with installed GUI software in UIs, please follow the example below:
ssh -XY <your_account>@slurm-ui.twgrid.org
to login in the UI with X11 forwarding enabled. If you are using Windows™ system, please install and execute Xming before connect to the UI. If you are using MacOS™, you will probably install xquartz to have the capability of X11 in the MacOS™.
Software
Filetype
Type
text editor
CLI
images
GUI
pdf files
GUI
Slurm Resources and Queues
Slurm Resources
Cluster |
Worker Nodes |
Total CPU cores |
CPU/node |
CPU model |
Memory/node |
Disk space/node |
Network |
GPU model |
GPU/node |
---|---|---|---|---|---|---|---|---|---|
HPC_EDR1 |
10 |
1920 |
192 |
AMD AMD Genoa 9654 @2.4GHz Processor |
1.5TB |
1.92TB (System: 20GB) |
100GbE |
N/A |
N/A |
HPC_HDR1 |
6 |
768 |
128 |
AMD EPYC 7662 64-Core Processor |
1520GB |
1TB (System: 20GB) |
Eth:100Gbps |
N/A |
N/A |
Intel-G4 |
2 |
256 |
128 |
Intel(R) Xeon(R) Gold 6448H |
1520GB |
1.06TB |
Eth:100Gbps |
N/A |
N/A |
GPU_V100 |
1 |
48 |
48 |
Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz |
768GB |
1TB (System: 20GB) |
10GbE |
V100 |
8 |
GPU_A100 |
1 |
64 |
64 |
AMD EPYC 7302 16-Core Processor |
1024GB |
1TB (System: 20GB) |
100GbE |
A100 |
8 |
HPC_FDR5-reserved |
92 |
2208 |
24 |
Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz |
125GB |
2TB (System: 400GB) |
10GbE |
N/A |
N/A |
Slurm Partitions (Queues)
Partition |
Cluster |
Timelimit |
Nodes |
Total CPU Cores |
Total GPU Boards |
---|---|---|---|---|---|
hdr1-al9_large |
AMD HDR1 |
14-00:00:00 |
6 |
768 |
N/A |
hdr1-al9_short |
AMD HDR1 |
03:00:00 |
6 |
768 |
N/A |
hdr1-al9_long_serial |
AMD HDR1 |
14-00:00:00 |
6 |
768 |
N/A |
hdr1-al9_moderate_serial |
AMD HDR1 |
2-00:00:00 |
6 |
768 |
N/A |
hdr1-al9_short_serial |
AMD HDR1 |
04:00:00 |
6 |
768 |
N/A |
edr1_large |
AMD HDR1 |
14-00:00:00 |
10 |
1920 |
N/A |
edr1_short* |
AMD HDR1 |
03:00:00 |
10 |
1920 |
N/A |
edr1_long_serial |
AMD HDR1 |
14-00:00:00 |
10 |
1920 |
N/A |
edr1_moderate_serial |
AMD HDR1 |
2-00:00:00 |
10 |
1920 |
N/A |
edr1_short_serial |
AMD HDR1 |
04:00:00 |
10 |
1920 |
N/A |
intel-g4-al9_large |
Intel |
14-00:00:00 |
2 |
256 |
N/A |
intel-g4-al9_short |
Intel |
03:00:00 |
2 |
256 |
N/A |
intel-g4-al9_long_serial |
Intel |
14-00:00:00 |
2 |
256 |
N/A |
intel-g4-al9_moderate_serial |
Intel |
2-00:00:00 |
2 |
256 |
N/A |
intel-g4-al9_short_serial |
Intel |
04:00:00 |
2 |
256 |
N/A |
v100 |
NVIDIA V100 |
5-00:00:00 |
5 |
240 |
40 |
v100_long |
NVIDIA V100 |
7-00:00:00 |
5 |
240 |
40 |
v100_short |
NVIDIA V100 |
06:00:00 |
5 |
240 |
40 |
a100 |
NVIDIA A100 |
5-00:00:00 |
2 |
128 |
16 |
a100_long |
NVIDIA A100 |
7-00:00:00 |
2 |
128 |
16 |
a100_short |
NVIDIA A100 |
06:00:00 |
2 |
128 |
16 |
a100_devel |
NVIDIA A100 |
00:20:00 |
1 |
64 |
8 |
Note
The resources are shared with different queues, so some of the resources are mutually exclusive with different queues.
Slurm Quality of Service (QoS)
Partition |
Priority Tier |
MaxNodes |
QoS Name |
MaxTRES |
MaxTRESPerUser |
MinTRES |
Flags |
---|---|---|---|---|---|---|---|
hdr1-al9_large |
10 |
UNLIMITED |
N/A |
N/A |
N/A |
N/A |
N/A |
hdr1-al9_short |
1 |
UNLIMITED |
N/A |
N/A |
N/A |
N/A |
N/A |
hdr1-al9_long_serial |
10 |
1 |
N/A |
N/A |
N/A |
N/A |
N/A |
hdr1-al9_moderate_serial |
100 |
UNLIMITED |
cpu_single_moderate |
CPU=24 |
N/A |
N/A |
DenyOnLimit |
hdr1-al9_short_serial |
500 |
UNLIMITED |
cpu_single_short |
CPU=24 |
N/A |
N/A |
DenyOnLimit |
edr1_large |
10 |
UNLIMITED |
N/A |
N/A |
N/A |
N/A |
N/A |
edr1_short* |
1 |
UNLIMITED |
N/A |
N/A |
N/A |
N/A |
N/A |
edr1_long_serial |
10 |
1 |
N/A |
N/A |
N/A |
N/A |
N/A |
edr1_moderate_serial |
100 |
UNLIMITED |
cpu_single_moderate |
CPU=24 |
N/A |
N/A |
DenyOnLimit |
edr1_short_serial |
500 |
UNLIMITED |
cpu_single_moderate |
CPU=24 |
N/A |
N/A |
DenyOnLimit |
intel-g4-al9_large |
10 |
UNLIMITED |
N/A |
N/A |
N/A |
N/A |
N/A |
intel-g4-al9_short |
1 |
UNLIMITED |
N/A |
N/A |
N/A |
N/A |
N/A |
intel-g4-al9_long_serial |
10 |
1 |
N/A |
N/A |
N/A |
N/A |
N/A |
intel-g4-al9_moderate_serial |
100 |
UNLIMITED |
cpu_single_moderate |
CPU=24 |
N/A |
N/A |
DenyOnLimit |
intel-g4-al9_short_serial |
500 |
UNLIMITED |
cpu_single_short |
CPU=24 |
N/A |
N/A |
DenyOnLimit |
v100 |
10 |
UNLIMITED |
gpu_v100_general |
gres/gpu=8 |
N/A |
gres/gpu=1 |
DenyOnLimit |
v100_long |
1 |
UNLIMITED |
gpu_v100_general |
gres/gpu=8 |
N/A |
gres/gpu=1 |
DenyOnLimit |
v100_short |
500 |
UNLIMITED |
gpu_v100_short |
gres/gpu=8 |
N/A |
gres/gpu=1 |
DenyOnLimit |
a100 |
10 |
UNLIMITED |
gpu_a100_general |
gres/gpu=8 |
N/A |
gres/gpu=1 |
DenyOnLimit |
a100_long |
1 |
UNLIMITED |
gpu_a100_general |
gres/gpu=8 |
N/A |
gres/gpu=1 |
DenyOnLimit |
a100_short |
500 |
UNLIMITED |
gpu_a100_short |
gres/gpu=8 |
N/A |
gres/gpu=1 |
DenyOnLimit |
a100_devel |
10000 |
1 |
gpu_a100_devel |
gres/gpu=1 |
N/A |
N/A |
N/A |
Note
Brief Explanation of QoS
Priority Tier:
MaxTRES: maximum resources that can be requested in a QOS.
MaxTRESPerUser: maximum number of CPUs an user can request in a QOS.
MinTRES: maximum number of CPUs an user can request in a QOS.
MaxNodes:
Flags:
More information at https://slurm.schedmd.com/qos.html
Slurm Software Repository
environment-modules
Easy setup of your software environment by using environment-modules (in module
command). Use the following commands to list and find softwares module:
module avail
Load MPICH2 + gcc48:
module load gcc/4.8.5
module load mpich
Unload all loaded modules:
module purge
Load Openmpi + Intel2018:
module load intel/2018
module load openmpi
Load OpenMPI + gcc48:
module load gcc/4.8.5
module load openmpi
Intel & OpenMP Compilers for AlmaLinux 9
module load intel_mpi/2021.6.0
module load icc/2022.1.0
See also
Slurm Tutorials
On Site Slurm Documents
User documents for SLURM are located in
/ceph/sharedfs/software/tutorial/user_document/
Specify a working directory and copy the scripts in your HOME space to run the examples.
/ceph/sharedfs/software/user_document/scripts/*
/ceph/sharedfs/software/
/ceph/sharedfs/pkg/
Request for Specific Software Installation
For your software requirements, please contact to DiCOS-Support@twgrid.org.