Slurm Batch System

Slurm User Interfaces (UIs)

Please login on slurm-ui.twgrid.org for the slurm batch job system.

User Interface Nodes

OS

Purpose

Note

slurm-ui.twgrid.org

CentOS 7

Job submission, File download/upload

General portal, DNS Round Robin for load balance among slurm-ui01, slurm-ui02 and slurm-ui03

slurm-ui-asiop.twgrid.org

CentOS 7

Job submission, File download/upload

Dedicated for AS-IOP users, online in 20220919

Note

  • No computing jobs in the UI for its limitation , or your jobs will be killed without notice.

  • Your IP will be banned for if your wrong password login failures exceeds 5 times to avoid insecure attack.

  • To preview files with installed GUI software in UIs, please follow the example below:

    ssh -XY <your_account>@slurm-ui.twgrid.org
    

    to login in the UI with X11 forwarding enabled. If you are using Windows™ system, please install and execute Xming before connect to the UI. If you are using MacOS™, you will probably install xquartz to have the capability of X11 in the MacOS™.

    Software

    Filetype

    Type

    vim | nano

    text editor

    CLI

    eog

    images

    GUI

    xpdf

    pdf files

    GUI

Slurm Resources and Queues

Slurm Resources

Slurm Resources 2024-05-10

Cluster

Worker Nodes

Total CPU cores

CPU/node

CPU model

Memory/node

Disk space/node

Network

GPU model

GPU/node

HPC_EDR1

10

1920

192

AMD AMD Genoa 9654 @2.4GHz Processor

1.5TB

1.92TB (System: 20GB)

100GbE

N/A

N/A

HPC_HDR1

6

768

128

AMD EPYC 7662 64-Core Processor

1520GB

1TB (System: 20GB)

100GbE

N/A

N/A

HPC_FDR5

22

528

24

Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

125GB

2TB (System: 400GB)

10GbE

N/A

N/A

HPC_FDR5-reserved

70

1680

24

Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

125GB

2TB (System: 400GB)

10GbE

N/A

N/A

GPU_V100

1

48

48

Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz

768GB

1TB (System: 20GB)

10GbE

V100

8

GPU_A100

1

64

64

AMD EPYC 7302 16-Core Processor

1024GB

1TB (System: 20GB)

100GbE

A100

8

Slurm Partitions (Queues)

Slurm Partitions updated since 2024-05-10 ( Slurm partitions may be updated on-demand periodically, please check up-to-date information in slurm-ui.twgrid.org. )

Partition

Timelimit

Nodes

Total CPU Cores

Total GPU Boards

Priority

QoS

Resource

Note

large

14-00:00:00

22

528

N/A

10

N/A

FDR5

UNLIMITED

reserv

14-00:00:00

70

1680

N/A

10

N/A

FDR5

UNLIMITED

long_serial

14-00:00:00

22

528

N/A

10

N/A

FDR5

1

short

03:00:00

22

528

N/A

1

N/A

FDR5

UNLIMITED

moderate_serial

2-00:00:00

22

528

N/A

100

cpu_single_moderate

FDR5

70

short_serial

04:00:00

22

528

N/A

500

cpu_single_short

FDR5

80

development

01:00:00

2

48

N/A

1000

N/A

FDR5

1

v100

5-00:00:00

5

240

40

10

gpu_v100_general

NVIDIA V100

UNLIMITED

v100_short

06:00:00

5

240

40

500

gpu_v100_short

NVIDIA V100

UNLIMITED

v100_long

7-00:00:00

5

240

40

1

gpu_v100_general

NVIDIA V100

UNLIMITED

a100

5-00:00:00

2

128

16

10

gpu_a100_general

NVIDIA A100

UNLIMITED

a100_long

7-00:00:00

2

128

16

1

gpu_a100_general

NVIDIA A100

UNLIMITED

a100_short

06:00:00

2

128

16

500

gpu_a100_short

NVIDIA A100

UNLIMITED

amd

5-00:00:00

4

512

N/A

1

N/A

HDR1

UNLIMITED

amd_short

04:00:00

4

512

N/A

500

cpu_single_short

HDR1

UNLIMITED

amd_devel

01:00:00

4

512

N/A

1000

N/A

HDR1

1

edr1_large

14-00:00:00

10

1920

N/A

10

N/A

EDR1

UNLIMITED

edr1_long_serial

14-00:00:00

10

1920

N/A

10

N/A

EDR1

1

edr1_short

03:00:00

10

1920

N/A

1

N/A

EDR1

UNLIMITED

edr1_moderate_serial

2-00:00:00

10

1920

N/A

100

cpu_single_moderate

EDR1

UNLIMITED

edr1_short_serial

04:00:00

10

1920

N/A

500

cpu_single_short

EDR1

UNLIMITED

almalinux9-amd

5-00:00:00

2

256

N/A

1

N/A

HDR1

UNLIMITED

Note

  • The resources are shared with different queues, so some of the resources are mutually exclusive with different queues.

  • Historical Resource Allocation

Slurm Quality of Service (QoS)

Slurm QoS since 2024-05-10 (UTC)

QoS Name

Flags

MaxTRES

MaxTRESPerUser

MinTRES

gpu_general

DenyOnLimit

gres/gpu=8

gres/gpu=1

gpu_short

DenyOnLimit

gres/gpu=8

gres/gpu=1

gputest

DenyOnLimit

gres/gpu=8

cpu=4

cpu_single_short

DenyOnLimit

cpu=24

cpu_single_moderate

DenyOnLimit

cpu=24

gpu_a100_general

DenyOnLimit

gres/gpu=8

gres/gpu=1

gpu_a100_short

DenyOnLimit

gres/gpu=8

gres/gpu=1

gpu_v100_short

DenyOnLimit

gres/gpu=8

gres/gpu=1

gpu_v100_general

DenyOnLimit

gres/gpu=8

gres/gpu=1

gpu_a100_devel

gres/gpu=1

System Topography

  • The system scheme could be found in the following image. The network connection is majorly in 10G ethernet.

    _images/slurm_scheme_v5.png

Slurm Software Repository

environment-modules

Easy setup of your software environment by using environment-modules (in module command). Use the following commands to list and find softwares module:

module avail

Load MPICH2 + gcc48:

module load gcc/4.8.5
module load mpich

Unload all loaded modules:

module purge

Load Openmpi + Intel2018:

module load intel/2018
module load openmpi

Load OpenMPI + gcc48:

module load gcc/4.8.5
module load openmpi

Slurm Tutorials

On Site Slurm Documents

User documents for SLURM are located in

/ceph/sharedfs/software/tutorial/user_document/

Specify a working directory and copy the scripts in your HOME space to run the examples.

/ceph/sharedfs/software/user_document/scripts/*
/ceph/sharedfs/software/
/ceph/sharedfs/pkg/

Request for Specific Software Installation

For your software requirements, please contact to DiCOS-Support@twgrid.org.