Slurm Batch System

Slurm User Interfaces (UIs)

Please login on slurm-ui.twgrid.org for the slurm batch job system.

User Interface Nodes

OS

Purpose

Note

slurm-ui.twgrid.org

CentOS 7

Job submission, File download/upload

General portal, DNS Round Robin for load balance among slurm-ui01, slurm-ui02 and slurm-ui03

slurm-ui04.twgrid.org

AlmaLinux 9

Job submission, File download/upload

UI node for AlmaLinux partitions

slurm-ui-asiop.twgrid.org

CentOS 7

Job submission, File download/upload

Dedicated for AS-IOP users, online in 20220919

Note

  • No computing jobs in the UI for its limitation , or your jobs will be killed without notice.

  • Your IP will be banned for if your wrong password login failures exceeds 5 times to avoid insecure attack.

  • To preview files with installed GUI software in UIs, please follow the example below:

    ssh -XY <your_account>@slurm-ui.twgrid.org
    

    to login in the UI with X11 forwarding enabled. If you are using Windows™ system, please install and execute Xming before connect to the UI. If you are using MacOS™, you will probably install xquartz to have the capability of X11 in the MacOS™.

    Software

    Filetype

    Type

    vim , nano

    text editor

    CLI

    eog

    images

    GUI

    xpdf

    pdf files

    GUI

Slurm Resources and Queues

Slurm Resources

Slurm Resources 2024-05-10

Cluster

Worker Nodes

Total CPU cores

CPU/node

CPU model

Memory/node

Disk space/node

Network

GPU model

GPU/node

HPC_EDR1

10

1920

192

AMD AMD Genoa 9654 @2.4GHz Processor

1.5TB

1.92TB (System: 20GB)

100GbE

N/A

N/A

HPC_HDR1

6

768

128

AMD EPYC 7662 64-Core Processor

1520GB

1TB (System: 20GB)

Eth:100Gbps

N/A

N/A

Intel-G4

2

256

128

Intel(R) Xeon(R) Gold 6448H

1520GB

1.06TB

Eth:100Gbps

N/A

N/A

GPU_V100

1

48

48

Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz

768GB

1TB (System: 20GB)

10GbE

V100

8

GPU_A100

1

64

64

AMD EPYC 7302 16-Core Processor

1024GB

1TB (System: 20GB)

100GbE

A100

8

HPC_FDR5-reserved

92

2208

24

Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

125GB

2TB (System: 400GB)

10GbE

N/A

N/A

Slurm Partitions (Queues)

Slurm Partitions updated since 2024-05-10 ( Slurm partitions may be updated on-demand periodically, please check up-to-date information in slurm-ui.twgrid.org. )

Partition

Cluster

Timelimit

Nodes

Total CPU Cores

Total GPU Boards

hdr1-al9_large

AMD HDR1

14-00:00:00

6

768

N/A

hdr1-al9_short

AMD HDR1

03:00:00

6

768

N/A

hdr1-al9_long_serial

AMD HDR1

14-00:00:00

6

768

N/A

hdr1-al9_moderate_serial

AMD HDR1

2-00:00:00

6

768

N/A

hdr1-al9_short_serial

AMD HDR1

04:00:00

6

768

N/A

edr1_large

AMD HDR1

14-00:00:00

10

1920

N/A

edr1_short*

AMD HDR1

03:00:00

10

1920

N/A

edr1_long_serial

AMD HDR1

14-00:00:00

10

1920

N/A

edr1_moderate_serial

AMD HDR1

2-00:00:00

10

1920

N/A

edr1_short_serial

AMD HDR1

04:00:00

10

1920

N/A

intel-g4-al9_large

Intel

14-00:00:00

2

256

N/A

intel-g4-al9_short

Intel

03:00:00

2

256

N/A

intel-g4-al9_long_serial

Intel

14-00:00:00

2

256

N/A

intel-g4-al9_moderate_serial

Intel

2-00:00:00

2

256

N/A

intel-g4-al9_short_serial

Intel

04:00:00

2

256

N/A

v100

NVIDIA V100

5-00:00:00

5

240

40

v100_long

NVIDIA V100

7-00:00:00

5

240

40

v100_short

NVIDIA V100

06:00:00

5

240

40

a100

NVIDIA A100

5-00:00:00

2

128

16

a100_long

NVIDIA A100

7-00:00:00

2

128

16

a100_short

NVIDIA A100

06:00:00

2

128

16

a100_devel

NVIDIA A100

00:20:00

1

64

8

Note

  • The resources are shared with different queues, so some of the resources are mutually exclusive with different queues.

  • Historical Resource Allocation

Slurm Quality of Service (QoS)

Slurm QoS since 2024-05-10 (UTC)

Partition

Priority Tier

MaxNodes

QoS Name

MaxTRES

MaxTRESPerUser

MinTRES

Flags

hdr1-al9_large

10

UNLIMITED

N/A

N/A

N/A

N/A

N/A

hdr1-al9_short

1

UNLIMITED

N/A

N/A

N/A

N/A

N/A

hdr1-al9_long_serial

10

1

N/A

N/A

N/A

N/A

N/A

hdr1-al9_moderate_serial

100

UNLIMITED

cpu_single_moderate

CPU=24

N/A

N/A

DenyOnLimit

hdr1-al9_short_serial

500

UNLIMITED

cpu_single_short

CPU=24

N/A

N/A

DenyOnLimit

edr1_large

10

UNLIMITED

N/A

N/A

N/A

N/A

N/A

edr1_short*

1

UNLIMITED

N/A

N/A

N/A

N/A

N/A

edr1_long_serial

10

1

N/A

N/A

N/A

N/A

N/A

edr1_moderate_serial

100

UNLIMITED

cpu_single_moderate

CPU=24

N/A

N/A

DenyOnLimit

edr1_short_serial

500

UNLIMITED

cpu_single_moderate

CPU=24

N/A

N/A

DenyOnLimit

intel-g4-al9_large

10

UNLIMITED

N/A

N/A

N/A

N/A

N/A

intel-g4-al9_short

1

UNLIMITED

N/A

N/A

N/A

N/A

N/A

intel-g4-al9_long_serial

10

1

N/A

N/A

N/A

N/A

N/A

intel-g4-al9_moderate_serial

100

UNLIMITED

cpu_single_moderate

CPU=24

N/A

N/A

DenyOnLimit

intel-g4-al9_short_serial

500

UNLIMITED

cpu_single_short

CPU=24

N/A

N/A

DenyOnLimit

v100

10

UNLIMITED

gpu_v100_general

gres/gpu=8

N/A

gres/gpu=1

DenyOnLimit

v100_long

1

UNLIMITED

gpu_v100_general

gres/gpu=8

N/A

gres/gpu=1

DenyOnLimit

v100_short

500

UNLIMITED

gpu_v100_short

gres/gpu=8

N/A

gres/gpu=1

DenyOnLimit

a100

10

UNLIMITED

gpu_a100_general

gres/gpu=8

N/A

gres/gpu=1

DenyOnLimit

a100_long

1

UNLIMITED

gpu_a100_general

gres/gpu=8

N/A

gres/gpu=1

DenyOnLimit

a100_short

500

UNLIMITED

gpu_a100_short

gres/gpu=8

N/A

gres/gpu=1

DenyOnLimit

a100_devel

10000

1

gpu_a100_devel

gres/gpu=1

N/A

N/A

N/A

Note

  • Brief Explanation of QoS

  • Priority Tier:

  • MaxTRES: maximum resources that can be requested in a QOS.

  • MaxTRESPerUser: maximum number of CPUs an user can request in a QOS.

  • MinTRES: maximum number of CPUs an user can request in a QOS.

  • MaxNodes:

  • Flags:

  • More information at https://slurm.schedmd.com/qos.html

Slurm Software Repository

environment-modules

Easy setup of your software environment by using environment-modules (in module command). Use the following commands to list and find softwares module:

module avail

Load MPICH2 + gcc48:

module load gcc/4.8.5
module load mpich

Unload all loaded modules:

module purge

Load Openmpi + Intel2018:

module load intel/2018
module load openmpi

Load OpenMPI + gcc48:

module load gcc/4.8.5
module load openmpi

Intel & OpenMP Compilers for AlmaLinux 9

module load intel_mpi/2021.6.0
module load icc/2022.1.0

Slurm Tutorials

On Site Slurm Documents

User documents for SLURM are located in

/ceph/sharedfs/software/tutorial/user_document/

Specify a working directory and copy the scripts in your HOME space to run the examples.

/ceph/sharedfs/software/user_document/scripts/*
/ceph/sharedfs/software/
/ceph/sharedfs/pkg/

Request for Specific Software Installation

For your software requirements, please contact to DiCOS-Support@twgrid.org.