History Slurm Resources and Queues

Slurm Resources

Slurm Resources 2022-08-17

Cluster

Worker Nodes

Total CPU cores

CPU/node

CPU model

Memory/node

Disk space/node

Network

GPU model

GPU/node

HPC_FDR5

92

2208

24

Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

125GB

2TB (System: 400GB)

10GbE

N/A

N/A

HPC_HDR1

6

768

128

AMD EPYC 7662 64-Core Processor

1520GB

1TB (System: 20GB)

100GbE

N/A

N/A

GPU_V100

1

48

48

Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz

768GB

1TB (System: 20GB)

10GbE

V100

8

GPU_A100

1

64

64

AMD EPYC 7302 16-Core Processor

1024GB

1TB (System: 20GB)

100GbE

A100

8

Slurm Resources 2022-07-27 (Corrected: 2022-07-29)

Cluster

Worker Nodes

Total CPU cores

CPU/node

CPU model

Memory/node

Disk space/node

Network

GPU model

GPU/node

HPC_QDR4

50

1000

20

Intel Xeon E5-2650L v2 @ 1.7GHz

128GB

1TB (System: 50GB)

10GbE

N/A

N/A

HPC_FDR5

92

2208

24

Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

125GB

2TB (System: 400GB)

10GbE

N/A

N/A

HPC_HDR1

6

768

128

AMD EPYC 7662 64-Core Processor

1520GB

1TB (System: 20GB)

100GbE

N/A

N/A

GPU_V100

1

48

48

Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz

768GB

1TB (System: 20GB)

10GbE

V100

8

GPU_A100

1

64

64

AMD EPYC 7302 16-Core Processor

1024GB

1TB (System: 20GB)

100GbE

A100

8

Slurm Queues

Slurm Partitions since 2022-09-15 13:00 (UTC)

Partition

Timelimit

Total CPU Cores

Total GPU Boards

Nodes

Priority

QoS

Resource

Note

large

14-00:00:0

1920

N/A

80

10

N/A

FDR5

long_serial

14-00:00:0

240

N/A

10

10

N/A

FDR5

MaxNodes=1

short

3-00:00:0

2208

N/A

92

1

N/A

FDR5

default

moderate_serial

4-00:00:00

2208

N/A

92

100

cpu_single_moderate

FDR5

MaxNodes=60, new since 2022-09-14

short_serial

04:00:00

2208

N/A

92

500

cpu_single_short

FDR5

MaxNodes=60, new since 2022-09-14

development

1:00:0

48

N/A

2

1000

N/A

FDR5

MaxNodes=1

a100

5-00:00:0

64

8×A100

1

1

gpu_a100_general

A100

new QoS since 2022-09-14

a100_short

06:00:00

64

8×A100

1

500

gpu_a100_short

A100

new since 2022-09-14

v100

5-00:00:00

48

8×V100

1

1

gpu_v100_general

V100

new QoS since 2022-09-14

v100_short

06:00:00

48

8×V100

1

500

gpu_v100_short

V100

new since 2022-09-14

amd

5-00:00:00

768

N/A

6

1

N/A

HDR1

amd_short

04:00:00

768

N/A

6

500

cpu_single_short

HDR1

new since 2022-09-14

Slurm Partitions since 2022-09-14 09:00 (UTC)

Partition

Timelimit

Total CPU Cores

Total GPU Boards

Nodes

Priority

QoS

Resource

Note

large

14-00:00:0

1920

N/A

80

10

N/A

FDR5

long_serial

14-00:00:0

240

N/A

10

10

N/A

FDR5

MaxNodes=1

short

3-00:00:0

2208

N/A

92

1

N/A

FDR5

default

moderate_serial

4-00:00:00

2208

N/A

92

100

cpu_single_moderate

FDR5

MaxNodes=60, new since 2022-09-14

short_serial

04:00:00

2208

N/A

92

500

cpu_single_short

FDR5

MaxNodes=60, new since 2022-09-14

development

1:00:0

48

N/A

2

1000

N/A

FDR5

MaxNodes=1

a100

5-00:00:0

64

8×A100

1

1

gpu_general

A100

new QoS since 2022-09-14

a100_short

06:00:00

64

8×A100

1

500

gpu_short

A100

new since 2022-09-14

v100

5-00:00:00

48

8×V100

1

1

gpu_general

V100

new QoS since 2022-09-14

v100_short

06:00:00

48

8×V100

1

500

gpu_short

V100

new since 2022-09-14

amd

5-00:00:00

768

N/A

6

1

N/A

HDR1

amd_short

04:00:00

768

N/A

6

500

cpu_single_short

HDR1

new since 2022-09-14

Slurm Partitions after 2022-08-15 (Corrected: 2022-07-29)

Partition

Timelimit

CPU Cores

GPU Boards

Nodes

Resource

large

14-00:00:0

1920

N/A

80

FDR5

long_serial

14-00:00:0

240

N/A

10

FDR5

short

3-00:00:0

2208

N/A

92

FDR5

development

1:00:0

48

N/A

2

FDR5

a100

5-00:00:0

64

8×A100

1

A100

v100

5-00:00:0

48

8×V100

1

V100

amd

5-00:00:0

768

N/A

6

HDR1

Slurm Partitions 2022-07-27 (Corrected: 2022-07-29)

Partition

Timelimit

CPU Cores

GPU Boards

Nodes

Resource

large

14-00:00:0

840+1920

N/A

42+80

QDR4+FDR5

long_serial

14-00:00:0

100+240

N/A

5+10

QDR4+FDR5

short

3-00:00:0

1000+2208

N/A

50+92

QDR4+FDR5

development

1:00:0

20+48

N/A

1+2

QDR4+FDR5

a100

5-00:00:0

64

8×A100

1

A100

v100

5-00:00:0

48

8×V100

1

V100

amd

5-00:00:0

768

N/A

6

HDR1

Slurm Partitions 2022-06-27 (Corrected: 2022-07-29)

Partition

Timelimit

CPU Cores

GPU Boards

Nodes

Resource

large

14-00:00:0

840

N/A

42

QDR4

long_serial

14-00:00:0

100

N/A

5

QDR4

short

3-00:00:0

1000

N/A

50

QDR4

development

1:00:0

20

N/A

1

QDR4

a100

5-00:00:0

64

8×A100

1

A100

v100

5-00:00:0

48

8×V100

1

V100

amd

5-00:00:0

768

N/A

6

HDR1

Note

The resources are shared with different queues, so some of the resources are mutually exclusive with different queues.

Slurm Quality of Service (Qos)

Slurm QoS since 2022-09-15 13:00 (UTC)

QoS Name

Flags

MaxTRES

MaxTRESPerUser

MinTRES

gpu_a100_general

DenyOnLimit

gres/gpu=4

gres/gpu=4

gres/gpu=1

gpu_a100_short

DenyOnLimit

gres/gpu=4

gres/gpu=4

gres/gpu=1

gpu_v100_general

DenyOnLimit

gres/gpu=4

gres/gpu=4

gres/gpu=1

gpu_v100_short

DenyOnLimit

gres/gpu=4

gres/gpu=4

gres/gpu=1

cpu_single_short

DenyOnLimit

cpu=1

cpu_single_moderate

DenyOnLimit

cpu=1

Slurm QoS since 2022-09-14 09:00 (UTC)

QoS Name

Flags

MaxTRES

MaxTRESPerUser

MinTRES

gpu_general

DenyOnLimit

gres/gpu=4

gres/gpu=4

gres/gpu=1

gpu_short

DenyOnLimit

gres/gpu=4

gres/gpu=4

gres/gpu=1

cpu_single_short

DenyOnLimit

cpu=1

cpu_single_moderate

DenyOnLimit

cpu=1