Triton Inference Server (DiCOSApp)
Triton Scheme
Triton Server
Using DiCOSAPP to start the Triton server
Ports will be revealed when the container is started
http port
grpc port
metric port (not opened)
Specficiations of the image:
Triton inference server 2.18
P100 GPU x 2
CPU x 4
Memory: 96 GB
Usage:
Start the Triton DiCOSAPP from DiCOS web
When the server start running, you will see the following boxes
Get the API port from the DiCOSAPP web page by press Open button, ports will be listed
HTTP
gRPC
Run your Triton client (see next section) to communicate with the server
Note:
The DiCOSAPP of Triton server is only accessable in the DiCOS resources for security reason
The API server will be: k8s-master01.twgrid.org
Upload Your Model
Currently, we have desginated a ceph path as the model_repository path of the Triton inference server for the users:
You could put your file in /ceph/sharedfs/groups/KAGRA/model_repository
Note:
The space will be accounted as KAGRA group user space
If you are using DiCOS submit, at this stage, only QDR2 and FDR5 cluster will have access on the /ceph partition
You could put your customized models to the model directory no matter the Triton server is running or not
Triton Client
There are two different ways to submit your Triton client to our worker nodes:
DiCOS submit (from dicos-ui05.grid.sinica.edu.tw or dicos-ui06.grid.sinica.edu.tw)
Because you are requesting CPU resources, so there is no need to specify the queue with GPU resources
Slurm submit (from slurm-ui01.twgrid.org)
Singularity Container
If you are using python as your programming language for the API access. A singularity image has been built for your usage. Location: /ceph/astro_phys/singularity_image/python_tritonclient_slim-buster.sif
Test Programs
You may get the following test programs from the Triton github repository (https://github.com/triton-inference-server/client/tree/main/src/python/examples):
simple_grpc_keepalive_client.py
simple_http_health_metadata.py
A simple test program in shell could be written as (test.sh):
server=k8s-master01.twgrid.org
echo "TEST HTTP"
wd=$PWD
http_port=31443 # 8000 port of original triton server
grpc_port=30457 # 8001 port of original triton server
python3 $wd/simple_http_health_metadata.py -u $server:$http_port
echo "----------------------"
echo "TEST gRPC"
python3 $wd/simple_grpc_keepalive_client.py -u $server:$grpc_port
echo "----------------------"
A customized script utlize the singularity container could be written as (start_singularity.sh):
#!/bin/bash
singularity instance start /ceph/astro_phys/singularity_image/python_tritonclient_slim-buster.sif triton_client
singularity exec instance://triton_client bash $PWD/test.sh
DiCOS Submit
dicos job submit -i . -c "bash start_singularity.sh" -N triton -j 1
Slurm Submit
sbatch start_singularity.sh
Accounting
DiCOSAPP will account for it’s GPU and CPU resources
DiCOS job/slurm job will account for it’s CPU resources