In order to ensure users fair share of the cluster and to consider the differences in the used resources, Vega uses Slurm accounting and fairshare system. Slurm’s Trackable RESources (TRES) allows the scheduler to charge users for how much resources they have used. On Vega system we set TRES for CPU, GPU and Memory usage.
To view the configured TRES charge, you can run:
scontrol show partition <name>
For CPU partition the weights are:
For GPU partition the weights are:
Since all CPUs are of the same type, we normalise TRES to 1.0. The theoretical performance of AMD EPYC 7H12 processor is 2 TFLOPs per socket and the theoretical peak performance with double precision of NVIDIA A100 is 9.5 TFLOPs.
In the case of memory we set the TRES based off of the following formula:
NumCore * CoreTRES / TotalMem
where NumCore is the number of cores per node, CoreTRES is the TRES value, and TotalMem is the total available memory for the node.