Recommendations

Allocation of whole nodes

On Vega the option --exclusive is not recocomanded and the option --multithreads is enabled. Users can allocate the whole nodes by allocating all cpus or all memory. Allocation of all memory is not recommended, especially for jobs that require a large number of nodes and the small number of CPUs.

Example of how to allocate all 128 cpus per node:

#SBATCH --job-name=test         #name of the job
#SBATCH --nodes=2               #number of required nodes
#SBATCH --ntasks-per-node=128   #number of task per node
#SBATCH --cpus-per-task=1       #number of cpus per task
#SBATCH --hint=nomultithread    #disable multithread, means that the job used physical cores
#SBATCH --partition=cpu         #partition name
#SBATCH --output=foo-out.%j     #outout file from the job
#SBATCH --error=foo-err.%j      #error file from the job
#SBATCH --time=48:00:00         #executing time

In this example, the memory is not specified, which means that the job will allocate the whole memory. It does not mean that the job need/use the whole memory. From the aspect of cpus usage, this job has maximum efficiency. On all HPC Vega login nodes, users can check the efficiency of resource usage on the completed job with the command seff. Check the efficiency of your job.

Not recommanded example

#SBATCH --job-name=test       #name of the job
#SBATCH --nodes=2             #number of required nodes
#SBATCH --ntasks-per-node=64  #number of task per node
#SBATCH --cpus-per-task=1     #number of cpus per task
#SBATCH --hint=nomultithread  #disable multithread, means that the job used physical cores
#SBATCT --mem=512GB           #specify memory for the job
#SBATCH --partition=cpu       #partition name
#SBATCH --output=foo-out.%j   #outout file from the job
#SBATCH --error=foo-err.%j    #error file from the job
#SBATCH --time=48:00:00       #executing time

In this example, the job needs CPU=64 and MEM=256GB. The node will allocate the whole node because of the memory, and no other job can be accepted. From the aspect of cpus usage, this job has 50% efficiency which means that half cpus are not useable. From the aspect of memory usage, this job has 100% efficiency. But, if the job require 100 nodes instead of 2, the system will have 100 x64 CPUs=6400 CPUs unusable. In case of 256 nodes, the number of unusable CPUs are 16.384.

On all HPC Vega login nodes, users can check the efficiency of resource usage on the completed job with the command seff. Check the efficiency of your job.