Skip to content

Job Managament

Job management commands

  • sacct: inventory data for completed and pending jobs (sacct -j )
  • sstat: statistics of the jobs being performed (sstat -j --format = AveCPU, AveRSS, AveVMSize, MaxRSS, MaxVMSize)
  • scontrol show: e.g. scontrol show job | partition
  • scontrol update: change the transaction
  • scontrol hold: pause the job
  • scontrol release: release the job
  • sprio: displays job priority
  • scancel: cancel the job

Cancelling jobs

For various reasons, you might want to terminate your running jobs or remove your waiting jobs from the queue. The command is scancel. Read "man scancel" documentation for more information. Run the straightforward command to kill two of your jobs, by giving their job number.

$ scancel <Job ID> <Job ID>

The following command

$ scancel -i -u your_account_name

kills all your jobs, but asks for each job if you really want to terminate that job.

$ scancel -u your_account_name --state=pending

terminates all your waiting jobs.

Monitoring jobs

To see the status of your program, you can run commands like:

  • jobinfo
  • jobinfo -u your_account_name
  • squeue

For example, to monitor the state of your jobs with squeue before they are finished:

   squeue -u (username)

The output of the squeue command consists of several columns including job ID, partition, job name, username, job state, elapsed time, number of nodes, node list, etc.

  JOBID  PARTITION    NAME     USER  ST       TIME  NODES NODE LIST (REASON)
  499980   longcpu  vega208t   user  PD       0:00      1         (Resources)
  499981   longcpu  vega192t   user  PD       0:00      1         (Priority)
  449911   longcpu  bxe_t280   user  R   1-01:23:39     1 cn0402
  499889   longcpu  vega256t   user  R      3:29:24     1 cn0011
  449133   longcpu  bxe_t240   user  R   1-03:38:21     1 cn0401

Job state is listed in the ST column of the output of the squeue command. The most common job state codes are:

  • R: Running
  • PD: Pending
  • CG: Completing
  • CA: Cancelled