Job tracking¶
Job info by qstat¶
The current state of the job can be probed by qstat command.
Example:
qstat job_ID # display status of selected job (short format)
qstat -f job_ID # display status of job (long format)
qstat -u user123 # list all user123's running or waiting jobs on PBS server
Job states¶
PBS Pro uses different codes to mark job state within the PBS ecosystem.
| State | Description |
|---|---|
| Q | Queued |
| H | Held. Job is put into a held state by the server, user or administrator. Job stays in a held state until it is released by a user or administrator. |
| R | Running |
| S | Suspended (substate of R) |
| E | Exiting after having run |
| F | Finished |
| X | Finished (subjobs only) |
| W | Waiting. Job is waiting for its requested execution time to be reached, or job is delayed due to stagein failure. |
Output of running jobs¶
Although the input and temporary files for calculation lie in $SCRATCHDIR, the standard output (STDOUT) and standard error output (STDERR) are elsewhere.
To see current state of these files in a running job, proceed in these steps:
- find on which host the job runs by
qstat -f job_ID | grep exec_host2 sshto this host- on the host, navigate to
/var/spool/pbs/spool/directory and examine the files$PBS_JOBID.OUfor STDOUT, e.g.13031539.pbs-m1.metacentrum.cz.OU$PBS_JOBID.ERfor STDERR, e.g.13031539.pbs-m1.metacentrum.cz.ER
- To watch a file continuously, you can also use a command
tail -f
For example:
(BULLSEYE)user123@tarkil:~$ qstat -f 13031539.pbs-m1.metacentrum.cz | grep exec_host2
exec_host2 = zenon41.cerit-sc.cz:15002/12
(BULLSEYE)user123@tarkil:~$ ssh zenon41.cerit-sc.cz
user123@zenon41.cerit-sc.cz:/var/spool/pbs/spool$ tail -f 13031539.pbs-m1.metacentrum.cz.OU
Last update:
December 31, 2024