Why isn't my job running?

From FarmShare

(Difference between revisions)
Jump to: navigation, search
(Created page with "==Why is my job still in the queue and not running?== There are not enough resources available for the resource manager to schedule your job on one of the queue instances on the ...")
 
Line 8: Line 8:
==Checking memory resources==
==Checking memory resources==
We currently configure a mem_free "complex attribute" for the execution hosts.  GE tracks this consumable attribute and also tracks actual mem_free (physical RAM).  Jobs are set to request a default of 4GB of mem_free per job.  So you can have many free slots on a host, but its memory can be fully used.  Use 'qstat -f -F' and look for the mem_free attributes of each host.
We currently configure a mem_free "complex attribute" for the execution hosts.  GE tracks this consumable attribute and also tracks actual mem_free (physical RAM).  Jobs are set to request a default of 4GB of mem_free per job.  So you can have many free slots on a host, but its memory can be fully used.  Use 'qstat -f -F' and look for the mem_free attributes of each host.
 +
 +
==Asking for help==
 +
This stuff is not so straightforward, so if you need a specific explanation, file a ticket; make sure to include your job id.  And it probably helps to include the output of 'qstat -f -j JOBID'.

Latest revision as of 11:06, 12 April 2012

Contents

Why is my job still in the queue and not running?

There are not enough resources available for the resource manager to schedule your job on one of the queue instances on the execution hosts.

But there are available processors!

What are you looking at? If you look at the output of 'qstat -g c', it shows you only "slots". While "slots" typically map to "CPU cores", we actually oversubscribe a bit on the barleys (28 slots on a 24-core machine). But there are other resources besides "slots" that may not be available".

Checking memory resources

We currently configure a mem_free "complex attribute" for the execution hosts. GE tracks this consumable attribute and also tracks actual mem_free (physical RAM). Jobs are set to request a default of 4GB of mem_free per job. So you can have many free slots on a host, but its memory can be fully used. Use 'qstat -f -F' and look for the mem_free attributes of each host.

Asking for help

This stuff is not so straightforward, so if you need a specific explanation, file a ticket; make sure to include your job id. And it probably helps to include the output of 'qstat -f -j JOBID'.

Personal tools
Toolbox
LANGUAGES