GridEngine
From FarmShare
(→mem_free) |
(→mem_free) |
||
Line 81: | Line 81: | ||
#request 37GB RAM for this one-slot job | #request 37GB RAM for this one-slot job | ||
qsub -l mem_free=37G job.script | qsub -l mem_free=37G job.script | ||
+ | |||
+ | Grid Engine will compare to the lower of the two mem_free values on the host when scheduling the job. |
Revision as of 12:48, 9 April 2012
We're using the Debian packages of "Sun Grid Engine" which isn't quite "Sun" anymore since Oracle bought Sun, and the Debian packages are a bit behind the current forks of Open Grid Engine or Son of Grid Engine or Univa Grid Engine.
Contents |
documentation
Start with 'man sge_intro'. Move on to 'man qsub'. Try submitting a simple job with 'echo "sleep 3600" | qsub", then run 'qstat' and 'qdel'.
shell mode
GE can run in "POSIX mode" or "Unix mode". See the "shell_start_mode" section of 'man sge_conf'
# qconf -sconf|grep shell shell_start_mode unix_behavior login_shells bash,sh,ksh,csh,tcsh
So you'll want to explicitly specify your shell with the -S flag to qsub. E.g.
# get rid of spurious messages about tty/terminal types #$ -S /bin/sh
useful commands
- see all hosts: qhost
- see which jobs are on which hosts: qhost -j
- see all jobs: qstat -f -u "*"
- see all host attributes: qstat -f -F -u "*"
- explain state 'a': qstat -explain a
- summary of slots: qstat -g c
queues
Under SGE, a 'queue' is a set of settings that get applied to jobs that are assigned to that queue.
We are using pretty much the default settings. This page should contain a description of all of the settings we've changed away from the defaults along with a reason for doing so.
We created four queues: bigmem.q, long.q, main.q and test.q. Use 'qconf -sql' to list queues, 'qconf -sq queue_name' to see the queue settings.
main.q
Most jobs end up here. Time limit is 48hrs.
bigmem.q
This queue is explicitly tied to the machine with more physical memory. Same settings as main.q
long.q
2week time limit instead of 48hrs
test.q
15min time limit instead of 48hrs
If you want to submit a job to this test queue, you must use '-l testq=1' flag to qsub.
starter method
There is a file /usr/local/libexec/gridengine/job_start.sh which sets up the checkpointing environment if you specified checkpointing for your job.
making the test.q
- add testq to the global complex attributes list (qconf -mc)
senpai1:/root# qconf -sc |grep testq testq testq BOOL == FORCED NO 0 0
"requestable" is set to FORCED to make that attribute required (per 'man 5 complex')
- add testq to the complex attributes of the queue (qconf -sq test.q)
senpai1:/root# qconf -sq test.q|grep testq hostlist barley-testq.stanford.edu complex_values testq=1
- users now need to use the qsub parameter "-l testq=1" to have the job go to that queue instance. no jobs without testq=1 will go in that queue.
relevant thread: http://gridengine.org/pipermail/users/2012-March/002972.html
mem_free
Each node has a "load value" named "mem_free" that tracks actual free memory available.
Each node has a requestable and consumable "complex value" named "mem_free" that is set to 95G (190G for barley05).
Each job requests 4G of mem_free by default (unless uses specifies a different value). (qconf -sc)
You can see current values for the execution hosts with qstat -F, e.g. 'qstat -F -f -u "*"' and then search for mem_free.
#request 37GB RAM for this one-slot job qsub -l mem_free=37G job.script
Grid Engine will compare to the lower of the two mem_free values on the host when scheduling the job.