GridEngine

From FarmShare

(Difference between revisions)
Jump to: navigation, search
(Grid Engine settings on farmshare)
Line 46: Line 46:
==starter method==
==starter method==
There is a file /usr/local/libexec/gridengine/job_start.sh which sets up the checkpointing environment if you specified checkpointing for your job.
There is a file /usr/local/libexec/gridengine/job_start.sh which sets up the checkpointing environment if you specified checkpointing for your job.
 +
 +
==making the test.q==
 +
* add testq to the global complex attributes list (qconf -mc)
 +
 +
  senpai1:/root# qconf -sc |grep testq
 +
  testq              testq      BOOL        ==    YES        NO        0        0
 +
 +
* add testq to the complex attributes of the queue (qconf -sq test.q)
 +
 +
  senpai1:/root# qconf -sq test.q|grep testq
 +
  hostlist              barley-testq.stanford.edu
 +
  complex_values        testq=1
 +
 +
* users now need to use the qsub parameter "-l testq=1" to have the job go to that queue instance.

Revision as of 11:32, 12 March 2012

We're using the Debian packages of "Sun Grid Engine" which isn't quite "Sun" anymore since Oracle bought Sun, and the Debian packages are a bit behind the current forks of Open Grid Engine or Son of Grid Engine or Univa Grid Engine.

Contents

documentation

Start with 'man sge_intro'. Move on to 'man qsub'. Try submitting a simple job with 'echo "sleep 3600" | qsub", then run 'qstat' and 'qdel'.

shell mode

GE can run in "POSIX mode" or "Unix mode". See the "shell_start_mode" section of 'man sge_conf'

 # qconf -sconf|grep shell
 shell_start_mode             unix_behavior
 login_shells                 bash,sh,ksh,csh,tcsh

So you'll want to explicitly specify your shell with the -S flag to qsub. E.g.

 # get rid of spurious messages about tty/terminal types
 #$ -S /bin/sh

useful commands

  • see all hosts: qhost
  • see which jobs are on which hosts: qhost -j
  • see all jobs: qstat -f -u "*"
  • see all host attributes: qstat -f -F -u "*"

queues

Under SGE, a 'queue' is a set of settings that get applied to jobs that are assigned to that queue.


Grid Engine settings on farmshare

We are using pretty much the default settings. This page should contain a description of all of the settings we've changed away from the defaults along with a reason for doing so.

We created four queues: bigmem.q, long.q, main.q and test.q. Use 'qconf -sql' to list queues, 'qconf -sq queue_name' to see the queue settings.

main.q

Most jobs end up here. Time limit is 48hrs.

bigmem.q

This queue is explicitly tied to the machine with more physical memory. Same settings as main.q

long.q

2week time limit instead of 48hrs

test.q

15min time limit instead of 48hrs

starter method

There is a file /usr/local/libexec/gridengine/job_start.sh which sets up the checkpointing environment if you specified checkpointing for your job.

making the test.q

  • add testq to the global complex attributes list (qconf -mc)
 senpai1:/root# qconf -sc |grep testq
 testq               testq      BOOL        ==    YES         NO         0        0
  • add testq to the complex attributes of the queue (qconf -sq test.q)
 senpai1:/root# qconf -sq test.q|grep testq
 hostlist              barley-testq.stanford.edu
 complex_values        testq=1
  • users now need to use the qsub parameter "-l testq=1" to have the job go to that queue instance.
Personal tools
Toolbox
LANGUAGES