Proclus

From VISTA LAB WIKI

Jump to: navigation, search

First off, read the page on Sun Grid Engine. Most of that information applies to the proclus system so it is good to be familiar with it.

Contents

[edit] General notes on proclus

  • The first thing you need to do is to get an account on proclus. For that, see http://proclus.stanford.edu. The process may involve having to change UIDs on BIAC for you.
  • Once you have an account, to get to proclus, simply ssh to proclus.stanford.edu using your SUNET id as the login.
  • proclus.stanford.edu is the login node and this is where you can submit SGE jobs from. The proclus system actually consists of a large number of nodes --- to see all the wonderful nodes, type 'qhost'.
  • Data access is a tricky issue when dealing with proclus. Thanks to the work of Michael, proclus directly mounts BIAC (the RAID system). Thus, proclus can read and write to BIAC.
  • BIAC is visible from proclus through /biac4/wandell.
Note that your home directory on proclus is its own distinct entity and is not on BIAC, and note that your home directory on white is also not on BIAC.
  • NO other machines are visible to proclus. This means that anything on white, azure, peach, etc. that you use to link to will not be available while your logged into proclus.
  • There are different strategies you can use to manage your files on BIAC and proclus. One strategy is to maintain a minimal footprint on proclus and to access all of your code and data directly from BIAC. (This is the strategy KNK likes) A different strategy is to maintain a nice code repository (e.g. through git or svn) on proclus and then use that code to run computations on data that is stored on BIAC. Or do whatever you like.
  • Due to historical reasons, the directory structure of /biac4/wandell is messy. For example, my directory on /biac4/wandell is /biac4/wandell/biac3/wandell7/knk/. This is long and hard to type. A solution is to use softlinks. For example, on white (the local system), I made a softlink to my directory on /biac4/wandell using "ln -s /biac4/wandell/biac3/wandell7/knk ~/ext". This creates a convenient softlink such that now all I have to do is type something like ~/ext/file. I then created the exact same softlink on proclus. After doing so, then no matter what system I am on (either white or proclus), I can access the same underlying files through the same softlink.
  • Additionally, for code and other user-specific things you would like to be visible on proclus you can use /biac4/wandell/users/YOURUSERNAME to keep things organized.

[edit] Notes on SGE on proclus

  • proclus has SGE installed (just like the SGE on our local system). The SGE system on proclus is distinct from the SGE on our local system. To submit jobs on the proclus SGE, you must ssh into proclus and do your job submission from there.
  • I have made a new simplified version of the sgerun.m script for use with the proclus SGE. The new script is called sgerun2.m.
  • qhost and qstat are your friends. qmon doesn't work currently (we need to bug the sysadmins about installing the requisite libraries). sgestat.m is deprecated.
  • Try to minimize your MATLAB path for SGE jobs. For instance, if each job takes 2 minutes to set up the MATLAB path, that's a lot of wasted overhead.
  • The SGE system on proclus has no MATLAB licensing issues, so we no longer have to worry about that.
  • Currently, you can ssh to any of the nodes and run an interactive MATLAB session (instead of using the SGE system).

[edit] How to start using the SGE on proclus

The following steps are all to be done on proclus.

Initial setup:

1. Add the following lines to your .bash_profile:

 module load MATLAB-R2012b
 module avail 
  
 # User-specific aliases
 alias matlab='/hsgs/software/MATLAB-R2012b/bin/matlab -nosplash -nodesktop' 
 

The "module" commands make the MATLAB software available to you. The "alias" command specifies exactly what MATLAB version gets run when you type in 'matlab'.

2. Decide how you will organize your code and data. In particular, be aware of your MATLAB directory (~/matlab) and your MATLAB startup file (~/matlab/startup.m). Will you have two copies or one? (Personally, I just make ~/matlab a softlink to a directory on BIAC, so I have to manage only one matlab folder.)

3. Checkout SVN repository "kendrick" and add it to your MATLAB path [NOTE: this repository assumes at least MATLAB 7.6]. (If you already have the repository, do an SVN update to ensure that you have the latest version!):

 cd ~
 git clone https://github.com/kendrickkay/knkutils
 echo "addpath('~/knkutils');" >> ~/matlab/startup.m

4. Make the directory that will contain SGE output:

 mkdir ~/sgeoutput

5. Make a matlabsge.sh script in your home directory (note that the script copied here is different from the one we use on our own local SGE system):

 cp /biac4/wandell/biac3/wandell7/knk/proclusmatlabsge.sh ~/matlabsge.sh

6. Edit the script with the specific MATLAB call that you want:

 nano ~/matlabsge.sh

7. Ensure that the script is executable:

 chmod +x ~/matlabsge.sh

8. Add the SGE output directory to your MATLAB path at startup (note that I prefer to just change into the directory instead of actually adding the directory to my path (since having a directory with lots of files in it on the path may slow MATLAB down)).

 echo "cd('~/sgeoutput');" >> ~/matlab/startup.m

9. Set the queue name: This must be set once where "batch.q" is the name of the proclus queue.

 setpref('kendrick','sgequeue','batch.q');

Try it out:

1. Log onto proclus (since that is an execution host):

 ssh -XY  proclus
  • The '-XY' will allow 'X-forwarding' and should always be used when logging in to proclus and all compute nodes.

2. Start MATLAB:

 matlab

3. Submit a job called test1 that simply issues the 'ls' command:

 sgerun2('ls','test1',0);

4. Monitor your job:

 qstat

5. When job is completed, inspect the results:

 !cat ~/sgeoutput/job_test1.o*

6. Submit a job called test2 that makes use of the <wantworkspace> functionality:

 somedata = [0 1 2];
 sgerun2('somedata2 = somedata + 1','test2',1);

7. When job is completed, inspect the results:

 !cat ~/sgeoutput/job_test2.o*

8. Submit a job called test3 that farms out five different jobs and also saves output:

 sgerun2([],'test3',0,1:5);
 result = jobindex.^2;
 save(sprintf('~/test%d.mat',jobindex));
 .

9. When job is completed, inspect the results:

 loadmulti('~/test*.mat','result',2)

10. Now you are ready to do anything!

[edit] Monitoring jobs on proclus

If you want to see only your own jobs:

  qstat -u "{user_name}"

If you want to see everyone's jobs:

  qstat -u "*"

You can see how much memory a specific job requests by looking at the "full" output of a particular job:

   qstat -f -j JOBID

Here is example output of `qhost -j -F h-vmem` for one node:

   cn35                    linux-x64      16  4.89   63.0G   11.0G 1000.0M   27.7M
       Host Resource(s):      hc:h_vmem=2.747G
      4048470 0.50062 sfm_mb_sta klchan13     r     08/17/2013 14:18:03 batch.q@cn MASTER
      4077623 0.50007 qsubScript ngarud       r     08/17/2013 10:53:03 batch.q@cn MASTER
      4077624 0.50007 qsubScript ngarud       r     08/17/2013 10:53:03 batch.q@cn MASTER
      4077647 0.50007 qsubScript ngarud       r     08/17/2013 10:53:33 batch.q@cn MASTER
      4077654 0.50007 qsubScript ngarud       r     08/17/2013 12:06:03 batch.q@cn MASTER

To interpert: this is node named cn35 and it has 16 slots and 63G of physical RAM. Though right now the load is only 4.89 and only 11G of RAM is in use, the slots and/or RAM are all allocated to the five jobs running on that node. e.g. job 4048470 requested 40G of RAM (out of the 64).


If you want an over-view of the cluster's state at this point in time, `qstat -g c`

   CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE
   --------------------------------------------------------------------------------
   batch.q                           0.63    301      0   1200   1696     32    176
   cuda.q                            0.01      3      0     93    128      0     32
   interactive.q                     0.08     16      0     80     96      0      0
Personal tools