Main Page

From FarmShare

(Difference between revisions)
Jump to: navigation, search
(Links)
(current barley policies)
Line 47: Line 47:
*1000 max jobs per user (look for max_u_jobs in output of 'qconf -sconf')  
*1000 max jobs per user (look for max_u_jobs in output of 'qconf -sconf')  
*3000 max jobs in the system (look for max_jobs in output of 'qconf -sconf')  
*3000 max jobs in the system (look for max_jobs in output of 'qconf -sconf')  
-
*48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq main.q')  
+
*48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq precisw.q')  
-
*one week max runtime for the long queue (look for h_rt in output of 'qconf -sq long.q')  
+
*one week max runtime for the long queue (look for h_rt in output of 'qconf -sq precise-long.q')  
*15min max runtime in test.q  
*15min max runtime in test.q  
*4GB default mem_free request per job
*4GB default mem_free request per job

Revision as of 13:25, 29 June 2012

This wiki is intended for the users of FarmShare, the Stanford shared research computing environment: the "cardinal", "corn", and "barley" machines. For a general description of this service, and Stanford's shared computing policies, see the main service catalog page.

Most useful pages: Special:AllPages and Special:RecentChanges and User Guide and FAQ and FarmShare tutorial


Last 10 messages on Farmshare-Discuss mail list (this month)

Contents

    How to connect

    The machines are available for anyone with a SUNetID. Simply "ssh corn.stanford.edu" with your SUNetID credentials. The DNS name "corn.stanford.edu" actually goes to a load balancer and it will connect you to a particular corn machine (e.g. corn21) that has relatively low load.

    The "barley" machines are designed to be used for high performance computing (HPC) and only accessible via a resource manager (currently Open Grid Engine). You cannot log in directly, but you can submit jobs from any corn.  Storage dedicated for jobs running on the barley cluster is available via /mnt/glusterfs on all corn and barley nodes.  Login to senpai1.stanford.edu and a directory will be created for you as /mnt/glusterfs/<your user name> (can take up to 5 minutes).  Sign up and email the farmshare-discuss mailing list if you have any questions or would like any info not listed here.

    cardinal info

    The "cardinal" machines are small VMs intended for long-running processes (on the order of days) that are not resource intensive, e.g. mail/chat clients. You could log in to a cardinal and run a screen/tmux session there to do things on other machines.

    Simply "ssh cardinal.stanford.edu" with your SUNetID credentials.

    There are currently 3 cardinal machines: cardinal1, cardinal2 and cardinal3, load-balanced via cardinal.stanford.edu.

    corn info

    The "corn" machines are general-purpose Ubuntu boxes and you can run whatever you want on them (so long as you don't negatively impact other users). Please read the policies and the motd first.

    Each of the 30 corn machines has 8 cores, 32GB RAM and ~70GB of local disk in /tmp.

    barley info

    The "barley" machines are general-purpose newer Ubuntu boxes that can run jobs that you submit via the resource manager software. You should not log in to any barley directly, but can do so to troubleshoot your jobs.

    current barley policies

    • 1000 max jobs per user (look for max_u_jobs in output of 'qconf -sconf')
    • 3000 max jobs in the system (look for max_jobs in output of 'qconf -sconf')
    • 48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq precisw.q')
    • one week max runtime for the long queue (look for h_rt in output of 'qconf -sq precise-long.q')
    • 15min max runtime in test.q
    • 4GB default mem_free request per job

    Technical details

    • 19 new machines, AMD Magny Cours 24 cores each, 96GB RAM
    • 1 new machine, AMD Magny Cours 24 cores, 192GB RAM
    • ~450GB local scratch on each
    • ~7TB in /mnt/glusterfs shared across all barley and corn systems
    • Grid Engine v6.2u5 (via standard Debian package)
    • 10GbE interconnect (Juniper QFX3500 switch)

    how to use the barley machines

    To start using these new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.

    Initial issues:

    • You are limited in space to your AFS homedir ($HOME) and local scratch disk on each node ($TMPDIR)
    • The execution hosts don't accept interactive jobs, only batch jobs for now.
    • You'll want to make sure you have your Kerberos TGT and your AFS token.

    If you want to use the newer bigger storage:

    1. log into senpai1: "ssh sunetid@<host>.stanford.edu"
    2. cd to /mnt/glusterfs/<your username> (or wait 5mins if it doesn't exist yet)
    3. write a job script: "$EDITOR test_job.script"
      1. see 'man qsub' for more info
      2. use env var $TMPDIR for local scratch
      3. use /mnt/glusterfs/<your username> for shared data directory
    4. submit the job for processing: "qsub -cwd test_job.script"
    5. monitor the jobs with "qstat -f -j JOBID"
      1. see 'man qstat' for more info
    6. check the output files that you specified in your job script (the input and output files must be in /mnt/glusterfs/)

    Any questions, please email 'farmshare-discuss@lists.stanford.edu' Some good intro usage examples here: http://gridscheduler.sourceforge.net/howto/basic_usage.html

    Examples of using the barley cluster

    1. Introductory examples: Examples Ready to Make
    2. R
    3. MATLAB
    4. Access Mysql from Matlab
    5. Rmpi
    6. Gaussian
    7. Gaussview: Automated Submission Script Creation & Submission

    barley software

    stock software

    The barley machines are running Ubuntu 11.04, and the software is from the Ubuntu repositories, e.g. do 'dpkg -l' to see the list of installed packages.

    licensed software

    There is a group on campus called "SSDS" that can provide support for R, SAS and stata: http://www.stanford.edu/group/ssds/cgi-bin/drupal/content/who-we-are-what-we-do

    Monitoring / Status

    For important announcements, we plan to:

    • add it to this wiki
    • modify /etc/motd on the corn machines
    • send a mail to farmshare-announce

    Mailing Lists

    We have mailing lists, @lists.stanford.edu - https://itservices.stanford.edu/service/mailinglists/tools

    Links

    Want to learn HPC? Free education materials available:


    GPUs! We don't have any GPUs as part of FarmShare, but there are other campus resources available:


    Other similar wikis/clusters on campus (you might not have access to these):

    Vision

    The Farmshare resources are being made available to students, faculty and staff with fully sponsored SunetIDs to facilitate research at Stanford University.  This resource is designed so that those doing research will have a place to experiment and learn about technical solutions to assist in reaching their research goals without needing to write a grant for a cluster.  The Farmshare resources are focused on making it easier to learn how to parallelize research computing tasks and use research software including a "scheduler" or "distributed resource management system" to submit compute jobs.

    By using Farmshare, new researchers can more easily adapt to using larger clusters when they have big projects that involve using federally funded resources, shared Stanford clusters, or on a small grant funded cluster.

    Personal tools
    Toolbox
    LANGUAGES