Main Page
From FarmShare
(→current barley policies) |
(→Links) |
||
Line 178: | Line 178: | ||
*SU-HPC group: https://www.stanford.edu/group/su-hpc/cgi-bin/mediawiki/index.php/Special:Recentchanges | *SU-HPC group: https://www.stanford.edu/group/su-hpc/cgi-bin/mediawiki/index.php/Special:Recentchanges | ||
*HPCC wiki: https://www.stanford.edu/group/hpcc/cgi-bin/mediawiki/index.php/Main_Page | *HPCC wiki: https://www.stanford.edu/group/hpcc/cgi-bin/mediawiki/index.php/Main_Page | ||
+ | *Proclus (H&S cluster): https://www.stanford.edu/group/proclus/cgi-bin/mediawiki/index.php/Main_Page | ||
= Vision = | = Vision = |
Revision as of 11:30, 4 October 2012
This wiki is intended for the users of FarmShare, the Stanford shared research computing environment: the "cardinal", "corn", and "barley" machines. For a general description of this service, and Stanford's shared computing policies, see the main service catalog page.
Most useful pages: Special:AllPages and Special:RecentChanges and User Guide and FAQ and FarmShare tutorial
Last 10 messages on Farmshare-Discuss mail list (this month) | ||
| |
How to connect
The machines are available for anyone with a SUNetID. Simply "ssh corn.stanford.edu" with your SUNetID credentials. The DNS name "corn.stanford.edu" actually goes to a load balancer and it will connect you to a particular corn machine (e.g. corn21) that has relatively low load.
The "barley" machines are designed to be used for high performance computing (HPC) and only accessible via a resource manager (currently Open Grid Engine). You cannot log in directly, but you can submit jobs from any corn. Storage dedicated for jobs running on the barley cluster is available via /mnt/glusterfs on all corn and barley nodes. Login to senpai1.stanford.edu and a directory will be created for you as /mnt/glusterfs/<your user name> (can take up to 5 minutes). Sign up and email the farmshare-discuss mailing list if you have any questions or would like any info not listed here.
cardinal info
The "cardinal" machines are small VMs intended for long-running processes (on the order of days) that are not resource intensive, e.g. mail/chat clients. You could log in to a cardinal and run a screen/tmux session there to do things on other machines.
Simply "ssh cardinal.stanford.edu" with your SUNetID credentials.
There are currently 3 cardinal machines: cardinal1, cardinal2 and cardinal3, load-balanced via cardinal.stanford.edu.
corn info
The "corn" machines are general-purpose Ubuntu boxes and you can run whatever you want on them (so long as you don't negatively impact other users). Please read the policies and the motd first.
- Policies: http://itservices.stanford.edu/service/sharedcomputing/policies
- IT services page: https://itservices.stanford.edu/service/sharedcomputing
- VNC help: https://itservices.stanford.edu/service/sharedcomputing/vnc
- Q? File HelpSU: http://helpsu.stanford.edu/?pcat=farmshare
- Future vision as of summer 2010: http://itservices.stanford.edu/strategy/sysadmin/timeshare
Each of the 30 corn machines has 8 cores, 32GB RAM and ~70GB of local disk in /tmp.
barley info
The "barley" machines are general-purpose newer Ubuntu boxes that can run jobs that you submit via the resource manager software. You should not log in to any barley directly, but can do so to troubleshoot your jobs.
current barley policies
- 480 max jobs per user (look for max_u_jobs in output of 'qconf -sconf')
- 3000 max jobs in the system (look for max_jobs in output of 'qconf -sconf')
- 48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq precise.q')
- 30 days max runtime for the long queue (look for h_rt in output of 'qconf -sq precise-long.q')
- 15min max runtime in test.q
- 4GB default mem_free request per job
Technical details
- 19 new machines, AMD Magny Cours 24 cores each, 96GB RAM
- 1 new machine, AMD Magny Cours 24 cores, 192GB RAM
- ~450GB local scratch on each
- ~7TB in /mnt/glusterfs shared across all barley and corn systems
- Grid Engine v6.2u5 (via standard Debian package)
- 10GbE interconnect (Juniper QFX3500 switch)
how to use the barley machines
To start using these new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.
Initial issues:
- You are limited in space to your AFS homedir ($HOME) and local scratch disk on each node ($TMPDIR)
- The execution hosts don't accept interactive jobs, only batch jobs for now.
- You'll want to make sure you have your Kerberos TGT and your AFS token.
If you want to use the newer bigger storage:
- log into senpai1: "ssh sunetid@<host>.stanford.edu"
- cd to /mnt/glusterfs/<your username> (or wait 5mins if it doesn't exist yet)
- write a job script: "$EDITOR test_job.script"
- see 'man qsub' for more info
- use env var $TMPDIR for local scratch
- use /mnt/glusterfs/<your username> for shared data directory
- submit the job for processing: "qsub -cwd test_job.script"
- monitor the jobs with "qstat -f -j JOBID"
- see 'man qstat' for more info
- check the output files that you specified in your job script (the input and output files must be in /mnt/glusterfs/)
Any questions, please email 'farmshare-discuss@lists.stanford.edu' Some good intro usage examples here: http://gridscheduler.sourceforge.net/howto/basic_usage.html
Examples of using the barley cluster
- Introductory examples: Examples Ready to Make
- R
- MATLAB
- Access Mysql from Matlab
- Rmpi
- Gaussian
- Gaussview: Automated Submission Script Creation & Submission
Please note that we provide suppport on the installation and availability of software packages, but we generally don't provide support on the usage of the software. If you need help on usage, please e-mail the FarmShare user community or submit a support ticket with the appropriate vendor. Alternatively, the SSDS campus group can provide guidance for R, SAS and Stata. Please contact them directly if you have any questions about using those particular software packages.
stock software
The FarmShare machines are running Ubuntu, and the software is from the Ubuntu repositories, e.g. run dpkg -l | grep ^i to see the list of installed packages.
If the package you're looking for isn't installed, search the Ubuntu Packages page and submit a HelpSU with the package name(s) you want.
licensed software
As of April 2012, we're transitioning the location of licensed software into /mnt/glusterfs/software/ and using modules. As of July 2, 2012, the software that is available is:
# module avail ------------------------ /mnt/glusterfs/software/free/modules/tcl/modulefiles ------------------------- CPLEX_Studio-12.4 MATLAB-R2010b NAG-gfortran-23 StataMP-12.1 GAMS-23.8.1-AFS MATLAB-R2011b R-2.15.0 StataSE-12.1 GAMS-23.8.2-GlusterFS MATLAB-R2012a R-2.15.1-notyet IMSL NAG-C-23 StatTransfer-11.2
Licensed software that hasn't been transitioned yet is still available in /usr/sweet/bin. Older versions of some of the same programs as above may also be available here.
# ls /usr/sweet/bin/ MathKernel@ cplexamp@ g03@ hlm2@ lmutil@ mex@ sicstus@ stata@ Mathematica@ cplexconvert@ g09@ hlm3@ maple@ mint@ spdet@ stata-se@ Splus@ dbmscopy@ gams@ hmlm@ math@ rats@ spld@ tracker@ ampl@ dbmsnox@ gamsbatch@ hmlm2@ mathematica@ ratsgraph@ splfr@ xdisplay@ anshelp@ eqs@ gamslib@ launcher@ matlab@ rcomp@ splm@ xmaple@ ansys@ f95@ gview@ limdep@ mbuild@ rgf2pst@ splus@ xstata@ cplex@ f95mcheck@ hcm2@ lmgrd@ mcc@ sas@ spxref@ xstata-se@
Monitoring / Status
- Current status of farmshare machines: http://barley-monitor.stanford.edu/ganglia/
- More detailed graphs: http://barley-monitor.stanford.edu/munin/
- File a help ticket for farmshare problems: http://helpsu.stanford.edu/?pcat=farmshare
For important announcements, we plan to:
- add it to this wiki
- modify /etc/motd on the corn machines
- send a mail to farmshare-announce
Mailing Lists
We have mailing lists, @lists.stanford.edu - https://itservices.stanford.edu/service/mailinglists/tools
- farmshare-announce - announcements list - Archives
- farmshare-discuss - user discussion - Archives
Links
Want to learn HPC? Free education materials available:
GPUs! We don't have any GPUs as part of FarmShare, but there are other campus resources available:
- http://icme.stanford.edu/Computer%20Resources/gpu.php
- http://classx.stanford.edu/ClassX/system/users/web/pg/view_subject.php?subject=NVIDIA_ICME_SPRING_2010_2011
- Engineering / Computer Science computer labs: myth20 through myth32 have nVidia GPU modules for development: http://cs.stanford.edu/computing-guide/overview/computer-systems/myth
Other similar wikis/clusters on campus (you might not have access to these):
- Engineering / Computer Science computer labs (myth.stanford.edu), open to all fully sponsored SunetIDs: http://cs.stanford.edu/computing-guide/overview/computer-systems/myth
- Statistics cluster: https://www.stanford.edu/dept/statistics/cgi-bin/projects/stat-sysadminwiki/index.php/Cluster_Help
- Genetics cluster: https://www.stanford.edu/group/scgpm/cgi-bin/informatics/wiki/index.php/Main_Page
- SU-HPC group: https://www.stanford.edu/group/su-hpc/cgi-bin/mediawiki/index.php/Special:Recentchanges
- HPCC wiki: https://www.stanford.edu/group/hpcc/cgi-bin/mediawiki/index.php/Main_Page
- Proclus (H&S cluster): https://www.stanford.edu/group/proclus/cgi-bin/mediawiki/index.php/Main_Page
Vision
The Farmshare resources are being made available to students, faculty and staff with fully sponsored SunetIDs to facilitate research at Stanford University. This resource is designed so that those doing research will have a place to experiment and learn about technical solutions to assist in reaching their research goals without needing to write a grant for a cluster. The Farmshare resources are focused on making it easier to learn how to parallelize research computing tasks and use research software including a "scheduler" or "distributed resource management system" to submit compute jobs.
By using Farmshare, new researchers can more easily adapt to using larger clusters when they have big projects that involve using federally funded resources, shared Stanford clusters, or on a small grant funded cluster.