Main Page

From FarmShare

(Difference between revisions)

Revision as of 13:25, 29 June 2012

This wiki is intended for the users of FarmShare, the Stanford shared research computing environment: the "cardinal", "corn", and "barley" machines. For a general description of this service, and Stanford's shared computing policies, see the main service catalog page.

Most useful pages: Special:AllPages and Special:RecentChanges and User Guide and FAQ and FarmShare tutorial

Last 10 messages on Farmshare-Discuss mail list (this month)

1 How to connect
2 cardinal info
3 corn info
4 barley info
5 Monitoring / Status
6 Mailing Lists
7 Links
8 Vision

How to connect

The machines are available for anyone with a SUNetID. Simply "ssh corn.stanford.edu" with your SUNetID credentials. The DNS name "corn.stanford.edu" actually goes to a load balancer and it will connect you to a particular corn machine (e.g. corn21) that has relatively low load.

The "barley" machines are designed to be used for high performance computing (HPC) and only accessible via a resource manager (currently Open Grid Engine). You cannot log in directly, but you can submit jobs from any corn. Storage dedicated for jobs running on the barley cluster is available via /mnt/glusterfs on all corn and barley nodes. Login to senpai1.stanford.edu and a directory will be created for you as /mnt/glusterfs/<your user name> (can take up to 5 minutes). Sign up and email the farmshare-discuss mailing list if you have any questions or would like any info not listed here.

cardinal info

The "cardinal" machines are small VMs intended for long-running processes (on the order of days) that are not resource intensive, e.g. mail/chat clients. You could log in to a cardinal and run a screen/tmux session there to do things on other machines.

Simply "ssh cardinal.stanford.edu" with your SUNetID credentials.

There are currently 3 cardinal machines: cardinal1, cardinal2 and cardinal3, load-balanced via cardinal.stanford.edu.

corn info

The "corn" machines are general-purpose Ubuntu boxes and you can run whatever you want on them (so long as you don't negatively impact other users). Please read the policies and the motd first.

Policies: http://itservices.stanford.edu/service/sharedcomputing/policies
IT services page: https://itservices.stanford.edu/service/sharedcomputing
VNC help: https://itservices.stanford.edu/service/sharedcomputing/vnc
Q? File HelpSU: http://helpsu.stanford.edu/?pcat=farmshare
Future vision as of summer 2010: http://itservices.stanford.edu/strategy/sysadmin/timeshare

Each of the 30 corn machines has 8 cores, 32GB RAM and ~70GB of local disk in /tmp.

barley info

The "barley" machines are general-purpose newer Ubuntu boxes that can run jobs that you submit via the resource manager software. You should not log in to any barley directly, but can do so to troubleshoot your jobs.

current barley policies

1000 max jobs per user (look for max_u_jobs in output of 'qconf -sconf')
3000 max jobs in the system (look for max_jobs in output of 'qconf -sconf')
48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq precisw.q')
one week max runtime for the long queue (look for h_rt in output of 'qconf -sq precise-long.q')
15min max runtime in test.q
4GB default mem_free request per job

Technical details

19 new machines, AMD Magny Cours 24 cores each, 96GB RAM
1 new machine, AMD Magny Cours 24 cores, 192GB RAM
~450GB local scratch on each
~7TB in /mnt/glusterfs shared across all barley and corn systems
Grid Engine v6.2u5 (via standard Debian package)
10GbE interconnect (Juniper QFX3500 switch)

how to use the barley machines

To start using these new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.

Initial issues:

You are limited in space to your AFS homedir ($HOME) and local scratch disk on each node ($TMPDIR)
The execution hosts don't accept interactive jobs, only batch jobs for now.
You'll want to make sure you have your Kerberos TGT and your AFS token.

If you want to use the newer bigger storage:

log into senpai1: "ssh sunetid@<host>.stanford.edu"
cd to /mnt/glusterfs/<your username> (or wait 5mins if it doesn't exist yet)
write a job script: "$EDITOR test_job.script"
1. see 'man qsub' for more info
2. use env var $TMPDIR for local scratch
3. use /mnt/glusterfs/<your username> for shared data directory
submit the job for processing: "qsub -cwd test_job.script"
monitor the jobs with "qstat -f -j JOBID"
1. see 'man qstat' for more info
check the output files that you specified in your job script (the input and output files must be in /mnt/glusterfs/)

Any questions, please email 'farmshare-discuss@lists.stanford.edu' Some good intro usage examples here: http://gridscheduler.sourceforge.net/howto/basic_usage.html

Examples of using the barley cluster

Introductory examples: Examples Ready to Make
R
MATLAB
Access Mysql from Matlab
Rmpi
Gaussian
Gaussview: Automated Submission Script Creation & Submission

barley software

stock software

The barley machines are running Ubuntu 11.04, and the software is from the Ubuntu repositories, e.g. do 'dpkg -l' to see the list of installed packages.

R
OpenMPI
AFS / Kerberos
GridEngine

licensed software

/usr/sweet/bin - MATLAB, SAS, Stata, StataMP, etc

There is a group on campus called "SSDS" that can provide support for R, SAS and stata: http://www.stanford.edu/group/ssds/cgi-bin/drupal/content/who-we-are-what-we-do

Monitoring / Status

Current status of farmshare machines: http://barley-monitor.stanford.edu/ganglia/
More detailed graphs: http://barley-monitor.stanford.edu/munin/
File a help ticket for farmshare problems: http://helpsu.stanford.edu/?pcat=farmshare

For important announcements, we plan to:

add it to this wiki
modify /etc/motd on the corn machines
send a mail to farmshare-announce

Mailing Lists

We have mailing lists, @lists.stanford.edu - https://itservices.stanford.edu/service/mailinglists/tools

farmshare-announce - announcements list - Archives
farmshare-discuss - user discussion - Archives

Vision

The Farmshare resources are being made available to students, faculty and staff with fully sponsored SunetIDs to facilitate research at Stanford University. This resource is designed so that those doing research will have a place to experiment and learn about technical solutions to assist in reaching their research goals without needing to write a grant for a cluster. The Farmshare resources are focused on making it easier to learn how to parallelize research computing tasks and use research software including a "scheduler" or "distributed resource management system" to submit compute jobs.

By using Farmshare, new researchers can more easily adapt to using larger clusters when they have big projects that involve using federally funded resources, shared Stanford clusters, or on a small grant funded cluster.

Retrieved from "https://web.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Main_Page"

@@ Line 47: / Line 47: @@
 *1000 max jobs per user (look for max_u_jobs in output of 'qconf -sconf')
 *3000 max jobs in the system (look for max_jobs in output of 'qconf -sconf')
-*48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq main.q')
+*48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq precisw.q')
-*one week max runtime for the long queue (look for h_rt in output of 'qconf -sq long.q')
+*one week max runtime for the long queue (look for h_rt in output of 'qconf -sq precise-long.q')
 *15min max runtime in test.q
 *4GB default mem_free request per job

FarmShare

Main Page

From FarmShare

Revision as of 13:25, 29 June 2012

Contents

How to connect

cardinal info

corn info

barley info

current barley policies

Technical details

how to use the barley machines

Examples of using the barley cluster

barley software

stock software

licensed software

Monitoring / Status

Mailing Lists

Links

Vision

Views

Personal tools

search this wiki

Navigation

Search

Toolbox

LANGUAGES

Toolbox