User Guide

From FarmShare

(Difference between revisions)

Revision as of 12:47, 2 February 2012

If you have any questions, file HelpSU or ask on farmshare-discuss@lists.stanford.edu The modification date of this page is in the footer below.

1 Connecting
- 1.1 Connecting from Windows
- 1.2 Connecting from OS X / Linux / other OS
2 Logging In
3 Moving files to/from the cluster
4 Directory paths
5 Data Limits
6 Mounting your files elsewhere
7 Running jobs on the cluster

Connecting

The public-facing hostname is corn.stanford.edu
Only SSH connections are allowed. This also includes SFTP.
- Only SSH protocol v2 is supported.
- SSH fingerprint for corn is

 0b:e7:b4:95:03:c1:1e:07:df:04:ca:a2:3d:8e:e3:37

If you're behind a firewall, you may want to add "ServerKeepAliveInterval 60" or "ServerAliveInterval 60" to your SSH client's configuration.

Connecting from Windows

You will want to use an SSH client like one of these:

PuTTY: http://www.chiark.greenend.org.uk/~sgtatham/putty/
SecureCRT: https://itservices.stanford.edu/service/ess/pc

Connecting from OS X / Linux / other OS

You should probably just use the included SSH client. Stanford does provide an SSH GUI for OS X to help you track connection settings: http://itservices.stanford.edu/service/ess/mac/lelandssh

Logging In

You can log in via SSH using your SUNet ID credentials.

Moving files to/from the cluster

Since you can connect via SSH, you can use the sftp or scp commands to upload/download files to farmshare. On Windows, you can use software like FileZilla or WinSCP. On OS X users can use Cyberduck or similar. Linux and other unix like OS just use the included sftp or scp commands.

Directory paths

Your AFS home directory is something like '/afs/ir/users/c/h/chekh/'.

A shared directory is /mnt/glusterfs, and you can get a /mnt/glusterfs/username directory there.

There is also local scratch storage available on the compute nodes. The amount varies from ~70GB to ~500GB, depending on the node hardware. SGE will set the env vars $TMPDIR and $TMP to point to a directory like /tmp/$JOBID. Depending on your workload, it may be a good idea to copy input or reference data to the local scratch space on the node, and then copy the results back to your homedir.

Data Limits

Your AFS homedir is limited to 2GB of quota. You can request more space here: https://itservices.stanford.edu/service/storage/getmore

Use 'fs quota' to see your utilization. You may want to do something like the following command, which will generate a timestamped file in your homedir with the sizes of your directories. This command may take a while to run, as it will stat every single file in your homedir.

 du --exclude .backup -sm * | sort -n | tee ~/`pwd | tr '/' '_'`.du.`date +%Y-%m-%d`

Mounting your files elsewhere

Your AFS files can be accessed globally (literally), you just need the OpenAFS software installed. More info here: https://itservices.stanford.edu/service/afs/intro/mounting

You can make your cluster home directory accessible directly from your workstation. Again, access is only allowed over SSH, so you can use something like SSHFS (via FUSE). One caveat is that SSHFS doesn't work with concurrent/parallel access, so this solution is only appropriate if you're not accessing files from several places at once. E.g. don't have cluster jobs write some files while you access the same files via SSHFS.

Windows

You can try ExpanDrive (used to be SFTPDrive), $39, or WebDrive, $60, or Docan (Free Software).

OS X

Try OSXFUSE, or ExpanDrive (above)

Linux

You can use sshfs, e.g. on Debian (and derivatives):

Install: apt-get install sshfs
Mount: sshfs host:/mount/point /mount/point
Unmount: fusermount -u /mount/point

Running jobs on the cluster

We use Open Grid Engine. There are three types of jobs: interactive, batch and parallel. You can start by reading the man page for 'sge_intro'. Then the man page for 'qsub'. We currently have a limit of 3000 jobs (running and/or queued) per user.

Running interactive jobs

Use 'qlogin' or 'qrsh'. This will allocate one slot on the cluster and request 2GB RAM by default. In general for any memory or computationally intensive one-off jobs, open an interactive session on a cluster node and run the command there. See HPC:Using qlogin

Running batch jobs

Use 'qsub'. This will allocate one slot on the cluster and request 2GB RAM by default. See the bottom of the qsub man page for an example. Google 'SGE qsub' for more help.

Check how much memory your job uses. You can try running it in an interactive session first, or run just one job and see its peak memory usage after it's done. We have different size nodes, so for a 4GB node with 4 slots, you don't want to use more than 1GB per slot. For an 8GB node with 4 slots, you don't want to use more than 2GB per slot, and so on. Make sure your submitted job doesn't use too much memory or it can crash the node. See HPC:Large memory jobs

Running parallel jobs

Use 'qsub' with the '-pe' parameter. Using the '-pe' parameter allows you to request more than one slot per job. We have several different "parallel environments" defined, they differ in how the slots are allocated. If you want your slots on the same node, use '-pe DJ'. If you want your slots spread across nodes, use '-pe make'.

Running OpenMPI jobs

See HPC:OpenMPI, contact manager@genomics.upenn.edu with any questions

job duration

We don't currently have any limit on job duration. But we recommend you try to keep your job duration to under a day or two. Jobs that run longer than that are more likely to be affected by intermittent problems.

When jobs fail, you typically have to re-run them. So try to split them into many small chunks (but not too many).

Retrieved from "https://web.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/User_Guide"

@@ Line 1: / Line 1: @@
 If you have any questions, file [https://remedyweb.stanford.edu/helpsu/helpsu?pcat=farmshare HelpSU] or ask on [https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss farmshare-discuss@lists.stanford.edu] The modification date of this page is in the footer below.
-= Connecting =
+= Connecting  =
 *The public-facing hostname is corn.stanford.edu
@@ Line 12: / Line 12: @@
 *If you're behind a firewall, you may want to add "ServerKeepAliveInterval 60" or "ServerAliveInterval 60" to your SSH client's configuration.
-== Connecting from Windows ==
+== Connecting from Windows  ==
 You will want to use an SSH client like one of these:
@@ Line 19: / Line 19: @@
 *SecureCRT: https://itservices.stanford.edu/service/ess/pc
-== Connecting from OS X / Linux / other OS ==
+== Connecting from OS X / Linux / other OS  ==
 You should probably just use the included SSH client. Stanford does provide an SSH GUI for OS X to help you track connection settings: http://itservices.stanford.edu/service/ess/mac/lelandssh
-= Logging In =
+= Logging In  =
 You can log in via SSH using your [http://sunetid.stanford.edu/ SUNet ID] credentials.
-= Moving files to/from the cluster =
+= Moving files to/from the cluster  =
 Since you can connect via SSH, you can use the sftp or scp commands to upload/download files to farmshare. On Windows, you can use software like [http://filezilla-project.org/download.php FileZilla] or [http://winscp.net/ WinSCP]. On OS X users can use [http://cyberduck.ch/ Cyberduck] or similar. Linux and other unix like OS just use the included sftp or scp commands.
-= Directory paths =
+= Directory paths  =
 Your AFS home directory is something like '/afs/ir/users/c/h/chekh/'.
@@ Line 39: / Line 39: @@
 There is also local scratch storage available on the compute nodes. The amount varies from ~70GB to ~500GB, depending on the node hardware. SGE will set the env vars $TMPDIR and $TMP to point to a directory like /tmp/$JOBID. Depending on your workload, it may be a good idea to copy input or reference data to the local scratch space on the node, and then copy the results back to your homedir.
-= Data Limits =
+= Data Limits  =
 Your AFS homedir is limited to 2GB of quota. You can request more space here: https://itservices.stanford.edu/service/storage/getmore
@@ Line 45: / Line 45: @@
 Use 'fs quota' to see your utilization. You may want to do something like the following command, which will generate a timestamped file in your homedir with the sizes of your directories. This command may take a while to run, as it will stat every single file in your homedir.
-   du -sm * | sort -n | tee ~/`pwd | tr '/' '_'`.du.`date +%Y-%m-%d`
+   du --exclude .backup -sm * | sort -n | tee ~/`pwd | tr '/' '_'`.du.`date +%Y-%m-%d`
-= Mounting your files elsewhere =
+= Mounting your files elsewhere  =
 Your AFS files can be accessed globally (literally), you just need the OpenAFS software installed. More info here: https://itservices.stanford.edu/service/afs/intro/mounting
@@ Line 53: / Line 53: @@
 You can make your cluster home directory accessible directly from your workstation. Again, access is only allowed over SSH, so you can use something like [http://en.wikipedia.org/wiki/SSHFS SSHFS] (via FUSE). One caveat is that SSHFS doesn't work with concurrent/parallel access, so this solution is only appropriate if you're not accessing files from several places at once. E.g. don't have cluster jobs write some files while you access the same files via SSHFS.
-== Windows ==
+== Windows  ==
 You can try [https://secure.expandrive.com/store ExpanDrive (used to be SFTPDrive), $39], or [http://www.webdrive.com/products/webdrive/index.html WebDrive, $60], or [http://dokan-dev.net/en/download/ Docan] ([http://www.gnu.org/philosophy/free-sw.html Free Software]).
-== OS X ==
+== OS X  ==
 Try [http://osxfuse.github.com/ OSXFUSE], or ExpanDrive (above)
-== Linux ==
+== Linux  ==
 You can use sshfs, e.g. on Debian (and derivatives):
@@ Line 69: / Line 69: @@
 *Unmount: fusermount -u /mount/point
-= Running jobs on the cluster =
+= Running jobs on the cluster  =
 We use Open Grid Engine. There are three types of jobs: interactive, batch and parallel. You can start by reading the man page for 'sge_intro'. Then the man page for 'qsub'. We currently have a limit of 3000 jobs (running and/or queued) per user.
-== Running interactive jobs ==
+== Running interactive jobs  ==
 Use 'qlogin' or 'qrsh'. This will allocate one slot on the cluster and request 2GB RAM by default. In general for any memory or computationally intensive one-off jobs, open an interactive session on a cluster node and run the command there. See [[HPC:Using qlogin]]
-== Running batch jobs ==
+== Running batch jobs  ==
 Use 'qsub'. This will allocate one slot on the cluster and request 2GB RAM by default. See the bottom of the qsub man page for an example. Google 'SGE qsub' for more help.
@@ Line 83: / Line 83: @@
 Check how much memory your job uses. You can try running it in an interactive session first, or run just one job and see its peak memory usage after it's done. We have different size nodes, so for a 4GB node with 4 slots, you don't want to use more than 1GB per slot. For an 8GB node with 4 slots, you don't want to use more than 2GB per slot, and so on. Make sure your submitted job doesn't use too much memory or it can crash the node. See [[HPC:Large memory jobs]]
-== Running parallel jobs ==
+== Running parallel jobs  ==
 Use 'qsub' with the '-pe' parameter. Using the '-pe' parameter allows you to request more than one slot per job. We have several different "parallel environments" defined, they differ in how the slots are allocated. If you want your slots on the same node, use '-pe DJ'. If you want your slots spread across nodes, use '-pe make'.
-== Running OpenMPI jobs ==
+== Running OpenMPI jobs  ==
 See [[HPC:OpenMPI]], contact manager@genomics.upenn.edu with any questions
-== job duration ==
+== job duration  ==
 We don't currently have any limit on job duration. But we recommend you try to keep your job duration to under a day or two. Jobs that run longer than that are more likely to be affected by intermittent problems.
 When jobs fail, you typically have to re-run them. So try to split them into many small chunks (but not too many).

FarmShare