User Guide

From FarmShare

(Difference between revisions)

Revision as of 16:05, 12 April 2018

Contact SRCC staff for support at: srcc-support@stanford.edu, or post questions and concerns to the community discussion list at: farmshare-discuss@lists.stanford.edu.

1 Connecting
2 Storage
- 2.1 Home
- 2.2 AFS
- 2.3 Scratch
- 2.4 Temp
3 File Transfer
- 3.1 Using SSH
- 3.2 Using AFS
4 Installed Software
5 Running Jobs

Connecting

Log into rice.stanford.edu. Authentication is by SUNet ID and password (or GSSAPI), and two-step authentication is required. A suggested configuration for OpenSSH and recommendations for two popular SSH clients for Windows can be found in Advanced Connection Options.

Storage

FarmShare is not approved for use with high-risk data, including protected health information and personally identifiable information.

Home

Home directories are served (via NFS 4) from a dedicated file server, and per-user quota is currently 48 GB. Users may exceed this soft limit for up to 7 days, up to a hard limit of 64 GB.

AFS

AFS is accessible from rice systems only. A link to each user's AFS home directory, ~/afs-home, is provided as a convenience, but should only be used to access files in the legacy environment, and for transferring data. It should not be used as a working directory when submitting batch jobs, as AFS is not accessible from compute nodes. Please note that a valid Kerberos ticket and an AFS token are required to access locations in AFS; run kinit && aklog to re-authenticate if you have trouble accessing any AFS directory.

The default, per-user quota for AFS home directories is 5 GB, but you may have additional quota due to your enrollment in certain courses, and you can request additional quota (up to 20 GB total) with faculty sponsorship. AFS is backed up every night, and backups are kept for 30 days. The most recent snapshot of your AFS home directory is available in the .backup subdirectory, and you can request recovery from older backups by submitting a HelpSU ticket.

Scratch

Scratch storage is available in /farmshare/user_data, and each user is provided with a personal scratch directory, /farmshare/user_data/$USER. The total volume size is currently 126 TB; quotas are not currently enforced, but old files may be purged without warning. The scratch volume is not backed up, and is not suitable for long-term storage, but can be used as working storage for batch jobs, and as a short-term staging area for data waiting to be archived to permanent storage.

Temp

Local /tmp storage is available on most nodes, but size varies from node to node. On rice systems, /tmp is 512 GB, with a per-user quota of 128 GB. Users may exceed this soft limit for up to 7 days, up to a hard limit of 192 GB, and space is regularly reclaimed from files older than 7 days.

File Transfer

Using SSH

FarmShare supports any file-transfer method using SSH as a transport, including standard tools like scp, sftp, and rsync on Linux and macOS systems, and SFTP clients like Fetch for macOS and SecureFX for Windows. Because 2-step authentication is required you may need enable keep-alive in your preferred SFTP client to avoid repeated authentications. For Fetch, in the Preferences dialog, select General → FTP compatibility → Keep connections alive; for SecureFX, in Global Options, select File Transfer → Options → Advanced → Options → Keep connections alive.

You can also use FUSE and SSHFS to mount your FarmShare home and scratch directories. Most Linux distributions provide a standard sshfs package. On macOS you can use Homebrew to install the osxfuse and sshfs packages, or download FUSE and SSHFS installers from the FUSE for macOS project. Support for this option on Windows typically requires commercial software (like ExpanDrive).

Using AFS

You can use the native OpenAFS client to access files in AFS, including your AFS home directory. Most Linux distributions provide standard openafs packages. The University provides installers for the macOS and Windows clients.

You can also use WebAFS to transfer files between your computer and locations in AFS using a web browser.

Installed Software

FarmShare systems run Ubuntu 16.04 LTS, and most software is sourced from standard repositories. Additional software, including licensed software, is organized using environment modules and can be accessed using the module command. Users can build and/or install their own software in their home directories, either manually, or using a local package manager. FarmShare supports running software packaged as Singularity containers.

Running Jobs

FarmShare uses Slurm for job management. Full documentation is available from the vendor, and detailed usage information is provided in the man pages for the srun, sbatch, squeue, scancel, sinfo, and scontrol commands.

Jobs are scheduled according to a priority which depends on a number of factors, including how long a job has been waiting, its size, and a fair-share value that tracks recent per-user utilization of cluster resources. Lower-priority jobs, and jobs requiring access to resources not currently available, may wait some time before starting to run. The scheduler may reserve resources so that pending jobs can start; while it will try to backfill these resources with smaller, shorter jobs (even those at lower priorities), this behavior can sometimes cause nodes to appear to be idle even when there are jobs that are ready to run. You can use squeue --start to get an estimate of when pending jobs will start.

Interactive Jobs

Interactive sessions that require resources in excess of limits on the login nodes, exclusive access to resources, or access to a feature not available on the login nodes (e.g., a GPU), can be submitted to a compute node.

srun --pty --qos=interactive $SHELL -l

Interactive jobs receive a modest priority boost compared to batch jobs, but when contention for resources is high interactive jobs may wait a long time before starting. Each user is allowed one interactive job, which may run for at most one day.

Batch Jobs

The sbatch command is used to submit a batch job, and takes a batch script as an argument. Options are used to request specific resources (including runtime), and can be provided either on the command line or, using a special syntax, in the script file itself. sbatch can also be used to submit many similar jobs, each perhaps varying in only one or two parameters, in a single invocation using the --array option; each job in an array has access to environment variables identifying its rank.

MPI jobs

OpenMPI is installed, both as a package, and (in a more recent version) as a module (openmpi). Intel MPI is also installed, as part of the Intel Parallel Studio module (intel). Because security concerns restrict allowed authentication methods, SSH cannot be used to launch MPI tasks; use srun instead.

Default Allocations

Default allocations vary by partition and quality-of-service, but in general a job will have access to 1 physical core (2 threads) and 8 GB of memory, and may run for up to 2 hours by default; interactive jobs may run for up to 1 hour by default. The default allocation on the bigmem partition is 1 core (2 threads) and 48 GB of memory.

If your job needs more resources than are provided by default, or access to a special feature (like large memory or a GPU), you must run on the appropriate partition (or quality-of-service) and request those resources explicitly. Common sbatch options include --partition, --qos, --cpus-per-task, --mem, --mem-per-cpu, --gres, and --time.

Limits

Maximum runtime is 2 days unless jobs are scheduled using the long quality-of-service, which has a 7-day maximum runtime; interactive jobs have a maximum runtime of 1 day.

The gpu quality-of-service has a minimum GPU requirement (1), so you must request access to a GPU explicitly when submitting a job.

sbatch --partition=gpu --qos=gpu --gres=gpu:1

The bigmem quality-of-service has a minimum memory requirement; you must request at least 96GB when submitting a job.

sbatch --partition=bigmem --qos=bigmem --mem=96G

Monitoring your Jobs

You can use the squeue and sacct commands to monitor the current state of the scheduler and of your jobs. The sprio command can provide some information on how priority was determined for particular jobs, and the sshare command on how current fair-share was calculated. Use the scontrol and sacctmgr commands to examine the configuration of hosts, partitions, and qualities-of-service.

Retrieved from "https://web.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/User_Guide"

@@ Line 1: / Line 1: @@
-If you have any questions, file [https://remedyweb.stanford.edu/helpsu/helpsu?pcat=farmshare HelpSU] or ask on [https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss farmshare-discuss@lists.stanford.edu] The modification date of this page is in the footer below.
+Contact SRCC staff for support at: [mailto:srcc-support@stanford.edu srcc-support@stanford.edu], or post questions and concerns to the community [https://mailman.stanford.edu/mailman/listinfo/farmshare-discuss discussion list] at: [mailto:farmshare-discuss@lists.stanford.edu farmshare-discuss@lists.stanford.edu].
-= Connecting  =
+== Connecting  ==
-*The public-facing hostname is corn.stanford.edu
+Log into <code>rice.stanford.edu</code>. Authentication is by SUNet ID and password (or GSSAPI), and [https://uit.stanford.edu/service/webauth/twostep two-step] authentication is required. A suggested configuration for OpenSSH and recommendations for two popular SSH clients for Windows can be found in [[Advanced Connection Options]].
-*Only SSH connections are allowed. This also includes SFTP.
-**Only SSH protocol v2 is supported.
-**SSH fingerprint for corn is
-b:e7:b4:95:03:c1:1e:07:df:04:ca:a2:3d:8e:e3:37
+== Storage ==
-*If you're behind a firewall, you may want to add "ServerKeepAliveInterval 60" or "ServerAliveInterval 60" to your SSH client's configuration.
+FarmShare is ''not'' approved for use with [https://dataclass.stanford.edu/ high-risk] data, including protected health information and personally identifiable information.
-== Connecting from Windows  ==
+=== Home ===
-You will want to use an SSH client like one of these:
+Home directories are served (via NFS 4) from a dedicated file server, and per-user quota is currently 48 GB. Users may exceed this soft limit for up to 7 days, up to a hard limit of 64 GB.
-*PuTTY: http://www.chiark.greenend.org.uk/~sgtatham/putty/
+=== AFS ===
-*SecureCRT: https://itservices.stanford.edu/service/ess/pc
-== Connecting from OS X / Linux / other OS  ==
+[https://uit.stanford.edu/service/afs AFS] is accessible from <code>rice</code> systems ''only''. A link to each user's AFS home directory, <code>~/afs-home</code>, is provided as a convenience, but should only be used to access files in the legacy environment, and for transferring data. It should ''not'' be used as a working directory when submitting batch jobs, as AFS is not accessible from compute nodes. Please note that a valid Kerberos ticket and an [[AFS#Authentication|AFS token]] are required to access locations in AFS; run <code>kinit && aklog</code> to re-authenticate if you have trouble accessing any AFS directory.
-You should probably just use the included SSH client. Stanford does provide an SSH GUI for OS X to help you track connection settings: http://itservices.stanford.edu/service/ess/mac/lelandssh
+The default, per-user quota for AFS home directories is 5 GB, but you may have additional quota due to your enrollment in certain courses, and you can [https://tools.stanford.edu/cgi-bin/afs-request request] additional quota (up to 20 GB total) with faculty sponsorship. AFS is backed up every night, and backups are kept for 30 days. The most recent snapshot of your AFS home directory is available in the <code>.backup</code> subdirectory, and you can request recovery from older backups by submitting a [https://helpsu.stanford.edu HelpSU] ticket.
-= Logging In  =
+=== Scratch ===
-You can log in via SSH using your [http://sunetid.stanford.edu/ SUNet ID] credentials.
+Scratch storage is available in <code>/farmshare/user_data</code>, and each user is provided with a personal scratch directory, <code>/farmshare/user_data/$USER</code>. The total volume size is currently 126 TB; quotas are not currently enforced, but old files may be purged without warning. The scratch volume is ''not'' backed up, and is ''not'' suitable for long-term storage, but can be used as working storage for batch jobs, and as a short-term staging area for data waiting to be archived to permanent storage.
-= Moving files to/from the cluster  =
+=== Temp ===
-Since you can connect via SSH, you can use the rsync or sftp or scp commands to upload/download files to farmshare. On Windows, you can use software like [http://filezilla-project.org/download.php FileZilla] or [http://winscp.net/ WinSCP]. On OS X users can use [http://cyberduck.ch/ Cyberduck] or [https://itservices.stanford.edu/service/afs/file-transfer/macintosh#macsftppublish Fetch], if you want a GUI. Linux and other unix like OS just use the included rsync, sftp or scp commands.
+Local <code>/tmp</code> storage is available on most nodes, but size varies from node to node. On <code>rice</code> systems, <code>/tmp</code> is 512 GB, with a per-user quota of 128 GB. Users may exceed this soft limit for up to 7 days, up to a hard limit of 192 GB, and space is regularly reclaimed from files older than 7 days.
-= Directory paths  =
+== File Transfer  ==
-Your [[AFS]] home directory is something like '/afs/ir/users/c/h/chekh/'. You'll need an [[AFS]] token to access this directory.
+=== Using SSH ===
-A shared directory is /farmshare/user_data, and you get a /farmshare/user_data/username directory there, simply by logging in to any corn (a script will run and notice your login and create the directory).
+FarmShare supports any file-transfer method using SSH as a transport, including standard tools like <code>scp</code>, <code>sftp</code>, and <code>rsync</code> on Linux and macOS systems, and SFTP clients like [https://uit.stanford.edu/software/fetch Fetch] for macOS and [https://uit.stanford.edu/software/scrt_sfx SecureFX] for Windows. Because 2-step authentication is required you may need enable keep-alive in your preferred SFTP client to avoid repeated authentications. For Fetch, in the Preferences dialog, select <code>General</code> → <code>FTP compatibility</code> → <code>Keep connections alive</code>; for SecureFX, in Global Options, select <code>File Transfer</code> → <code>Options</code> → <code>Advanced</code> → <code>Options</code> → <code>Keep connections alive</code>.
-There is also local scratch storage available on the compute nodes. The amount varies from ~70GB to ~500GB, depending on the node hardware. SGE will set the env vars $TMPDIR and $TMP to point to a directory like /tmp/$JOBID. Depending on your workload, it may be a good idea to copy input or reference data to the local scratch space on the node, and then copy the results back to your homedir.
+You can also use FUSE and [[SSHFS]] to mount your FarmShare home and scratch directories. Most Linux distributions provide a standard <code>sshfs</code> package. On macOS you can use [https://brew.sh Homebrew] to install the <code>osxfuse</code> and <code>sshfs</code> packages, or download FUSE and SSHFS installers from the [https://osxfuse.github.io FUSE for macOS] project. Support for this option on Windows typically requires commercial software (like [https://www.expandrive.com ExpanDrive]).
-= Mounting your files elsewhere  =
+=== Using AFS ===
-Your AFS files can be accessed globally (literally), you just need the OpenAFS software installed. More info here: https://itservices.stanford.edu/service/afs/intro/mounting
+You can use the native OpenAFS client to access files in AFS, including your AFS home directory. Most Linux distributions provide standard <code>openafs</code> packages. The University provides [https://uit.stanford.edu/software/afs installers] for the macOS and Windows clients.
-You can make your /farmshare/user_data/ directory accessible directly from your workstation. Again, access is only allowed over SSH, so you can use something like [http://en.wikipedia.org/wiki/SSHFS SSHFS] (via FUSE). One caveat is that SSHFS doesn't work with concurrent/parallel access, so this solution is only appropriate if you're not accessing files from several places at once. E.g. don't have cluster jobs write some files while you access the same files via SSHFS.
+You can also use [https://afs.stanford.edu WebAFS] to transfer files between your computer and locations in AFS using a web browser.
-Windows: you can try [https://secure.expandrive.com/store ExpanDrive (used to be SFTPDrive), $39], or [http://www.webdrive.com/products/webdrive/index.html WebDrive, $60], or [http://dokan-dev.net/en/download/ Docan] ([http://www.gnu.org/philosophy/free-sw.html Free Software]).
+== Installed Software ==
-OS X: try [http://osxfuse.github.com/ OSXFUSE], or ExpanDrive (above)
+FarmShare systems run Ubuntu 16.04 LTS, and most software is sourced from standard repositories. Additional software, including licensed software, is organized using environment modules and can be accessed using the <code>module</code> command. Users can build and/or install their own software in their home directories, either manually, or using a local package manager. FarmShare supports running software packaged as Singularity containers.
-Linux: you can use sshfs, e.g. on Debian (and derivatives):
+== Running Jobs  ==
-*Install: apt-get install sshfs
+FarmShare uses [https://slurm.schedmd.com Slurm] for job management. Full [https://slurm.schedmd.com/documentation.html documentation] is available from the vendor, and detailed usage information is provided in the <code>man</code> pages for the <code>srun</code>, <code>sbatch</code>, <code>squeue</code>, <code>scancel</code>, <code>sinfo</code>, and <code>scontrol</code> commands.
-*Mount: sshfs host:/mount/point /mount/point
-*Unmount: fusermount -u /mount/point
-We also installed sshfs on the FarmShare machines, so you can mount files from your machines (which are accessible via SSH).
+Jobs are scheduled according to a priority which depends on a number of factors, including how long a job has been waiting, its size, and a fair-share value that tracks recent per-user utilization of cluster resources. Lower-priority jobs, and jobs requiring access to resources not currently available, may wait some time before starting to run. The scheduler may reserve resources so that pending jobs can start; while it will try to backfill these resources with smaller, shorter jobs (even those at lower priorities), this behavior can sometimes cause nodes to appear to be idle even when there are jobs that are ready to run. You can use <code>squeue --start</code> to get an ''estimate'' of when pending jobs will start.
-= Data Limits  =
+=== Interactive Jobs ===
-Your AFS homedir is limited to 5GB of quota. You can request more space (temporarily) here: https://itservices.stanford.edu/service/storage/getmore
+Interactive sessions that require resources in excess of limits on the login nodes, exclusive access to resources, or access to a feature not available on the login nodes (e.g., a GPU), can be submitted to a compute node.
-Use 'fs quota' to see your utilization, or run the command 'check-stanford-afs-quota'. You may want to do something like the following command, which will generate a timestamped file in your homedir with the sizes of your directories. This command may take a while to run, as it will stat every single file in your homedir.
+<source lang="sh">srun --pty --qos=interactive $SHELL -l</source>
-  du --exclude .backup -sh * | sort -h | tee ~/`pwd | tr '/' '_'`.du.`date +%Y-%m-%d`
+Interactive jobs receive a modest priority boost compared to batch jobs, but when contention for resources is high interactive jobs may wait a long time before starting. Each user is allowed one interactive job, which may run for at most one day.
-Your /farmshare/user_data directory is not currently limited by anything other than physical space.  But the filesystem tends to be close to full.  Please be considerate of other users and don't use too much space there.
+=== Batch Jobs  ===
-= Installed Software =
+The <code>sbatch</code> command is used to submit a batch job, and takes a batch script as an argument. Options are used to request specific resources (including runtime), and can be provided either on the command line or, using a special syntax, in the script file itself. <code>sbatch</code> can also be used to submit many similar jobs, each perhaps varying in only one or two parameters, in a single invocation using the <code>--array</code> option; each job in an array has access to environment variables identifying its rank.
-Most software is installed on these systems via the package manager (e.g. '''dpkg -l''').  Older licensed software is installed in AFS (typically /usr/sweet/bin).  Newer software is managed by the [[FarmShare software|module]] command.  If there's any software you'd like, just let us know, and we can probably install it.
+=== MPI jobs  ===
+[[OpenMPI]] is installed, both as a package, and (in a more recent version) as a module (<code>openmpi</code>). [https://software.intel.com/en-us/intel-mpi-library/ Intel MPI] is also installed, as part of the [https://software.intel.com/en-us/parallel-studio-xe/ Intel Parallel Studio] module (<code>intel</code>). Because security concerns restrict allowed authentication methods, SSH cannot be used to launch MPI tasks; use <code>srun</code> instead.
-= Running jobs on the cluster  =
+=== Default Allocations  ===
-We use Grid Engine (used to be Sun Grid Engine (SGE)). There are three types of jobs: interactive, batch and parallel. You can start by reading the man page for 'sge_intro'. Then the man page for 'qsub'. We currently have a limit of 3000 jobs (running and/or queued) per user. We don't currently allow interactive jobs on the barleys because you can run interactive tasks on the corns. Job scheduling uses simple fairshare (modified by resource requirements).
+Default allocations vary by partition and quality-of-service, but in general a job will have access to 1 physical core (2 threads) and 8 GB of memory, and may run for up to 2 hours by default; interactive jobs may run for up to 1 hour by default. The default allocation on the <code>bigmem</code> partition is 1 core (2 threads) and 48 GB of memory.
-Make sure you have your kerberos credentials before submitting jobs or else they will not be able to access your files in [[AFS]].
+If your job needs more resources than are provided by default, or access to a special feature (like large memory or a GPU), you ''must'' run on the appropriate partition (or quality-of-service) ''and'' request those resources explicitly. Common <code>sbatch</code> options include <code>--partition</code>, <code>--qos</code>, <code>--cpus-per-task</code>, <code>--mem</code>, <code>--mem-per-cpu</code>, <code>--gres</code>, and <code>--time</code>.
-== Running batch jobs  ==
+=== Limits ===
-Use 'qsub'. This will allocate one slot on the cluster. See the bottom of the qsub man page for an example. Google 'SGE qsub' for more help.
+Maximum runtime is 2 days unless jobs are scheduled using the <code>long</code> quality-of-service, which has a 7-day maximum runtime; interactive jobs have a maximum runtime of 1 day.
-Check how much memory your job uses. You can run just one job and see its peak memory usage after it's done. The standard barley node is 24 cores and 96GB RAM, so you shouldn't use more than 4GB/core. Make sure your submitted job doesn't use too much memory or it can crash the node.
+The <code>gpu</code> quality-of-service has a minimum GPU requirement (1), so you must request access to a GPU explicitly when submitting a job.
-== Running array jobs  ==
+<source lang="sh">sbatch --partition=gpu --qos=gpu --gres=gpu:1</source>
-For jobs that vary only by one parameter, it is easier to submit an "array" job to reduce the amount of output in qstat. If you want to be a good citizen and you're submitting an array job with thousands of tasks, you may want to limit how many tasks you run simultaneously, using the -tc parameter to qsub.
+The <code>bigmem</code> quality-of-service has a minimum memory requirement; you must request at least 96GB when submitting a job.
-== Running parallel jobs  ==
+<source lang="sh">sbatch --partition=bigmem --qos=bigmem --mem=96G</source>
-Use 'qsub' with the '-pe' parameter. Using the '-pe' parameter allows you to request more than one slot per job. We have several different "parallel environments" defined, they differ in how the slots are allocated. If you want your slots on the same node, use '-pe fah'. If you want your slots spread across nodes, use '-pe orte'. Use 'qconf -sp orte' to see the settings, and 'man sge_pe' for more info.
+=== Monitoring your Jobs ===
-== Running OpenMPI jobs  ==
+You can use the <code>squeue</code> and <code>sacct</code> commands to monitor the current state of the scheduler and of your jobs. The <code>sprio</code> command can provide some information on how priority was determined for particular jobs, and the <code>sshare</code> command on how current fair-share was calculated. Use the <code>scontrol</code> and <code>sacctmgr</code> commands to examine the configuration of hosts, partitions, and qualities-of-service.
-See [[OpenMPI]], contact farmshare-discuss with any questions.
-== job duration  ==
-There's a 48 hour limit on jobs in the regular queue, 15 min in the test queue and 7 days in long queue. You can use '-l h_rt=xx:xx:xx' to tell the scheduler how long your job will run, and your job will be killed if it hits that time limit.  Your job will make it into the long queue if you request "-l longq=1".  Your job will be killed (sent SIG_KILL) when you reach the h_rt limit that you set for yourself.
-So the longest job that you can submit currently is 7 days, use "-l h_rt=168:00:00".  But you should submit jobs less than 48hrs long, because there are many more regular job slots than long job slots.
-When jobs fail, you typically have to re-run them. So try to split them into many small chunks (but not too many).
-== checking on your jobs  ==
-Use the '''qstat''' command to check on your currently pending/running jobs. Use the '-M' flag to qsub to have the system e-mail you about your job if you want. Look through your output files for output of the job stdout and stderr streams. Use the '''qacct''' command on machine senpai2 (because that's where the accounting file lives) to see some information about jobs that already finished, e.g. '''qacct -j JOBID'''. If there is no record of the job in qacct, that means it didn't get written to the accounting file, which means it failed in an unusual way. Look at your output files to see what the error was.
-I usually look at the unfriendly output of this command:
-  qstat -f -u '*'
-You can look at some slighly more friendly job status output.  Try this script to see current memory usage per job:
-  /farmshare/user_data/chekh/qmem/qmem -u
-Or this pie chart http://www.stanford.edu/~bishopj/farmsharemem/ (give it a minute or two to self-update)

FarmShare