FAQ

From FarmShare

(Difference between revisions)
Jump to: navigation, search
(How do I change my shell on a cardinal or a corn)
(Why can't I submit jobs to the bigmem partition?)
 
(55 intermediate revisions not shown)
Line 1: Line 1:
-
=== How do I change my shell on a cardinal or a corn ===
+
== Policy ==
-
Up until October 2012 we were saying:
+
=== Can I use FarmShare with high-risk data? ===
-
Currently, everyone gets tcsh as default shell. If you want to use e.g. bash instead, put a code snippet like this into your .cshrc:
+
-
<pre>if ($?prompt &amp;&amp; -x /bin/bash) then
+
-
setenv SHELL bash
+
-
exec bash
+
-
endif
+
-
</pre>
+
-
As of October 18, 2012, we can change your shell University-wide, just e-mail research-computing-support@stanford.edu with your request.  This will change your shell on all systems that pull data from the central directory service (e.g. FarmShare, CS Myth).
+
'''No.''' FarmShare is ''not'' approved for use with [https://dataclass.stanford.edu high-risk] data, including protected health information and personally identifiable information. Do not use FarmShare resources to store or process protected information.
-
=== tset: standard error: Invalid argument ===
+
== Shell and Environment ==
-
In your job script, you probably don't explicitly specify which shell to use. Probably your default shell is csh and your csh startup scripts are getting loaded and something there is generating that error because the job is not run under an interactive session. So the solution is to either specify a shell on the first line of your job script in the usual Unix way e.g.
+
=== How do I change my shell? ===
-
  #!/bin/bash
+
<code>bash</code> is the default shell for most users, and should be the default shell for all new accounts. Older accounts may use <code>tcsh</code> by default, instead. If you would like to change your shell for any reason you can send e-mail to [mailto:srcc-support@stanford.edu srcc-support@stanford.edu]. The <code>bash</code>, <code>zsh</code>, <code>fish</code>, <code>mksh</code>, and <code>tcsh</code> shells are installed, but not all are equally well-supported.
-
or else use the -S flag to qsub, e.g.  
+
FarmShare uses Stanford's central account infrastructure, so ''changing your shell on FarmShare will affect all other systems that use this infrastructure'' (for example, <code>myth.stanford.edu</code>). Please acknowledge your understanding of this by including something like the following in the text of your request.
-
# get rid of spurious messages about tty/terminal types
+
<blockquote style="padding-left: 5em; padding-right: 5em;">Please change my default shell to $SHELL. I understand that this is a global change and will affect not only FarmShare systems, but all other systems at Stanford that use the University's central account infrastructure.</blockquote>
-
#$ -S /bin/sh
+
-
=== tty errors ===
+
=== Why does my shell exit when running the <code>module</code> command? ===
-
You may see things like
+
An early version of the default <code>tcsh</code> shell configuration set an option, <code>printexitvalue</code>, that was in conflict with the Lmod configuration. This issue has been fixed for new users, but existing users may have configurations that still <code>set printexitvalue</code> in <code>~/.tcshrc.set</code>. You can either edit this file to remove the statement, make the statement a comment (by prepending <code>#</code>), or copy over a corrected version of the default file from <code>/etc/skel</code>.
-
<pre>tset: standard error: Invalid argument
+
-
Undefined tty
+
<source lang="sh">cp /etc/skel/.tcshrc.set ~</source>
-
stdin: is not a tty
+
-
</pre>
+
-
or
+
-
<pre>Warning: no access to tty (Bad file descriptor).
+
-
Thus no job control in this shell.
+
-
</pre>  
+
-
See the question above, and specify a shell. See the 'shell_start_mode' section of 'man sge_conf' for more info.
+
-
=== error: can't open output file "/afs/ir/users/c/h/chekh/YYY.oXXXX" ===
+
You can either log out and back in again, or run <code>unset printexitvalue</code> ''once'', to make the change take effect.
-
Check that you have your Kerberos credentials and AFS tokens per [[AFS]]
+
== Storage ==
-
=== glusterfs: Transport endpoint not connected ===
+
=== Where are my files? ===
-
"/mnt/glusterfs: Transport endpoint is not connected." This indicates a network or connection problem for the shared filesystem, please report this via HelpSU.
+
FarmShare no longer uses AFS for users' home directories. AFS is still accessible on <code>rice</code> systems, and you can access your AFS home directory using the convenience link, <code>~/afs-home</code>.
-
=== where is GView version 5? ===
+
=== Why can't I access files in <code>~/afs-home</code> or <code>/afs</code>? ===
-
Gauss View (GV5) is /usr/sweet/bin/gview
+
AFS access requires valid Kerberos credentials and an [[AFS#Authentication|AFS token]]. You can use the <code>klist</code> and <code>token</code> commands to view your existing credentials, if any; if you're having trouble accessing files in AFS, try re-authenticating.
-
=== SAS error message ===
+
<source lang="sh">kinit && aklog</source>
-
<pre>  When I attempt to run the command "sas", I receive the following error
+
-
  message.
+
-
 
+
-
  ERROR: User does not have appropriate authorization level for library
+
-
  SASUSER.
+
-
  NOTE: Unable to initialize the options subsystem.
+
-
  ERROR: (SASXKINI): PHASE 3 KERNEL INITIALIZATION FAILED.
+
-
  ERROR: Unable to initialize the SAS kernel.
+
-
</pre>  
+
-
Try to re-auth (kinit&nbsp;; [[AFS|aklog]]), run updaterc, and source ~/.cshrc before running sas. Also check that you're not over quota with the 'fs quota' command or the '/usr/bin/check-stanford-afs-quota' command.
+
-
=== Received disconnect from &lt;IP address&gt;: 2: Too many authentication failures for... ===
+
See [[Advanced Connection Options]] for a suggested SSH configuration that can help reduce the occurrence of token issues at login.
-
That error message is from OpenSSH and it means it's not letting you log in because you don't have the right credentials. Check that your kerberos tickets are what you expect.
+
=== Are my data backed up? ===
-
=== Decrypt integrity check failed ===
+
We take regular snapshots of data in your home directory (<code>/home/$USER</code>) and ''may'' be able to recover lost or damaged files in some cases. Data in your AFS home directory (<code>~/afs-home</code>) are backed up every night, and backups are kept for 30 days. The most recent backup is mounted at <code>~/afs-home/.backup</code>; if you need to recover data from an older backup you should submit a [https://helpsu.stanford.edu HelpSU] request. Data stored on the scratch volume (<code>/farmshare/user_data/$USER</code>) are ''not'' backed up and may be purged without warning.
-
  k5start: error getting credentials: Decrypt integrity check failed
+
== Slurm ==
-
This just means that you typed your Kerberos password wrong when kinit or whatever prompted you for it.
+
=== Why can't I submit jobs to the <code>gpu</code> partition? ===
-
<br>  
+
You must explicitly request GPU resources using the <code>--gres</code> option when you submit a job to the <code>gpu</code> partition.
-
=== How to submit a binary for execution ===
+
<source lang="sh">sbatch --partition=gpu --qos=gpu --gres=gpu:1</source>
-
Use the '-b' flag to 'qsub', read the qsub man page for more info. But you should probably write a small wrapper script instead.  
+
See the <code>man</code> page for <code>sbatch</code> for more information.
-
=== Why won't my job run? ===
+
=== Why can't I submit jobs to the <code>bigmem</code> partition? ===
-
If your job is in state 'qw' for longer than you like, check its full output '''qstat -f -j JOBID''' and see what the scheduling reason is. It will be verbose about explaining why that job can't run in each queue instance. For full cluster information, check '''qstat -g c''' and '''qstat -f -u "*"''' and '''qhost -j''' or e-mail us for more explanations.
+
You must request at least 96GB of memory using the <code>--mem</code> option when you submit a job to the <code>bigmem</code> partition.
-
=== Why does pressing 'd' cause my windows to disappear? ===
+
<source lang="sh">sbatch --partition=bigmem --qos=bigmem --mem=96G</source>
-
The GNOME keybinding for '''d''' may be broken when using VNC for remote display. You can edit the relevant keyboard shortcut ("Hide All Normal Windows and Set Focus to the Desktop") using the GNOME Control Center ('''gnome-control-center'''). Alternatively, you can edit ~/.gconf/apps/metacity/global_keybindings/%gconf.xml manually. For example:<br>
+
See the <code>man</code> page for <code>sbatch</code> for more information.
-
<pre>&lt;?xml version="1.0"?&gt;
+
-
&lt;gconf&gt;
+
-
  &lt;entry name="show_desktop" mtime="0123456789" type="string"&gt;
+
-
    &lt;stringvalue&gt;&amp;lt;Control&amp;gt;&amp;lt;Alt&amp;gt;d&lt;/stringvalue&gt;
+
-
  &lt;/entry&gt;
+
-
&lt;/gconf&gt;
+
-
</pre>
+
-
We added this setting system-wide to cardinal+corn on 2012-04-18
+
-
=== I get a CPLEX license error, what should I do? ===
+
=== Why are my jobs killed after 2 hours? ===
-
You may see something like:
+
While the ''maximum'' runtime for jobs is 2 days (unless you are submitting a job using the <code>long</code> quality-of-service), the ''default'' runtime is 2 hours. If your job requires more time to run you must request the additional time explicitly, using <code>--time</code>. You can also request less time than the default.
-
<pre>
+
== Applications ==
-
corn04:~> cplex
+
-
Failed to initialize CPLEX environment.
+
-
CPLEX Error 32201: ILM Error 8: CPLEX: access key has expired.
+
-
Exiting
+
-
</pre>
+
-
Check that you're using the latest version of CPLEX:
+
=== Why does Gaussian 16 fail with error: "illegal instruction (core dumped)"? ===
-
<pre>
+
Gaussian 16 requires a more recent CPU than is available on some FarmShare systems. You'll need to request a node with a compatible CPU when submitting Gaussian 16 jobs, or fall back to Gaussian 09 to run on any node. See <code>module help gaussian/g16-a.03</code> for more information.
-
[chekh@corn05.stanford.edu] ~ [0]
+
 
-
$ module load CPLEX_Studio-12.4
+
=== Why does the <code>gview</code> command fail to start GaussView?===
-
[chekh@corn05.stanford.edu] ~ [0]
+
 
-
$ which cplex
+
The <code>gview</code> command is used to start the desktop version of VIM; use the <code>gv</code> command to start GaussView (after loading the <code>gaussview</code> module), instead.
-
/mnt/glusterfs/software/non-free/CPLEX_Studio124/cplex/bin/x86-64_sles10_4.1/cplex
+
-
</pre>
+

Latest revision as of 15:05, 12 April 2018

Contents

Policy

Can I use FarmShare with high-risk data?

No. FarmShare is not approved for use with high-risk data, including protected health information and personally identifiable information. Do not use FarmShare resources to store or process protected information.

Shell and Environment

How do I change my shell?

bash is the default shell for most users, and should be the default shell for all new accounts. Older accounts may use tcsh by default, instead. If you would like to change your shell for any reason you can send e-mail to srcc-support@stanford.edu. The bash, zsh, fish, mksh, and tcsh shells are installed, but not all are equally well-supported.

FarmShare uses Stanford's central account infrastructure, so changing your shell on FarmShare will affect all other systems that use this infrastructure (for example, myth.stanford.edu). Please acknowledge your understanding of this by including something like the following in the text of your request.

Please change my default shell to $SHELL. I understand that this is a global change and will affect not only FarmShare systems, but all other systems at Stanford that use the University's central account infrastructure.

Why does my shell exit when running the module command?

An early version of the default tcsh shell configuration set an option, printexitvalue, that was in conflict with the Lmod configuration. This issue has been fixed for new users, but existing users may have configurations that still set printexitvalue in ~/.tcshrc.set. You can either edit this file to remove the statement, make the statement a comment (by prepending #), or copy over a corrected version of the default file from /etc/skel.

cp /etc/skel/.tcshrc.set ~

You can either log out and back in again, or run unset printexitvalue once, to make the change take effect.

Storage

Where are my files?

FarmShare no longer uses AFS for users' home directories. AFS is still accessible on rice systems, and you can access your AFS home directory using the convenience link, ~/afs-home.

Why can't I access files in ~/afs-home or /afs?

AFS access requires valid Kerberos credentials and an AFS token. You can use the klist and token commands to view your existing credentials, if any; if you're having trouble accessing files in AFS, try re-authenticating.

kinit && aklog

See Advanced Connection Options for a suggested SSH configuration that can help reduce the occurrence of token issues at login.

Are my data backed up?

We take regular snapshots of data in your home directory (/home/$USER) and may be able to recover lost or damaged files in some cases. Data in your AFS home directory (~/afs-home) are backed up every night, and backups are kept for 30 days. The most recent backup is mounted at ~/afs-home/.backup; if you need to recover data from an older backup you should submit a HelpSU request. Data stored on the scratch volume (/farmshare/user_data/$USER) are not backed up and may be purged without warning.

Slurm

Why can't I submit jobs to the gpu partition?

You must explicitly request GPU resources using the --gres option when you submit a job to the gpu partition.

sbatch --partition=gpu --qos=gpu --gres=gpu:1

See the man page for sbatch for more information.

Why can't I submit jobs to the bigmem partition?

You must request at least 96GB of memory using the --mem option when you submit a job to the bigmem partition.

sbatch --partition=bigmem --qos=bigmem --mem=96G

See the man page for sbatch for more information.

Why are my jobs killed after 2 hours?

While the maximum runtime for jobs is 2 days (unless you are submitting a job using the long quality-of-service), the default runtime is 2 hours. If your job requires more time to run you must request the additional time explicitly, using --time. You can also request less time than the default.

Applications

Why does Gaussian 16 fail with error: "illegal instruction (core dumped)"?

Gaussian 16 requires a more recent CPU than is available on some FarmShare systems. You'll need to request a node with a compatible CPU when submitting Gaussian 16 jobs, or fall back to Gaussian 09 to run on any node. See module help gaussian/g16-a.03 for more information.

Why does the gview command fail to start GaussView?

The gview command is used to start the desktop version of VIM; use the gv command to start GaussView (after loading the gaussview module), instead.

Personal tools
Toolbox
LANGUAGES