R

From FarmShare

(Difference between revisions)
Jump to: navigation, search
(lapack issues)
Line 170: Line 170:
Use the module load command at the top of the page and figure out if you've set R library path to point to directories with older libraries at some point.
Use the module load command at the top of the page and figure out if you've set R library path to point to directories with older libraries at some point.
 +
 +
Test script:
 +
<pre>
 +
$ cat lapack.r
 +
data(iris)
 +
zz = lm(Sepal.Length ~., data = iris)
 +
summary(zz)
 +
 +
$ R --no-save < lapack.r
 +
</pre>

Revision as of 13:33, 19 December 2012

Use 'module load R-2.15.1' if you're on Ubuntu 12.04. And check which library directories you're loading.

Currently there's R 2.15.2 installed from standard repos (package names start with 'r-cran-'), and R 2.15.1 installed in FarmShare modules (try 'module avail'). So you can just type "R" to use it, but you may need to set your RLIBPath in case the module you need is not commonly available.


Contents

Looking at installed packages

library();
.libPaths(c("/mnt/glusterfs/software/free/R-2.15.1/lib/R/library", "/usr/lib/R/library"))
library();

Installing CRAN Packages

Most CRAN packages can be installed per-user by running install.packages() in an interactive session:

install.packages("package_name", dependencies = TRUE)

R initially attempts to install to /usr/local/lib/R, but will prompt for the creation of a library subdirectory in ~/R (if necessary) and fall back to installation there when the initial attempt fails. If your package requires dependencies available from the standard Ubuntu repositories you can submit a HelpSU ticket requesting installation. We can install from the Debian/Ubuntu package repositories or into the shared FarmShare fs.

You can, of course, install R libraries into any arbitrary path and just add that path to your R env.

NOTE: when you install a package in corn, it will be available to you in Barley.

R Sample Job

Here's an example R file that generates a large array, fills it with some random numbers, then sleeps for 5mins. This happens to use up almost exactly 8GB of RAM.

$ cat 8GB.R 
x <- array(1:1073741824, dim=c(1024,1024,1024)) 
x <- gaussian()
Sys.sleep(300)

Here's an example SGE submit script that runs that R file.

#!/bin/bash

# use the current directory
#$ -cwd
# mail this address
#$ -M chekh@stanford.edu
# send mail on begin, end, suspend
#$ -m bes
# get rid of spurious messages about tty/terminal types
#$ -S /bin/sh

R --vanilla --no-save < 8GB.R

You can submit it with just

 qsub r_test.script

Here are the output files that I get, one from stderr, one from stdout

$ cat r_test.script.e497 
tset: standard error: Function not implemented

Undefined tty
stdin: is not a tty
$ cat r_test.script.o497 

Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.

R version 2.12.1 (2010-12-16) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.


> x <- array(1:1073741824, dim=c(1024,1024,1024)) > x <- gaussian() > Sys.sleep(300) >

Those errors about tty and job control have something to do with shell startup and terminal settings, and are normal. Or specify the 'sh' shell in your job script

 #get rid of spurious messages about tty/terminal types
#$ -S /bin/sh

In the mail that you get about the ending of the job, the maxvmem number is actually incorrect, it is a known bug in this version of SGE. The R script on this page actually uses 8GB of vmem.

Another R Sample Job

R script, let's call it R-rjags.R

print("Hello World")
library(rjags)
#this just loaded some settings from that library
print("Finished")

Job script, let's call it R-jags.submit.script

#!/bin/bash

# use the current directory
#$ -cwd
# mail this address
#$ -M chekh@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

R --vanilla --no-save < R-jags.R

Submit it to the test queue with a small memory requirement:

 qsub -l mem_free=200M -l testq=1 R-jags.submit.script


Looking at the output files, it errored out because R can't find the package rjags. You have two alternatives:

  • include the R library from /mnt/glusterfs/software
  • use modules to specify the full R install from /mnt/glusterfs/software

The first way, you would add this line to your R script:

 .libPaths(c("/mnt/glusterfs/software/free/R-2.15.0/lib/R/library", "/usr/lib/R/library"))

The second way, your script will look like this:

$ cat R-jags.submit.script
#!/bin/bash

# use the current directory
#$ -cwd
# mail this address
#$ -M chekh@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

eval `tclsh /mnt/glusterfs/software/free/modules/tcl/modulecmd.tcl sh autoinit`
module load R-2.15.0
R --vanilla --no-save < R-jags.R

Links

Some other departments have some other more detailed examples:

building our local R

Here's how I usually do it.


lapack issues

If you see messages like:

  unable to load shared object '/usr/lib/R/modules//lapack.so':

most likely you're mixing R versions and libraries.

Use the module load command at the top of the page and figure out if you've set R library path to point to directories with older libraries at some point.

Test script:

$ cat lapack.r 
data(iris)
zz = lm(Sepal.Length ~., data = iris) 
summary(zz)

$ R --no-save < lapack.r 
Personal tools
Toolbox
LANGUAGES