R

From FarmShare

Revision as of 18:20, 12 June 2014 by Bishopj (Talk | contribs)
Jump to: navigation, search

Contents

Looking at installed packages

You can see the list of installed R libraries by the library() call

library();

For example, currently on FarmShare these libraries are installed

$ R

R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> library()
Packages in library '/usr/lib/R/site-library':

AMORE                   A MORE flexible neural network package
Biobase                 Biobase: Base functions for Bioconductor
DBI                     R Database Interface
GenABEL                 genome-wide SNP association analysis
HilbertVis              Hilbert curve visualization
Hmisc                   Harrell Miscellaneous
MCMCpack                Markov chain Monte Carlo (MCMC) Package
MNP                     R Package for Fitting the Multinomial Probit
                        Model
MatchIt                 MatchIt
RColorBrewer            ColorBrewer palettes
RGtk2                   R bindings for Gtk 2.8.0 and above
RMySQL                  R interface to the MySQL database
RODBC                   ODBC Database Access
RQuantLib               R interface to the QuantLib library
RSQLite                 SQLite interface for R
Rcmdr                   R Commander
Rcpp                    Seamless R and C++ Integration
Rglpk                   R/GNU Linear Programming Kit Interface
Rmpi                    Interface (Wrapper) to MPI (Message-Passing
                        Interface)
Rserve                  Binary R server
TeachingDemos           Demonstrations for teaching and learning
VGAM                    Vector Generalized Linear and Additive Models
XML                     Tools for parsing and generating XML within R
                        and S-Plus.
Zelig                   Everyones Statistical Software
abind                   Combine multi-dimensional arrays
bayesm                  Bayesian Inference for
                        Marketing/Micro-econometrics
bio3d                   Biological Structure Analysis
bitops                  Functions for Bitwise operations
caTools                 Tools: moving window statistics, GIF, Base64,
                        ROC AUC, etc.
cairoDevice             Cairo-based cross-platform antialiased graphics
                        device driver.
car                     Companion to Applied Regression
chron                   Chronological objects which can handle dates
                        and times
coda                    Output analysis and diagnostics for MCMC
colorspace              Color Space Manipulation
combinat                combinatorics utilities
cummeRbund              Analysis, exploration, manipulation, and
                        visualization of Cufflinks high-throughput
                        sequencing data.
date                    Functions for handling dates
digest                  Create cryptographic hash digests of R objects
eco                     R Package for Ecological Inference in 2x2
                        Tables
edgeR                   Empirical analysis of digital gene expression
                        data in R
effects                 Effect Displays for Linear, Generalized Linear,
                        Multinomial-Logit, Proportional-Odds Logit
                        Models and Mixed-Effects Models
fAssets                 Rmetrics - Assets Selection and Modelling
fBasics                 Rmetrics - Markets and Basic Statistics
fCopulae                Rmetrics - Dependence Structures with Copulas
fExtremes               Rmetrics - Extreme Financial Market Data
fGarch                  Rmetrics - Autoregressive Conditional
                        Heteroskedastic Modelling
fMultivar               Multivariate Market Analysis
fOptions                Basics of Option Valuation
fPortfolio              Rmetrics - Portfolio Selection and Optimization
                        - ebook available at www.rmetrics.org
fTrading                Technical Trading Analysis
g.data                  Delayed-Data Packages
gdata                   Various R programming tools for data
                        manipulation
genetics                Population Genetics
ggplot2                 An implementation of the Grammar of Graphics
gmodels                 Various R programming tools for model fitting
gplots                  Various R programming tools for plotting data
gregmisc                Gregs Miscellaneous Functions
gtools                  Various R programming tools
haplo.stats             Statistical Analysis of Haplotypes with Traits
                        and Covariates when Linkage Phase is Ambiguous
happy                   Quantitative Trait Locus genetic analysis in
                        Heterogeneous Stocks
hdf5                    HDF5
its                     Irregular Time Series
latticeExtra            Extra Graphical Utilities Based on Lattice
limma                   Linear Models for Microarray Data
lme4                    Linear mixed-effects models using S4 classes
lmtest                  Testing Linear Regression Models
mapdata                 Extra Map Databases
mapproj                 Map Projections
maps                    Draw Geographical Maps
misc3d                  Miscellaneous 3D Plots
mnormt                  The multivariate normal and t distributions
msm                     Multi-state Markov and hidden Markov models in
                        continuous time
multcomp                Simultaneous Inference in General Parametric
                        Models
multicore               Parallel processing of R code on machines with
                        multiple cores or CPUs
mvtnorm                 Multivariate Normal and t Distributions
plyr                    Tools for splitting, applying and combining
                        data
proto                   Prototype object-based programming
psy                     Various procedures used in psychometry
pvclust                 Hierarchical Clustering with P-Values via
                        Multiscale Bootstrap Resampling
qtl                     Tools for analyzing QTL experiments
quadprog                Functions to solve Quadratic Programming
                        Problems.
qvalue                  Q-value estimation for false discovery rate
                        control
randomForest            Breiman and Cutlers random forests for
                        classification and regression
relimp                  Relative Contribution of Effects in a
                        Regression Model
reshape                 Flexibly reshape data.
reshape2                Flexibly reshape data: a reboot of the reshape
                        package.
rggobi                  Interface between R and GGobi
rgl                     3D visualization device system (OpenGL)
rkward                  Provides functions related to the RKWard GUI
rkwardtests             RKWard Plugin Test Suite Framework
rms                     Regression Modeling Strategies
robustbase              Basic Robust Statistics
rotRPackage             Statistical functions needed by the OpenTURNS
                        project, see www.openturns.org
rsprng                  R interface to SPRNG (Scalable Parallel Random
                        Number Generators)
sandwich                Robust Covariance Matrix Estimators
slam                    Sparse Lightweight Arrays and Matrices
sm                      Smoothing methods for nonparametric regression
                        and density estimation
sn                      The skew-normal and skew-t distributions
snow                    Simple Network of Workstations
sp                      classes and methods for spatial data
stabledist              Stable Distribution Functions
stringr                 Make it easier to work with strings.
strucchange             Testing, Monitoring, and Dating Structural
                        Changes
timeDate                Rmetrics - Chronological and Calendar Objects
timeSeries              Rmetrics - Financial Time Series Objects
tkrplot                 TK Rplot
tseries                 Time series analysis and computational finance
timeSeries              Rmetrics - Financial Time Series Objects
tkrplot                 TK Rplot
tseries                 Time series analysis and computational finance
vcd                     Visualizing Categorical Data
zoo                     S3 Infrastructure for Regular and Irregular
                        Time Series (Zs ordered observations)

Packages in library '/usr/lib/R/library':

KernSmooth              Functions for kernel smoothing for Wand & Jones
                        (1995)
MASS                    Support Functions and Datasets for Venables and
                        Ripleys MASS
Matrix                  Sparse and Dense Matrix Classes and Methods
base                    The R Base Package
boot                    Bootstrap Functions (originally by Angelo Canty
                        for S)
class                   Functions for Classification
cluster                 Cluster Analysis Extended Rousseeuw et al.
codetools               Code Analysis Tools for R
compiler                The R Compiler Package
datasets                The R Datasets Package
foreign                 Read Data Stored by Minitab, S, SAS, SPSS,
                        Stata, Systat, dBase, ...
grDevices               The R Graphics Devices and Support for Colours
                        and Fonts
graphics                The R Graphics Package
grid                    The Grid Graphics Package
lattice                 Lattice Graphics
methods                 Formal Methods and Classes
mgcv                    Mixed GAM Computation Vehicle with GCV/AIC/REML
                        smoothness estimation
nlme                    Linear and Nonlinear Mixed Effects Models
nnet                    Feed-forward Neural Networks and Multinomial
                        Log-Linear Models
parallel                Support for Parallel computation in R
rpart                   Recursive Partitioning
spatial                 Functions for Kriging and Point Pattern
                        Analysis
splines                 Regression Spline Functions and Classes
stats                   The R Stats Package
stats4                  Statistical Functions using S4 Classes
survival                Survival Analysis
tcltk                   Tcl/Tk Interface
tools                   Tools for Package Development
utils                   The R Utils Package


Which R are you using?

Try run

 which R

Try run

 R --version

Installing CRAN Packages

Most CRAN packages can be installed per-user by running install.packages() in an interactive session:

install.packages("package_name", dependencies = TRUE)

R initially attempts to install to /usr/local/lib/R, but will prompt for the creation of a library subdirectory in ~/R (if necessary) and fall back to installation there when the initial attempt fails. If your package requires dependencies available from the standard Ubuntu repositories you can submit a HelpSU ticket requesting installation. We can install from the Debian/Ubuntu package repositories or into the shared FarmShare fs.

You can, of course, install R libraries into any arbitrary path and just add that path to your R env. That will probably break the next time R is upgraded to a new version, since your packages are built with the older version.

NOTE: when you install a package in corn, it will be available to you in Barley.

R Sample Job

Here's an example R file that generates a large array, fills it with some random numbers, then sleeps for 5mins. This happens to use up almost exactly 8GB of RAM.

Save this as 8GB.R:

x <- array(1:1073741824, dim=c(1024,1024,1024)) 
x <- gaussian()
Sys.sleep(300)

Here's an example SGE submit script that runs that R file.

#!/bin/bash

# use the current directory
#$ -cwd
#$ -S /bin/bash

# mail this address
#$ -M chekh@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

R --vanilla --no-save < 8GB.R

You can submit it with just

 qsub r_test.script

Here are the output files that I get, one from stderr, one from stdout

$ cat r_test.script.o497 
R version 2.12.1 (2010-12-16)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


> x <- array(1:1073741824, dim=c(1024,1024,1024)) 
> x <- gaussian()
> Sys.sleep(300)
>


In the mail that you get about the ending of the job, the maxvmem number is actually incorrect, it is a known bug in this version of SGE. The R script on this page actually uses 8GB of vmem.

Another R Sample Job

R script, let's call it R-rjags.R

print("Hello World")
library(rjags)
#this just loaded some settings from that library
print("Finished")

Job script, let's call it R-jags.submit.script

#!/bin/bash

# use the current directory
#$ -cwd
#$ -S /bin/bash

# mail this address
#$ -M chekh@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

R --vanilla --no-save < R-jags.R

Submit it to the test queue with a small memory requirement:

 qsub -l mem_free=200M -l testq=1 R-jags.submit.script


Looking at the output files, it errored out because R can't find the package rjags. You have two alternatives:

  • include the R library from /mnt/glusterfs/software
  • use modules to specify the full R install from /mnt/glusterfs/software

The first way, you would add this line to your R script:

 .libPaths(c("/mnt/glusterfs/software/free/R-2.15.0/lib/R/library", "/usr/lib/R/library"))

The second way, your script will look like this:

$ cat R-jags.submit.script
#!/bin/bash

# use the current directory
#$ -cwd
#$ -S /bin/bash

# mail this address
#$ -M chekh@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

eval `tclsh /mnt/glusterfs/software/free/modules/tcl/modulecmd.tcl sh autoinit`
module load R-2.15.0
R --vanilla --no-save < R-jags.R

Links

Some other departments have some other more detailed examples:

building our local R

Here's how I usually do it.


lapack issues

If you see messages like:

  unable to load shared object '/usr/lib/R/modules//lapack.so':

most likely you're mixing R versions and libraries.

Double check that you are not setting R library path to point to directories with older libraries.

This test script should run fine if you have everything set correctly

$ cat lapack.r 
data(iris)
zz = lm(Sepal.Length ~., data = iris) 
summary(zz)

$ R --no-save < lapack.r 
Personal tools
Toolbox
LANGUAGES