# R

### From FarmShare

(→building our local R) |
(→building our local R) |
||

Line 375: | Line 375: | ||

* don't "make install" | * don't "make install" | ||

* write new FarmShare module, e.g. /mnt/glusterfs/software/free/modules/tcl/modulefiles/R-2.15.1 | * write new FarmShare module, e.g. /mnt/glusterfs/software/free/modules/tcl/modulefiles/R-2.15.1 | ||

+ | |||

+ | ===2014-07-10=== | ||

+ | R 3.1.1 released today, I compiled it as chekh on corn40 (Ubuntu 13.10) | ||

+ | |||

+ | *cd /farmshare/software/free/r | ||

+ | *wget http://cran.cnr.berkeley.edu/src/base/R-3/R-3.1.1.tar.gz | ||

+ | *cd R-3.1.1 | ||

+ | *./configure --enable-R-shlib | ||

+ | *write /farmshare/software/free | ||

==lapack issues== | ==lapack issues== |

## Revision as of 12:39, 10 July 2014

## Contents |

## Looking at installed packages

You can see the list of installed R libraries by the library() call

library();

For example, currently on FarmShare these libraries are installed

$ R R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored] > library() Packages in library '/usr/lib/R/site-library': AMORE A MORE flexible neural network package Biobase Biobase: Base functions for Bioconductor DBI R Database Interface GenABEL genome-wide SNP association analysis HilbertVis Hilbert curve visualization Hmisc Harrell Miscellaneous MCMCpack Markov chain Monte Carlo (MCMC) Package MNP R Package for Fitting the Multinomial Probit Model MatchIt MatchIt RColorBrewer ColorBrewer palettes RGtk2 R bindings for Gtk 2.8.0 and above RMySQL R interface to the MySQL database RODBC ODBC Database Access RQuantLib R interface to the QuantLib library RSQLite SQLite interface for R Rcmdr R Commander Rcpp Seamless R and C++ Integration Rglpk R/GNU Linear Programming Kit Interface Rmpi Interface (Wrapper) to MPI (Message-Passing Interface) Rserve Binary R server TeachingDemos Demonstrations for teaching and learning VGAM Vector Generalized Linear and Additive Models XML Tools for parsing and generating XML within R and S-Plus. Zelig Everyones Statistical Software abind Combine multi-dimensional arrays bayesm Bayesian Inference for Marketing/Micro-econometrics bio3d Biological Structure Analysis bitops Functions for Bitwise operations caTools Tools: moving window statistics, GIF, Base64, ROC AUC, etc. cairoDevice Cairo-based cross-platform antialiased graphics device driver. car Companion to Applied Regression chron Chronological objects which can handle dates and times coda Output analysis and diagnostics for MCMC colorspace Color Space Manipulation combinat combinatorics utilities cummeRbund Analysis, exploration, manipulation, and visualization of Cufflinks high-throughput sequencing data. date Functions for handling dates digest Create cryptographic hash digests of R objects eco R Package for Ecological Inference in 2x2 Tables edgeR Empirical analysis of digital gene expression data in R effects Effect Displays for Linear, Generalized Linear, Multinomial-Logit, Proportional-Odds Logit Models and Mixed-Effects Models fAssets Rmetrics - Assets Selection and Modelling fBasics Rmetrics - Markets and Basic Statistics fCopulae Rmetrics - Dependence Structures with Copulas fExtremes Rmetrics - Extreme Financial Market Data fGarch Rmetrics - Autoregressive Conditional Heteroskedastic Modelling fMultivar Multivariate Market Analysis fOptions Basics of Option Valuation fPortfolio Rmetrics - Portfolio Selection and Optimization - ebook available at www.rmetrics.org fTrading Technical Trading Analysis g.data Delayed-Data Packages gdata Various R programming tools for data manipulation genetics Population Genetics ggplot2 An implementation of the Grammar of Graphics gmodels Various R programming tools for model fitting gplots Various R programming tools for plotting data gregmisc Gregs Miscellaneous Functions gtools Various R programming tools haplo.stats Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous happy Quantitative Trait Locus genetic analysis in Heterogeneous Stocks hdf5 HDF5 its Irregular Time Series latticeExtra Extra Graphical Utilities Based on Lattice limma Linear Models for Microarray Data lme4 Linear mixed-effects models using S4 classes lmtest Testing Linear Regression Models mapdata Extra Map Databases mapproj Map Projections maps Draw Geographical Maps misc3d Miscellaneous 3D Plots mnormt The multivariate normal and t distributions msm Multi-state Markov and hidden Markov models in continuous time multcomp Simultaneous Inference in General Parametric Models multicore Parallel processing of R code on machines with multiple cores or CPUs mvtnorm Multivariate Normal and t Distributions plyr Tools for splitting, applying and combining data proto Prototype object-based programming psy Various procedures used in psychometry pvclust Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling qtl Tools for analyzing QTL experiments quadprog Functions to solve Quadratic Programming Problems. qvalue Q-value estimation for false discovery rate control randomForest Breiman and Cutlers random forests for classification and regression relimp Relative Contribution of Effects in a Regression Model reshape Flexibly reshape data. reshape2 Flexibly reshape data: a reboot of the reshape package. rggobi Interface between R and GGobi rgl 3D visualization device system (OpenGL) rkward Provides functions related to the RKWard GUI rkwardtests RKWard Plugin Test Suite Framework rms Regression Modeling Strategies robustbase Basic Robust Statistics rotRPackage Statistical functions needed by the OpenTURNS project, see www.openturns.org rsprng R interface to SPRNG (Scalable Parallel Random Number Generators) sandwich Robust Covariance Matrix Estimators slam Sparse Lightweight Arrays and Matrices sm Smoothing methods for nonparametric regression and density estimation sn The skew-normal and skew-t distributions snow Simple Network of Workstations sp classes and methods for spatial data stabledist Stable Distribution Functions stringr Make it easier to work with strings. strucchange Testing, Monitoring, and Dating Structural Changes timeDate Rmetrics - Chronological and Calendar Objects timeSeries Rmetrics - Financial Time Series Objects tkrplot TK Rplot tseries Time series analysis and computational finance timeSeries Rmetrics - Financial Time Series Objects tkrplot TK Rplot tseries Time series analysis and computational finance vcd Visualizing Categorical Data zoo S3 Infrastructure for Regular and Irregular Time Series (Zs ordered observations) Packages in library '/usr/lib/R/library': KernSmooth Functions for kernel smoothing for Wand & Jones (1995) MASS Support Functions and Datasets for Venables and Ripleys MASS Matrix Sparse and Dense Matrix Classes and Methods base The R Base Package boot Bootstrap Functions (originally by Angelo Canty for S) class Functions for Classification cluster Cluster Analysis Extended Rousseeuw et al. codetools Code Analysis Tools for R compiler The R Compiler Package datasets The R Datasets Package foreign Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, dBase, ... grDevices The R Graphics Devices and Support for Colours and Fonts graphics The R Graphics Package grid The Grid Graphics Package lattice Lattice Graphics methods Formal Methods and Classes mgcv Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation nlme Linear and Nonlinear Mixed Effects Models nnet Feed-forward Neural Networks and Multinomial Log-Linear Models parallel Support for Parallel computation in R rpart Recursive Partitioning spatial Functions for Kriging and Point Pattern Analysis splines Regression Spline Functions and Classes stats The R Stats Package stats4 Statistical Functions using S4 Classes survival Survival Analysis tcltk Tcl/Tk Interface tools Tools for Package Development utils The R Utils Package

## Which R are you using?

Try run

which R

Try run

R --version

## Installing CRAN Packages

Most CRAN packages can be installed per-user by running install.packages() in an interactive session:

install.packages("package_name", dependencies = TRUE)

R initially attempts to install to /usr/local/lib/R, but will prompt for the creation of a library subdirectory in ~/R (if necessary) and fall back to installation there when the initial attempt fails. If your package requires dependencies available from the standard Ubuntu repositories you can submit a HelpSU ticket requesting installation. We can install from the Debian/Ubuntu package repositories or into the shared FarmShare fs.

You can, of course, install R libraries into any arbitrary path and just add that path to your R env. That will probably break the next time R is upgraded to a new version, since your packages are built with the older version.

NOTE: when you install a package in corn, it will be available to you in Barley.

## R Sample Job

Here's an example R file that generates a large array, fills it with some random numbers, then sleeps for 5mins. This happens to use up almost exactly 8GB of RAM.

Save this as 8GB.R:

x <- array(1:1073741824, dim=c(1024,1024,1024)) x <- gaussian() Sys.sleep(300)

Here's an example SGE submit script that runs that R file.

#!/bin/bash # use the current directory #$ -cwd #$ -S /bin/bash # mail this address #$ -M chekh@stanford.edu # send mail on begin, end, suspend #$ -m bes R --vanilla --no-save < 8GB.R

You can submit it with just

qsub r_test.script

Here are the output files that I get, one from stderr, one from stdout

$ cat r_test.script.o497 R version 2.12.1 (2010-12-16) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > x <- array(1:1073741824, dim=c(1024,1024,1024)) > x <- gaussian() > Sys.sleep(300) >

In the mail that you get about the ending of the job, the maxvmem number is actually incorrect, it is a known bug in this version of SGE. The R script on this page actually uses 8GB of vmem.

## Another R Sample Job

R script, let's call it R-rjags.R

print("Hello World") library(rjags) #this just loaded some settings from that library print("Finished")

Job script, let's call it R-jags.submit.script

#!/bin/bash # use the current directory #$ -cwd #$ -S /bin/bash # mail this address #$ -M chekh@stanford.edu # send mail on begin, end, suspend #$ -m bes R --vanilla --no-save < R-jags.R

Submit it to the test queue with a small memory requirement:

qsub -l mem_free=200M -l testq=1 R-jags.submit.script

Looking at the output files, it errored out because R can't find the package rjags. You have two alternatives:

- include the
**R**library from /mnt/glusterfs/software - use modules to specify the full
**R**install from /mnt/glusterfs/software

The first way, you would add this line to your R script:

.libPaths(c("/mnt/glusterfs/software/free/R-2.15.0/lib/R/library", "/usr/lib/R/library"))

The second way, your script will look like this:

$ cat R-jags.submit.script #!/bin/bash # use the current directory #$ -cwd #$ -S /bin/bash # mail this address #$ -M chekh@stanford.edu # send mail on begin, end, suspend #$ -m bes eval `tclsh /mnt/glusterfs/software/free/modules/tcl/modulecmd.tcl sh autoinit` module load R-2.15.0 R --vanilla --no-save < R-jags.R

## Links

Some other departments have some other more detailed examples:

- http://wiki.genomics.upenn.edu/index.php/HPC:ExamplesR
- http://me.eng.uab.edu/wiki/index.php?title=R-userinfo
- https://www.stanford.edu/dept/statistics/cgi-bin/projects/stat-sysadminwiki/index.php/R_Jobs
- http://www.glennklockwood.com/di/R-para.php

## building our local R

Here's how I usually do it.

- cd /mnt/glusterfs/software/free
- wget http://cran.cnr.berkeley.edu/src/base/R-2/R-2.15.1.tar.gz
- tar zxvf R-2.15.1.tar.gz
- cd R-2.15.1
- ./configure --enable-R-shlib
- make
- don't "make install"
- write new FarmShare module, e.g. /mnt/glusterfs/software/free/modules/tcl/modulefiles/R-2.15.1

### 2014-07-10

R 3.1.1 released today, I compiled it as chekh on corn40 (Ubuntu 13.10)

- cd /farmshare/software/free/r
- wget http://cran.cnr.berkeley.edu/src/base/R-3/R-3.1.1.tar.gz
- cd R-3.1.1
- ./configure --enable-R-shlib
- write /farmshare/software/free

## lapack issues

If you see messages like:

unable to load shared object '/usr/lib/R/modules//lapack.so':

most likely you're mixing R versions and libraries.

Double check that you are not setting R library path to point to directories with older libraries.

This test script should run fine if you have everything set correctly

$ cat lapack.r data(iris) zz = lm(Sepal.Length ~., data = iris) summary(zz) $ R --no-save < lapack.r