Gene Shaving
 Smart Prediction

Gamsel: fit regularization path for generalized additive models.
Gamsel fits a regularization path for generalized additive models with many variables. It uses an overlapped group lasso penalty to create sticking points at constant, linear and non-linear terms.

Written by Alexandra Chouldechova and Trevor Hastie, and maintained by Trevor Hastie.

gamsel package, on CRAN


Glinternet: fit a linear model with hierarchical interaction via group-lasso regularization.
Glinternet fits a regularization path to include main effects and interactions in linear and logistic regression models. Can deal with quantitative and factor predictors. Glinternet uses the overlap group-lasso to enforce strict hierarchy, which also encourages interactions among variables with strong main effects. The code is efficient, and can handle problems with many thousands of variables.

Written Michael Lim and Trevor Hastie, and maintained by Michael Lim.

glinternet package, on CRAN


Glmnet: fit the elastic-net regularization path for some generalized linear models.
Glmnet fits the entire regularition path for an elastic-net regularized glm. The models included are Gaussian, binomial, multinomial, Poisson, and the Cox model. Glmnet solves the following problem $$\min_{\beta_0,\beta} \frac{1}{N}\sum_{i=1}^N w_il(y_i,\beta_0+\beta^Tx_i)+\lambda \left[(1-\alpha) ||\beta||_2^2/2+\alpha||\beta||_1\right],$$ over a grid of values of \(\lambda\) covering the entire range. Here \( l(y,\eta)\) is a log-likelihood contribution for observation \(i\); e.g. for the Gaussian case it is \(\mbox{$\frac12$}(y-\eta)^2\). Here \(\alpha\) bridges the gap between lasso (\(\alpha=1\), the default), and ridge (\(\alpha=0\)). The package includes methods for prediction and plotting, and functions for performing K-fold cross-validation. The code can handle sparse input-matrix formats, as well as range constraints on coefficients. Glmnet also makes use of the strong rules for efficient restriction of the active set. Glmnet has many bells and whistles, which are illustrated in the vignette below. The core of Glmnet is a set of fortran subroutines, which make for very fast execution. The algorithms use coordinate descent with warm starts and active set iterations.

Written by Jerome Friedman, Trevor Hastie, Rob Tibshirani and Noah Simon.

Glmnet in R: This package is actively maintained by Trevor Hastie on CRAN. The R code interfaces to Fortran code written by Jerome Friedman.

Youtube webinar on glmnet (the sound got slightly lagged wrt the video)

Glmnet vignette (html) published (2/18/2015), also in pdf format.
Here is a link to the directory containing the data objects used in the vignette, or else a compressed zip archive of the lot.

Glmnet in Matlab: ported and maintained by Junyang Qian. The original port was by Hui Jiang (2009 ), and was updated and expanded by Junyang Qian in September 2013.


softImpute: impute missing values for a matrix via nuclear-norm regularization
SoftImpute fits a low-rank matrix approximation to a matrix with missing values via nuclear-norm regularization. The algorithm works like EM, filling in the missing values with the current guess, and then solving the optimization problem on the complete matrix using a soft-thresholded SVD. Special sparse-matrix classes available for very large matrices.
Written by Trevor Hastie and Rahul Mazumder, and maintained by Trevor Hastie.

softImpute package, on CRAN

softImpute vignette (html) published (9/10/2014).


Sparsenet: fit a linear model regularized by the nonconvex MC+ sparsity penalty
Sparsenet uses coordinate descent on the MC+ nonconvex penalty family, and fits a surface of solutions over the two-dimensional parameter space. This penalty family is indexed by an overall strength paramter \(\lambda\) (like lasso), and a convexity parameter \(\gamma\), with \(\gamma = \infty\) corresponding to the lasso, and \(\gamma = 1\) best subset selection.
Written by Rahul Mazumder, Jerome Friedman and Trevor Hastie, and maintained by Trevor Hastie.

Sparsenet package, on CRAN


SvmPath: fit the entire regularization path for the SVM
The software, written in the S language for R, computes the entire solution path for the two-class SVM model. The solution is calculated for every value of the cost parameter C, essentially with the same computing cost of a single SVM solution.
Written by Trevor Hastie.
Go to Webpage


glmpath: fit the entire L1 regularization path for generalized linear models.
This algorithm uses predictor-corrector method to compute the entire regularization path for generalized linear models with L1 penalty. Somewhat superceded by the package glmnet above, but not entirely. Glmpath is able to estimate the knots or entry points for each variable as it enters the path..
Written by Mee-Young Park and Trevor Hastie, and maintained by Mee-Young Park.
glmpath package, on CRAN


LARS: Least Angle Regression software
The software, written in the S language, computes the entire LAR, Lasso, or (epsilon) forward stagewise coefficient path in the same order of computations as a single least-squares fit.
Written by Brad Efron and Trevor Hastie.
Go to LARS Webpage


R routines for fitting generalized additive models. This package corresponds to the gam models described in Chapter 7 of the "white" book Statistical Models in S Wadsworth (1992) Chambers and Hastie (eds).
Formulas s() and lo() allow for smoothing splines and local regression smoothers. Any family is accommodated, using the same family functions as glm(). Generic functions for plotting, anova, summary, predict etc. Recent (2013) improvements to the function step.gam()
Written and maintained by Trevor Hastie.
gam package, on CRAN.


R routines for Flexible Discriminant Analysis, Penalized Discriminant Analysis and Nonparametric Mixture Discriminant Analysis models. These tools are enhancements on the lda function in R, and allow linear, polynomial, and nonparametric versions of discriminant analysis and mixture models. There are easy to use predict methods. These methods are described in Elements of Statistical Learning (chapter 12), as well as the original references.
Written by Trevor Hastie and Rob Tibshirani, and maintained by Trevor Hastie.
mda package, on CRAN.


Imputation of missing data, intended for microarray and expression arrays. Impute uses knn to impute the missing values for a gene, by using the average values from the k-nearest neighbors in the space of the non-missing elements. The algorithm is fortran-based, and uses an adaptive combination of recursive 2-means clustering and nearest neighbors.
Trevor Hastie (fortran code and algorithm), Robert Tibshirani, Balasubramanian Narasimhan (maintainer) and Gilbert Chu.
impute (on Bioconductor)



Gene Shaving
A method for finding small clusters of highly correlated genes with large variance across the samples. See online version of gene shaving paper at Genome Biology by Trevor Hastie, Rob Tibshirani and coauthors
Code part of the GeneClust software written by Kim Anh Do and colleagues, and based on the original code by Trevor Hastie and Rob Tibshirani
Go to Geneclust homepage


Smart Prediction
Routines in Splus and R for making predictions "smarter" in the context of the formula language for statistical models such as lm() and glm().
Written by Thomas Yee and Trevor Hastie.
Go to Webpage


FORTRAN program for fitting generalized additive models.
Written by Trevor Hastie and Rob Tibshirani.
Shell archive


S functions for fitting principal curves.
Written by Trevor Hastie.
Shell archive

princurve package in R. Contains original principal curves code, ported and maintained by Andreas Weingessel.


Modified versions of bs() and ns() that allow safe predictions, especially in the context of the S modelling functions. New predict() methods as well.
Written by Trevor Hastie.
Shell archive
Tools for converting S help files and S code to latex, and a corresponding latex .sty file.
Written by John Chambers and Trevor Hastie
Shell archive

 © copyright 2003 Trevor Hastie - All rights reserved.