

Glinternet: fit a linear model with hierarchical interaction via grouplasso regularization.
Glinternet fits a regularization path to include main effects and interactions in linear and logistic regression models.
Can deal with quantitative and factor predictors. Glinternet uses the overlap grouplasso to enforce strict hierarchy, which also encourages interactions among variables with strong main effects. The code is efficient, and can handle problems with many thousands of variables.
Written Michael Lim and Trevor Hastie, and maintained by Michael Lim.
glinternet package, on CRAN


Glmnet: fit the elasticnet regularization path for some generalized linear models.
Glmnet fits the entire regularition path for an elasticnet regularized glm. The models included are Gaussian, binomial, multinomial, Poisson, and the Cox model. Glmnet solves the following problem
$$\min_{\beta_0,\beta} \frac{1}{N}\sum_{i=1}^N w_il(y_i,\beta_0+\beta^Tx_i)+\lambda \left[(1\alpha) \beta_2^2/2+\alpha\beta_1\right],$$
over a grid of values of \(\lambda\) covering the entire range. Here \( l(y,\eta)\) is a loglikelihood contribution for observation \(i\); e.g. for the Gaussian case it is \(\mbox{$\frac12$}(y\eta)^2\). Here \(\alpha\) bridges the gap between lasso (\(\alpha=1\), the default), and ridge (\(\alpha=0\)).
The package includes methods for prediction and plotting, and functions for performing Kfold crossvalidation. The code can handle sparse inputmatrix formats, as well as range constraints on coefficients. Glmnet also makes use of the strong rules for efficient restriction of the active set. Glmnet has many bells and whistles, which are illustrated in the vignette below. The core of Glmnet is a set of fortran subroutines, which make for very fast execution. The algorithms use coordinate descent with warm starts and active set iterations.
Written by Jerome Friedman, Trevor Hastie, Rob Tibshirani and Noah Simon.
Glmnet in R: This package is actively maintained by Trevor Hastie on CRAN. The R code interfaces to Fortran code written by Jerome Friedman.
Youtube webinar on glmnet (the sound got slightly lagged wrt the video)
Glmnet vignette (html) published (11/24/2013), also in pdf format.
Here is a
link to the directory containing the data objects used in the vignette, or else a compressed zip archive of the lot.
Glmnet in Matlab: ported and maintained by Junyang Qian. The original port was by Hui Jiang (2009
), and was updated and expanded by Junyang Qian in September 2013.


softImpute: impute missing values for a matrix via nuclearnorm regularization
SoftImpute fits a lowrank matrix approximation to a matrix with missing values via nuclearnorm regularization. The algorithm works like EM, filling in the missing values with the current guess, and then solving the optimization problem on the complete matrix using a softthresholded SVD. Special sparsematrix classes available for very large matrices.
Written by Trevor Hastie and Rahul Mazumder, and maintained by Trevor Hastie.
softImpute package, on CRAN
softImpute vignette (html) published (9/10/2014).


Sparsenet: fit a linear model regularized by the nonconvex MC+ sparsity penalty
Sparsenet uses coordinate descent on the MC+ nonconvex penalty family, and fits a surface of solutions over the twodimensional parameter space. This penalty family is indexed by an overall strength paramter \(\lambda\) (like lasso), and a convexity parameter \(\gamma\), with \(\gamma = \infty\) corresponding to the lasso, and \(\gamma = 1\) best subset selection.
Written by Rahul Mazumder, Jerome Friedman and Trevor Hastie, and maintained by Trevor Hastie.
Sparsenet package, on CRAN


SvmPath: fit the entire regularization path for
the SVM
The software, written in the S language for R, computes the entire
solution path for the twoclass SVM model. The solution is
calculated for every value of the cost parameter C,
essentially with the same computing cost of a single SVM solution.
Written by Trevor Hastie.
Go to
Webpage


glmpath: fit the entire L1 regularization path for generalized linear models.
This algorithm uses predictorcorrector method to compute the entire regularization path for generalized linear models with L1 penalty. Somewhat superceded by the package glmnet above, but not entirely. Glmpath is able to estimate the knots or entry points for each variable as it enters the path..
Written by MeeYoung Park and Trevor Hastie, and maintained by MeeYoung Park.
glmpath package, on CRAN


LARS: Least Angle Regression software
The software, written in the S language, computes the entire LAR, Lasso, or (epsilon) forward stagewise coefficient path in the same order of computations as a single leastsquares fit.
Written by Brad Efron and Trevor Hastie.
Go to
LARS Webpage


gam
R routines for fitting generalized additive models. This package corresponds to the gam models described in Chapter 7 of the "white" book
Statistical Models in S Wadsworth (1992) Chambers and Hastie (eds).
Formulas s() and lo() allow for smoothing splines and local regression smoothers. Any family is accommodated, using the same family functions as glm(). Generic functions for plotting, anova, summary, predict etc. Recent (2013) improvements to the function step.gam()
Written and maintained by Trevor Hastie.
gam package, on CRAN.


mda
R routines for Flexible Discriminant Analysis, Penalized
Discriminant Analysis and Nonparametric Mixture Discriminant Analysis
models.
These tools are
enhancements on the lda function in R, and allow
linear, polynomial, and nonparametric versions of
discriminant analysis and mixture models. There are easy to use predict
methods. These methods are described in Elements of Statistical Learning (chapter 12), as well as the original references.
Written by Trevor Hastie and Rob Tibshirani, and maintained by Trevor Hastie.
mda package, on CRAN.


impute
Imputation of missing data, intended for microarray and expression arrays. Impute uses knn to impute the missing values for a gene, by using the average values from the knearest neighbors in the space of the nonmissing elements. The algorithm is fortranbased, and uses an adaptive combination of recursive 2means clustering and nearest neighbors.
Trevor Hastie (fortran code and algorithm), Robert Tibshirani, Balasubramanian Narasimhan (maintainer) and Gilbert Chu.
impute (on Bioconductor)

OLDER SOFTWARE

Gene Shaving
A method for finding small clusters of highly correlated genes with large
variance across the samples. See
online
version of gene
shaving paper at Genome Biology by Trevor Hastie, Rob
Tibshirani and coauthors
Code part of the GeneClust software written by Kim
Anh Do and colleagues, and based on the original code by
Trevor Hastie and Rob Tibshirani
Go to
Geneclust homepage


Smart Prediction
Routines in Splus and R for making predictions "smarter" in the
context of the formula language for statistical models such as lm()
and glm().
Written by Thomas Yee and Trevor Hastie.
Go to
Webpage


gamfit
FORTRAN program for fitting generalized additive models.
Written by Trevor Hastie and Rob Tibshirani.
Shell archive


principal.curve
S functions for fitting principal curves.
Written by Trevor Hastie.
Shell archive
princurve package in R. Contains original principal curves code, ported and maintained by Andreas Weingessel.


safe.predict
Modified versions of bs() and ns() that allow safe
predictions, especially in the context of the S modelling
functions. New predict() methods as well.
Written by Trevor Hastie.
Shell archive


s.to.latex
Tools for converting S help files and S code to latex, and
a corresponding latex .sty file.
Written by John Chambers and Trevor Hastie
Shell archive

