Glmnet - Introduction

Here is a brief introduction of the package. For more details and examples, use help glmnet or help cvglmnet in Matlab.

Description

Suppose X is the input matrix and Y the response vector. For the Gaussian family, glmnet solves the penalized residual sum of squares,

 min_{(beta_0, beta) in mathbb{R}^{p+1}}frac{1}{2N} sum_{i=1}^N (y_i -beta_0-x_i^T beta)^2+lambda big[ (1-alpha)||beta||_2^2/2 + alpha||beta||_1big],

where lambda geq 0 is a complexity parameter and 0 leq alpha leq 1 is a compromise between ridge and lasso. Note that it becomes the lasso when alpha = 1 and the ridge regression when alpha = 0.

For other families, glmnet maximizes the appropriate penalized log-likelihood (partial likelihood for the cox model), or minimize the penalized negative one. Take the binomial model for example, it solves

 min_{(beta_0, beta) in mathbb{R}^{p+1}} -frac{1}{N} sum_{i=1}^N y_i cdot (beta_0 + x_i^T beta) + log (1+e^{(beta_0+x_i^T beta)}) + lambda big[ (1-alpha)||beta||_2^2/2 + alpha||beta||_1big].

The algorithm uses cyclical coordinate descent in a pathwise fashion. In addition to basic settings, many more options are available: observation weights, choice of lambda sequence, grouping, etc. For more information, see the reference papers, help file or the documentation (in progress).

Two central functions of the package are:

  • glmnet.m - basic function that returns a structure containing all essential information for further use, like printing, plotting and prediction.

  • cvglmnet.m - a more commonly used function that returns a structure after selecting the tuning parameter by cross-validation.

Example

We give a simple example here just to point the way. More exploration can be done by referring to the help files or the illustrative documentation.

Suppose x is the input matrix and y the response vector. Then,

  • fit = glmnet(x, y) – fits the model under all default settings, the structure variable fit saves all necessary information.

  • glmnetPrint(fit) – prints relevant information of the fitted object, fit here.

  • glmnetPlot(fit) – plots the coefficients from the fitted object.

  • cvfit = cvglmnet(x, y) – fits the model by cross-validation under all default settings, with results saved in cvfit.

  • cvglmnetPlot(cvfit) – plots the cross-validation curve.

List of Major Functions

  • cvglmnet.m

    • cross-validation for glmnet

  • cvglmnetCoef.m

    • extract the coefficients from a 'cv.glmnet’ object

  • cvglmnetPlot.m

    • plot the cross-validation curve produced by cvglmnet.m

  • cvglmnetPredict.m

    • make predictions from a 'cv.glmnet’ object

  • glmnet.m

    • fit a GLM with lasso or elasticnet regularization

  • glmnetCoef.m

    • extract the coefficients from a 'glmnet’ object

  • glmnetControl.m

    • internal glmnet parameters

  • glmnetPlot.m

    • plot coefficients from a 'glmnet’ object

  • glmnetPredict.m

    • make predictions from a 'glmnet’ object

  • glmnetPrint.m

    • print a 'glmnet’ object

  • glmnetSet.m

    • creates or alters an options structure for glmnet.m