l1_logregis an implementation of the interior-point method for l1-regularized logistic regression described in the paper, An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression. This implementation consists of three main functions:
l1_logreg_regpathfor (approximate) regularization path computation
l1_logreg concerns the logistic model that has the form
where denotes a vector of feature variables, and denotes the associated binary outcome (class). Here, is the conditional probability of , given . The logistic model has parameters (the intercept) and (the weight vector).
With a given set of training examples,
l1_logreg_train finds the logistic model by solving an optimization problem of the form
where the variables are , , and the problem data are , and . We refer to the problem as a -regularized logistic regression problem (l1-regularized LRP).
Once we find maximum likelihood values of and , i.e., a solution of the l1-regularized LRP, we can predict the probability of the two possible outcomes, , given a new feature vector , using the associated logistic model.
l1_logreg_classify performs classification over a (test) data set by computing
which picks the more likely outcome, given , according to the logistic model found by
The regularization parameter roughly controls the number of nonzero components in , with larger typically yielding sparser . The family of solutions, as varies over is called the regularization path.
l1_logreg_regpath finds an approximate regularization path efficiently using a warmstart technique.
To solve the l1-regularized LRP,
l1_logreg uses Preconditioned Conjugate Gradient (PCG) method if the feature matrix is stored in sparse format, and the direct method if the feature matrix is stored in dense format. For more information on file format, see file format page.
l1_logreg_regpathin the command line. See using l1_logreg in shell section.
l1_logreg, you need to get executables compatible with your machine. You may download precompiled executables or download the source code to build your own.
To install precompiled executables, you can simply download executables compatible with your OS and CPU.
If your cannot find appropriate binaries for your machine, follow the instructions in build-your-own section.
l1_logregis distributed under the terms of the GNU General Public License 2.0. For commercial applications that may be incompatible with this license, please do not hesitate to contact us to discuss alternatives.
To obtain the
l1_logreg distribution, which includes the source code, the manual, and example data sets, use the following links:
If you just want binaries, you can download them from the following links:
If you want to run examples, download and untar the source package first, and put the executables in the binary package at
src_c directory. For test, run
test_script at the top-build directory of source package after putting/making the binaries in
NOTE: The pre-compiled binaries would be slow. To achieve maximum performance you might want to compile the source code using BLAS appropriate to your own machine.
Before installation, be sure that BLAS and LAPACK libraries are installed in your machine. If not (or not sure), please check the libraries page.
l1_logreg_train -s train_x train_b 0.01 modelIt performs training, that is learning a logistic model from training examples of feature matrix
train_xand class vector
train_bwith = 0.01. The model learned from example data are stored in
Typical usage of
l1_logreg_classify model test_x resultIt classifies the test data using the logistic model parameters stored in
modeland store the classification result to
resultfile. If you want to not only classify the test data, but also compare the result with a known class vector
test_b, use the
l1_logreg_classify -t test_b model test_x results
l1_logregis callable from Matlab. You may write the problem data (feature matrix and class vector) first using
mmwritescript; see writing matrices in Matrix Market (MM) format using Matlab page. You may call stand-alone executables using Matlab
systemcall. For example,
mmwrite('ex_X',X); mmwrite('ex_b',b); system('l1_logreg_train -s ex_X ex_b 0.01 model_iono') ... model_iono = mmread('model_iono');You may assign the result file to a Matlab variable using
mmreadscript. For example,
system('l1_logreg_classify -t ex_b model_iono ex_X result_iono') ... model_iono = mmread('result_iono');
l1_logreg_train [options] feature_file class_file lambda model_fileArguments are
feature_file - feature matrix class_file - output vector lambda - regularization parameter model_file - store model data to file model_fileOptions are
-q - quiet mode -v [0..3] - set verbosity level (default 1) 0 : show one line summary 1 : show simple log 3 : show detailed log -r - use relative lambda if used, lambda := lambda*lambda_max if not used, lambda := lambda -s - standardize dataAdvanced options are
-h - show coefficients histogram and trim -k <double> - set tolerance for zero coefficients from KKT condition -t <double> - set tolerance for duality gap
See file format page for the formats of
l1_logreg_classify [options] model_file feature_file result_fileArguments are:
model_file - model data(coefficients and intercept) found by either l1_logreg_train or l1_logreg_regpath feature_file - feature matrix result_file - store classification results to result_file if model_file is generated from 1) l1_logreg_train, then predicted outcomes 2) l1_logreg_regpath, then the number of errors will be stored.Options are:
-q - quiet mode -p - store probability instead of predicted outcome -t <class_file> - test classification result against real class vector in <class_file>
See file format page for the formats of
l1_logreg_regpath [options] feature_file class_file lambda_min num_lambda model_fileArguments are:
feature_file - feature matrix class_file - output vector lambda_min - minimum value of regularization parameter num_lambda - number of lambdas (sample) model_file - store results to file model_fileOptions are:
-c - only coefficients (to know real coefficients). The model_file generated with -c option cannot be used for classification. Use this option only to plot regularization path. -q - quiet mode -v [0..3] - set verbosity level 0 : show one line summary 1 : show simple log 3 : show detailed log -r - use relative lambda if used, lambda := lambda*lambda_max if not used, lambda := lambda -s - standardize dataAdvanced options are
-k <double> - set tolerance for zero coefficients from KKT condition -t <double> - set tolerance for duality gap
See file format page for the formats of
feature 1 feature 2 feature 3 feature 4 class example 1 3 0 1 -2 1 example 2 0 0 2 5 -1 example 3 7 1 -4 0 1
To solve associated l1-regularized LRP, a user stores a feature matrix and a class vector in MM format. Feature matrix can be stored in either array (dense) format and coordinate (sparse) format. If the problem is stored in dense format,
l1_logreg_train uses direct methods. If stored in sparse format,
l1_logreg_train uses the PCG method in solving the problem.
The associated feature matrix can be stored in the file
train_x in dense format.
%%MatrixMarket matrix array real general 3 4 3 0 7 0 0 1 1 2 -4 -2 5 0The associated feature matrix can be stored in the file
train_xin sparse format.
%%MatrixMarket matrix coordinate real general 3 4 8 1 1 3 3 1 7 3 2 1 1 3 1 2 3 2 3 3 -4 1 4 -2 2 4 5
A class file must be stored in array (dense) format for both direct and PCG methods.
%%MatrixMarket matrix array real general 3 1 1 -1 1
To standardize training data
train_x and to learn a model with from the standardized training data, execute the following command:
l1_logreg_train -s train_x train_b 0.01 modelThis will generate the following file
%MatrixMarket matrix array real general % % This is a model file of train_x, % generated by l1_logreg_train ver.0.8.2 % Contents of model: % entry 1: intercept % entries 2..m: coefficients % 5 1 1.568009374292901e+00 4.043495151532541e-01 0.000000000000000e+00 0.000000000000000e+00 -1.103257200261706e+00
feature 1 feature 2 feature 3 feature 4 example 1 5 1 0 0 example 2 -3 0 2 3
The test data set can be stored in the file
test_x in dense format.
%MatrixMarket matrix array real general 2 4 5 -3 1 0 0 2 0 3The test data set can be stored in the file
test_xin sparse format.
%MatrixMarket matrix coordinate real general 2 4 5 1 1 5 2 1 -3 1 2 1 2 3 2 2 4 3
To classify the test data set using the logistic model stored in
model, execute the following command:
l1_logreg_classify model test_x result
The classification result will be saved in the file
%MatrixMarket matrix array real general % % This is the file that stores the test result of test_x, % generated by l1_logreg_classify ver 0.8.2 % Each row contains a predicted class of corresponding example. % 2 1 1 -1
To compare the predicted outcome which will be stored in the result file
result with the known outcome
test_b, write the known outcome (class) vector file
test_b in MM format and execute the following command:
l1_logreg_classify -t test_b model test_x result
l1_logreg_regpath -r train_x train_b 0.01 100 path_modelIt generates two files:
path_model_lambda. The former contains a matrix of models whose column corresponds to the logistic model for each value. The latter contains a vector of values. The option
-ris used to set value relative to .
To apply classification to this family of models
path_model, execute the following command:
l1_logreg_classify -t train_b path_model train_x path_resultThe number of examples whose prediction is wrong will be stored to a result file
path_result. Note that the result of classification using
l1_logreg_regpathis different from that of
l1_logregcontains some example data sets as well as their shell scripts in the
examplesdirectory. You can download more examples (including large-scale data sets) from the website of
If you have a problem in using
l1_logreg, try FAQ.
If you do send a bug report, please include the following information if possible:
l1_logregand platform (OS, CPU) you are using.
Your help will be greatly appreciated.