Thursday and Friday November 2-3, 2017

Trevor Hastie and Robert Tibshirani

both of Stanford University

In this course we emphasize the tools useful for tackling modern-day data analysis problems. Many of these are essential building blocks, but we also include techniques at the cutting-edge of technology for handling big-data problems. From the vast array of tools available, we have selected what we consider are the most relevant and exciting. Our list of topics include:

- Linear methods: regression, logistic regression (binary and multiclass), Cox model.
- Bootstrap, cross-validation, and permutation methods.
- Regularized linear models: ridge, lasso, elastic net. Post-selection inference. Glmnet package in R, and other software.
- Trees, random forests, and boosting.
- Unsupervised methods: clustering (prototype, hierarchical, spectral,...), principal components and other low-rank methods, sparse decompositions.
- Support-vector machines and kernel methods.
- Deep learning and neural networks.

Our earlier courses are not a prerequisite for this new course. Although there is overlap with past courses, our new course contains topics not covered by us before. We illustrate many of the methods using examples developed in R.

The material is based on recent papers by the authors and other researchers, as well as our best selling book:

- Elements of Statistical Learning: data mining, inference and prediction (2nd Edition) (with J. Friedman, Springer-Verlag, 2009).

The authors have two other popular books that are also relevant to this course:

- An Introduction to Statistical Learning, with applications in R (with Gareth James and Daniela Witten, Springer-Verlag, 2013).
- Statistical Learning with Sparsity: the Lasso and Generalizations (with Martin Wainwright, Chapman and Hall, 2015).

Professors Hastie and Tibshriani are both members of the Statistics and Biomedical Data Science
Departments at Stanford University. They have collaborated on research projects over their entire careers, and have coauthored several books: *Generalized Additive Models* (1990), *Elements of Statistical Learning* (2001, second edition 2009, also with J. Friedman), * Introduction to Statistical Learning* (2013, also with G. James and D. Witten), and *Statistical Learning with Sparsity * (2015, also with M. Wainwright).

Professor Hastie spent his first eight years post-PhD with the
Statistics and Data Analysis Research group, AT&T Bell
Laboratories, where he gained valuable experience with
prediction problems in industry and
manufacturing. He has published extensively in the area of
nonparametric regression and classification. He co-edited the
Wadsworth book
*Statistical Models in S* (1991) with John Chambers. His
Ph.D. thesis *Principal Curves* introduced one of the first
nonlinear versions of principal components analysis. During his ten
years at Bell Laboratories

Professor Tibshirani is a recipient of the COPSS award - an award given jointly by all the leading statistical societies to the most outstanding statistician under the age of 40. He also has many research articles on nonparametric regression and classification. With Bradley Efron he co-authored the best-selling text *An Introduction to the Bootstrap* in 1993, and has been an active researcher on bootstrap technology over the years. His 1984 Ph.D thesis spawned the currently lively research area known as Local Likelihood. He has more than thirty five years experience in consulting on biostatistical problems.

Professors
Hastie and Tibshirani published "The Elements of Statistical learning:
Data mining, inference and prediction", with Jerome Friedman
(springer, 2001, second edition 2009). This book has received a terrific reception,
with over 45,000 copies sold. Both presenters are actively involved in research in
statistical learning methods, and are well-known not
only in the statistics community but in the machine-learning,
neural network and bioinformatics fields as well.
Their newer book "An Introduction to Statistical Learning, with Applications in R" (with Gareth James and Daniela Witten, 2013) is also a best-seller, and has remained consistently in the top 10 in the Amazon categories "Mathematics and Statistics" and "Artificial Intelligence", with a five-star rating based on 84 customer reviews.
Over the
years they have become leaders in the statistical analysis of DNA
microarrays, working with leading-edge biologists such as Patrick
Brown of Stanford University, and David Botstein of
Princeton. They have given many short courses together over the
past 20 years, to academic, government and industrial
audiences. They are both actively involved with consulting in data
analysis and modeling, for the Stanford medical community as well
as local biotech and web-related industries. They have a
reputation for being good instructors who interact well with the
needs of the audience.

**8:00am-9:00am:**Check-in and coffee/tea + pastries.**9:00am-10:20am:**Technical Sessions Begin**10:20am-10:35am:**Coffee break.**10:35am-noon:**Technical Session**Noon-1:30pm:**Lunch

**1:30pm-2:30pm:**Technical Session-
**2:30pm-2:40pm:**Break **2:40pm-3:45pm:**Technical Session**3:45pm-5pm:**Technical session + discussion.

Read here for more details on
who should
attend, and our
policy
not to sell our course notes.

*http://www.stanford.edu/~hastie/sldmIV.html*