Syllabus#

  • Videos: Every lecture will be recorded by SCPD

  • Email policy: Please use the Piazza site) for most questions. For administrative issues that only concern you, email the course staff mailing list: stats202-aut2223-staff@lists.stanford.edu

  • Website: stats202.stanford.edu

  • If you are auditing the class (not registered on Axess), email us your SUNet ID in order to gain access to the lectures and homework on canvas.

Description#

Stats 202 is an introduction to statistical / machine learning. By the end of the quarter, students will:

  • Understand the distinction between supervised and unsupervised learning and be able to identify appropriate tools to answer different research questions.

  • Become familiar with basic unsupervised procedures including clustering and principal components analysis.

  • Become familiar with the following regression and classification algorithms: linear regression, ridge regression, the lasso, logistic regression, linear discriminant analysis, K-nearest neighbors, splines, generalized additive models, tree-based methods, and support vector machines.

  • Gain a practical appreciation of the bias-variance tradeoff and apply model selection methods based on cross-validation and bootstrapping to a prediction challenge.

  • Analyze a real dataset of moderate size using R.

  • Develop the computational skills for data wrangling, collaboration, and reproducible research.

  • Be exposed to other topics in machine learning, possibly including missing data, prediction using time series and relational data, non-linear dimensionality reduction techniques, web-based data visualizations, anomaly detection, and representation learning.

Textbook#

Introduction to Statistical Learning (with applications in R), 2nd edition

Prerequisites#

Introductory courses in statistics or probability (e.g., Stats 60), linear algebra (e.g., Math 51), and computer programming (e.g., CS 105).

Slides#

Notes on these pages are available as HTML slides:

Labs#

Source code for labs are available to download as jupyter notebooks.

Instructions for using Jupyter notebook for labs#

  • Assumes you’ve installed R.

  • Install conda

  • Make a new environment and install jupyterlab:

conda create -n stats202_aut2022 python=3.9 -y
conda activate stats202_aut2022
pip install jupyterlab
  • In R:

install.packages('IRkernel', repos='http://cloud.r-project.org')
library(IRkernel)
IRkernel::installspec()
  • Remember to remove the .txt extension… when you save it. If saved in your Downloads directory (common with Chrome)

mv ~/Downloads/Ch2-statlearn-lab.ipynb.txt ~/Downloads/Ch2-statlearn-lab.ipynb
  • Open a downloaded notebook:

jupyter lab ~/Downloads/Ch2-statlearn-lab.ipynb

Where to find files#

  • The links above will get you to .ipynb versions of the labs through the Download option at the top right of each page.

  • Alternatively, .Rmd and .ipynb versions of the labs can be downloaded at statlearning.com

  • The R markdown files (.Rmd) can be used within RStudio

Evaluation#

  • 5 assignments (60%)

  • Midterm (10%) (Tentative date: 11/7 in class)

  • Final exam (30%): 12/16/2022 @ 8:30 AM according to exam schedule

  • All work to be submitted on gradescope. Use entry BB55NN.

Late policy#

  • No assignments will be graded if submitted more than three days after the due date.

  • Each 24 hours or part thereof that a homework is late will be treated as one full day.

Piazza#

Gradescope, use entry BB55NN.#

Office hours#

Instructor#

  • Jonathan Taylor: Friday 1-3pm, Sequoia Hall #137

TAs#

  • Sophia Lu: M 3:30-4:30pm, Zoom (click here), Th 9-10am, both in Fishbowl (Sequoia Hall)

  • Aditya Ghosh: MF 3-4pm, both in Bowker (Sequoia Hall)

  • Rex Shen: T 3-5pm, 380-380D

  • David Fager: W 9:30-11:30am, Fishbowl (Sequoia Hall)

  • Zitong Yang: Th 4-5pm, F 3-4pm, both in Bowker (Sequoia Hall)

  • Kevin Fry: Th 12-2pm, Zoom (click here)

  • Debolina Paul: T 1-3pm, Bowker (Sequoia Hall)