Statistics 315a: Modern Statistical Learning

Brian Trippe, Stanford University, Winter 2026

Tentative Syllabus and Readings

Lecture	Date	Topics	Reading(s)
1	6 Jan	Decision theory and the hold-out method	ESL Chapters 1 and 2 Bach: Chapter 2.1–2.3
2	8 Jan	Linear models, ridge, the bias-variance trade-off	ESL: Chapters 3.1, 3.2, 3.4, 4.4, and 7.1-7.4 Bach: Chapter 3.1–3.6
3	13 Jan	Cross-validation	ESL: Chapter 7.1–7.5 and 7.10 Stefan Wager: “Cross-Validation, Risk Estimation, and Model Selection”
4	15 Jan	Learning theory, generalization	Bach: Chapter 4.6 PPA: Chapter 8
5	20 Jan	Calibration and proper scoring rules	Tibshirani: “Forecast Scoring and Calibration”
6	22 Jan	Conformal prediction	Tibshirani: “Conformal Prediction”, Angelopoulos and Bates: “A Gentle Introduction to Conformal Prediction” (Optional)
7	27 Jan	Finish Calibration, decision trees	ESL 9.2 (trees) CS229 lecture notes: Decision Trees
8	29 Jan	Bagging, random forests	ESL: Chapter 8.7 (bagging) and Chapter 15 (random forests) CS229 lecture notes: Decision Trees
9	3 Feb	Convex optimization	Duchi: Chapters 1–3
10	5 Feb	Stochastic optimization, adaptive metrics	Duchi: Chapters 3 and 4
	10 Feb	Midterm
11	12 Feb	Deep learning: automatic differentiation, gradient checkpointing	Baydin et al.: Automatic Differentiation in Machine Learning: a Survey Andrej Karpathy Micrograd repository Andrej Karpathy Micrograd tutorial
12	17 Feb	Deep learning: universal approximation, resNets, Layer norm, Transformers	Bach: Chapter 9.3.1 and 9.3.3 Turner: “An Introduction to Transformers” Optional: Murphy (Book 1): Chapters 13-14 (Neural networks for structured data; Neural Networks for Images)
13	19 Feb	Graphical models	Notes
14	24 Feb	State-space models	Notes
	26 Feb	Prediction competition winners – talks by high-scorers
15	3 March	Variational autoencoders	Murphy (Book 2): Chapter 21 (Variational Autoencoders) Shakir Mohammed: “Gradient estimation in machine learning”, Sections 1–3, 5, 7, and 8
16	5 March	Diffusion generative models	Turner: “Denoising Diffusion Probabilistic Models in Six Simple Steps”
17	10 March	Large language models	Andrej Karpathy NanoGPT repository Andrej Karpathy Youtube tutorial
18	12 March	Reinforcement Learning, reward fine-tuning, score-based gradients	PPA: Chapter 12 (including MDPs, Bandits, REINFORCE) Shakir Mohammed: “Gradient estimation in machine learning”, Sections 4 and 7

Bibliography and reading key

ESL: The Elements of Statistical Learning; Trevor Hastie, Robert Tibshirani, and Jerome Freedman
Bach: Learning Theory from First Principles; Francis Bach
Duchi: Introductory Lectures on Stochastic Optimization; John C. Duchi
PPA: Patterns, Predictions, and Actions; Moritz Hardt and Benjamin Recht
Tibshirani: Advanced Topics in Statistical Learning: Spring 2023; Ryan Tibshirani
Turner: Tutorials; Richard E. Turner

Additional resources on optimization by John Duchi

Background on matrices and optimization. A short(ish) note that contains some of the principles on matrices and optimization that we leverage throughout the course.
John Duchi's lecture recordings and slides from 2025 on optimization.
- Lectures: Convex analysis 1, Convex analysis 2, Subgradient methods, AdaGrad, Momentum and proximal point methods
- Slides: Convex Analysis, Subgradient methods, Advanced subgradient methods