| Lecture | Date | Topics | Reading(s) |
| 1 | 6 Jan | Decision theory and the hold-out method | ESL Chapters 1 and 2 Bach: Chapter 2.1–2.3 |
| 2 | 8 Jan | Linear models, ridge, the bias-variance trade-off | ESL: Chapters 3.1, 3.2, 3.4, 4.4, and 7.1-7.4 Bach: Chapter 3.1–3.6 |
| 3 | 13 Jan | Cross-validation | ESL: Chapter 7.1–7.5 and 7.10 Stefan Wager: “Cross-Validation, Risk Estimation, and Model Selection” |
| 4 | 15 Jan | Learning theory, generalization | Bach: Chapter 4.6 PPA: Chapter 8 |
| 5 | 20 Jan | Calibration and proper scoring rules | Tibshirani: “Forecast Scoring and Calibration” |
| 6 | 22 Jan | Conformal prediction | Tibshirani: “Conformal Prediction”, Angelopoulos and Bates: “A Gentle Introduction to Conformal Prediction” (Optional) |
| 7 | 27 Jan | Finish Calibration, decision trees | ESL 9.2 (trees) CS229 lecture notes: Decision Trees |
| 8 | 29 Jan | Bagging, random forests | ESL: Chapter 8.7 (bagging) and Chapter 15 (random forests) CS229 lecture notes: Decision Trees |
| 9 | 3 Feb | Convex optimization | Duchi: Chapters 1–3 |
| 10 | 5 Feb | Stochastic optimization, adaptive metrics | Duchi: Chapters 3 and 4 |
| 10 Feb | Midterm | |
| 11 | 12 Feb | Deep learning: automatic differentiation, gradient checkpointing | Baydin et al.: Automatic Differentiation in Machine Learning: a Survey Andrej Karpathy Micrograd repository Andrej Karpathy Micrograd tutorial |
| 12 | 17 Feb | Deep learning: universal approximation, resNets, Layer norm, Transformers | Bach: Chapter 9.3.1 and 9.3.3 Turner: “An Introduction to Transformers” Optional: Murphy (Book 1): Chapters 13-14 (Neural networks for structured data; Neural Networks for Images) |
| 13 | 19 Feb | Graphical models | Notes |
| 14 | 24 Feb | State-space models | Notes |
| 26 Feb | Prediction competition winners – talks by high-scorers | |
| 15 | 3 March | Variational autoencoders | Murphy (Book 2): Chapter 21 (Variational Autoencoders) Shakir Mohammed: “Gradient estimation in machine learning”, Sections 1–3, 5, 7, and 8 |
| 16 | 5 March | Diffusion generative models | Turner: “Denoising Diffusion Probabilistic Models in Six Simple Steps” |
| 17 | 10 March | Large language models | Andrej Karpathy NanoGPT repository Andrej Karpathy Youtube tutorial |
| 18 | 12 March | Reinforcement Learning, reward fine-tuning, score-based gradients | PPA: Chapter 12 (including MDPs, Bandits, REINFORCE) Shakir Mohammed: “Gradient estimation in machine learning”, Sections 4 and 7 |
|
|