CS229T/STATS231: Statistical Learning Theory

Stanford / Autumn 2018-2019

Announcements

- 11/08: Homework 2 Solutions have been posted!
- 11/02: Homework 3 is out! It's due on Wednesday, 11/28, 11pm.
- 11/01: exam2013-solution, exam2014-solution, exam2015-solution, exam2016-solution have been posted!
- 10/17: Homework 1 Solutions have been posted!
- 10/14: Homework 2 is out! It's due on Thursday, 11/1, 11pm.
- 10/08: Homework 0 Solutions have been posted!
- 10/02: Homework 1 is out! It's due on Wednesday, 10/10, 11pm.
- 9/29: some backgrounds on linear algebra, optimization, and probability. Note: all the mathematical statements in the document can be cited in the homework solutions without proofs.
- 9/22: Homework 0 is out! It's due on Wednesday, 10/03, 11pm.
- 9/8: Welcome to CS229T/STATS231! Previous years' home pages are here and here for reference. (Currently this page is still under construction.)

Administrative information
**Time/location:**
**Instructor:** Tengyu Ma
**Course assistants:**
**Contact:**
Please use Piazza for questions and discussions.

- Lectures: Mon/Wed 3-4:20pm in Pigott Hall 113

- Yu Bai (head CA)
- Tum Chaturapruek
- Jim Zhiyuan Li
- Luigi Nardi
- Colin Wei

Course content
**Description:**
When do machine learning algorithms work and why? How do we formalize what it means for an algorithm to learn from data? How do we use mathematical thinking to design better machine learning methods?
This course focuses on developing a theoretical
understanding of the statistical properties of learning algorithms.
**Topics**:
**Prerequisites:**

- Uniform convergence (VC dimension, Rademacher complexity, etc)
- Implicit/algorithmic regularization, generalization theory for neural networks
- Kernel methods
- Online learning and bandits problems
- Unsupervised learning: exponential family, method of moments, statistical theory of GANs

- A solid background in
linear algebra,
real analysis,
probability theory,
and
**general ability to do mathematical proofs** - Machine learning (CS229) or statistics (STATS315A)
- Convex optimization (EE364A) is recommended

Grading
**Coursework**:
**Late policy:** Two total late days are accepted for homeworks, paper review, or projects, but not scribe notes. Late work done in pairs with x late days will require x late days from each of the contributors. Under extentuating circumstances, you may request an extension by contacting the course staff.
**Collaboration policy:** we encourage you to form study groups and
discuss homeworks. However, you must write up all homeworks from scratch independently without referring to any notes from the joint session.
Please follow the honor code.

**Homeworks (40%)**: there will be three homeworks (plus a warmup which does not count towards your grade), centered around proving properties of statistical procedures. Each homework must be submitted through Gradescope. Sign up for the course using entry code**M4V34N**. You are encouraged to use LaTeX to writeup your homeworks (here's a template), but this is not a requirement. You will receive**one (1) bonus point**for submitting a typed written assignment (e.g. LaTeX, Microsoft Word). We will accept scanned handwritten assignments but they will not receive the bonus point.**Exam (25%)**: open-book, open-notes. Problems will be like the homeworks, but simpler. You can use laptops as long as you turn off the wireless.**Date: Wed Nov 14, 6-10 PM, Bishop Auditorium, Lathrop296****Paper review (30%)**: you will write a 2-4 page review of papers. The goal is to learn to read technically demanding papers critically, and hopefully in the process, generate novel research ideas. Your review should not only summarize the main result of the paper, but critique it, instantiate it on examples, discuss its overall significance, and suggest possible future directions. See this Google doc for detailed guidelines and a list of papers. The paper reviews can be done in pairs. Paper reviews that are done in pairs will be evaluated with a slightly higher bar, and they ideally should contain reviews for two closely-related papers and are allowed two additional pages. Appendix or references beyond the page limit are allowed, but you will not be graded based on them.

Instead of doing the paper review, with approval from the course staff on the project topic, you can do a final project. Please come to the Tengyu Ma or Yu Bai's office hours to request the approval by briefly describing the project plan. We don't encourage you to do the project unless you own research area is closely related to machine learning theory. The project can be done in pairs. The page limit for project report is 8 pages, not including reference or appendix.

The review and the project should be submitted electronically by**11pm**.**Scribe notes (5%)**: Because there is no textbook or set of readings that perfectly fits this course, you will be asked to scribe a note for a lecture in LaTeX. The course staff will select one note for each lecture and share it with other students.**1% bonus credit**will be given if your note is selected for posting. See this Google doc for the detailed guidelines. The scribe notes are due 2 days after the lecture (11pm Wed for Mon lecture, and Fri 11pm for Wed lecture). Please sign up here before**Sept 29th**and plan the time ahead. Extra credits will be given to the notes that are selected for posting. The scribe notes can be done in pairs.

Texts and References

There is no required text for the course. A number of useful references:

Schedule (subject to change)
**Week 1** **Week 2** **Week 3** **Week 4** **Week 5** **Week 6** **Week 7** **Week 8** **Week 9** **Week 10**

- Mon 09/24: Lecture 1: overview, formulation of prediction problems, error decomposition [Scribe notes]
- Wed 09/26: Lecture 2: asymptotics of maximum likelihood estimators (MLE) [Scribe notes]

- Mon 10/01: Lecture 3: uniform convergence overview, finite hypothesis class [Scribe notes]
- Mon 10/01: Homework 1 out
- Wed 10/03: Lecture 4: naive epsilon-cover argument, concentration inequalities [Scribe notes] [Advanced Reading]
- Wed 10/03: Homework 0 (warmup) due

- Mon 10/08: Lecture 5: Sub-Gaussian random variables, Rademacher complexity [Scribe notes]
- Wed 10/10: Lecture 6: Rademacher complexity, margin theory [Scribe notes]
- Wed 10/10: Homework 1 due
- Thu 10/11: Homework 2 out

- Mon 10/15: Lecture 7: Rademacher complexity, neural networks [Scribe notes]
- Wed 10/17: Lecture 8: Margin-based generalization error of two-layer neural networks [Scribe notes]

- Mon 10/22: Lecture 9: VC dimension, covering techniques [Scribe notes]
- Wed 10/24: Lecture 10: Covering techniques, overview of GANs [Please refer to Percy's notes (page 88-95) for the covering part and Lecture 11 for the overview of GANs]

- Mon 10/29: Lecture 11: Total variation distance, Wasserstein distance, Wasserstein GANs [Scribe notes]
- Wed 10/31: Lecture 12: Generalization and approximation in Wassersetin GANs [Scribe notes]
- Thu 11/01: Homework 2 (uniform convergence) due
- Thu 11/01: Homework 3 out

- Mon 11/05: Lecture 13: Restricted Approximability, overview of online learning [Scribe notes]
- Wed 11/07: Lecture 14: Online learning, online convex optimization, Follow the Leader (FTL) algorithm [Scribe notes] [Additional Reading (Notes by Haipeng Luo)]

- Mon 11/12: Lecture 15: Follow the Regularized Leader (FTRL) algorithm [Scribe notes]
- Wed 11/14: Lecture 16: FTRL in concrete problems: online regression & expert problem, convex to linear reduction [Scribe notes]
- Wed 11/14: Exam (6-10pm, classroom: Bishop Auditorium, Lathrop296)

- Mon 11/26: Lecture 17
- Wed 11/28: Lecture 18
- Wed 11/28: Homework 3 due

- Mon 12/03: Lecture 19
- Wed 12/05: Lecture 20
- Fri 12/07: Paper review due
- Fri 12/07: Final project due (if you didn't do the paper review)