Statistics 231 / CS229T: Statistical Learning Theory
Prerequisites
There are no formal prerequisites to this class. But we will assume
a significant level of mathematical maturity. This means an understanding
of the following.
Machine learning: at least at the level of CS229
Linear algebra: a working knowledge at the level of
EE263 or
Math 104
Probability: this course will have substantial probabilistic content
and require nontrivial command of probabilistic techniques.
The absolute bare minimum is probability at the level of
Stats116
Convex optimization will be extremely helpful, but is not
strictly necessary. Simultaneously taking the course
EE364a is recommended because
you will find its content extraordinarily useful more broadly
Texts
There is no required text for the course, though we will post lecture
notes for each of the lectures. There are a number of useful other
references. Feel free to contact us with more if you find useful ones.
Grading
Your grade will be determined by homework (40%), two paper reviews you
will write (20%), and a final project (40%).
Homework: There will be (approximately) one homework assignment
every two to three weeks throughout the course, which will count for
40% of the grade. In effort to speed grading and homework return to
you, we will grade homework problems and their subparts on a {0, 1,
2}scale: 0 indicates a completely incorrect answer, 1 indicating
approximately halfway correct, 2 indicating more or less
correct.
Paper reviews: we will post more information shortly, but
you will (at some point in the quarter) be required to read and
write a short review, of about 4 pages, of two papers.
The final project will be on a topic plausibly related to the theory
of machine learning, statistics, or optimization.
Course Overview
When do machine learning algorithms work and why? This course focuses
on developing a theoretical understanding of the statistical
properties of learning algorithms.
Potential Topics
Concentration inequalities: subGaussian and subexponential random
variables, randomized embeddings
Uniform convergence: uniform laws of large numbers, VC dimension,
entropy integrals, Rademacher complexity
Online learning: online convex optimization, mirror descent, multiarmed
bandits, concentration
Kernel methods: reproducing kernel Hilbert spaces, Fourier
properties and analysis, randomized and lowrank approximations
Asymptotics: quadratic expansions, central limit theorems, asymptotic
normality, moment methods
