Empirical Process Seminar

In probability theory, an empirical process is a stochastic process that describes the proportion of objects in a system in a given state. In this seminar we are interested in the sup of emprical process and it's applications in statistical machine learning. (Besides reading textbook, we'll also go through recent papers using these theorems.)

Topics:

Uniform convergence (VC dimension, Rademacher complexity, etc)
Non-parametric Regression
Empirical Process beyond supervised learning
Implicit/algorithmic regularization, generalization theory for neural networks

Detailed Syllabus:here (Including references and readings)

Zoom Link:here

Contact: yplu [*at*] stanford [*at*] edu

找不到黄图

Texts and References

Textbook

Van Der Vaart A W, Wellner J A. Weak convergence and empirical processes. Springer, New York, NY, 1996: 16-28.

Geer S A, van de Geer S. Empirical Processes in M-estimation. Cambridge university press, 2000.

References

Stats300b Theory of Statisticss II by John Duchi

Seminar on High dimensional probability on bilibili

News

[2021/03/01] Homepage online.
[2021/03/15] We started the seminar.

Syallbus

Week1: Concentration Inequality

Asymptotic normality, delta method, moment method
Maximal Inequality
Covering Number, Duley Integral
Basic Concentration: Hoeffiding's Inequality, Chernoff's Inequality, Inequality for Sub-gaussian/Sub-exp random variables, Bernstein's Inequality
Efficiency of the models

Note: slide1

Note: Chapter 1.1-1.3 in pdf

Week2: Uniform Law of Large Numbers

Symmetrization, Gaussian average
Example of function classes
Empirical Bernstein argument

Note: Chapter 1 of pdf

Reference: Chapter 2 of Weak convergence and emprical process book.

Exercise:

Low rank matrix sensing
Quantile Regression
Moduli of continuity
Conterfactual Risk Minimization(empirical Bernstein argument)

Week3: Localization and Fast Rate

Peeling
Rate of convergence
Chapter 1 of pdf.
Chapter 3 of pdf
Xu Y, Zeevi A. Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory. arXiv preprint arXiv:2011.06186, 2020
Fast rate example: Classification.

Note: pdf

Week4: Applications in Deep Learning

Explaining Neural Scaling Laws by Yasaman Bahr et al. arxiv:2102.06701
J. Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function. ArXiv e-prints, August 2017.
Suzuki T. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality ICLR2019

Slide: Neural Scaling Law (TO DO).

Note: Learning rate of ERM problem.

Further Reading:

18.408 Theoretical Foundations for Deep Learning

Week5: Non-P-Donsker Classes

Kur, Gil, Yuval Dagan, and Alexander Rakhlin. "Optimality of maximum likelihood for log-concave density estimation and bounded convex regression." arXiv preprint arXiv:1903.05315 (2019).
Kur G, Rakhlin A. On the Minimal Error of Empirical Risk Minimization. arXiv preprint arXiv:2102.12066, 2021.

Examples

Log-concave density estimation

Notes: TO DO

Week6: Application 1: Learning PDEs

Hütter J C, Rigollet P. Minimax rates of estimation for smooth optimal transport maps. Annals of Stats, 2021.
Duan C, Jiao Y, Lai Y, et al. Convergence Rate Analysis for Deep Ritz Method[J]. arXiv preprint arXiv:2103.13330, 2021

Examples

Poisson Equation
Monge-ampere equation

Notes: TO DO

Week7: Application 2: Fast rate for robust learning

Chinot's phd thesis

Notes: TO DO

Week8: Application 3: Reinforcemen Learning

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning. ICML2021 link
Sequential Rademacher complexity:

Alexander Rakhlin, Karthik Sridharan, and Ambuj Tewari. Online learning via sequential complexities. Journal of Machine Learning Research, 16(6):155–186, 2015a.
Alexander Rakhlin, Karthik Sridharan, and Ambuj Tewari. Sequential complexities and uniform martingale laws of large numbers. Probability Theory and Related Fields, 161(1-2):111–153, 2015b.

Notes: TO DO

Week9: Semi-parametric Statistics

Foster D J, Syrgkanis V. Orthogonal statistical learning. arXiv preprint arXiv:1901.09036, 2019.

Examples

Notes: TO DO