2prime About Reference News Syllabus

Empirical Process Seminar

In probability theory, an empirical process is a stochastic process that describes the proportion of objects in a system in a given state. In this seminar we are interested in the sup of emprical process and it's applications in statistical machine learning. (Besides reading textbook, we'll also go through recent papers using these theorems.)

Topics:

  • Uniform convergence (VC dimension, Rademacher complexity, etc)
  • Non-parametric Regression
  • Empirical Process beyond supervised learning
  • Implicit/algorithmic regularization, generalization theory for neural networks

Detailed Syllabus:here (Including references and readings)

Zoom Link:here

Contact: yplu [*at*] stanford [*at*] edu

找不到黄图

Texts and References

Textbook

Van Der Vaart A W, Wellner J A. Weak convergence and empirical processes. Springer, New York, NY, 1996: 16-28.

Geer S A, van de Geer S. Empirical Processes in M-estimation. Cambridge university press, 2000.

References

Stats300b Theory of Statisticss II by John Duchi

Seminar on High dimensional probability on bilibili

News

  • [2021/03/01] Homepage online.
  • [2021/03/15] We started the seminar.

Syallbus

Week1: Concentration Inequality

  • Asymptotic normality, delta method, moment method
  • Maximal Inequality
  • Covering Number, Duley Integral
  • Basic Concentration: Hoeffiding's Inequality, Chernoff's Inequality, Inequality for Sub-gaussian/Sub-exp random variables, Bernstein's Inequality
  • Efficiency of the models

Note: slide1

Note: Chapter 1.1-1.3 in pdf

Week2: Uniform Law of Large Numbers

  • Symmetrization, Gaussian average
  • Example of function classes
  • Empirical Bernstein argument

Note: Chapter 1 of pdf

Reference: Chapter 2 of Weak convergence and emprical process book.

Exercise:

Week3: Localization and Fast Rate

  • Peeling
  • Rate of convergence
  • Chapter 1 of pdf.
  • Chapter 3 of pdf
  • Xu Y, Zeevi A. Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory. arXiv preprint arXiv:2011.06186, 2020
  • Fast rate example: Classification.

Note: pdf

Week4: Applications in Deep Learning

  • Explaining Neural Scaling Laws by Yasaman Bahr et al. arxiv:2102.06701
  • J. Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function. ArXiv e-prints, August 2017.
  • Suzuki T. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality ICLR2019

Slide: Neural Scaling Law (TO DO).

Note: Learning rate of ERM problem.

Further Reading:

Week5: Non-P-Donsker Classes

  • Kur, Gil, Yuval Dagan, and Alexander Rakhlin. "Optimality of maximum likelihood for log-concave density estimation and bounded convex regression." arXiv preprint arXiv:1903.05315 (2019).
  • Kur G, Rakhlin A. On the Minimal Error of Empirical Risk Minimization. arXiv preprint arXiv:2102.12066, 2021.

Examples

  • Log-concave density estimation

Notes: TO DO

Week6: Application 1: Learning PDEs

Examples

  • Poisson Equation
  • Monge-ampere equation

Notes: TO DO

Week7: Application 2: Fast rate for robust learning

Notes: TO DO

Week8: Application 3: Reinforcemen Learning

  • Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning. ICML2021 link
  • Sequential Rademacher complexity:
    • Alexander Rakhlin, Karthik Sridharan, and Ambuj Tewari. Online learning via sequential complexities. Journal of Machine Learning Research, 16(6):155–186, 2015a.
    • Alexander Rakhlin, Karthik Sridharan, and Ambuj Tewari. Sequential complexities and uniform martingale laws of large numbers. Probability Theory and Related Fields, 161(1-2):111–153, 2015b.

Notes: TO DO

Week9: Semi-parametric Statistics

  • Foster D J, Syrgkanis V. Orthogonal statistical learning. arXiv preprint arXiv:1901.09036, 2019.

Examples

Notes: TO DO