EE 375 : Syllabus

We will mainly follow this review, and references therein.

Here is a rough syllabusschedule (precise schedule will depend on the progress in class, and suggestionsfeedback are welcome).

Lectures will be prerecorded and class time will be devoted to discussion.

Mar 30, April 1

Overview of the class. Background on uniform convergence theory.

  1. Bartlett, P.L., 1998. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE transactions on Information Theory, 44(2), pp.525-536

  2. Bach, F., 2017. Breaking the curse of dimensionality with convex neural networks. The Journal of Machine Learning Research, 18(1), pp.629-681.

April 6, 8

Complexity bounds for shallow and deep networks.

  1. Bartlett, P.L., Foster, D.J. and Telgarsky, M., 2017, December. Spectrally-normalized margin bounds for neural networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 6241-6250)

April 13, 15

Overparametrized models and implicit bias.

  1. Gunasekar, S., Lee, J., Soudry, D. and Srebro, N., 2018, July. Characterizing implicit bias in terms of optimization geometry. In International Conference on Machine Learning (pp. 1832-1841). PMLR.

  2. Soudry, D., Hoffer, E., Nacson, M.S., Gunasekar, S. and Srebro, N., 2018. The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1), pp.2822-2878.

April 20, 22

Benign overfitting in linear regression.

  1. Tsigler, A. and Bartlett, P.L., 2020. Benign overfitting in ridge regression. arXiv preprint arXiv:2009.14286.

  2. Hastie, T., Montanari, A., Rosset, S. and Tibshirani, R.J., 2019. Surprises in high-dimensional ridgeless least squares interpolation. arXiv preprint arXiv:1903.08560.

April 27, 29

Revisitig kernel ridge regression.

  1. Liang, T. and Rakhlin, A., 2020. Just interpolate: Kernel “ridgeless” regression can generalize. Annals of Statistics, 48(3), pp.1329-1347.

  2. Rakhlin, Alexander, and Xiyu Zhai. “Consistency of interpolation with laplace kernels is a high-dimensional phenomenon.” Conference on Learning Theory. PMLR, 2019.

  3. Ghorbani, B., Mei, S., Misiakiewicz, T. and Montanari, A., 2019. Linearized two-layers neural networks in high dimension. Annals of Statistics (to appear)

May 4, 6

Optimization in the linear regime. Neural tangent kernel.

  1. Chizat, L., Oyallon, E. and Bach, F., 2019. On Lazy Training in Differentiable Programming. Advances in Neural Information Processing Systems, 32, pp.2937-2947.

  2. Oymak, S. and Soltanolkotabi, M., 2020. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks. IEEE Journal on Selected Areas in Information Theory, 1(1), pp.84-105.

May 11, 13

Generalization in neural tangent and random features models.

  1. Mei, S. and Montanari, A., 2019. The generalization error of random features regression: Precise asymptotics and double descent curve, Communications in Pure and Applied Mathematics (to appear)

  2. Montanari, A. and Zhong, Y., 2020. The interpolation phase transition in neural networks: Memorization and generalization under lazy training. arXiv preprint arXiv:2007.12826.

  3. Mei, S., Misiakiewicz, T. and Montanari, A., 2021. Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration. arXiv preprint arXiv:2101.10588.

May 18, 20

Beyond the linear regime.

May 25, 27; June 1,3

Projects discussion.