2primehome

Yiping Lu.

Ph.D. student

Institute for Computational and Mathematical Engineering
School Of Engineering
Stanford University

Email: yplu [at] stanford [dot] edu

Yiping Lu.

Ph.D. student

Institute for Computational and Mathematical Engineering
School Of Engineering
Stanford University

Email: yplu [at] stanford [dot] edu

Continuous Depth Neural Network

Overview.

To understand and improve the success of deep Residual Network, my research tends to limiting the depth of a Residual Network to infinity and comes out an ODE Model.

Deep learning is formulated as a discrete-time optimal control problem. Our research bridging optimal control and optimization of deep neural networks.
Utilizing the knowledge in numerical analysis, one can design a Neural Network statisfying the property they aim to enjoy. Examples can be
- Enforcing stability to have a robust mdoel.
- Enforcing model to go through an optimal transport path.
- Enforing Physic Constraints.

Optimization

How Continouos Depth Model Helps Understanding Optimization Of Deep Networks.

Using the ODE model can exploit the structure of Neural network while analysising the optimization. Based on Mean-field ResNet paper published at ICML2020 and YOPO paper published at NeurIPS2019.

Theory

Yiping Lu*, Chao Ma, Yulong Lu, Jianfeng Lu, Lexing Ying. "A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth" Thirty-seventh International Conference on Machine Learning (ICML), 2020

Short version presented at ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations. (Oral)

[ paper] [ arXiv] [slide] [Video]

We combined the ODE model and mean--field analysis of two-layer neural nets, we provide a convergence proof of training resnet beyond the lazy training regime. This is the first landscape result for deep neural networks in mean--field regime.

Analysis of Wasserstein Gradient flow is still an open problem.

Algorithm Design

Dinghuai Zhang*, Tianyuan Zhang*,Yiping Lu*, Zhanxing Zhu, Bin Dong. "You Only Propagate Once: Painless Adversarial Training Using Maximal Principle." (*equal contribution) 33rd Annual Conference on Neural Information Processing Systems 2019(NeurIPS2019).

[ paper] [ arXiv] [ slide] [Code] [poster]

ODE can help accelerate adversarial training!! Adversarial training doesn't need too many computational resources! We fully exploit structure of deep neural networks via recasting the adversarial training for neural networks as a differential game and propose a novel strategy to decouple the adversary update with the gradient back propagation.

4-5 times faster!

Related Works

Here are the related papers by other groups.

Li Q, Hao S. An optimal control approach to deep learning and applications to discrete-weight neural networks. ICML 2018.
Liu G H, Chen T, Theodorou E A. Differential Dynamic Programming Neural Optimizer. arXiv preprint arXiv:2002.08809, 2020.
Li X, Wong T K L, Chen R T Q, et al. Scalable Gradients for Stochastic Differential Equations. AISTATS 2020.
Chen R T Q, Rubanova Y, Bettencourt J, et al. Neural ordinary differential equations Neurips2018.
Chang B, Meng L, Haber E, et al. Multi-level residual networks from dynamical systems view. ICLR2018
Günther S, Ruthotto L, Schroder J B, et al. Layer-parallel training of deep residual neural networks. SIMDOS 2020

Principled DL Model Desing

Our Finding:

1. Network Structure = Numerical Schemes

Yiping Lu, Aoxiao Zhong, Quanzheng Li, Bin Dong. "Beyond Finite Layer Neural Network:Bridging Deep Architects and Numerical Differential Equations" Thirty-fifth International Conference on Machine Learning (ICML), 2018

[ paper] [ arXiv] [ project page] [ slide][ bibtex][ Poster]

Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Liwei Wang, Tie-yan Liu "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View." (*equal contribution) Submitted. arXiv preprint:1906.02762

[ paper] [ arXiv] [ slide] [Code]

2. Network should adapt to Task Physics

Using inverse problem as our application, to recover data from different level of degradation. We proposed appraoch solving the regularization path, which comes to a time-varing ODE whoese discretization should be a depth varying network.

Related Works

Here are the related papers by other groups.

Behrmann J, Grathwohl W, Chen R T Q, et al. Invertible residual networks ICML 2019.
- Zhang L, Wang L. Monge-ampere flow for generative modeling. arXiv preprint arXiv:1809.10188, 2018.
- Finlay C, Jacobsen J H, Nurbekyan L, et al. How to train your neural ode. ICML 2020.
- Tong A, Huang J, Wolf G, et al. TrajectoryNet: A Dynamic Optimal Transport Network for Modeling Cellular Dynamics. arXiv preprint arXiv:2002.04461, 2020.
- Zhang J, Han B, Wynter L, et al. Towards robust resnet: A small step but a giant leap. IJCAI2019.
- Yang Z, Liu Y, Bao C, et al. Interpolation between Residual and Non-Residual Networks. ICML2020.
- Mingjie Li, Lingshen He, and Zhouchen Lin, Implicit Euler Skip Connections: Enhancing Adversarial Robustness via Numerical Stability, ICML 2020
- Chang B, Chen M, Haber E, et al. AntisymmetricRNN: A dynamical system view on recurrent neural networks. ICLR2019.
- Kag A, Zhang Z, Saligrama V. RNNs Incrementally Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients? ICLR2019.
- Chen Z, Zhang J, Arjovsky M, et al. Symplectic recurrent neural networks. ICLR2019.
- De Brouwer E, Simm J, Arany A, et al. GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series Neurips2019
- Rubanova Y, Chen R T Q, Duvenaud D K. Latent ordinary differential equations for irregularly-sampled time series Neurips2019
Cranmer M, Greydanus S, Hoyer S, et al. Lagrangian neural networks. arXiv preprint arXiv:2003.04630, 2020.

Physics Applications

We made an initial attempt to learn evolution PDEs from data via Neural Networks.

Inspired by the latest development of neural network designs in deep learning, we propose a new feed-forward deep network, called PDE-Net, to fulfill two objectives at the same time: to accurately predict dynamics of complex systems and to uncover the underlying hidden PDE models. The basic idea of the proposed PDE-Net is to learn differential operators by learning convolution kernels (filters), and apply neural networks or other machine learning methods to approximate the unknown nonlinear responses.

Zichao long*, Yiping Lu*, Xianzhong Ma*, Bin Dong. "PDE-Net:Learning PDEs From Data", Thirty-fifth International Conference on Machine Learning (ICML), 2018(*equal contribution)

[ paper] [ arXiv] [ code] [ Supplementary Materials][ bibtex]

Zichao Long, Yiping Lu, Bin Dong. " PDE-Net 2.0: Learning PDEs from Data with A Numeric-Symbolic Hybrid Deep Network" Journal of Computational Physics, 399, 108925, 2019.(arXiv preprint:1812.04426)

[ paper] [ arXiv] [code] [ slide] [ proceeding]

Related Works

Raissi M, Karniadakis G E. Hidden physics models: Machine learning of nonlinear partial differential equations. Journal of Computational Physics, 2018, 357: 125-141.
Brunton S L, Proctor J L, Kutz J N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 2016, 113(15): 3932-3937.
Han J, Jentzen A, Weinan E. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 2018, 115(34): 8505-8510.

Contact Me

Stanford, CA, US

Phone: +86 18001847803

Email: yplu@stanford.edu

Let's get in touch. Send me a message: