Stanislav Fort

I am currently a PhD student at Stanford University with Prof Surya Ganguli at the Neural Dynamics and Computation Lab and an OpenPhil AI Fellow. Before that, I was a Google AI Resident working on understanding deep learning, and a Research Scientist Intern at DeepMind working on scaling theoretical insights up.

During my year at Google Research, I started and led 5 research projects collaborating with scientists from Google Brain and DeepMind.

My research spans machine learning, AI, and physics. I focus on 1) developing the Science of Deep Learning (a principled, scientific understanding of deep learning), and 2) applying ML to quantum and astrophysics problems. I am excited about applications of artificial intelligence and machine learning in physics, emergent phenomena, and the role of complexity and openendedness.

I completed my Bachelors and Masters (Part III of the Tripos) at Trinity College, University of Cambridge, and another Masters at Stanford University.

In the past I worked at Institute of Astronomy on galaxy clusters in X-ray, Albert Einstein Institute on large scale data mining for pulsar discovery, Perimeter Institute for Theoretical Physics on perturbative approaches to black hole formation in AdS-like geometries, and DAMTP on cross-correlations of gamma-rays and the CMB in the sky.

I actively co-organize and coach at the Czech Astronomy Olympiad, setting problems and preparing students for the IOAA (where I ranked #1 worldwide in 2011). I sometimes lecture at the Czech Physics Olympiad and prepare students for IPhO. I co-organized the 1st and 2nd International Workshop on Astronomy and Astrophysics in Estonia and the Czech Republic. I am also an amateur astrophotographer.

On top of my research, I work on a number of side projects in mathematics, physics, and CS.

Twitter  /  Google Scholar /  GitHub  /  LinkedIn /  Blog

Research

I'm interested in emergence, AI, and physics. My current focus is on 1) (empirical) theories of deep learning & deep learning understanding, and 2) applying deep learning methods to the physical sciences, especially astrophysics and quantum. I'm especially keen on neural network scaling & its benefits.

18. A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection
Jie Ren, Stanislav Fort, Jeremiah Liu, Abhijit Guha Roy, Shreyas Padhy, Balaji Lakshminarayanan

We analyze the failure modes of the Mahalanobis distance method for near-OOD detection and propose a simple fix called relative Mahalanobis distance (RMD) which improves performance and is more robust to hyperparameter choice.

17. Exploring the Limits of Out-of-Distribution Detection
Stanislav Fort, Jie Ren, Balaji Lakshminarayanan

We improve the AUROC from 85% (current SOTA) to more than 96% using Vision Transformers pre-trained on ImageNet-21k. On a challenging genomics OOD detection benchmark, we improve the AUROC from 66% to 77% using transformers and unsupervised pre-training. For multi-modal image-text pre-trained transformers such as CLIP, we explore a new way of using just the names of outlier classes as a sole source of information without any accompanying images.

16. Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error
Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith

By applying augmentation multiplicity to the recently proposed NFNet model family, we achieve a new ImageNet SotA of 86.8% top-1 accuracy without extra data after just 34 epochs of training with an NFNet-F5 using the SAM optimizer.

15. Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse

Linear interpolation from initial to final neural net params typically decreases the loss monotonically. We investigate this phenomenon empirically and theoretically.

14. Identifying charged particle background events in X-ray imaging detectors with novel machine learning algorithms
D. R. Wilkins, S. W. Allen, E. D. Miller, M. Bautz, T. Chattopadhyay, S. Fort, C. E. Grant, S. Herrmann, R. Kraft, R. G. Morris, P. Nulsen

Using machine learning algorithms for identification of background (noise) charge particles in X-ray imaging detectors, with a particular emphasis on the proposed Athena X-ray observatory's WFI science products module.

Accepted for publication at Proceedings of the SPIE, Astronomical Telescopes and Instrumentation, Space Telescopes and Instrumentation 2020: Ultraviolet to Gamma Ray.

13. Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli

We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. We do so through a large-scale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics.

Accepted for publication at NeurIPS 2020 in Vancouver as a poster.

12. Training independent subnetworks for robust prediction
Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, Dustin Tran

Using a multi-input multi-output (MIMO) configuration, we can utilize a single model's capacity to train multiple subnetworks that independently learn the task at hand.

11. The Break-Even Point on the Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras

In the early phase of training of deep neural networks there exists a "break-even point" which determines properties of the entire optimization trajectory.

Accepted as a spotlight talk at the International Conference on Learning Representations 2020 (ICLR) in Addis Ababa, Ethiopia.

10. Deep Ensembles: A Loss Landscape Perspective
Stanislav Fort, Huiyi Hu, Balaji Lakshminarayanan

Exploring the consequences of the neural network loss landscape structure for ensembling, Bayesian methods, and calibration.

Accepted as a contributed talk at Bayesian Deep Learning workshop at NeurIPS 2019 in Vancouver.

9. Emergent properties of the local geometry of neural loss landscapes
Stanislav Fort, Surya Ganguli

By modelling logit gradient clustering and the effect of training as logit scale growth, we constructed a simple analytical model of the gradient and Hessian of neural networks in classification problems. From this minimal model, we successfully recovered 4 previously observed surprising empirical phenomena related to the local stucture of neural network loss landscapes, demonstrating that their origin is likely very generic in nature and not specific to the natural data distributions, neural networks, or gradient descent, as previously conjectured.

8. Large Scale Structure of Neural Network Loss Landscapes
Stanislav Fort, Stanislaw Jastrzebski

Building a unified phenomenological model of the low-loss manifold in neural network loss landscapes that incorporates 1) mode connectivity, 2) the surprising ease of optimizing on low-dimensional cuts through the weight space, and 3) the existence of long directions in the loss landscape into a single model. Using this model, we made new predictions about the loss landscape and verified them empirically.

Accepted for publication at NeurIPS 2019 in Vancouver as a poster.

A subset accepted at the Understanding and Improving Generalization in Deep Learning workshop at ICML 2019 as a spotlight talk and a poster, and at the Theoretical Physics for Deep Learning workshop at ICML 2019 as a poster. I also delivered invited talks at Uber AI Labs and Google Brain.

7. Stiffness: A New Perspective on Generalization in Neural Networks
Stanislav Fort, PaweĊ‚ Krzysztof Nowak, Stanislaw Jastrzebski, Srini Narayanan

We defined the concept of stiffness, showed its utility in providing a perspective to better understand generalization in neural networks, observed its variation with learning rate, and defined the concept of dynamical critical length using it.

6. Adaptive Quantum State Tomography with Neural Networks
Stanislav Fort (equal contributions), Yihui Quek (equal contributions), Hui Khoon Ng

Learning to learn about quantum states using neural networks, swarm optimization and particle filters. We develop a new algorithm for quantum state tomography that learns to perform the state reconstruction directly from data and achieves orders of magnitude computational speedup while retaining state-of-the-art reconstruction accuracy.

A subset accepted at the 4th Seefeld Workshop on Quantum Information, 22nd Annual Conference on Quantum Information Processing (QIP 2019) as a poster, 3rd Quantum Techniques in Machine Learning 2019 (QTML) in Korea as a talk, and McGill Physics-AI conference in Montreal as a talk.

5. The Goldilocks zone: Towards better understanding of neural network loss landscapes
Stanislav Fort, Adam Scherlis

A connection between optimization on random low-dimensional hypersurfaces and local convexity in the neural network loss landscape.

Accepted for publication at AAAI 2019 in Hawaii as an oral presentation and a poster.

A subset accepted at the Modern Trends in Nonconvex Optimization for Machine Learning workshop at ICML 2018 and BayLearn 2018 as The Goldilocks zone: Empirical exploration of the structure of the neural network loss landscapes (link here). Accepted as an oral presentation at the Theoretical Physics for Machine Learning Aspen winter conference.

4. The ATHENA WFI science products module
David N Burrows, Steven Allen, Marshall Bautz, Esra Bulbul, Julia Erdley, Abraham D Falcone, Stanislav Fort, Catherine E Grant, Sven Herrmann, Jamie Kennea, Robert Klar, Ralph Kraft, Adam Mantz, Eric D Miller, Paul Nulsen, Steve Persyn, Pragati Pradhan, Dan Wilkins

A paper on the proposed Athena X-ray observatory's WFI science products module. My part involved exploring the use of AI techniques on board.

Published at the Proceedings Volume 10699, Space Telescopes and Instrumentation 2018: Ultraviolet to Gamma Ray.

3. Towards understanding feedback from supermassive black holes using convolutional neural networks
Stanislav Fort

A novel approach to detection of X-ray cavities in clusters of galaxies using convolutional neural architectures.

Accepted at the Deep Learning for Physical Sciences workshop at NIPS 2017.

2. Gaussian Prototypical Networks for Few-Shot Learning on Omniglot
Stanislav Fort

An architecture capable of dealing with uncertainties for few-shot learning on the Omniglot dataset.

Accepted and presented at BayLearn 2017.
Accepted at the Bayesian Deep Learning workshop at NIPS 2017.

Essential code available on GitHub.

1. Discovery of Gamma-ray Pulsations from the Transitional Redback PSR J1227-4853
T. J. Johnson, P. S. Ray, J. Roy, C. C. Cheung, A. K. Harding, H. J. Pletsch, S. Fort, F. Camilo, J. Deneva, B. Bhattacharyya, B. W. Stappers, M. Kerr

A pulsar detection in gamma-ray.

Class projects

At Stanford, I worked on the following class projects:

Fun side projects

I work on a number of side projects and fun problems in mathematics, physics, and CS. Some of them are shown here.

Drawing an envelope/barn without lifting one's pen - all 88 (44 unique and their mirrors) solutions at once.


cloned from clone