Stanislav Fort
I am currently a PhD student at Stanford University with Prof Surya Ganguli at the Neural Dynamics and Computation Lab and an OpenPhil AI Fellow. Before that, I was a Google AI Resident working on understanding deep learning, and a Research Scientist Intern at DeepMind working on scaling theoretical insights up.
During my year at Google Research, I started and led 5 research projects collaborating with scientists from Google Brain and DeepMind.
My research spans machine learning, AI, and physics. I focus on 1) developing the Science of Deep Learning (a principled, scientific understanding of deep learning), and 2) applying ML to quantum and astrophysics problems. I am excited about applications of artificial intelligence and machine learning in physics, emergent phenomena, and the role of complexity and openendedness.
I completed my Bachelors and Masters (Part III of the Tripos) at Trinity College, University of Cambridge, and another Masters at Stanford University.
In the past I worked at Institute of Astronomy on galaxy clusters in Xray, Albert Einstein Institute on large scale data mining for pulsar discovery, Perimeter Institute for Theoretical Physics on perturbative approaches to black hole formation in AdSlike geometries, and DAMTP on crosscorrelations of gammarays and the CMB in the sky.
I actively coorganize and coach at the Czech Astronomy Olympiad, setting problems and preparing students for the IOAA (where I ranked #1 worldwide in 2011). I sometimes lecture at the Czech Physics Olympiad and prepare students for IPhO. I coorganized the 1st and 2nd International Workshop on Astronomy and Astrophysics in Estonia and the Czech Republic. I am also an amateur astrophotographer.
On top of my research, I work on a number of side projects in mathematics, physics, and CS.
Twitter /
Google Scholar /
GitHub /
LinkedIn /
Blog


Research
I'm interested in emergence, AI, and physics. My current focus is on 1) (empirical) theories of deep learning & deep learning understanding, and 2) applying deep learning methods to the physical sciences, especially astrophysics and quantum. I'm especially keen on neural network scaling & its benefits.


18. A Simple Fix to Mahalanobis Distance for Improving NearOOD Detection
Jie Ren, Stanislav Fort, Jeremiah Liu, Abhijit Guha Roy, Shreyas Padhy, Balaji Lakshminarayanan
We analyze the failure modes of the Mahalanobis distance method for nearOOD detection and propose a simple fix called relative Mahalanobis distance (RMD) which improves performance and is more robust to hyperparameter choice.


17. Exploring the Limits of OutofDistribution Detection
Stanislav Fort, Jie Ren, Balaji Lakshminarayanan
We improve the AUROC from 85% (current SOTA) to more than 96% using Vision Transformers pretrained on ImageNet21k. On a challenging genomics OOD detection benchmark, we improve the AUROC from 66% to 77% using transformers and unsupervised pretraining. For multimodal imagetext pretrained transformers such as CLIP, we explore a new way of using just the names of outlier classes as a sole source of information without any accompanying images.


16. Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error
Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith
By applying augmentation multiplicity to the recently proposed NFNet model family, we achieve a new ImageNet SotA of 86.8% top1 accuracy without extra data after just 34 epochs of training with an NFNetF5 using the SAM optimizer.


15. Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse
Linear interpolation from initial to final neural net params typically decreases the loss monotonically. We investigate this phenomenon empirically and theoretically.


14. Identifying charged particle background events in Xray imaging detectors with novel machine learning algorithms
D. R. Wilkins, S. W. Allen, E. D. Miller, M. Bautz, T. Chattopadhyay, S. Fort, C. E. Grant, S. Herrmann, R. Kraft, R. G. Morris, P. Nulsen
Using machine learning algorithms for identification of background (noise) charge particles in Xray imaging detectors, with a particular emphasis on the proposed Athena Xray observatory's WFI science products module.
Accepted for publication at Proceedings of the SPIE, Astronomical Telescopes and Instrumentation, Space Telescopes and Instrumentation 2020: Ultraviolet to Gamma Ray.


13. Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli
We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a datadependent NTK. We do so through a largescale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics.
Accepted for publication at NeurIPS 2020 in Vancouver as a poster.


12. Training independent subnetworks for robust prediction
Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, Dustin Tran
Using a multiinput multioutput (MIMO) configuration, we can utilize a single model's capacity to train multiple subnetworks that independently learn the task at hand.


11. The BreakEven Point on the Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras
In the early phase of training of deep neural networks there exists a "breakeven point" which determines properties of the entire optimization trajectory.
Accepted as a spotlight talk at the International Conference on Learning Representations 2020 (ICLR) in Addis Ababa, Ethiopia.


10. Deep Ensembles: A Loss Landscape Perspective
Stanislav Fort, Huiyi Hu, Balaji Lakshminarayanan
Exploring the consequences of the neural network loss landscape structure for ensembling, Bayesian methods, and calibration.
Accepted as a contributed talk at Bayesian Deep Learning workshop at NeurIPS 2019 in Vancouver.


9. Emergent properties of the local geometry of neural loss landscapes
Stanislav Fort, Surya Ganguli
By modelling logit gradient clustering and the effect of training as logit scale growth, we constructed a simple analytical model of the gradient and Hessian of neural networks in classification problems. From this minimal model, we successfully recovered 4 previously observed surprising empirical phenomena related to the local stucture of neural network loss landscapes, demonstrating that their origin is likely very generic in nature and not specific to the natural data distributions, neural networks, or gradient descent, as previously conjectured.


8. Large Scale Structure of Neural Network Loss Landscapes
Stanislav Fort, Stanislaw Jastrzebski
Building a unified phenomenological model of the lowloss manifold in neural network loss landscapes that incorporates 1) mode connectivity, 2) the surprising ease of optimizing on lowdimensional cuts through the weight space, and 3) the existence of long directions in the loss landscape into a single model. Using this model, we made new predictions about the loss landscape and verified them empirically.
Accepted for publication at NeurIPS 2019 in Vancouver as a poster.
A subset accepted at the
Understanding and Improving Generalization in Deep Learning workshop at ICML 2019 as a spotlight talk and a poster, and at the Theoretical Physics
for Deep Learning workshop at ICML 2019 as a poster. I also delivered invited talks at Uber AI Labs and Google Brain.


7. Stiffness: A New Perspective on Generalization in Neural Networks
Stanislav Fort, PaweÅ‚ Krzysztof Nowak, Stanislaw Jastrzebski, Srini Narayanan
We defined the concept of stiffness, showed its utility in providing a perspective to better understand generalization in neural networks, observed its variation with learning rate, and defined the concept of dynamical critical length using it.


6. Adaptive Quantum State Tomography with Neural Networks
Stanislav Fort (equal contributions), Yihui Quek (equal contributions), Hui Khoon Ng
Learning to learn about quantum states using neural networks, swarm optimization and particle filters. We develop a new algorithm for quantum state tomography that learns to perform the state reconstruction directly from data and achieves orders of magnitude computational speedup while retaining stateoftheart reconstruction accuracy.
A subset accepted at the
4th Seefeld Workshop on Quantum Information, 22nd Annual Conference on Quantum Information Processing (QIP 2019) as a poster, 3rd Quantum Techniques in Machine Learning 2019 (QTML) in Korea as a talk, and McGill PhysicsAI conference in Montreal as a talk.


5. The Goldilocks zone: Towards better understanding of neural network loss landscapes
Stanislav Fort, Adam Scherlis
A connection between optimization on random lowdimensional hypersurfaces and local convexity in the neural network loss landscape.
Accepted for publication at AAAI 2019 in Hawaii as an oral presentation and a poster.
A subset accepted at the
Modern Trends in Nonconvex Optimization for Machine Learning workshop at ICML 2018 and BayLearn 2018 as The Goldilocks zone: Empirical exploration of the structure of the neural network loss landscapes (link here). Accepted as an oral presentation at the Theoretical Physics for Machine Learning Aspen winter conference.


4. The ATHENA WFI science products module
David N Burrows, Steven Allen, Marshall Bautz, Esra Bulbul, Julia Erdley, Abraham D Falcone, Stanislav Fort, Catherine E Grant, Sven Herrmann, Jamie Kennea, Robert Klar, Ralph Kraft, Adam Mantz, Eric D Miller, Paul Nulsen, Steve Persyn, Pragati Pradhan, Dan Wilkins
A paper on the proposed Athena Xray observatory's WFI science products module. My part involved exploring the use of AI techniques on board.
Published at the Proceedings Volume 10699, Space Telescopes and Instrumentation 2018: Ultraviolet to Gamma Ray.


3. Towards understanding feedback from supermassive black holes using convolutional neural networks
Stanislav Fort
A novel approach to detection of Xray cavities in clusters of galaxies using convolutional neural architectures.
Accepted at the
Deep Learning for Physical Sciences workshop at NIPS 2017.


2. Gaussian Prototypical Networks for FewShot Learning on Omniglot
Stanislav Fort
An architecture capable of dealing with uncertainties for fewshot learning on the Omniglot dataset.
Accepted and presented at BayLearn 2017. Accepted at the
Bayesian Deep Learning workshop at NIPS 2017.
Essential code available on GitHub.


1. Discovery of Gammaray Pulsations from the Transitional Redback PSR J12274853
T. J. Johnson, P. S. Ray, J. Roy, C. C. Cheung, A. K. Harding, H. J. Pletsch, S. Fort, F. Camilo, J. Deneva, B. Bhattacharyya, B. W. Stappers, M. Kerr
A pulsar detection in gammaray.

Class projects
At Stanford, I worked on the following class projects:

Fun side projects
I work on a number of side projects and fun problems in mathematics, physics, and CS. Some of them are shown here.


Drawing an envelope/barn without lifting one's pen  all 88 (44 unique and their mirrors) solutions at once.

