Since their introduction in 2017, transformers have revolutionized Natural Language Processing (NLP). Now, transformers are finding applications all over Deep Learning, be it computer vision (CV), reinforcement learning (RL), Generative Adversarial Networks (GANs), Speech or even Biology. Among other things, transformers have enabled the creation of powerful language models like GPT-3 and were instrumental in DeepMind's recent AlphaFold2, that tackles protein folding.

In this seminar, we examine the details of how transformers work, and dive deep into the different kinds of transformers and how they're applied in different fields. We do this through a combination of instructor lectures, guest lectures, and classroom discussions. We will invite people at the forefront of transformers research across different domains for guest lectures.

Prerequisites: Basic knowledge of Deep Learning (must understand attention) or have taken CS224N / CS231N / CS230.

Please send all questions to and not to any other email.

Faculty Advisor



The bulk of this class will comprise of talks from researchers discussing latest breakthroughs with transformers and explaining how they apply them to their fields of research. The objective of the course is to bring together the ideas from ML, NLP, CV, biology and other communities on transformers, understand their broad implications, and spark cross-collaborative research.

The current class schedule is below (subject to change)


Date Description Course Materials
Mon Sep 20 Introduction to Transformers Recommended Readings:
  1. Attention Is All You Need
  2. The Illustrated Transformer
  3. The Annotated Transformer (Assignment)
Additional Readings:
Mon Sept 27 Transformers in Language: GPT-3, Codex
Speaker: Mark Chen (OpenAI)
Recommended Readings:
  1. Language Models are Few-Shot Learners
  2. Evaluating Large Language Models Trained on Code
Additional Readings:
Mon Oct 4 Applications in Vision
Speaker: Lucas Beyer (Google Brain)
Recommended Readings:
  1. An Image is Worth 16x16 Words (Vision Transfomer)
Additional Readings:
  1. How to train your ViT?
Mon Oct 11 Transformers in RL & Universal
Compute Engines
Speaker: Aditya Grover (FAIR)
Recommended Readings:
  1. Pretrained Transformers as Universal Computation Engines
  2. Decision Transformer: Reinforcement Learning via Sequence Modeling
Additional Readings:
Mon Oct 18 Scaling transformers
Speaker: Barret Zoph (Google Brain)
Recommended Readings:
  1. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Additional Readings:
Mon Oct 25 Perceiver: Arbitrary IO with transformers
Speaker: Andrew Jaegle (DeepMind)
Recommended Readings:
  1. Perceiver: General Perception with Iterative Attention
  2. Perceiver IO: A General Architecture for Structured Inputs & Outputs
Additional Readings:
Mon Nov 1 Self Attention & Non-Parametric Transformers
Speaker: Aidan Gomez (University of Oxford)
Recommended Readings:
  1. Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning
Additional Readings:
Mon Nov 8 GLOM: Representing part-whole hierarchies in a neural network
Speaker: Geoffrey Hinton (UoT)
Recommended Readings:
  1. How to represent part-whole hierarchies in a neural network
Additional Readings:
Mon Nov 15 Interpretability with transformers
Speaker: Chris Olah (AnthropicAI)
Recommended Readings:
  1. Multimodal Neurons in Artificial Neural Networks
Additional Readings:
  1. The Building Blocks of Interpretability
Mon Nov 29 Class Discussion