Since their introduction in 2017, transformers have revolutionized Natural Language Processing (NLP). Now, transformers are finding applications all over Deep Learning, be it computer vision (CV), reinforcement learning (RL), Generative Adversarial Networks (GANs), Speech or even Biology. Among other things, transformers have enabled the creation of powerful language models like GPT-3 and were instrumental in DeepMind's recent AlphaFold2, that tackles protein folding.

In this seminar, we examine the details of how transformers work, and dive deep into the different kinds of transformers and how they're applied in different fields. We do this through a combination of instructor lectures, guest lectures, and classroom discussions. We will invite people at the forefront of transformers research across different domains for guest lectures.

The bulk of this class will comprise of talks from researchers discussing latest breakthroughs with transformers and explaining how they apply them to their fields of research. The objective of the course is to bring together the ideas from ML, NLP, CV, biology and other communities on transformers, understand their broad implications, and spark cross-collaborative research.


Faculty Advisor


The current class schedule is below (subject to change)

Date Description Course Materials
Jan 10 Introduction to Transformers
Speaker: Andrej Karpathy
Recommended Readings:
  1. Attention Is All You Need
  2. The Illustrated Transformer
  3. The Annotated Transformer
Additional Readings:
Jan 17 Language and Human Alignment
Speaker: Jan Leike (OpenAI)
Recommended Readings:
  1. ChatGPT
  2. InstructGPT
  3. Language Models are Few-Shot Learners (GPT-3)
Additional Readings:
Jan 24 Emergent Abilities and Scaling in LLMs
Speaker: Jason Wei (Google Brain)
Recommended Readings:
  1. Emergent Abilities of Large Language Models
  2. Chain of Thought Prompting Elicits Reasoning in Large Language Models
  3. Scaling Instruction-Finetuned Language Models
Additional Readings:
Jan 31 Strategic Games
Speaker: Noam Brown (FAIR)
Recommended Readings:
  1. Human-level play in the game of Diplomacy by combining language models with strategic reasoning
  2. Modeling Strong and Human-Like Gameplay with KL-Regularized Search
  3. No-Press Diplomacy from Scratch
Additional Readings:
Feb 7 Robotics and Imitation Learning
Speaker: Ted Xiao (Google Brain)
Recommended Readings:
  1. RT-1: Robotics Transformer for Real-World Control at Scale
  2. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
  3. Inner Monologue: Embodied Reasoning through Planning with Language Models
Additional Readings:
Feb 14 Common Sense Reasoning
Speaker: Yejin Choi (U. Washington / Allen Institute for AI)
Recommended Readings:
  1. Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
  2. Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
  3. Can Machines Learn Morality? The Delphi Experiment
Additional Readings:
Feb 21 Biomedical Transformers
Speaker: Vivek Natarajan (Google Health AI)
Recommended Readings:
  1. Large Language Models Encode Clinical Knowledge
  2. ProtNLM: Model-based Natural Language Protein Annotation
  3. Effective gene expression prediction from sequence by integrating long-range interactions
Additional Readings:
Feb 28 In-Context Learning & Faithful Reasoning
Speakers: Stephanie Chan (DeepMind) & Antonia Creswell (DeepMind)
Recommended Readings:
  1. Data Distributional Properties Drive Emergent In-Context Learning in Transformers
  2. Faithful Reasoning Using Large Language Models
  3. Language models show human-like content effects on reasoning
Additional Readings:
Mar 7 Neuroscience-Inspired Artificial Intelligence
Speakers: Trenton Bricken (Harvard/Redwood Center for Theoretical Neuroscience/Anthropic) & Will Dorrell (UCL Gatsby Computational Neuroscience Unit/Stanford)
Recommended Readings:
  1. Attention Approximates Sparse Distributed Memory
  2. The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation
  3. Relating transformers to models and neural representations of the hippocampal formation
Additional Readings:
  1. Sparse Distributed Memory is a Continual Learner
  2. Sparse Distributed Memory and Related Models
  3. How to build a cognitive map
Mar 14 Wrap Up
Speaker: TBA
Recommended Readings:
Additional Readings: