Natural language processing (NLP) is a crucial part of artificial intelligence (AI), modeling how people share information. In recent years, deep learning approaches have obtained very high performance on many NLP tasks. In this course, students gain a thorough introduction to cutting-edge neural networks for NLP.



What is this course about?

Natural language processing (NLP) or computational linguistics is one of the most important technologies of the information age. Applications of NLP are everywhere because people communicate almost everything in language: web search, advertising, emails, customer service, language translation, virtual agents, medical reports, etc. In recent years, deep learning (or neural network) approaches have obtained very high performance across many different NLP tasks, using single end-to-end neural models that do not require traditional, task-specific feature engineering. In this course, students will gain a thorough introduction to cutting-edge research in Deep Learning for NLP. Through lectures, assignments and a final project, students will learn the necessary skills to design, implement, and understand their own neural network models. As piloted last year, CS224n will be taught using PyTorch this year.

Previous offerings

This course was formed in 2017 as a merger of the earlier CS224n (Natural Language Processing) and CS224d (Natural Language Processing with Deep Learning) courses. Below you can find archived websites and student project reports.

CS224n Websites: Winter 2019 / Winter 2018 / Winter 2017 / Autumn 2015 / Autumn 2014 / Autumn 2013 / Autumn 2012 / Autumn 2011 / Winter 2011 / Spring 2010 / Spring 2009 / Spring 2008 / Spring 2007 / Spring 2006 / Spring 2005 / Spring 2004 / Spring 2003 / Spring 2002 / Spring 2000
CS224n Lecture Videos: Winter 2019 / Winter 2017
CS224n Reports: Winter 2019 / Winter 2018 / Winter 2017 / Autumn 2015 and earlier
CS224d Reports: Spring 2016 / Spring 2015


Reference Texts

The following texts are useful, but none are required. All of them can be read free online.

If you have no background in neural networks but would like to take the course anyway, you might well find one of these books helpful to give you more background:


Assignments (54%)

There are five weekly assignments, which will improve both your theoretical understanding and your practical skills. All assignments contain both written questions and programming parts.

Final Project (43%)

The Final Project offers you the chance to apply your newly acquired skills towards an in-depth application. Students have two options: the Default Final Project (in which students tackle a predefined task, namely textual Question Answering) or a Custom Final Project (in which students choose their own project involving human language and deep learning). Examples of both can be seen on last year's website.

Important information


Participation (3%)

We appreciate everyone being actively involved in the class! There are several ways of earning participation credit, which is capped at 3%:

Late Days

Regrade Requests

If you feel you deserved a better grade on an assignment, you may submit a regrade request on Gradescope within 3 days after the grades are released. Your request should briefly summarize why you feel the original grade was unfair. Your TA will reevaluate your assignment as soon as possible, and then issue a decision. If you are still not happy, you can ask for your assignment to be regraded by an instructor.

Credit/No credit enrollment

If you take the class credit/no credit then you are graded in the same way as those registered for a letter grade. The only difference is that, providing you reach a C- standard in your work, it will simply be graded as CR.

Sexual violence

Academic accommodations are available for students who have experienced or are recovering from sexual violence. If you would like to talk to a confidential resource, you can schedule a meeting with the Confidential Support Team or call their 24/7 hotline at: 650-725-9955. Counseling and Psychological Services also offers confidential counseling services. Non-confidential resources include the Title IX Office, for investigation and accommodations, and the SARA Office, for healing programs. Students can also speak directly with the teaching staff to arrange accommodations. Note that university employees – including professors and TAs – are required to report what they know about incidents of sexual or relationship violence, stalking and sexual harassment to the Title IX Office. Students can learn more at


Updated lecture slides will be posted here shortly before each lecture. Other links contain last year's slides, which are mostly similar.

Lecture notes will be uploaded a few days after most lectures. The notes (which cover approximately the first half of the course content) give supplementary detail beyond the lectures.

Date Description Course Materials Events Deadlines
Tue Jan 7 Introduction and Word Vectors
[slides] [video] [notes]

Gensim word vectors example:
[code] [preview]
Suggested Readings:
  1. Word2Vec Tutorial - The Skip-Gram Model
  2. Efficient Estimation of Word Representations in Vector Space (original word2vec paper)
  3. Distributed Representations of Words and Phrases and their Compositionality (negative sampling paper)
Assignment 1 out
[code] [preview]
Thu Jan 9 Word Vectors 2 and Word Senses
[slides] [video] [notes]
Suggested Readings:
  1. GloVe: Global Vectors for Word Representation (original GloVe paper)
  2. Improving Distributional Similarity with Lessons Learned from Word Embeddings
  3. Evaluation methods for unsupervised word embeddings
Additional Readings:
  1. A Latent Variable Model Approach to PMI-based Word Embeddings
  2. Linear Algebraic Structure of Word Senses, with Applications to Polysemy
  3. On the Dimensionality of Word Embedding.
Fri Jan 10 Python review session
[slides] [video] [code]
2:30 - 4:20pm
160-124 [map]
Tue Jan 14 Word Window Classification, Neural Networks, and PyTorch
[slides] [video] [code (notebook)] [code (html)]
[matrix calculus notes]
[notes (lectures 3 and 4)]
Suggested Readings:
  1. Review of differential calculus
Additional Readings:
  1. Natural Language Processing (Almost) from Scratch
Assignment 2 out
[code] [handout]
Assignment 1 due
Thu Jan 16 Matrix Calculus and Backpropagation
[slides] [video]
[notes (lectures 3 and 4)]
Suggested Readings:
  1. CS231n notes on network architectures
  2. CS231n notes on backprop
  3. Learning Representations by Backpropagating Errors
  4. Derivatives, Backpropagation, and Vectorization
  5. Yes you should understand backprop
Tue Jan 21 Linguistic Structure: Dependency Parsing
[video] [notes]
Suggested Readings:
  1. Incrementality in Deterministic Dependency Parsing
  2. A Fast and Accurate Dependency Parser using Neural Networks
  3. Dependency Parsing
  4. Globally Normalized Transition-Based Neural Networks
  5. Universal Stanford Dependencies: A cross-linguistic typology
  6. Universal Dependencies website
Assignment 3 out
[code] [handout]
Assignment 2 due
Thu Jan 23 The probability of a sentence? Recurrent Neural Networks and Language Models
[slides] [video]
[notes (lectures 6 and 7)]
Suggested Readings:
  1. N-gram Language Models (textbook chapter)
  2. The Unreasonable Effectiveness of Recurrent Neural Networks (blog post overview)
  3. Sequence Modeling: Recurrent and Recursive Neural Nets (Sections 10.1 and 10.2)
  4. On Chomsky and the Two Cultures of Statistical Learning
Tue Jan 28 Vanishing Gradients and Fancy RNNs
[slides] [video]
[notes (lectures 6 and 7)]
Suggested Readings:
  1. Sequence Modeling: Recurrent and Recursive Neural Nets (Sections 10.3, 10.5, 10.7-10.12)
  2. Learning long-term dependencies with gradient descent is difficult (one of the original vanishing gradient papers)
  3. On the difficulty of training Recurrent Neural Networks (proof of vanishing gradient problem)
  4. Vanishing Gradients Jupyter Notebook (demo for feedforward networks)
  5. Understanding LSTM Networks (blog post overview)
Assignment 4 out
[code] [handout] [Azure Guide] [Practical Guide to VMs]
Assignment 3 due
Thu Jan 30 Machine Translation, Seq2Seq and Attention
[slides] [video] [notes]
Suggested Readings:
  1. Statistical Machine Translation slides, CS224n 2015 (lectures 2/3/4)
  2. Statistical Machine Translation (book by Philipp Koehn)
  3. BLEU (original paper)
  4. Sequence to Sequence Learning with Neural Networks (original seq2seq NMT paper)
  5. Sequence Transduction with Recurrent Neural Networks (early seq2seq speech recognition paper)
  6. Neural Machine Translation by Jointly Learning to Align and Translate (original seq2seq+attention paper)
  7. Attention and Augmented Recurrent Neural Networks (blog post overview)
  8. Massive Exploration of Neural Machine Translation Architectures (practical advice for hyperparameter choices)
Tue Feb 4 Practical Tips for Final Projects
[slides] [video] [notes]
Suggested Readings:
  1. Practical Methodology (Deep Learning book chapter)
Project Proposal out

Default Final Project out [handout] [code]
Thu Feb 6 Question Answering, the Default Final Project, and an introduction to Transformer architectures
[slides] [video] [notes]
Suggested Readings:
  1. Project Handout
  2. Attention Is All You Need
  3. The Illustrated Transformer
  4. Transformer (Google AI blog post)
  5. Layer Normalization
  6. Image Transformer
  7. Music Transformer: Generating music with long-term structure
Assignment 4 due
Tue Feb 11 ConvNets for NLP
[slides] [video] [notes]
Suggested Readings:
  1. Convolutional Neural Networks for Sentence Classification
  2. Improving neural networks by preventing co-adaptation of feature detectors
  3. A Convolutional Neural Network for Modelling Sentences
Assignment 5 out
[original code (requires Stanford login) / public version] [handout]
Thu Feb 13 Information from parts of words (Subword Models)
[slides] [video]
Suggested readings:
  1. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
  2. Revisiting Character-Based Neural Machine Translation with Capacity and Compression
Project Proposal due
Tue Feb 18 Contextual Word Representations: BERT (guest lecture by Jacob Devlin)
[slides] [video]
Suggested readings:
  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Thu Feb 20 Modeling contexts of use: Contextual Representations and Pretraining. ELMo and BERT.
[slides] [video]
Suggested readings:
  1. Contextual Word Representations: A Contextual Introduction.
  2. The Illustrated BERT, ELMo, and co.
Project Milestone out [instructions] Assignment 5 due
Tue Feb 25 Natural Language Generation
[slides] [video]
Suggested readings:
  1. The Curious Case of Neural Text Degeneration.
  2. Get To The Point: Summarization with Pointer-Generator Networks.
  3. Hierarchical Neural Story Generation.
  4. How NOT To Evaluate Your Dialogue System.
Thu Feb 27 Reference in Language and Coreference Resolution
[slides] [video]
Suggested readings:
  1. Coreference Resolution chapter of Jurafsky and Martin
  2. End-to-end Neural Coreference Resolution
Tue Mar 3 Fairness and Inclusion in AI (guest lecture by Vinodkumar Prabhakaran)
[slides] [video]
Project Milestone due
Thu Mar 5 Constituency Parsing and Tree Recursive Neural Networks
[slides] [video] [notes]
Suggested Readings:
  1. Parsing with Compositional Vector Grammars.
  2. Constituency Parsing with a Self-Attentive Encoder
Fri Mar 6 Virtual Office Hours with HuggingFace
[video] (requires Stanford login)
Tue Mar 10 Recent Advances in Low Resource Machine Translation (guest lecture by Marc'Aurelio Ranzato)
[slides] [video]
Thu Mar 12 Analysis and Interpretability of Neural NLP
[slides] [video]
Canceled Final project poster session
5:30 - 10pm
McCaw Hall at the Alumni Center [map]
Project Poster/Video due [instructions]
March 23 Final project report submission (OPTIONAL)
[instructions] [select grading option]
Final Project Report (OPTIONAL) due