5/17/05 New FAQ: HW4 parent annotation, speed targets
5/4/05 Final project proposals due Monday — see handout
5/4/05 No section this Friday
5/3/05 New FAQ: Memory issues for HW3
5/1/05 New FAQ: "stepSize underflow" on HW3
5/1/05 New FAQ: More questions on HW3
4/29/05 New FAQ: Typo in HW3 formula for G(λ)
4/26/05 New FAQ: IndexLinearizer, features, etc.
4/19/05 New FAQ: More data for HW2?
4/19/05 New FAQ: Questions about Knight section 31
4/18/05 HW2 due date pushed back to Friday 4/22
4/17/05 New FAQ: Starter code combines test and training sentences
4/13/05 New FAQ: Do I have to do my final project in Java?
4/10/05 Need more memory in Java? Use -Xmx flag: java -Xmx1000m HelloWorld.
4/9/05 New FAQ: Closing streams in readSpeechNBestLists()
4/7/05 Looking for a homework partner? Try posting to the class newsgroup.
4/5/05 New FAQ: Smoothing and unknown words
4/2/05 New FAQ: Larger dataset for HW1
3/30/05 New FAQs: How is Java 1.5 different from 1.4?, Using Java 1.5 on the Mac

Course Description

This course is designed to introduce students to the fundamental concepts and ideas in natural language processing (NLP), and to get them up to speed with current research in the area. It develops an in-depth understanding of both the algorithms available for the processing of linguistic information and the underlying computational properties of natural languages. Word-level, syntactic, and semantic processing from both a linguistic and an algorithmic perspective are considered. The focus is on modern quantitative techniques in NLP: using large corpora, statistical models for acquisition, disambiguation, and parsing. Also, it examines and constructs representative systems.


  • Adequate experience with programming and formal structures (e.g., CS106 and CS103X).
  • Programming projects will be written in Java, so knowledge of Java (or a willingness to learn on your own) is required.
  • Knowledge of standard concepts in artificial intelligence and/or computational linguistics (e.g., CS121/221 or Ling 138/238).
  • Basic familiarity with logic, vector spaces, and probability.

Intended Audience

Graduate students and advanced undergraduates specializing in computer science, linguistics, or symbolic systems.

Textbook and Readings

The most used book will be:

  • Christopher Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing. MIT Press, 1999.
    Buy at Amazon ($67 new)!
    Read the text online!

We will distribute the most vital parts. It's referred to as M&S below. Please see for supplementary information about the text, including errata, and pointers to online resources.

Other useful reference texts for NLP are:

  • James Allen. 1995. Natural Language Understanding. Benjamin/Cummings, 2ed.
  • Gerald Gazdar and Chris Mellish. 1989. Natural Language Processing in X. Addison-Wesley.
  • Dan Jurafsky and James Martin. 2000. Speech and Language Processing. Prentice Hall.

Papers will occasionally be distributed and discussed during the course of the class.

Copies of in-class hand-outs, such as readings and homework assignments, will be posted on the syllabus, and hard copies will also be available outside Gates 158 (in front of Prof. Manning's office) while supplies last.

Homework and Grading

There will be four homeworks centered around substantial programming assignments, each exploring a core NLP task.

In addition, there will be a final programming project on a topic of your own choosing. A short, ungraded project proposal will be due on Monday 5/9/05. Final project write-ups will be due on the last day of class, Wednesday 6/1/05. Students will give short project presentations during the time slot allocated for the final exam, on Tuesday 6/7/05. You may find it helpful to look at final projects from previous years.

Course grades will be based 2/3 on homeworks (1/6 each) and 1/3 on the final project.

Be sure to read the policies on late days and collaboration.


Sections will be held most weeks to go over background material, or to address issues related to the programming assignments. Sections are optional, but students are encouraged to attend for a better understanding of background material and the assignments.


Introduction [slides: ppt, pdf]
Overview of NLP. Statistical machine translation. Language models and their role in speech processing. Course introduction and administration.
Good background reading: M&S 1.0-1.4, 4.1-4.2, Homework Collaboration Policy
Optional reading: Ken Church's tutorial Unix for Poets [ps, pdf]
(If your knowledge of probability theory is limited, also read M&S 2.0-2.1.7. If that's too condensed, read the probability chapter of an intro statistics textbook, e.g. Rice, Mathematical Statistics and Data Analysis, ch. 1.)
Homework 1 distributed today
N-gram Language Models and Information Theory [slides: ps MegaHal]
n-gram models. Entropy, relative entropy, cross entropy, mutual information, perplexity. Statistical estimation and smoothing for language models.
Assigned reading: M&S 2.2
Optional reading: Joshua Goodman (2001), A Bit of Progress in Language Modeling, Extended Version [pdf, ps]
Optional reading: Stanley Chen and Joshua Goodman (1998), An empirical study of smoothing techniques for language modeling [pdf, ps]
Statistical Machine Translation (MT), Alignment Models [slides: pdf ps]
Assigned reading: Kevin Knight, A Statistical MT Tutorial Workbook [rtf]. MS., August 1999.
Further reading: M&S 13
Section 1 [notes: xls, pdf]
Smoothing: absolute discounting, proving you have a proper probability distribution, Good-Turing implementation. Information theory examples and intuitions. Java implementation issues.
Statistical Alignment Models and Expectation Maximization (EM) [slides: pdf, spreadsheet: xls]
EM and its use in statistical MT alignment models.
Reference reading: Geoffrey J. McLachlan and Thriyambakam Krishnan. 1997. The EM Algorithm and Extensions. Wiley
Homework 2 distributed today
Homework 1 due today
Putting together a complete statistical MT system. [slides: pdf]
Decoding and A* Search. Recent work in statistical MT.
Further reading: Brown, Della Pietra, Della Pietra, and Mercer, The Mathematics of Statistical Machine Translation: Parameter Estimation [pdf, pdf]. Computational Linguistics.
Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada. 2001. Fast Decoding and Optimal Decoding for Machine Translation. ACL.
K. Yamada and K. Knight. 2002. A Decoder for Syntax-Based Statistical MT. ACL.
Section 2 [notes: xls]
The EM algorithm.
Word Sense Disambiguation (WSD) and Naïve Bayes (NB) Models [slides: pdf]
Information sources, performance bounds, dictionary methods, supervised machine learning methods, Naïve Bayes classifiers.
Assigned Reading: M&S Ch. 7.
Reference: Computational Linguistics 24(1), 1998. Special issue on Word Sense Disambiguation.
Proceedings of Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text
Maximum Entropy Classifiers [slides: pdf]
Assigned Reading: class slides.
Other references: Adwait Ratnaparkhi. A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.
M&S section 16.2
Section 3 [notes: txt]
Corpora and other resources.
Homework 2 due today
Maximum Entropy Classifiers, Part II [slides: pdf]
Assigned Reading: class slides.
Other references: Adwait Ratnaparkhi. A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.
M&S section 16.2
Adam Berger, A Brief Maxent Tutorial
Homework 3 distributed today
Part of Speech Tagging and Sequence Inference [slides: pdf]
Parts of speech and the tagging problem: sources of evidence; easy and difficult cases. Probabilistic sequence inference: Hidden Markov Models (HMMs), Conditional Markov Models (CMMs), and the Viterbi algorithm.
Assigned reading: M&S Ch. 10, pp. 341-356.
Further reading on HMMs: M&S Ch. 9.
HMM POS tagger: Thorsten Brants, TnT - A Statistical Part-of-Speech Tagger, ANLP 2000.
CMM POS tagger: Kristina Toutanova and Christopher D. Manning. 2000. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. EMNLP 2000.
Section 4
Maximum entropy models, HMMs
Named Entity Recognition (NER) and Information Extraction (IE) [slides: pdf]
Evaluation reading: M&S 8.1
HMMs for IE reading: Dayne Freitag and Andrew McCallum (2000), Information Extraction with HMM Structures Learned by Stochastic Optimization, AAAI-2000
Maxent NER reading: Jenny Finkel et al., 2005. Exploring the Boundaries: Gene and Protein Identification in Biomedical Text
Background IE reading: Ion Muslea (1999), Extraction Patterns for Information Extraction Tasks: A Survey [pdf, ps], AAAI-99 Workshop on Machine Learning for Information Extraction.
Background IE reading: Douglas E. Appelt. 1999. Introduction to Information Extraction Technology
Parsing for Context-Free Grammars (CFGs) [slides: pdf]
Top-down parsing, bottom-up parsing, empty constituents, left recursion.
Background reading: M&S 3 (if you haven't done any linguistics courses) or J&M ch. 9
Optional reading: J&M ch. 10
Final project guide distributed today
no section today
Homework 3 due today
Dynamic Programming for Parsing [handout: pdf]
Dynamic programming methods, chart parsing, the CKY algorithm.
Optional reading: J&M ch. 10
Homework 4 distributed today
Final project proposals due today
Probabilistic Context-Free Grammars (PCFGs) [slides: pdf (probparse), pdf (search), pdf (unlexicalized)]
PCFGs, finding the most likely parse, refining PCFGs Other questions for PCFGs: the inside-outside algorithm, and learning PCFGs.
Assigned reading: M&S Ch. 11
Section 5
Parsing, PCFGs
Modern Statistical Parsers [slides: see last time]
Parsing for disambiguation, weakening independence assumptions, lexicalization, search methods, Charniak's parser, probabilistic left corner grammars, parser evaluation.
Assigned reading: M&S 8.3, 12
Optional readings:
  • Eugene Charniak (2000), A Maximum-Entropy-Inspired Parser, Proceedings of NAACL-2000.
  • Eugene Charniak (1997), Statistical techniques for natural language parsing, AI Magazine.
  • Eugene Charniak (1997), Statistical parsing with a context-free grammar and word statistics, Proceedings of the Fourteenth National Conference on Artificial Intelligence. AAAI Press/MIT Press, Menlo Park (1997).
  • Wed
    Question Answering (QA) [handout: pdf]
    TREC-style robust QA, natural language database interfaces
    Assigned reading: Marius Pasca, Sanda M. Harabagiu. High Performance Question/Answering. SIGIR 2001: 366-374.
    no section today
    Homework 4 due today
    Compositional Semantics
    Semantic representations, lambda calculus, compositionality, syntax/semantics interfaces, logical reasoning.
    Assigned reading: An Informal but Respectable Approach to Computational Semantics [pdf, ps]
    Compositional Semantics
    Assigned reading: I. Androutsopoulos et al., Language Interfaces to Databases
    Memorial Day
    no class
    Dialog & Discourse Systems
    Rhetorical structure, planning and requests.
    Assigned reading: handout
    Optional reading: Gazdar & Mellish ch. 10
    Final projects due today
    Final Project Presentations

    Course Information

    Lectures: MW 11:00-12:15
    Location: Gates B12
    Section: F 11:00-12:15
    Location: Bldg. 200-217
    Professor: Chris Manning

    Electronic Communications


    Newsgroup: su.class.cs224n

    Questions mailing list:
    Send your questions here!

    Announcements mailing list:

    Enrolled students are automatically subscribed. Others wishing to receive announcements should send an email to with message body "subscribe cs224n-spr0405-guests".


    Homework 1 (due 4/11/05)
    Homework 2 (due 4/20/05)
    Homework 3 (due 5/4/05)
    Homework 4 (due 5/18/05)
    Final project

    Late Day Policy
    Regrading Policy
    Homework Collaboration Policy


    Lecture slides: intro [ppt, pdf]
    Lecture slides: n-grams [ps]
    M&S Chapters 1 & 4 [ps]
    M&S Chapter 3 [pdf]
    M&S Chapter 6 [ps]
    M&S Chapter 10 [pdf]
    M&S Chapters 11 & 12 [pdf]
    Section notes: smoothing [xls, pdf]
    Section notes: EM [xls]
    Lecture slides: WSD [pdf]
    Lecture slides: MaxEnt [pdf]
    Section notes: corpora etc. [txt]


    Chris Manning

    Professor: Chris Manning
    Office: Gates 158
    Office Hours: M 4-5, W 2-3
    Phone: 650-723-7683
    Fax: 650-725-2588

    Bill MacCartney

    TA: Bill MacCartney
    Office: Gates 114
    Office Hours: Tu Th 11-12, W 1-2
    Phone: 650-723-3796
    Email: my domain is and my username is wcmac

    Guy Isely

    TA: Guy Isely
    Office: Gates B24A
    Office Hours: M F 10-11

    Admin: Colleen Scott-Fields
    Office: Gates 150
    Phone: 650-723-0748


    The Stanford NLP Group
    Linguistic Corpora at Stanford
    Statistical NLP links
    Probabilistic parser links
    Java 1.5 Overview
    Java 1.5 New Features