Announcements

4/29/06 New FAQ: "stepSize underflow" in PA2 Part II
4/27/06 New FAQ: For PA2 Part I, why is the test data part of the training data?
4/25/06 New FAQ: Questions on Knight, section 31
4/18/06 New FAQ: Increasing perplexity with more training data?
4/17/06 Room change: effective immediately, lectures will be in Bldg. 200-205.
4/15/06 New FAQ: Do we smooth at train time or test time?
4/14/06 Need more memory in Java? Use -mx flag: java -mx1000m HelloWorld.
4/13/06 Pi-chuan's office hours: Wed 2-3, Fri 12:30-1:30
4/13/06 Bill's office hours: MW 10-11
4/13/06 Chris's office hours: Tue 3-4, Thu 10-11
4/12/06 We've added another fabulous TA: Pi-chuan Chang
4/5/06 Looking for a programming partner? Try posting to the class newsgroup.


Course Description

This course is designed to introduce students to the fundamental concepts and ideas in natural language processing (NLP), and to get them up to speed with current research in the area. It develops an in-depth understanding of both the algorithms available for the processing of linguistic information and the underlying computational properties of natural languages. Word-level, syntactic, and semantic processing from both a linguistic and an algorithmic perspective are considered. The focus is on modern quantitative techniques in NLP: using large corpora, statistical models for acquisition, disambiguation, and parsing. Also, it examines and constructs representative systems.

Prerequisites

  • Adequate experience with programming and formal structures (e.g., CS106 and CS103X).
  • Programming projects will be written in Java 1.5, so knowledge of Java (or a willingness to learn on your own) is required.
  • Knowledge of standard concepts in artificial intelligence and/or computational linguistics (e.g., CS121/221 or Ling 180).
  • Basic familiarity with logic, vector spaces, and probability.

Intended Audience

Graduate students and advanced undergraduates specializing in computer science, linguistics, or symbolic systems.

Textbook and Readings

The most used book will be:

  • Christopher Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing. MIT Press, 1999.
    Buy it at the Stanford Bookstore or Amazon ($77 new)
    Read the text online!

It's referred to as M&S below. Please see http://nlp.stanford.edu/fsnlp/ for supplementary information about the text, including errata, and pointers to online resources.

Other useful reference texts for NLP are:

  • James Allen. 1995. Natural Language Understanding. Benjamin/Cummings, 2ed.
  • Gerald Gazdar and Chris Mellish. 1989. Natural Language Processing in X. Addison-Wesley.
  • Dan Jurafsky and James Martin. 2000. Speech and Language Processing. Prentice Hall.
  • Frederick Jelinek. 1998. Statistical Methods for Speech Recognition. MIT Press.

Papers will occasionally be distributed and discussed during the course of the class.

Copies of in-class hand-outs, such as readings and programming assignments, will be posted on the syllabus, and hard copies will also be available outside Gates 158 (in front of Prof. Manning's office) while supplies last.

Assignments and Grading

There will be three substantial programming assignments, each exploring a core NLP task.

In addition, there will be a final programming project on a topic of your own choosing.

Course grades will be based 60% on programming assignments (20% each) and 40% on the final project.

Be sure to read the policies on late days and collaboration.

Section

Sections will be held most weeks to go over background material, or to address issues related to the programming assignments. Sections are optional, but students are encouraged to attend for a better understanding of background material and the assignments.

Course Information


Lectures: MW 11:00-12:15
Location: Gates B12
Section: F 11:00-12:15
Location: Gates B12
Professor: Chris Manning

Electronic Communications

Web: http://cs224n.stanford.edu

Newsgroup: su.class.cs224n
(best option for questions)

Staff mailing list:
cs224n-spr0506-staff@lists.stanford.edu

Announcements mailing list:
cs224n-spr0506-students@lists.stanford.edu

Enrolled students are automatically subscribed. Others wishing to receive announcements should send an email to majordomo@lists.stanford.edu with message body "subscribe cs224n-spr0506-guests".

Assignments

Assignment 1 (due 4/19/06)
Assignment 2 (due 5/3/06)
Assignment 3 (due 5/17/06)
Final project

Collaboration Policy
Late Day Policy
Regrading Policy

Links

The Stanford NLP Group
Linguistic Corpora at Stanford
Statistical NLP links
Probabilistic parser links
Java 1.5 Overview
Java 1.5 New Features