Announcements

6/06/07 **Final Project Presentations are in 380-380C** Make sure to send your ppt slides to Jenny in advance.
5/08/07 How do I interpret the output of the scoring script?
5/07/07 Ah! I don't understand this whole feature index thing in PA3.
4/28/07 What's the deal with distortion for Model 2?
4/24/07 What numbers should I be getting for PA2?
4/24/07 What should getAlignmentProb() return?
4/18/07 PA2 is now available. Enjoy!
4/18/07 Check out the new FAQ on Kevin Knight's tutorial, as well as chapter 24 of J&M
4/13/07 Do I have to create my own validation set for PA1?
4/13/07 PA1 and PA2 deadlines have been pushed back to Wed, 4/18 and 5/2 respectively. Thank the Professor for having a kind heart.
4/13/07 Make sure to read handouts for lectures and section, posted n the syllabus page.
4/13/07 Check out the new FAQs?
4/9/07 What numbers should I be getting for PA1?
4/9/07 Having trouble compiling/running PA1? Check out the FAQ.
4/1/07 Looking for a programming partner? Try posting to the class newsgroup.


Course Description

This course is designed to introduce students to the fundamental concepts and ideas in natural language processing (NLP), and to get them up to speed with current research in the area. It develops an in-depth understanding of both the algorithms available for the processing of linguistic information and the underlying computational properties of natural languages. Word-level, syntactic, and semantic processing from both a linguistic and an algorithmic perspective are considered. The focus is on modern quantitative techniques in NLP: using large corpora, statistical models for acquisition, disambiguation, and parsing. Also, it examines and constructs representative systems.

Prerequisites

  • Adequate experience with programming and formal structures (e.g., CS106B/X and CS103B/X).
  • Programming projects will be written in Java 1.5, so knowledge of Java (or a willingness to learn on your own) is required.
  • Knowledge of standard concepts in artificial intelligence and/or computational linguistics (e.g., CS121/221 or Ling 180).
  • Basic familiarity with logic, vector spaces, and probability.

Intended Audience

Graduate students and advanced undergraduates specializing in computer science, linguistics, or symbolic systems.

Textbook and Readings

The most used book will be:

  • Christopher Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing. MIT Press, 1999.
    Buy it at the Stanford Bookstore or Amazon ($77 new)
    Read the text online!

It's referred to as M&S below. Please see http://nlp.stanford.edu/fsnlp/ for supplementary information about the text, including errata, and pointers to online resources.

Other useful reference texts for NLP are:

  • James Allen. 1995. Natural Language Understanding. Benjamin/Cummings, 2ed.
  • Gerald Gazdar and Chris Mellish. 1989. Natural Language Processing in X. Addison-Wesley.
  • Dan Jurafsky and James Martin. 2000. Speech and Language Processing. Prentice Hall.
      (selected chapters from 2ed available at http://www.cs.colorado.edu/~martin/slp2.html)
  • Frederick Jelinek. 1998. Statistical Methods for Speech Recognition. MIT Press.

Papers will occasionally be distributed and discussed during the course of the class.

Copies of in-class hand-outs, such as readings and programming assignments, will be posted on the syllabus, and hard copies will also be available outside Gates 158 (in front of Prof. Manning's office) while supplies last.

Assignments and Grading

There will be three substantial programming assignments, each exploring a core NLP task.

In addition, there will be a final programming project on a topic of your own choosing.

Course grades will be based 60% on programming assignments (20% each) and 40% on the final project.

Be sure to read the policies on late days and collaboration.

Section

Sections will be held most weeks to go over background material, or to address issues related to the programming assignments. Sections are optional, but students are encouraged to attend for a better understanding of background material and the assignments.

Course Information


Lectures: MW 11:00-12:15
Location: 200-203
Section: F 11:00-12:15
Location: Gates B12
Professor: Chris Manning

Electronic Communications

Web: http://cs224n.stanford.edu

Newsgroup: su.class.cs224n

Staff mailing list:
cs224n-spr0607-staff@lists.stanford.edu

Announcements mailing list:
cs224n-spr0607-students@lists.stanford.edu

Enrolled students are automatically subscribed.
Others wishing to receive announcements should
go to mailman.stanford.edu, and subscribe to
"cs224n-spr0607-guests".

Assignments

Assignment 1 (due 4/18/07)
Assignment 2 (due 5/2/07)
Assignment 3 (due 5/16/07)
Final project

Collaboration Policy
Late Day Policy
Regrading Policy

Links

The Stanford NLP Group
Linguistic Corpora at Stanford
Statistical NLP links
Probabilistic parser links
Java 1.5 Overview
Java 1.5 New Features