Announcements

8/23/11 CS 244N will be taught in the Fall for 2011-12. It's back to being taught by Christopher Manning. This website has not yet been updated for the Fall 2011 edition . Links to previous year's notes are still available, perhaps with a strikethrough through them, which will be removed in order to let you know when the links are updated. I'm hoping to have final project presentations on Tue Dec 13, 3:30-6:30.


Course Description

This course introduces the fundamental concepts and ideas in natural language processing (NLP), otherwise known as computational linguistics. Ever wondered how Google Translate works, or how companies do automated resume processing? Want to build a computer that understands language? This course is for you. It develops an in-depth understanding of both algorithms for processing linguistic information and the underlying computational properties of natural languages. We consider Word-level, syntactic, and semantic processing from both a linguistic and an algorithmic perspective, aiming to get up to speed with current research in the area. The course focuses on modern quantitative techniques in NLP -- using large corpora, statistical models for acquisition, disambiguation, and parsing -- and the construction of representative systems.

Prerequisites

  • Adequate experience with programming and formal structures (e.g., CS106B/X and CS103B/X).
  • Programming projects will be written in Java, so knowledge of Java (or a willingness to learn on your own) is required.
  • Knowledge of standard concepts in artificial intelligence and/or computational linguistics (e.g., CS121/221 or CS124/Ling 180).
  • Basic familiarity with logic, vector spaces, and probability.

Intended Audience

Graduate students and advanced undergraduates specializing in computer science, linguistics, or symbolic systems.

Textbook and Readings

The required text is:

  • Daniel Jurafsky and James H. Martin. 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Second Edition. Prentice Hall.

It's at the bookstore (and other purveyors of fine books). Of course, we're also fond of:

  • Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.
    Buy it at the Stanford Bookstore or Amazon ($64 new).
    You can read the text online on a Stanford network computer! It's referred to as M&S in the syllabus. While a bit older, it also has good and often distinct coverage of many topics. Please see http://nlp.stanford.edu/fsnlp/ for supplementary information about the text, including errata, and pointers to online resources.

Other useful reference texts for NLP are:

  • James Allen. 1995. Natural Language Understanding. Benjamin/Cummings, 2ed.
  • Gerald Gazdar and Chris Mellish. 1989. Natural Language Processing in X. Addison-Wesley. [Where X = Prolog, Lisp, or, I think, Snobol.
  • Frederick Jelinek. 1998. Statistical Methods for Speech Recognition. MIT Press.

Other papers with relevant material will be posted on the syllabus, as will lecture slides.

Assignments and Grading

There will be three substantial programming assignments, each exploring a core NLP task. They are a chance to see real, close to state-of-the-art tools and techniques in action, and where students learn a lot of the material of the class.

There will be a final programming project on a topic of your own choosing.

Finally, there will be simple in-class quizzes based on the day's lecture, which will aim to check that you are paying attention to what you hear/read.

Course grades will be based 60% on programming assignments (20% each), 6% on the quizzes, and 34% on the final project.

Be sure to read the policies on late days and collaboration.

Section

Sections will be held most weeks to go over background material, or to address issues related to the programming assignments. Sections are optional, but students are encouraged to attend for a better understanding of background material and the assignments.

Course Information


Lectures: MW 11:00-12:15
Location: Gates B03
Section: TBA
Location: TBA
Instructors: Chris Manning

Electronic Communications

Web: http://cs224n.stanford.edu/

Piazzza: CS224N forum
Post questions, find project partners, etc.

Staff mailing list:
For the moment, contact manning@stanford.edu

Announcements mailing list:
cs224n-win1011-students@lists.stanford.edu

Enrolled students are automatically subscribed.
Others wishing to receive announcements should go to mailman.stanford.edu, and subscribe to
"cs224n-win1011-guests".

Assignments

Assignment 1 (due 1/19/11)
Assignment 2 (due 2/2/11)
Assignment 3 (due 2/16/11)
Final project (due 3/9/11)

Quiz answer submission form
Collaboration Policy
Late Day Policy
Regrading Policy

Links

Quiz answer submission form
The Stanford NLP Group
Linguistic Corpora at Stanford
Statistical NLP links
Probabilistic parser links
Java 1.5 Overview
Java 1.5 New Features