CS 224N (Ling 237) -- Natural Language Processing -- Course Syllabus

STANFORD

CS 224N -- Ling 237
Natural Language Processing
Spring 2003 - Handout #2

Course Syllabus

(updated 4/03/2003)

Date	Topic	Out	Due
Week 1
Wednesday, 2 Apr 03	What is NLP? History; current applications and topics. Why does Chris pronounce 'parsing' funny?
Wednesday, 2 Apr 03	Topics: Course introduction and administration. What is NLP? Brief history of and discussion of current topics, approaches, and applications. Need for language understanding. Rule based approaches to linguistic structure. How to find sentence structure: parsing as search. Reading: Could read M&S Sec 1.0-1.3 for intro.
Week 2
Monday, 7 Apr 03	NLP Parsing as search and dynamic programming of parsing	HW #1
Monday, 7 Apr 03	Readings: handout, Gazdar and Mellish (1989) pp. 143-155; M&S Ch. 3 [if you haven't done any linguistics courses] or J&M Ch. 9 References: J&M Ch. 10 Topics: top-down parsing, bottom-up parsing; empty constituents, and left-recursive rules
Wednesday, 9 Apr 03	Dynamic programming methods of parsing, weighted grammar rule parsing	PP #1
Wednesday, 9 Apr 03	Readings: handout, Gazdar and Mellish (1989) pp. 179-199 References: J&M Ch. 10 Topics: Tabular/memoized/chart parsing methods. The Earley algorithm. The CKY algorithm. Active chart parsing.
Section	Parsing algorithms
Section
Week 3
Monday, 14 Apr 03	n-gram models of language		HW #1
Monday, 14 Apr 03	Readings: M&S Section 1.4.0-1.4.3, Chapter 6 [really it'd be good to glance through all of it, but pay particular attention to things we covered in class!]. If you are rusty or have little knowledge of probabilty theory, also read Ch. 2, sec 2.0-2.1.7. If that's too condensed, read the probability chapter of an intro statistics textbook, for instance, Rice, Mathematical Statistics and Data Analysis, ch. 1. You're dormmate probably has a copy. References: Joshua Goodman. 2001. A Bit of Progress in Language Modeling. Computer Speech and Language, October 2001, pages 403-434. Stanley Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical report TR-10-98, Harvard University, August 1998. Topics: Relative Frequency estimation from corpora, n-gram models of English - Markov models, relative entropy, cross entropy, and perplexity. Smoothing techniques to deal with unseen or insufficiently seen contexts
Wednesday, 16 Apr 03	Word Sense Disambiguation: Naïve Bayes methods
Wednesday, 16 Apr 03	Readings: Tom Mitchell Machine Learning, pp. 177-184, M&S Sec 7.0-7.3, Sec 7.5 Topics: The general problem of word sense disambiguation, information sources, performance bounds, dictionary and supervised machine learning approaches. Naive Bayes classifiers. System evaluation: accuracy, precision and recall, F measure. References: J&M 636-640, Computational Linguistics Vol 24 No 1, 1998 Special Issue on Word Sense Disambiguation (particularly the Introduction)
Section	Accessing corpora at Stanford; linguistic annotation; Unix text tools
Section
Week 4
Monday, 21 Apr 03	POS tagging and Hidden Markov Models		PP #1
Monday, 21 Apr 03	Readings: M&S Sec 10.0-10.2; Sec 9.0-9.3.2 Reference: M&S chapter 3 through Section 3.1; section 4.3.2 Topics: Part of speech tagging. Available information sources. Markov models. Fundamental algorithms for hidden Markov models: determining the probability of an observed sequence, and the maximum probability state sequence (the Viterbi algorithm).
Wednesday, 23 Apr 03	Named Entity Recognition, Information Extraction and Hidden Markov Models	HW #2
Wednesday, 23 Apr 03	Readings: Dayne Freitag and Andrew McCallum. 2000. Information Extraction with HMM Structures Learned by Stochastic Optimization. AAAI-2000. M&S section 8.1 Topics: extracting semantic tokens (names of people, companies, prices, times, etc.) from text, use of cascades, identifying collocations and terminological phrases. Machine learning methods for IE over annotated data. Autoslog and HMM-based techniques. Reference: Ion Muslea: "Extraction Patterns for Information Extraction Tasks: A Survey", AAAI-99 Workshop on Machine Learning for Information Extraction.
Section	Hidden Markov Models workshop
Section	Topics: Working through HMMs
Week 5
Monday, 28 Apr 03	POS Tagging and similar sequence problems continued	PP #2
Monday, 28 Apr 03	Readings: M&S from section 9.3.3-9.5. Topics: Other approaches to and issues that arise in part of speech tagging. Unknown words. Different tagsets. Baum-Welch reestimation of parameters of HMM. The limited usefulness of this in part of speech tagging. Successful use in IE. EM as data clustering.
Wednesday, 30 Apr 03	Conditional/discriminative models applied to sequence tasks		HW #2
Wednesday, 30 Apr 03	Conditional markov model/maximum entropy model/discriminative sequence model techniques applied to problems of part-of-speech tagging and named entity recognition.
Section	Information extraction for the web: wrapper induction and related techniques
Section
Week 6
Monday, 5 May 03	Probabilistic Context-Free Grammars	FinalP
Monday, 5 May 03	Readings: M&S chapter 11 through section 11.3.3 Topics: probabilistic grammars. Calculating the probability of a string from a structured model. Choosing the highest probability parse.
Wednesday, 7 May 03	Probabilistic parsing and attachment ambiguities		PP #2
Wednesday, 7 May 03	Readings: M&S chapter 11 from section 11.3.4, chapter 12 through section 12.1.7, sec 8.3. Topics: Probabilistic parsing; attachment ambiguities: prepositional phrases, conjunctions, noun compounds Reference: Eugene Charniak. A Maximum-Entropy-Inspired Parser Proceedings of NAACL-2000. Eugene Charniak. Statistical techniques for natural language parsing AI Magazine. (1997). Eugene Charniak. Statistical parsing with a context-free grammar and word statistics, Proceedings of the Fourteenth National Conference on Artificial Intelligence AAAI Press/MIT Press, Menlo Park (1997).
Section	Project discussion
Section
Week 7
Monday, 12 May 03	Building semantic representations (1)	HW #3	FinalP Abstract
Monday, 12 May 03	Readings: handout Reference: J&M Ch. 15, Topics: (Typed) lambda calculus, , rule-to-rule semantic translation. Term and attribute-value unification; feature grammars and unification-based parsing
Wednesday, 14 May 03	Building semantic representations (2)
Wednesday, 14 May 03	Readings: handout Reference: I. Androutsopoulos et al. Language Interfaces to Databases http://citeseer.nj.nec.com/androutsopoulos95natural.html Topics: Unification, rule-to-rule semantic translation. Syntax-semantics interfaces. Using semantic forms
Section	Semantic representations and logical reasoning
Section
Week 8
Monday, 19 May 03	Building semantic representations (3)		HW #3
Monday, 19 May 03	Interface to knowledge representations. Lexical semantics: WordNet
Wednesday, 21 May 03	Dialogue and discourse systems; planning and requests
Wednesday, 21 May 03	Readings: handout Reference: Gazdar & Mellish, ch. 10
Section	none
Section
Week 9
Monday, 26 May 03	Memorial Day holiday - no class
Monday, 26 May 03
Wednesday, 28 May 03	Machine translation: rule-based and statistical approaches; sentence alignment
Wednesday, 28 May 03	Readings: M&S chapter 13.1-2
Section	none
Section
Week 10
Monday, 2 Jun 03	Statistical machine translation		FinalP
Monday, 2 Jun 03	Readings: M&S chapter 13.3, Kevin Knight. A Statistical MT Tutorial Workbook. ms., August 1999.
Wednesday, 4 Jun 03	Project Mini Presentations.
Wednesday, 4 Jun 03
Finals Period - time to visit the beach!

Out

Memorial Day holiday - no class