CS 224N / Ling 284 — Natural Language Processing

Calendar

Mon	Tue	Wed	Thu	Fri
Sep 26 Lecture 1: Introduction	Sep 27	Sep 28 Lecture 2: N-gram Models	Sep 29	Sep 30
Oct 3 Lecture 3: Statistical MT: Word Alignment	Oct 4	Oct 5 Lecture 4: EM (and StatMT)	Oct 6	Oct 7 Section 1: Smoothing
Oct 10 Lecture 5: StatMT Systems	Oct 11	Oct 12 PA1 due Lecture 6: Phrase-based & syntactic MT	Oct 13	Oct 14 Section 2: PA2 & EM
Oct 17 Lecture 7: IE/NER & NB Models	Oct 18	Oct 19 Lecture 8: MaxEnt Classifiers	Oct 20	Oct 21 Section 3: Corpora
Oct 24 Lecture 9: Sequence classifiers & IE	Oct 25	Oct 26 PA2 due Lecture 10: POS tagging and chunking	Oct 27	Oct 28 Section 4: MaxEnt
Oct 31 Lecture 11: Syntax & Parsing	Nov 1	Nov 2 Final project proposal due Lecture 12: DPs for Parsing	Nov 3	Nov 4 Section 5: Parsing & PCFGs
Nov 7 Lecture 13: Lexicalized PCFGs	Nov 8	Nov 9 PA3 due Lecture 14: Statistical Parsers	Nov 10	Nov 11
Nov 14 Lecture 15: Lexical Semantics	Nov 15	Nov 16 Lecture 16: Coreference	Nov 17	Nov 18
Nov 21 Thanksgiving	Nov 22	Nov 23 Thanksgiving	Nov 24	Nov 25
Nov 28 Lecture 17: Computational Semantics	Nov 29	Nov 30 Lecture 18: Computational Semantics II	Dec 1	Dec 2
Dec 5 Lecture 19: Semantic role labeling	Dec 6	Dec 7 Final project / PA4 due Lecture 20: QA & Inference	Dec 8	Dec 9
Dec 12	Dec 13 3:30pm - 6:30pm Final project presentations	Dec 14	Dec 15	Dec 16

Syllabus

Lecture 1
Mon
1/3/11 Introduction [slides: pdf; pdf1up] Overview of NLP. Statistical machine translation. Language models and their role in speech processing. Course introduction and administration.
No required reading.
Optional good background reading: J&M Ch. 1; M&S 1.0-1.3, 4.1-4.2, Collaboration Policy
Optional reading on Unix text manipulation (useful skill!): Ken Church's tutorial Unix for Poets [ps, pdf]
Background for MT video [fun read!]: The IBM 701 translator (1954)
(If your knowledge of probability theory is limited, also read M&S 2.0-2.1.7. If that's too condensed, read the probability chapter of an intro statistics textbook, e.g. Rice, Mathematical Statistics and Data Analysis, ch. 1.)
Distributed today: Programming Assignment 1

Lecture 2
Wed
1/5/11

N-gram Language Models and Information Theory [slides: pdf; pdf1up; MegaHal: html]
n-gram models. Statistical estimation and smoothing for language models. Entropy, cross entropy, mutual information, perplexity.
Assigned reading: J&M ch. 4
Alternative reading: M&S 1.4, 2.2, ch. 6.
Tutorial reading: Kevin Knight. A Statistical MT Tutorial Workbook [pdf] [rtf]. MS., August 1999. Sections 1-14.
Optional advanced reading: Joshua Goodman (2001), A Bit of Progress in Language Modeling, Extended Version [pdf, ps]
Optional advanced reading: (older but shorter) Stanley Chen and Joshua Goodman (1998), An empirical study of smoothing techniques for language modeling [pdf, ps]
Optional very advanced reading: Teh, Yee Whye. 2006. A Hierarchical Bayesian Language Model based on Pitman-Yor Processes. EMNLP 2006. [pdf]

Lecture 3
Mon
1/10/11

Statistical Machine Translation (MT): Word Alignment Models [slides: pdf]
Assigned reading: J&M ch. 25, sections 25.0-25.6, 25.11.

Lecture 4
Wed
1/12/11

Expectation Maximization (EM) and Statistical Alignment Models [slides: pdf, pdf-1up, spreadsheet: Google Docs]
EM and its use in statistical MT alignment models.
Assigned reading: Kevin Knight. A Statistical MT Tutorial Workbook [pdf] [rtf]. MS., August 1999. Sections 15-37 (get the free beer!).
(read also the relevant Knight Workbook FAQ)
Reference reading: Geoffrey J. McLachlan and Thriyambakam Krishnan. 1997. The EM Algorithm and Extensions. Wiley
Optional further reading: M&S 13.
Moore, Robert C. 2005. Association-Based Bilingual Word Alignment. In Proceedings, Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Ann Arbor, Michigan , pp. 1-8.
Moore, Robert C. 2004. Improving IBM Word Alignment Model 1. In Proceedings, 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 519-526.

Section 1
Fri
1/14/11

Smoothing [notes: ppt used in the section; original xls ]
Smoothing: absolute discounting, proving you have a proper probability distribution, Good-Turing implementation. Information theory examples and intuitions. Java implementation issues.

Mon
1/17/11

Martin Luther King Day
no class

Lecture 5
Wed
1/19/11

Putting together a complete statistical MT system [slides: pdf]
IBM Word alignment models. MT evaluation. Decoding and Search.
Required reading: J&M, secs 25.7-10, 25.12.
Reference: "Seminal" background reading: Brown, Della Pietra, Della Pietra, and Mercer, 2003, The Mathematics of Statistical Machine Translation: Parameter Estimation [pdf, pdf]. Computational Linguistics.
[After their work in speech and language technology, the team turned to finance.... (the original article from Bloomberg has long since disappeared...)]
Further references:
Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada. 2001. Fast Decoding and Optimal Decoding for Machine Translation. ACL.
Due today: Programming Assignment 1
Distributed today: Programming Assignment 2

Section 2
Fri
1/21/11

PA2 & EM algorithm [notes: ppt used in the section]

Lecture 6
Mon
1/24/11

MT systems. Decoding. Phrased-based and syntactic MT. Real world MT. [6-up slides: pdf] [1-up slides: pdf]
Decoding. Recent work in statistical MT: statistical phrase based systems and syntax in MT. MT in practice.
Required reading: J&M, secs 25.7-10, 25.12.
Further references:
Franz Josef Och, Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics 30(4): 417-449.
K. Yamada and K. Knight. 2002. A Decoder for Syntax-Based Statistical MT. ACL.
David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. ACL 2005, pages 263-270.

Lecture 7
Wed
1/26/11

Information Extraction (IE) and Named Entity Recognition (NER). [slides: pdf]
Information sources, rule-based methods, evaluation (recall, precision). Introduction to supervised machine learning methods. Naïve Bayes (NB) classifiers for entity classification.
Assigned reading:
J&M secs 22.0-22.1 (intro to IE and NER).
J&M secs. 5.5 and 5.7 (introduce HMMs, Viterbi algorithm, and experimental technique). If you're not familiar with supervised classification and Naive Bayes, read J&M sec 20.2 before the parts of ch. 5.
Alternative reading: M&S 8.1 (evaluation), 7.1 (experimental metholdology), 7.2.1 (Naive Bayes), 10.2-10.3 (HMMs and Viterbi)
Background IE reading:
Recent Wired article on Google's search result ranking (but don't completely swallow the hype: click through on the mike siwek lawyer mi query, and read a couple of the top hits in the search results).
Sunita Sarawagi. 2008. Information Extraction. Foundations and Trends in Databases 1(3): 261-377. http:/dx.doi.org/10.1561/1900000003
Peter Jackson and Isabelle Moulinier. 2007. Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization. John Benjamins. 2nd edition. Ch. 3.
Ion Muslea (1999), Extraction Patterns for Information Extraction Tasks: A Survey [pdf, ps], AAAI-99 Workshop on Machine Learning for Information Extraction.
Douglas E. Appelt. 1999. Introduction to Information Extraction Technology

Section 3
Fri
1/28/11

Final Project, Corpora and other resources [notes: ppt, Project Descriptions]

Lecture 8
Mon
1/31/11

Maximum Entropy Classifiers [slides: pdf, pdf1up]
Assigned Reading:
class slides.
J&M secs 6.6-7 (maximum entropy models)
Additional references:
M&S section 16.2
Adwait Ratnaparkhi. A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.

Lecture 9
Wed
2/2/11

Maximum Entropy Sequence Classifiers and Information Extraction [slides: pdf]
Assigned Reading:
class slides.
J&M secs. 6.0-6.4 and 6.8-6.9 (HMMs in detail and then MEMMs), and 22.2, 22.4 (IE).
Other references: Adwait Ratnaparkhi. A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.
Adam Berger, A Brief Maxent Tutorial
HMMs for IE reading: Dayne Freitag and Andrew McCallum (2000), Information Extraction with HMM Structures Learned by Stochastic Optimization, AAAI-2000
Maxent NER reading: Jenny Finkel et al., 2005. Exploring the Boundaries: Gene and Protein Identification in Biomedical Text
Due today: Programming Assignment 2
Distributed today: Programming Assignment 3
Distributed today: Final project guide

Section 4
Fri
2/4/11

Maximum entropy sequence models [notes: pdf, xls]

Lecture 10
Mon
2/7/11

Syntax and Parsing for Context-Free Grammars (CFGs) [slides: 6-up pdf] [slides: 1-up pdf]
Parsing, treebanks, attachment ambiguities. Context-free grammars. Top-down and bottom-up parsing, empty constituents, left recursion, and repeated work. Probabilistic CFGs.
Assigned reading: J&M ch. 13, secs. 13.0-13.3.
Background reading: J&M ch. 9 (or M&S ch. 3). This is especially if you haven't done any linguistics courses, but even if you have, there's useful information on treebanks and part-of-speech tag sets used in NLP.

Lecture 11
Wed
2/9/11

Dynamic Programming for Parsing [slides: 6-up pdf] [slides: 1-up pdf] Dynamic programming for parsing. The CKY algorithm. Accurate unlexicalized PCFG parsing.
Assigned reading: J&M sec. 13.4
Additional information: Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. ACL 2003, pp. 423-430.
Due today: final project proposals

Section 5
Fri
2/11/11

Parsing, PCFGs [notes: pdf]

Lecture 12
Mon
2/14/11

Lexicalized Probabilistic Context-Free Grammars (LPCFGs) [6-up slides: pdf] [1-up slides: pdf]
Lexicalization and lexicalized parsing. The Charniak, Collins/Bikel, and Petrov & Klein parsers.
Assigned reading: J&M ch. 14 (you can stop at the end of sec. 14.7, if you'd like!)
Alternative reading: M&S Ch. 11
Optional readings:

Eugene Charniak (1997), Statistical techniques for natural language parsing, AI Magazine.
Eugene Charniak (1997), Statistical parsing with a context-free grammar and word statistics, Proceedings of the Fourteenth National Conference on Artificial Intelligence. AAAI Press/MIT Press, Menlo Park (1997).
Eugene Charniak (2000), A Maximum-Entropy-Inspired Parser, Proceedings of NAACL-2000.

Lecture 13
Wed
2/16/11

Modern Statistical Parsers [6-up slides: pdf] [1-up slides: pdf]
Search methods in parsing: Agenda-based chart, A*, and "best-first" parsing. Dependency parsing. Discriminative parsing. Assigned reading: J&M ch. 14 (you can stop at the end of sec. 14.7, if you'd like!)
Alternative, less up-to-date reading: M&S 8.3, 12

Dan Klein and Christopher D. Manning. 2003. Factored A* Search for Models over Sequences and Trees. IJCAI 2003.
Dan Klein and Christopher D. Manning. 2003. A* Parsing: Fast Exact Viterbi Parse Selection. HLT-NAACL 2003.
Kristina Toutanova, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen. 2002. Parse Disambiguation for a Rich HPSG Grammar. First Workshop on Treebanks and Linguistic Theories (TLT2002), pp. 253-263. Sozopol, Bulgaria.
Kristina Toutanova, Christopher D. Manning, Dan Flickinger, and Stephan Oepen. 2005. Stochastic HPSG Parse Disambiguation using the Redwoods Corpus. Research in Language and Computation 2005.
B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning. Max-Margin Parsing. Empirical Methods in Natural Language Processing (EMNLP04), Barcelona, Spain, July 2004. Received best paper award.
Eugene Charniak and Mark Johnson (2005). Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
Ryan McDonald, Koby Crammer and Fernando Pereira (2005). Online Large-Margin Training of Dependency Parsers. 43rd Annual Meeting of the Association for Computational Linguistics, ACL 2005.