Calendar

Mon	Tue	Wed	Thu	Fri
4/3	4/4	4/5 Lecture 1: Intro	4/6	4/7
4/10 Lecture 2: N-gram Models	4/11	4/12 Lecture 3: StatMT	4/13	4/14 Section 1: Smoothing
4/17 Lecture 4: StatMT & EM	4/18	4/19 PA1 due Lecture 5: StatMT Systems	4/20	4/21 Section 2: EM
4/24 Lecture 6: WSD & NB Models	4/25	4/26 Lecture 7: MaxEnt Classifiers	4/27	4/28 Section 3: MaxEnt
5/1 Lecture 8: MaxEnt Classifiers II	5/2	5/3 PA2 due Lecture 9: CFG Parsing	5/4	5/5 Section 4: Corpora
5/8 Lecture 10: DPs for Parsing	5/9	5/10 Lecture 11: PCFGs	5/11	5/12 Section 5: Parsing & PCFGs
5/15 Lecture 12: StatParsers	5/16	5/17 PA3 due Lecture 13: POS tagging	5/18	5/19
5/22 Lecture 14: NER & IE	5/23	5/24 Lecture 15: ComSem	5/25	5/26
5/29 Memorial Day	5/30	5/31 Lecture 16: ComSem II	6/1	6/2
6/5 Lecture 17: QA Systems	6/6	6/7 Final project due Lecture 18: Dialog & Discourse	6/8	6/9
6/12	6/13	6/14 8:30am - 11:30am Final project presentations	6/15	6/16

Syllabus

Lecture 1
Wed
4/5/06

Introduction [slides: pdf, ps]
Overview of NLP. Statistical machine translation. Language models and their role in speech processing. Course introduction and administration.
Good background reading: M&S 1.0-1.3, 4.1-4.2, Collaboration Policy
Optional reading: Ken Church's tutorial Unix for Poets [ps, pdf]
(If your knowledge of probability theory is limited, also read M&S 2.0-2.1.7. If that's too condensed, read the probability chapter of an intro statistics textbook, e.g. Rice, Mathematical Statistics and Data Analysis, ch. 1.)
Distributed today: Programming Assignment 1

Lecture 2
Mon
4/10/06

N-gram Language Models and Information Theory [slides: ps pdf] MegaHal]
n-gram models. Entropy, relative entropy, cross entropy, mutual information, perplexity. Statistical estimation and smoothing for language models.
Assigned reading: M&S 1.4, 2.2, ch. 6.
Optional reading: Joshua Goodman (2001), A Bit of Progress in Language Modeling, Extended Version [pdf, ps]
Optional reading: Stanley Chen and Joshua Goodman (1998), An empirical study of smoothing techniques for language modeling [pdf, ps]

Lecture 3
Wed
4/12/06

Statistical Machine Translation (MT), Alignment Models [slides: ppt, pdf, ps ]
Assigned reading: Kevin Knight, A Statistical MT Tutorial Workbook [rtf]. MS., August 1999. (see also the relevant FAQ)
Further reading: M&S 13

Section 1
Fri
4/14/06

Smoothing [notes: xls ]
Smoothing: absolute discounting, proving you have a proper probability distribution, Good-Turing implementation. Information theory examples and intuitions. Java implementation issues.

Lecture 4
Mon
4/17/06

Statistical Alignment Models and Expectation Maximization (EM) [slides: pdf, spreadsheet: xls]
EM and its use in statistical MT alignment models.
Reference reading: Geoffrey J. McLachlan and Thriyambakam Krishnan. 1997. The EM Algorithm and Extensions. Wiley
Further reading: Moore, Robert C. 2005. Association-Based Bilingual Word Alignment. In Proceedings, Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Ann Arbor, Michigan , pp. 1-8.
Moore, Robert C. 2004. Improving IBM Word Alignment Model 1. In Proceedings, 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 519-526.

Lecture 5
Wed
4/19/06

Putting together a complete statistical MT system [slides: pdf]
Decoding and A* Search. Recent work in statistical MT.
Further reading: Brown, Della Pietra, Della Pietra, and Mercer, The Mathematics of Statistical Machine Translation: Parameter Estimation [pdf, pdf]. Computational Linguistics.
Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada. 2001. Fast Decoding and Optimal Decoding for Machine Translation. ACL.
K. Yamada and K. Knight. 2002. A Decoder for Syntax-Based Statistical MT. ACL.
David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. ACL 2005, pages 263-270.
Due today: Programming Assignment 1
Distributed today: Programming Assignment 2

Section 2
Fri
4/21/06

The EM algorithm [notes: xls]

Lecture 6
Mon
4/24/06

Word Sense Disambiguation (WSD) and Naïve Bayes (NB) Models [slides: pdf]
Information sources, performance bounds, dictionary methods, supervised machine learning methods, Naïve Bayes classifiers.
Assigned Reading: M&S Ch. 7.
Reference: Computational Linguistics 24(1), 1998. Special issue on Word Sense Disambiguation.
Proceedings of Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

Lecture 7
Wed
4/26/06

Maximum Entropy Classifiers [slides: pdf]
Assigned Reading: class slides.
Other references: Adwait Ratnaparkhi. A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.
M&S section 16.2

Section 3
Fri
4/28/06

Maximum entropy models [notes: pdf, xls]

Lecture 8
Mon
5/1/06

Maximum Entropy Classifiers, Part II [slides: pdf]
Assigned Reading: class slides.
Other references: Adwait Ratnaparkhi. A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.
M&S section 16.2
Adam Berger, A Brief Maxent Tutorial Distributed today: Final project guide

Lecture 9
Wed
5/3/06

Parsing for Context-Free Grammars (CFGs) [slides: pdf]
Top-down parsing, bottom-up parsing, empty constituents, left recursion.
Background reading: M&S 3 (if you haven't done any linguistics courses) or J&M ch. 9
Optional reading: J&M ch. 10
Due today: Programming Assignment 2
Distributed today: Programming Assignment 3

Section 4
Fri
5/5/06

Corpora and other resources [notes: txt]

Lecture 10
Mon
5/8/06

Dynamic Programming for Parsing [handout: pdf]
Dynamic programming methods, chart parsing, the CKY algorithm.
Optional reading: J&M ch. 10

Lecture 11
Wed
5/10/06

Probabilistic Context-Free Grammars (PCFGs) [slides: pdf (probparse), pdf (search), pdf (unlexicalized)]
PCFGs, finding the most likely parse, refining PCFGs. Other questions for PCFGs: the inside-outside algorithm, and learning PCFGs.
Assigned reading: M&S Ch. 11
Due today: final project proposals

Section 5
Fri
5/12/06

Parsing, PCFGs [notes: pdf]

Lecture 12
Mon
5/15/06

Modern Statistical Parsers [slides: see last time, and pdf]
Parsing for disambiguation, weakening independence assumptions, lexicalization, search methods, Charniak's parser, probabilistic left corner grammars, parser evaluation.
Assigned reading: M&S 8.3, 12
Optional readings:

Eugene Charniak (2000), A Maximum-Entropy-Inspired Parser, Proceedings of NAACL-2000.
Eugene Charniak (1997), Statistical techniques for natural language parsing, AI Magazine.
Eugene Charniak (1997), Statistical parsing with a context-free grammar and word statistics, Proceedings of the Fourteenth National Conference on Artificial Intelligence. AAAI Press/MIT Press, Menlo Park (1997).
Dan Klein and Christopher D. Manning. 2002. A Generative Constituent-Context Model for Improved Grammar Induction. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 128-135.
Dan Klein and Christopher D. Manning. 2002. Natural Language Grammar Induction using a Constituent-Context Model. In Thomas G. Dietterich, Suzanna Becker, and Zoubin Ghahramani (eds), Advances in Neural Information Processing Systems 14 (NIPS 2001). Cambridge, MA: MIT Press, vol. 1, pp. 35-42.
Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. ACL 2003, pp. 423-430.
Dan Klein and Christopher D. Manning. 2003. Factored A* Search for Models over Sequences and Trees. IJCAI 2003.
Dan Klein and Christopher D. Manning. 2003. A* Parsing: Fast Exact Viterbi Parse Selection. HLT-NAACL 2003.
Kristina Toutanova, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen. 2002. Parse Disambiguation for a Rich HPSG Grammar. First Workshop on Treebanks and Linguistic Theories (TLT2002), pp. 253-263. Sozopol, Bulgaria.
Kristina Toutanova, Christopher D. Manning, Dan Flickinger, and Stephan Oepen. 2005. Stochastic HPSG Parse Disambiguation using the Redwoods Corpus. Research in Language and Computation 2005.
B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning. Max-Margin Parsing. Empirical Methods in Natural Language Processing (EMNLP04), Barcelona, Spain, July 2004. Received best paper award.
Eugene Charniak and Mark Johnson (2005). Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
Ryan McDonald, Koby Crammer and Fernando Pereira (2005). Online Large-Margin Training of Dependency Parsers. 43rd Annual Meeting of the Association for Computational Linguistics, ACL 2005.