Calendar

Mon	Tue	Wed	Thu	Fri
3/30	3/31	4/1 PA1 out Lecture 1: Intro	4/2	4/3
4/6 Lecture 2: N-gram Models	4/7	4/8 Lecture 3: StatMT	4/9	4/10 Section 1: Smoothing
4/13 Lecture 4: StatMT & EM	4/14	4/15 PA1 due; PA2 out Lecture 5: StatMT Systems	4/16	4/17 Section 2: EM
4/20 Lecture 6: IE/NER & NB Models	4/21	4/22 Lecture 7: MaxEnt Classifiers	4/23	4/24 Section 3: Corpora
4/27 Lecture 8: MaxEnt Sequence Classifiers	4/28	4/29 PA2 due; PA3 out Lecture 9: IE and text mining	4/30	5/1 Section 4: MaxEnt
5/4 Lecture 10: Syntax & Parsing	5/5	5/6 Final project proposal due Lecture 11: DPs for Parsing	5/7	5/8 Section 5: Parsing & PCFGs
5/11 Lecture 12: PCFGs	5/12	5/13 PA3 due Lecture 13: StatParsers	5/14	5/15
5/18 Lecture 14: Semantic Role Labeling	5/19	5/20 Lecture 15: ComSem	5/21	5/22
5/25 Memorial Day	5/26	5/27 Lecture 16: ComSem II	5/28	5/29
6/1 Lecture 17: Lexical Semantics	6/2	6/3 Final project due Lecture 18: QA & Inference	6/4	6/5
6/8	6/9 8:30am - 11:30am Final project presentations	6/10	6/11	6/12

Syllabus

Lecture 1
Wed
4/1/09 Introduction [slides: pdf] Overview of NLP. Statistical machine translation. Language models and their role in speech processing. Course introduction and administration.
No required reading.
Optional good background reading: J&M Ch. 1; M&S 1.0-1.3, 4.1-4.2, Collaboration Policy
Optional reading on Unix text manipulation (useful skill!): Ken Church's tutorial Unix for Poets [ps, pdf]
Background for MT video: The IBM 701 translator (1954)
(If your knowledge of probability theory is limited, also read M&S 2.0-2.1.7. If that's too condensed, read the probability chapter of an intro statistics textbook, e.g. Rice, Mathematical Statistics and Data Analysis, ch. 1.)
Distributed today: Programming Assignment 1

Lecture 2
Mon
4/6/09

N-gram Language Models and Information Theory [slides: pdf; pdf1up; MegaHal: html]
n-gram models. Entropy, relative entropy, cross entropy, mutual information, perplexity. Statistical estimation and smoothing for language models.
Assigned reading: J&M ch. 4
Alternative reading:M&S 1.4, 2.2, ch. 6.
Tutorial reading: Kevin Knight. A Statistical MT Tutorial Workbook [pdf] [rtf]. MS., August 1999. Sections 1-14.
Optional reading: Joshua Goodman (2001), A Bit of Progress in Language Modeling, Extended Version [pdf, ps]
Optional reading: Stanley Chen and Joshua Goodman (1998), An empirical study of smoothing techniques for language modeling [pdf, ps]
Optional reading: Teh, Yee Whye. 2006. A Hierarchical Bayesian Language Model based on Pitman-Yor Processes. EMNLP 2006. [pdf]

Lecture 3
Wed
4/8/09

Statistical Machine Translation (MT), Alignment Models [slides: pdf pdf-1up ]
Assigned reading: J&M ch. 25, sections 25.0-25.5, 25.11.

Section 1
Fri
4/10/09

Smoothing [notes: ppt used in the section; original xls ]
Smoothing: absolute discounting, proving you have a proper probability distribution, Good-Turing implementation. Information theory examples and intuitions. Java implementation issues.

Lecture 4
Mon
4/13/09

Statistical Alignment Models and Expectation Maximization (EM) [quiz question: pdf, slides: pdf, spreadsheet: xls]
EM and its use in statistical MT alignment models.
Assigned reading: Kevin Knight. A Statistical MT Tutorial Workbook [pdf] [rtf]. MS., August 1999. Sections 15-37 (get the free beer!).
(read also the relevant Knight Workbook FAQ)
Reference reading: Geoffrey J. McLachlan and Thriyambakam Krishnan. 1997. The EM Algorithm and Extensions. Wiley
Optional further reading: M&S 13.
Moore, Robert C. 2005. Association-Based Bilingual Word Alignment. In Proceedings, Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Ann Arbor, Michigan , pp. 1-8.
Moore, Robert C. 2004. Improving IBM Word Alignment Model 1. In Proceedings, 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 519-526.

Lecture 5
Wed
4/15/09

Putting together a complete statistical MT system [6-up slides: pdf] [1-up slides: pdf]
MT evaluation. Decoding and Search. Recent work in statistical MT: statistical phrase based systems and syntax in MT.
Required reading: J&M, secs 25.7-10, 25.12.
Reference: "Seminal" background reading: Brown, Della Pietra, Della Pietra, and Mercer, The Mathematics of Statistical Machine Translation: Parameter Estimation [pdf, pdf]. Computational Linguistics.
[After their work in speech and language technology, the team turned to finance....]
Further references:
Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada. 2001. Fast Decoding and Optimal Decoding for Machine Translation. ACL.
K. Yamada and K. Knight. 2002. A Decoder for Syntax-Based Statistical MT. ACL.
David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. ACL 2005, pages 263-270.
Due today: Programming Assignment 1
Distributed today: Programming Assignment 2

Section 2
Fri
4/16/09

The EM algorithm [notes: ppt xls ~~k-means example soft k-means example~~]

Lecture 6
Mon
4/20/09

Information Extraction (IE) and Named Entity Recognition (NER). ~~[6-up slides: pdf~~] [~~1-up slides: pdf~~]
Information sources, rule-based methods, evaluation (recall, precision). Introduction to supervised machine learning methods. Naïve Bayes (NB) classifiers for entity classification.
Assigned reading:
J&M secs 22.0-22.1 (intro to IE and NER).
J&M secs. 5.5 and 5.7 (introduce HMMs, Viterbi algorithm, and experimental technique). If you're not familiar with supervised classification and Naive Bayes, read J&M sec 20.2 before the parts of ch. 5.
Alternative reading: M&S 8.1 (evaluation), 7.1 (experimental metholdology), 7.2.1 (Naive Bayes), 10.2-10.3 (HMMs and Viterbi)
Background and older IE reading:
Peter Jackson and Isabelle Moulinier. 2007. Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization. John Benjamins. 2nd edition. Ch. 3.
Ion Muslea (1999), Extraction Patterns for Information Extraction Tasks: A Survey [pdf, ps], AAAI-99 Workshop on Machine Learning for Information Extraction.
Douglas E. Appelt. 1999. Introduction to Information Extraction Technology

Lecture 7
Wed
4/22/09

Maximum Entropy Classifiers [slides: pdf, pdf1up]
Assigned Reading:
class slides.
J&M secs 6.6-7 (maximum entropy models)
Additional references:
M&S section 16.2
Adwait Ratnaparkhi. A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.

Section 3
Fri
4/24/09

Corpora and other resources [notes: ppt, ~~pdf(2008), txt(2006)~~]

Lecture 8
Mon
4/27/09

Maximum Entropy Sequence Classifiers [slides: 6-up pdf] [slides: 1-up pdf]
Assigned Reading:
class slides.
J&M secs. 6.0-6.4 and 6.8-6.9 (HMMs in detail and then MEMMs).
Other references: Adwait Ratnaparkhi. A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.
Adam Berger, A Brief Maxent Tutorial
Distributed today: Final project guide

Lecture 9
Wed
4/29/09

IE and text mining [slides: 6-up pdf] [slides: 1-up pdf]
Assigned reading: J&M secs. 22.2, 22.4.
HMMs for IE reading: Dayne Freitag and Andrew McCallum (2000), Information Extraction with HMM Structures Learned by Stochastic Optimization, AAAI-2000
Maxent NER reading: Jenny Finkel et al., 2005. Exploring the Boundaries: Gene and Protein Identification in Biomedical Text
Due today: Programming Assignment 2
Distributed today: Programming Assignment 3

Section 4
Fri
5/1/09

Maximum entropy sequence models [notes: pdf, xls]

Lecture 10
Mon
5/4/09

Syntax and Parsing for Context-Free Grammars (CFGs) [1-up slides: pdf] Parsing, treebanks, attachment ambiguities. Context-free grammars. Top-down and bottom-up parsing, empty constituents, left recursion, and repeated work. Probabilistic CFGs.
Assigned reading: J&M ch. 13, secs. 13.0-13.3.
Background reading: J&M ch. 9 (or M&S ch. 3). This is especially if you haven't done any linguistics courses, but even if you have, there's useful information on treebanks and part-of-speech tag sets used in NLP.

Lecture 11
Wed
5/6/09

Dynamic Programming for Parsing [1-up slides: pdf] Dynamic programming for parsing. The CKY algorithm. Accurate unlexicalized PCFG parsing.
Assigned reading: J&M sec. 13.4
Additional information: Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. ACL 2003, pp. 423-430.
Due today: final project proposals

Section 5
Fri
5/8/09

Parsing, PCFGs [~~notes: pdf~~]

Lecture 12
Mon
5/11/09

Lexicalized Probabilistic Context-Free Grammars (LPCFGs) [6-up slides: pdf] [1-up slides: pdf]
Lexicalization and lexicalized parsing. The Charniak, Collins/Bikel, and Petrov & Klein parsers.
Assigned reading: J&M ch. 14 (you can stop at the end of sec. 14.7, if you'd like!)
Alternative reading: M&S Ch. 11
Optional readings:

Eugene Charniak (1997), Statistical techniques for natural language parsing, AI Magazine.
Eugene Charniak (1997), Statistical parsing with a context-free grammar and word statistics, Proceedings of the Fourteenth National Conference on Artificial Intelligence. AAAI Press/MIT Press, Menlo Park (1997).
Eugene Charniak (2000), A Maximum-Entropy-Inspired Parser, Proceedings of NAACL-2000.

Lecture 13
Wed
5/13/09

Modern Statistical Parsers [6-up slides: pdf] [1-up slides: pdf] [quiz submission guide: txt]
Search methods in parsing: Agenda-based chart, A*, and "best-first" parsing. Dependency parsing. Discriminative parsing. Assigned reading: J&M ch. 14 (you can stop at the end of sec. 14.7, if you'd like!)
Alternative, less up-to-date reading: M&S 8.3, 12
Optional readings:

Dan Klein and Christopher D. Manning. 2002. A Generative Constituent-Context Model for Improved Grammar Induction. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 128-135.
Dan Klein and Christopher D. Manning. 2002. Natural Language Grammar Induction using a Constituent-Context Model. In Thomas G. Dietterich, Suzanna Becker, and Zoubin Ghahramani (eds), Advances in Neural Information Processing Systems 14 (NIPS 2001). Cambridge, MA: MIT Press, vol. 1, pp. 35-42.
Dan Klein and Christopher D. Manning. 2003. Factored A* Search for Models over Sequences and Trees. IJCAI 2003.
Dan Klein and Christopher D. Manning. 2003. A* Parsing: Fast Exact Viterbi Parse Selection. HLT-NAACL 2003.
Kristina Toutanova, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen. 2002. Parse Disambiguation for a Rich HPSG Grammar. First Workshop on Treebanks and Linguistic Theories (TLT2002), pp. 253-263. Sozopol, Bulgaria.
Kristina Toutanova, Christopher D. Manning, Dan Flickinger, and Stephan Oepen. 2005. Stochastic HPSG Parse Disambiguation using the Redwoods Corpus. Research in Language and Computation 2005.
B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning. Max-Margin Parsing. Empirical Methods in Natural Language Processing (EMNLP04), Barcelona, Spain, July 2004. Received best paper award.
Eugene Charniak and Mark Johnson (2005). Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
Ryan McDonald, Koby Crammer and Fernando Pereira (2005). Online Large-Margin Training of Dependency Parsers. 43rd Annual Meeting of the Association for Computational Linguistics, ACL 2005.