Calendar

Mon	Tue	Wed	Thu	Fri
3/29 PA1 out Lecture 1: Introduction	3/30	3/31 Lecture 2: N-gram Models	4/1	4/2
4/5 Lecture 3: LMs and StatMT	4/6	4/7 Lecture 4: EM (and StatMT)	4/8	4/9 Section 1: Smoothing
4/12 PA1 due; PA2 out Lecture 5: StatMT Systems	4/13	4/14 Lecture 6: Phrase-based & syntactic MT	4/15	4/16 Section 2: EM
4/19 Lecture 7: IE/NER & NB Models	4/20	4/21 Lecture 8: MaxEnt Classifiers	4/22	4/23 Section 3: Corpora
4/26 PA2 due; PA3 out Lecture 9: Sequence classifiers & IE	4/27	4/28 Lecture 10: Syntax & Parsing	4/29	4/30 Section 4: MaxEnt
5/3 Final project proposal due Lecture 11: DPs for Parsing	5/4	5/5 Lecture 12: LPCFGs	5/6	5/7 Section 5: Parsing & PCFGs
5/10 Lecture 13: Statistical Parsers	5/11 PA3 due	5/12 Lecture 14: Grammar induction	5/13	5/14
5/17 Lecture 15: Semantic Role Labeling	5/18	5/19 Lecture 16: ComSem	5/20	5/21
5/24 Lecture 17: ComSem II	5/25	5/26 Lecture 18: Lexical Semantics	5/27	5/28
5/31 Memorial Day	6/1	6/2 Final project due Lecture 19: QA & Inference	6/3	6/4
6/7	6/8	6/9 9:00am - 12:00am Final project presentations	6/10	6/11

Syllabus

Lecture 1
Mon
3/29/10 Introduction [slides: pdf; pdf1up] Overview of NLP. Statistical machine translation. Language models and their role in speech processing. Course introduction and administration.
No required reading.
Optional good background reading: J&M Ch. 1; M&S 1.0-1.3, 4.1-4.2, Collaboration Policy
Optional reading on Unix text manipulation (useful skill!): Ken Church's tutorial Unix for Poets [ps, pdf]
Background for MT video [fun read!]: The IBM 701 translator (1954)
(If your knowledge of probability theory is limited, also read M&S 2.0-2.1.7. If that's too condensed, read the probability chapter of an intro statistics textbook, e.g. Rice, Mathematical Statistics and Data Analysis, ch. 1.)
Distributed today: Programming Assignment 1

Lecture 2
Wed
3/31/10

N-gram Language Models and Information Theory [slides: pdf; pdf1up; MegaHal: html]
n-gram models. Statistical estimation and smoothing for language models. Entropy, cross entropy, mutual information, perplexity.
Assigned reading: J&M ch. 4
Alternative reading: M&S 1.4, 2.2, ch. 6.
Tutorial reading: Kevin Knight. A Statistical MT Tutorial Workbook [pdf] [rtf]. MS., August 1999. Sections 1-14.
Optional advanced reading: Joshua Goodman (2001), A Bit of Progress in Language Modeling, Extended Version [pdf, ps]
Optional advanced reading: (older but shorter) Stanley Chen and Joshua Goodman (1998), An empirical study of smoothing techniques for language modeling [pdf, ps]
Optional very advanced reading: Teh, Yee Whye. 2006. A Hierarchical Bayesian Language Model based on Pitman-Yor Processes. EMNLP 2006. [pdf]

Lecture 3
Mon
4/5/10

Statistical Machine Translation (MT), Alignment Models & (and LMs continued) [slides: pdf; pdf-1up ]
Assigned reading: J&M ch. 25, sections 25.0-25.5, 25.11.

Lecture 4
Wed
4/7/10

Expectation Maximization (EM) and Statistical Alignment Models [quiz question: pdf, slides: pdf, pdf-1up, spreadsheet: xls]
EM and its use in statistical MT alignment models.
Assigned reading: Kevin Knight. A Statistical MT Tutorial Workbook [pdf] [rtf]. MS., August 1999. Sections 15-37 (get the free beer!).
(read also the relevant Knight Workbook FAQ)
Reference reading: Geoffrey J. McLachlan and Thriyambakam Krishnan. 1997. The EM Algorithm and Extensions. Wiley
Optional further reading: M&S 13.
Moore, Robert C. 2005. Association-Based Bilingual Word Alignment. In Proceedings, Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Ann Arbor, Michigan , pp. 1-8.
Moore, Robert C. 2004. Improving IBM Word Alignment Model 1. In Proceedings, 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 519-526.

Section 1
Fri
4/9/10

Smoothing [notes: ppt used in the section; original xls ]
Smoothing: absolute discounting, proving you have a proper probability distribution, Good-Turing implementation. Information theory examples and intuitions. Java implementation issues.

Lecture 5
Mon
4/12/10

Putting together a complete statistical MT system [6-up slides: pdf] [1-up slides: pdf]
IBM Word alignment models. MT evaluation. Decoding and Search.
Required reading: J&M, secs 25.7-10, 25.12.
Reference: "Seminal" background reading: Brown, Della Pietra, Della Pietra, and Mercer, 2003, The Mathematics of Statistical Machine Translation: Parameter Estimation [pdf, pdf]. Computational Linguistics.
[After their work in speech and language technology, the team turned to finance.... (the original article from Bloomberg has long since disappeared...)]
Further references:
Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada. 2001. Fast Decoding and Optimal Decoding for Machine Translation. ACL.
Due today: Programming Assignment 1
Distributed today: Programming Assignment 2

Lecture 6
Wed
4/14/10

MT systems. Decoding. Phrased-based and syntactic MT. Real world MT. [6-up slides: pdf] [1-up slides: pdf]
Decoding. Recent work in statistical MT: statistical phrase based systems and syntax in MT. MT in practice.
Required reading: J&M, secs 25.7-10, 25.12.
Further references:
Franz Josef Och, Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics 30(4): 417-449.
K. Yamada and K. Knight. 2002. A Decoder for Syntax-Based Statistical MT. ACL.
David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. ACL 2005, pages 263-270.

Section 2
Fri
4/16/10

The EM algorithm [notes: ppt xls ~~k-means example soft k-means example~~]

Lecture 7
Mon
4/19/10

Information Extraction (IE) and Named Entity Recognition (NER). [6-up slides: pdf] [1-up slides: pdf]
Information sources, rule-based methods, evaluation (recall, precision). Introduction to supervised machine learning methods. Naïve Bayes (NB) classifiers for entity classification.
Assigned reading:
J&M secs 22.0-22.1 (intro to IE and NER).
J&M secs. 5.5 and 5.7 (introduce HMMs, Viterbi algorithm, and experimental technique). If you're not familiar with supervised classification and Naive Bayes, read J&M sec 20.2 before the parts of ch. 5.
Alternative reading: M&S 8.1 (evaluation), 7.1 (experimental metholdology), 7.2.1 (Naive Bayes), 10.2-10.3 (HMMs and Viterbi)
Background IE reading:
Recent Wired article on Google's search result ranking (but don't completely swallow the hype: click through on the mike siwek lawyer mi query, and read a couple of the top hits in the search results).
Sunita Sarawagi. 2008. Information Extraction. Foundations and Trends in Databases 1(3): 261-377. http:/dx.doi.org/10.1561/1900000003
Peter Jackson and Isabelle Moulinier. 2007. Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization. John Benjamins. 2nd edition. Ch. 3.
Ion Muslea (1999), Extraction Patterns for Information Extraction Tasks: A Survey [pdf, ps], AAAI-99 Workshop on Machine Learning for Information Extraction.
Douglas E. Appelt. 1999. Introduction to Information Extraction Technology

Lecture 8
Wed
4/21/10

Maximum Entropy Classifiers [slides: pdf, pdf1up]
Assigned Reading:
class slides.
J&M secs 6.6-7 (maximum entropy models)
Additional references:
M&S section 16.2
Adwait Ratnaparkhi. A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.

Section 3
Fri
4/23/10

Corpora and other resources [notes: ppt, ~~pdf(2008), txt(2006)~~]

Lecture 9
Mon
4/26/10

Maximum Entropy Sequence Classifiers and Information Extraction [slides: 6-up pdf] [slides: 1-up pdf]
Assigned Reading:
class slides.
J&M secs. 6.0-6.4 and 6.8-6.9 (HMMs in detail and then MEMMs), and 22.2, 22.4 (IE).
Other references: Adwait Ratnaparkhi. A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.
Adam Berger, A Brief Maxent Tutorial
HMMs for IE reading: Dayne Freitag and Andrew McCallum (2000), Information Extraction with HMM Structures Learned by Stochastic Optimization, AAAI-2000
Maxent NER reading: Jenny Finkel et al., 2005. Exploring the Boundaries: Gene and Protein Identification in Biomedical Text
Distributed today: Final project guide Due today: Programming Assignment 2
Distributed today: Programming Assignment 3

Lecture 10
Wed
4/28/10

Syntax and Parsing for Context-Free Grammars (CFGs) [slides: 6-up pdf] [slides: 1-up pdf]
Parsing, treebanks, attachment ambiguities. Context-free grammars. Top-down and bottom-up parsing, empty constituents, left recursion, and repeated work. Probabilistic CFGs.
Assigned reading: J&M ch. 13, secs. 13.0-13.3.
Background reading: J&M ch. 9 (or M&S ch. 3). This is especially if you haven't done any linguistics courses, but even if you have, there's useful information on treebanks and part-of-speech tag sets used in NLP.

Section 4
Fri
4/30/10

Maximum entropy sequence models [notes: pdf, xls]

Lecture 11
Mon
5/3/10

Dynamic Programming for Parsing [slides: 6-up pdf] [slides: 1-up pdf] Dynamic programming for parsing. The CKY algorithm. Accurate unlexicalized PCFG parsing.
Assigned reading: J&M sec. 13.4
Additional information: Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. ACL 2003, pp. 423-430.
Due today: final project proposals

Lecture 12
Wed
5/5/10

Lexicalized Probabilistic Context-Free Grammars (LPCFGs) [6-up slides: pdf] [1-up slides: pdf]
Lexicalization and lexicalized parsing. The Charniak, Collins/Bikel, and Petrov & Klein parsers.
Assigned reading: J&M ch. 14 (you can stop at the end of sec. 14.7, if you'd like!)
Alternative reading: M&S Ch. 11
Optional readings:

Eugene Charniak (1997), Statistical techniques for natural language parsing, AI Magazine.
Eugene Charniak (1997), Statistical parsing with a context-free grammar and word statistics, Proceedings of the Fourteenth National Conference on Artificial Intelligence. AAAI Press/MIT Press, Menlo Park (1997).
Eugene Charniak (2000), A Maximum-Entropy-Inspired Parser, Proceedings of NAACL-2000.

Section 5
Fri
5/7/10

Parsing, PCFGs [~~notes: pdf~~]

Lecture 13
Mon
5/10/10

Modern Statistical Parsers [6-up slides: pdf] [1-up slides: pdf] [quiz submission guide: txt]
Search methods in parsing: Agenda-based chart, A*, and "best-first" parsing. Dependency parsing. Discriminative parsing. Assigned reading: J&M ch. 14 (you can stop at the end of sec. 14.7, if you'd like!)
Alternative, less up-to-date reading: M&S 8.3, 12

Dan Klein and Christopher D. Manning. 2003. Factored A* Search for Models over Sequences and Trees. IJCAI 2003.
Dan Klein and Christopher D. Manning. 2003. A* Parsing: Fast Exact Viterbi Parse Selection. HLT-NAACL 2003.
Kristina Toutanova, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen. 2002. Parse Disambiguation for a Rich HPSG Grammar. First Workshop on Treebanks and Linguistic Theories (TLT2002), pp. 253-263. Sozopol, Bulgaria.
Kristina Toutanova, Christopher D. Manning, Dan Flickinger, and Stephan Oepen. 2005. Stochastic HPSG Parse Disambiguation using the Redwoods Corpus. Research in Language and Computation 2005.
B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning. Max-Margin Parsing. Empirical Methods in Natural Language Processing (EMNLP04), Barcelona, Spain, July 2004. Received best paper award.
Eugene Charniak and Mark Johnson (2005). Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
Ryan McDonald, Koby Crammer and Fernando Pereira (2005). Online Large-Margin Training of Dependency Parsers. 43rd Annual Meeting of the Association for Computational Linguistics, ACL 2005.