|
|
CS 224N -- Ling 237 |
|
Course Syllabus |
(updated 2004/05/11) |
|
Date |
Topic |
Out |
Due |
|
Week 1 |
|
|
|
|
What is NLP? History; current applications and topics. Need for precise language understanding. Course introduction and administration.
|
HW #1: Text |
|
|
|
Rule based approaches to linguistic structure. How to find sentence structure: parsing as search. |
|
|
|
|
Week 2 |
|
|
|
|
Monday, 5 Apr 04 Lecture Slidesps pdf |
NLP Parsing: top-down parsing, bottom-up parsing; empty constituents, and left-recursive rules. Intro to dynamic programming of parsing
|
|
|
|
Word classes (word clustering), lexical semantics, syntactic ambiguities. References: J&M Ch. 10 |
|
|
|
|
Wednesday, |
Dynamic programming methods of parsing: Tabular/memoized/chart parsing methods. Well-formed substring tables. The CKY algorithm. The Earley algorithm. Active chart parsing. References: J&M Ch. 10 |
HW #1 |
|
|
Information theory: entropy, cross entropy, mutual information. |
|
|
|
|
Section |
Parsing algorithms |
|
|
|
|
|
|
|
|
Week 3 |
|
|
|
|
Monday, 12 Apr 04 Lecture Slides:speech recog ps pdf n-grams ppt |
n-gram models of language: Relative Frequency estimation from corpora, n-gram models of English - Markov models, relative entropy, cross entropy, and perplexity. Smoothing techniques to deal with unseen or insufficiently seen contexts. References: Joshua Goodman. 2001. A Bit of Progress in Language Modeling. Computer Speech and Language, October 2001, pages 403-434. Jason Hutchens. MegaHAL site. |
|
|
|
Multinomial distributions, and smoothing them (for NLP) |
|
|
|
|
Wednesday, |
Word Sense Disambiguation: The general problem of word sense disambiguation, information sources, performance bounds, dictionary and supervised machine learning approaches. Naive Bayes classifiers. References: Computational Linguistics Vol 24 No 1, 1998 Special Issue on Word Sense Disambiguation (particularly the Introduction) |
HW #3: n‑grams ps pdfHW #4: WSD rtf |
HW #2 |
|
Naïve Bayes models. |
|
|
|
|
Section |
Accessing corpora at Stanford; linguistic annotation Corpora Handout.doc |
|
|
|
|
|
|
|
|
Week 4 |
|
|
|
|
Monday, 19 Apr 04 Lecture slidesps pdf Paying attention (XLS) |
POS tagging: Part of speech tagging. Available information sources.
|
|
|
|
Markov Models and Hidden Markov Models: Fundamental algorithms for hidden Markov models: determining the probability of an observed sequence, and the maximum probability state sequence (the Viterbi algorithm). |
|
|
|
|
Wednesday, ps pdf Comes across (XLS) |
Named Entity Recognition and Information Extraction: extracting semantic tokens (names of people, companies, prices, times, etc.) from text, use of cascades, identifying collocations and terminological phrases. Machine learning methods for IE over annotated data. Autoslog and HMM-based techniques. System evaluation: accuracy, precision and recall, F measure. Reference: Ion Muslea: "Extraction Patterns for Information Extraction Tasks: A Survey", AAAI-99 Workshop on Machine Learning for Information Extraction. |
|
HW #3 |
|
Hidden Markov Models Topics: Baum-Welch reestimation of parameters of HMM. The limited usefulness of this in part of speech tagging. Successful use in IE. EM as data clustering. |
|
|
|
|
Section |
Hidden Markov Models workshop: Working through HMMs: Jason Eisner's spreadsheet on EM for HMM's. More of Jason Eisner's HMMs |
|
|
|
|
|
|
|
|
Week 5 |
|
|
|
|
Monday, |
POS Tagging and other sequence problems continued: Other approaches to and issues that arise in part of speech tagging. Unknown words. Different tagsets. |
|
|
|
Discriminative methods: Logistic regression/Maxent classifiers. |
|
|
|
|
Wednesday, |
Conditional/discriminative sequence models applied to NLP tasks. Chunking and segmentation. (Midquarter eval) |
HW #5: POS ps pdf |
HW #4 |
|
Linguistic discrimination: designing features for discriminative classifiers |
|
|
|
|
Section |
Information extraction for the web: wrapper induction and related techniques, ppt |
|
|
|
|
|
|
|
|
Week 6 |
|
|
|
|
Monday, 3 May 04 Lecture slidesppt |
Probabilistic parsing |
|
|
|
Probabilistic Context-Free Grammars: probabilistic grammars. Calculating the probability of a string from a structured model. Choosing the highest probability parse. |
|
|
|
|
Wednesday, 5 May 04 Lecture slidesppt |
Modern probabilistic parsing Reading: (M&S chapter 12) and M&S section 8.3 Reference: Eugene Charniak. A Maximum-Entropy-Inspired Parser Proceedings of NAACL-2000. Eugene Charniak. Statistical techniques for natural language parsing AI Magazine. (1997). Eugene Charniak. Statistical parsing with a context-free grammar and word statistics, Proceedings of the Fourteenth National Conference on Artificial Intelligence AAAI Press/MIT Press, Menlo Park (1997). |
HW #6: PCFG ps pdf |
HW #5 ps pdf |
|
Attachment ambiguities: prepositional phrases, conjunctions, noun compounds; psycholinguistic models, linguistic features |
|
|
|
|
Section |
PCFGs and probablistic parsing, ppt |
|
|
|
|
|
|
|
|
Week 7 |
|
|
|
|
Monday, |
Term and attribute-value unification; feature grammars and unification-based parsing |
|
FinalP Abstract |
|
Semantic representations for NLP: (Typed) lambda calculus, compositionality |
|
|
|
|
Wednesday, |
Building semantic representations (2): rule-to-rule semantic translation. Manipulating semantic forms. Syntax-semantics interfaces. |
HW #6 |
|
|
|
|
|
|
|
Section |
Semantic representations and logical reasoning, ppt |
|
|
|
|
|
|
|
|
Week 8 |
|
|
|
|
Monday, |
Question answering: TREC-style robust QA, natural language database interfaces. Interface to knowledge representations. Reference: Marius Pasca, Sanda M. Harabagiu: High Performance Question/Answering. SIGIR 2001: 366-374. |
|
|
|
|
|
|
|
|
Dialogue and discourse systems; rhetorical structure; planning and requests Reference: Gazdar & Mellish, ch. 10 |
|
HW #7 |
|
|
|
|
|
|
|
Section |
|
|
|
|
|
|
|
|
|
Week 9 |
|
|
|
|
Monday, |
Machine translation: rule-based and
statistical approaches; sentence alignment.
|
|
|
|
|
|
|
|
|
Grammar Induction: can one do unsupervised learning of linguistic structure? (And why is it hard.) |
|
|
|
|
|
|
|
|
|
Section |
|
|
|
|
|
|
|
|
|
Week 10 |
|
|
|
|
Monday, 31 May 04 |
Memorial Day holiday - no class |
|
|
|
|
|
|
|
|
Wednesday, |
Project Mini Presentations. |
|
FinalP |
|
|
|
|
|
|
Finals Period - time to visit the beach! |
|
|
|