Ling 235: Quantitative and
Probabilistic Explanation in Linguistics 
Course Syllabus 
(updated 2005/01/04) 
Date 
Topic 
Out 
Due 
Week 1 



Wednesday, 5 Jan 05 
An Introduction and an Example. 


Linguistics: What motivates probabilistic approaches and statistical methodology in linguistics? Problems of categoricity. The greater explanatory power of probabilistic models. Some examples. Statistics: Exploratory Data Analysis (EDA). Introduction to SPSS. A case study on a sociolinguistics dataset. Supplemental readings: Steven Abney. 1996. Statistical
Methods and Linguistics. In: Judith Klavans and Philip Resnik (eds.), The
Balancing Act: Combining Symbolic and Statistical Approaches to Language.
The MIT Press, Christopher D. Manning. 2002. Probabilistic Syntax. In Rens Bod,
Jennifer Hay, and Stefanie Jannedy (eds.), Probabilistic Linguistics,
MIT Press, 2003. 



Week 2 



Monday, 10 Jan 05 
Basic concepts in probability and the idea of building probabilistic models for linguistic explanation. 
HW #1 

Statistics: The sociolinguistics example continued: Model building in SPSS (building a VARBRUL/logistic regression model of the data). Probability intro: counting, basic probability laws, maximum likelihood; discrete distributions; joint and conditional probability. Hypothesis tests. Supplemental readings: John Goldsmith. 2001. Probability for linguists. Microsoft
Word or converted
to HTML. or Christopher Manning and
Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing.
Chapter 2, pp. 3954, 6068, 7276. or Rice, John A. Mathematical Statistics and Data Analysis. 2^{nd} edition. Duxbury Press, 1995. 



Wednesday, 12 Jan 05 
Active vs. passive variation. Modeling the choice with logistic regression (a.k.a. Varbrul) 


Linguistics: E. Judith Weiner and William Labov. 1983. Constraints on the agentless passive. Journal of Linguistics 19: 2958. Statistics: Finish up intro on model building in SPSS. Data visualization: scatterplots. A tiny bit about logistic regression models. 



Week 3 



Monday, 17 Jan 05 
Martin Luther King Day  no class 






Wednesday, 19 Jan 05 
Domain minimization, contingency table statistics. Grammatical weight and ambiguity avoidance. 
HW #2 
HW #1 
Linguistics: Wasow, Thomas Postverbal Behavior. CSLI Publications. 2002. Chapter 2. Statistics: More on contingency tables. Independence. Hypothesis testing redux. The chisquared test. 



Week 4
(and 5) 



Monday, 24 Jan 05 
Statistics on contingency tables and linguistic parallelism 


Linguistics: Parallelism in coordination (Roger’s handout). Statistics: Fisher's exact test. Likelihood ratios: log odds ratios, and G^{2 }test. Samples and statistical inference, estimating parameters, the method of maximum likelihood, maximum likelihood for multinomial cell probabilities. Supplemental Readings: Frazier,
Lyn, Alan Munn and Charles Clifton (2000) "Processing coordinate
structures", Journal of Psycholinguistic Research 



Wednesday, 26 Jan 05 
Probabilistic grammars. Constructing models and examining their goodness of fit. Comparing models. 
HW #3 
HW #2 
Linguistics: Suppes, Patrick. 1970. Probabilistic grammars for natural languages. Synthèse 22: 95116. Supplemental readings: Roland and Jurafsky. Verb Sense and Verb
Subcategorization Probabilities. CUNY 1998. 



Week 6 



Monday, 7 Feb 05 
Rest of Suppes discussion and Linear regression models. 
HW #4 
HW #3 
Statistics:Mean, median, and variance. Linear regression: simple and multiple linear regression. 



Wednesday, 9 Feb 05 
Gradience in grammaticality. Magnitude estimation. 


Linguistics: Magnitude Estimation for linguistic data Bard, Ellen Gurman, Robertson, Dan, and Sorace, Antonella. 1996. Magnitude Estimation of Linguistic Acceptability. Language 72: 3268. Supplemental Sorace,
A. (2000)."Gradients in auxiliary selection with intransitive
verbs". Language 76: 859890. Keller,
Frank and Antonella Sorace. 2003. Gradient
Auxiliary Selection and Impersonal Passivization in German: An Experimental
Investigation. Journal of Linguistics 39:1, 57108. Keller,
Frank and Ash Asudeh. 2001. Constraints
on Linguistic Coreference: Structural vs. Pragmatic Factors. In Johanna
D. Moore and Keith Stenning, eds., Proceedings of the 23rd Annual
Conference of the Cognitive Science Society, 483488. 

Week 7 



Monday, 7 Feb 05 
Conditional probabilistic syntax. Determining systemic choices: Optimality Theory and Stochastic Optimality Theory 
HW #5 
HW #4 
Linguistics: Christopher D. Manning. 2002. Probabilistic Syntax. In Rens Bod, Jennifer Hay, and Stefanie Jannedy (eds.), Probabilistic Linguistics, MIT Press, 2003. Section 8.5. Statistics: Stochastic optimality theory. Boersma and Hayes intro. Boersma, How we learn. Paul Boersma. 1999. OptimalityTheoretic learning in the Praat program. IFA Proceedings 23: 1735; (ROA 380) 



Wednesday, 9 Feb 05 
Argument realization and Stochastic Optimality Theory. 


Linguistics: Joan Bresnan, Shipra Dingare, and Christopher D. Manning. Soft
Constraints Mirror Hard Constraints: Voice and Person in English and Lummi.
Proceedings of the LFG01 Conference, pp. 1332, Supplemental readings: Joan
Bresnan and Tatiana Nikitina. 2003. "On the Gradience of
the Dative Alternation". Draft of May 7, 2003. 



Week 8 



Monday, 14 Feb 05 
Logistic regression models of systemic choice 
HW #6 
HW #5 
Statistics: Logistic regression. Sankoff, D. 1988. Variable rules. In U. Ammon, Fred L. Ramsey and Daniel W. Schafer. 1997. The Statistical Sleuth: A Course in Methods of Data Analysis. Belmont, CA: Duxbury Press, chapter 20, pp. 564583. Supplemental readings: Labov, William. 1969. Contraction, deletion and inherent
variability of the English copula. Language 45, 71562, extract. 



Wednesday, 16 Feb 05 
Logistic regression models in linguistics reprise 


Linguistics: Arnold, Jennifer, Thomas Wasow, Ash Asudeh,
and Peter Alrenga. Avoiding
Attachment Ambiguities: the role of Constituent Ordering. Journal of
Memory and Language 55.1: 5570. 2004. Supplemental readings: Lohse,
Barbara , John Hawkins, and Thomas Wasow. Processing Domains in English
VerbParticle Constructions. Language 80.2: 238261. 2004 



Week 9 



Monday, 21 Feb 05 
More on logistic regression. Interaction effects. 

HW #6 
Linguistics: Roland, Douglas, Jeffrey L. Elman, Victor S. Ferreira (in press). Why
is that? Structural prediction and ambiguity resolution in a very large
corpus of English sentences. Cognition. Supplemental readings: Temperley, David. 2003. Ambiguity Avoidance in
English Relative Clauses. Language 79: 46484. Race,
D. S. & MacDonald, M.C. (2003). The use of "that" in the
production and comprehension of object relative clauses. Proceedings of
the 25th Annual Meeting of the Cognitive Science Society. 



Wednesday, 23 Feb 05 
Constraint interactions. Classification accuracy. Evaluating model fit.


Project outline 
Linguistics and statistics: Robert Sigley. 2003.
The
importance of interaction
effects.
Language Variation and
Change. 



Week 10 



Monday, 28 Feb 05 
Model comparisons: stochastic OT and logistic
regression.



Linguistics: Gerhard Jäger and Anette Rosenbach. 2004. The
winner takes it all  almost. Cumulativity in grammatical variation, manuscript,
University of Supplemental Anette
Rosenbach: Aspects of iconicity and economy in the choice between the sgenitive
and the ofgenitive in English. To appear in B. Mondorf and G.
Rohdenburg (eds), Determinants of Grammatical Variation in English. Mouton de Gruyter. Altenberg.
The Genitive v. the ofConstruction. 



Wednesday, 2 Mar 05 
Model comparisons: stochastic OT and
logistic regression. Decision tree or socalled
"analogic" models.



Linguistics: Ernestus, Mirjam Theresia Constantia, and Harald R.
Baayen. 2004. Predicting the Unpredictable:
Interpreting Neutralized Segments in Dutch. Language 79(1). 



Week 11 (i.e., we won't get to this!) 



Monday, 7 Mar 05 
More model comparisons.



Linguistics: Sarah Benor and Roger Levy. 2004. The Chicken or the Egg?
A probabilistic analysis of English binomials. Draft. http://www.stanford.edu/~rog/papers/binomials.pdf 



Wednesday, 9 Mar 05 
Wrap up. 






The End 

Final paper 