Ling 235: Quantitative and Probabilistic Explanation in Linguistics
Handout #2: Winter 2005 Syllabus

Course Syllabus

(updated 2005/01/04)

This is a tentative syllabus and is subject to change





Week 1




5 Jan 05

An Introduction and an Example. 



Linguistics: What motivates probabilistic approaches and statistical methodology in linguistics? Problems of categoricity. The greater explanatory power of probabilistic models. Some examples.

Statistics: Exploratory Data Analysis (EDA). Introduction to SPSS. A case study on a sociolinguistics dataset. 

Supplemental readings:

Steven Abney. 1996. Statistical Methods and Linguistics. In: Judith Klavans and Philip Resnik (eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language. The MIT Press, Cambridge, MA.

Christopher D. Manning. 2002. Probabilistic Syntax. In Rens Bod, Jennifer Hay, and Stefanie Jannedy (eds.), Probabilistic Linguistics, MIT Press, 2003.



Week 2



Monday, 10 Jan 05

Basic concepts in probability and the idea of building probabilistic models for linguistic explanation.

HW #1


Statistics: The sociolinguistics example continued: Model building in SPSS (building a VARBRUL/logistic regression model of the data).   Probability intro: counting, basic probability laws, maximum likelihood; discrete distributions; joint and conditional probability. Hypothesis tests.

Supplemental readings:

John Goldsmith. 2001. Probability for linguists. Microsoft Word or converted to HTML. or

Christopher Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Chapter 2, pp. 39-54, 60-68, 72-76. or

Rice, John A. Mathematical Statistics and Data Analysis. 2nd edition. Duxbury Press, 1995.



Wednesday, 12 Jan 05

Active vs. passive variation. Modeling the choice with logistic regression (a.k.a. Varbrul)




E. Judith Weiner and William Labov. 1983. Constraints on the agentless passive. Journal of Linguistics 19: 29-58.


Finish up intro on model building in SPSS.  Data visualization: scatterplots.  A tiny bit about logistic regression models.



Week 3



Monday, 17 Jan 05

Martin Luther King Day - no class






Wednesday, 19 Jan 05

Domain minimization, contingency table statistics.  Grammatical weight and ambiguity avoidance.

 HW #2

 HW #1


Wasow, Thomas Postverbal Behavior. CSLI Publications. 2002. Chapter 2.


More on contingency tables. Independence.  Hypothesis testing redux.  The chi-squared test.



Week 4 (and 5)



Monday, 24 Jan 05

Statistics on contingency tables and linguistic parallelism



Linguistics: Parallelism in coordination (Roger’s handout).

Statistics: Fisher's exact test. Likelihood ratios: log odds ratios, and G2 test. Samples and statistical inference, estimating parameters, the method of maximum likelihood, maximum likelihood for multinomial cell probabilities.

Supplemental Readings:

Frazier, Lyn, Alan Munn and Charles Clifton (2000) "Processing coordinate structures", Journal of Psycholinguistic Research



Wednesday, 26 Jan 05

Probabilistic grammars.  Constructing models and examining their goodness of fit.  Comparing models.

 HW #3

 HW #2


Suppes, Patrick. 1970. Probabilistic grammars for natural languages. Synthèse 22: 95-116.

Supplemental readings:

Roland and Jurafsky. Verb Sense and Verb Subcategorization Probabilities. CUNY 1998. 



Week 6



Monday, 7 Feb 05

Rest of Suppes discussion and Linear regression models.

 HW #4

 HW #3

Statistics:Mean, median, and variance. Linear regression: simple and multiple linear regression.



Wednesday, 9 Feb 05

Gradience in grammaticality.  Magnitude estimation. 



Linguistics: Magnitude Estimation for linguistic data

Bard, Ellen Gurman, Robertson, Dan, and Sorace, Antonella. 1996. Magnitude Estimation of Linguistic Acceptability. Language 72: 32-68.

Supplemental Readings:

Sorace, A. (2000)."Gradients in auxiliary selection with intransitive verbs". Language 76: 859-890.

Keller, Frank and Antonella Sorace. 2003. Gradient Auxiliary Selection and Impersonal Passivization in German: An Experimental Investigation. Journal of Linguistics 39:1, 57-108.

Keller, Frank and Ash Asudeh. 2001. Constraints on Linguistic Coreference: Structural vs. Pragmatic Factors. In Johanna D. Moore and Keith Stenning, eds., Proceedings of the 23rd Annual Conference of the Cognitive Science Society, 483-488. Mahawah, NJ: Lawrence Erlbaum.


Talk to Roger or Chris about final project!

Week 7



Monday, 7 Feb 05

Conditional probabilistic syntax. Determining systemic choices: Optimality Theory and Stochastic Optimality Theory

HW #5

HW #4 


Christopher D. Manning. 2002. Probabilistic Syntax. In Rens Bod, Jennifer Hay, and Stefanie Jannedy (eds.), Probabilistic Linguistics, MIT Press, 2003. Section 8.5.

Statistics: Stochastic optimality theory. Boersma and Hayes intro. Boersma, How we learn.

Paul Boersma. 1999. Optimality-Theoretic learning in the Praat program. IFA Proceedings 23: 17-35;   (ROA 380)



Wednesday, 9 Feb 05

Argument realization and Stochastic Optimality Theory.




Joan Bresnan, Shipra Dingare, and Christopher D. Manning. Soft Constraints Mirror Hard Constraints: Voice and Person in English and Lummi. Proceedings of the LFG01 Conference, pp. 13-32, Hong Kong. [pdf]

Supplemental readings:

Joan Bresnan and Tatiana Nikitina. 2003. "On the Gradience of the Dative Alternation".   Draft of May 7, 2003.  



Week 8



Monday, 14 Feb 05

Logistic regression models of systemic choice

HW #6

HW #5

Statistics: Logistic regression.

Sankoff, D. 1988. Variable rules. In U. Ammon, N. Dittmar, and K. J. Mattheier (eds.), Sociolinguistics: An International Handbook of the Science of Language and Society. Vol.2, pp. 984-997. Berlin: Walter de Gruyter.

Fred L. Ramsey and Daniel W. Schafer. 1997. The Statistical Sleuth: A Course in Methods of Data Analysis. Belmont, CA: Duxbury Press, chapter 20, pp. 564-583.

Supplemental readings:

Labov, William. 1969. Contraction, deletion and inherent variability of the English copula. Language 45, 715-62, extract.



Wednesday, 16 Feb 05

Logistic regression models in linguistics reprise




Arnold, Jennifer, Thomas Wasow, Ash Asudeh, and Peter Alrenga. Avoiding Attachment Ambiguities: the role of Constituent Ordering. Journal of Memory and Language 55.1: 55-70. 2004.

Supplemental readings:

Lohse, Barbara , John Hawkins, and Thomas Wasow. Processing Domains in English Verb-Particle Constructions. Language 80.2: 238-261. 2004



Week 9



Monday, 21 Feb 05

More on logistic regression.  Interaction effects.


HW #6


Roland, Douglas, Jeffrey L. Elman, Victor S. Ferreira (in press). Why is that? Structural prediction and ambiguity resolution in a very large corpus of English sentences. Cognition. 

Supplemental readings:

Temperley, David. 2003. Ambiguity Avoidance in English Relative Clauses. Language 79: 464-84.

Race, D. S. & MacDonald, M.C. (2003). The use of "that" in the production and comprehension of object relative clauses. Proceedings of the 25th Annual Meeting of the Cognitive Science Society.



Wednesday, 23 Feb 05

Constraint interactions. Classification accuracy. Evaluating model fit.


 Project outline

Linguistics and statistics:

Robert Sigley. 2003. The importance of interaction effects.  Language Variation and Change.



Week 10



Monday, 28 Feb 05

Model comparisons: stochastic OT and logistic regression.




Gerhard Jäger and Anette Rosenbach. 2004.  The winner takes it all - almost. Cumulativity in grammatical variation, manuscript, University of Potsdam and University of Düsseldorf.

Supplemental Readings:

Anette Rosenbach: Aspects of iconicity and economy in the choice between the s-genitive and the of-genitive in English. To appear in B. Mondorf and G. Rohdenburg (eds), Determinants of Grammatical Variation in English. Mouton de Gruyter.

Altenberg. The Genitive v. the of-Construction.



Wednesday, 2 Mar 05

Model comparisons: stochastic OT and logistic regression. Decision tree or so-called "analogic" models.




Ernestus, Mirjam Theresia Constantia, and Harald R. Baayen. 2004.  Predicting the Unpredictable: Interpreting Neutralized Segments in Dutch. Language 79(1).



Week 11 (i.e., we won't get to this!)



Monday, 7 Mar 05

More model comparisons.




Sarah Benor and Roger Levy.  2004. The Chicken or the Egg?  A probabilistic analysis of English binomials.  Draft.



Wednesday, 9 Mar 05

Wrap up.






Final paper