Stefan Th. Gries Colloquium

Stanford Linguistics is pleased to announce the following colloquium:

Stefan Th. Gries
University of California, Santa Barbara

Data-driven approaches in corpus linguistics: The role of granularity for register variation, temporal stages, and temporal change

Friday, January 15, 3:30 pm, Margaret Jacks 126

Reception immediately afterwards in the department lounge.

Thanks Arto!

Abstract

Corpus linguistics is inherently a distributional discipline: corpora contain nothing but things to count: frequencies of occurrence (of morphemes, words, lemmas, n-grams, utterances, texts, etc.), frequencies of co-occurrence (of words, words and patterns, patterns and patterns, etc.), and distributions of elements (of elements within and across files/texts/registers). Thus, any subject studied corpus-linguistically must be operationalized in terms of counts and dispersions.

However, a decision in favor of a particular operationalization requires potentially treacherous decisions regarding the desired/required level of granularity. In many contemporary corpus-linguistic studies, such decisions are made arbitrarily and top-down/a priori. In this talk, I will argue in favor of (i) a more wide-spread use of different kinds of bottom-up approaches and (ii) more frequent and more thorough exploration as well as combination of different levels of granularity in corpus linguistic studies. To exemplify these arguments, I will use studies of register variation as well as diachronic change.