Expressive content and the semantics of contexts
NSF Grant No. BCS-0642752

NSF Research Highlights 2008

Expressives — swears, honorifics, epithets, exclamatives — are reliable indicators of the speaker's attitudes and emotions. They are among our clearest windows into the conversational dynamics and, in turn, into the social dynamics of the conversation's participants. At the same time, their meanings are highly variable, shifting from exuberant to aggressive, approving to disapproving, as the context of utterance changes. On the project Expressive Content and the Semantics of Contexts, Christopher Potts and his research team are mapping this linguistic terrain using the tools of theoretical linguistics, experimental statistics, and information extraction.

The research group has collected a number of multi-million-word, expressive-rich corpora in Chinese, English, German, and Japanese. The individual texts in these corpora — product reviews, interviews, newspaper articles — are annotated with information that reflects important properties of the context in which the author was writing: who her audience was, what she was responding to, how she felt about the topic, and so forth. Together with a deep understanding of the linguistic structures, these annotations help reveal what expressives convey and how vital they are to linguistic communication.

The expressive damn is a useful illustration. Like the more charged and controversial F-word, it is syntactically flexible and conveys a wide range of messages. Unlike the F-word, though, it is only mildly taboo, so it is easy to get data on how it is used in different discourse situations. The group's corpus of informal product reviews yields a nuanced picture of the heightened emotion it conveys. Each text in this corpus is annotated with a star rating, one through five stars, reflecting the author's overall assessment of the product under discussion. As one might expect, the language in the extreme rating categories — one star and five star — is significantly more emotional than the language in the middle-of-the-road reviews; authors who write extreme reviews either loved or loathed what they are writing about.

Figure 1 is a basic statistical analysis of the distribution of damn in these reviews. The ratings are along the x-axis, centered around 0 for statistical reasons. The y-axis gives the log-odds distribution of damn in each rating category. The quadratic logistic regression (red) is a good fit, whereas the linear regression (blue) is not. This helps validate the visual impression that the distribution is U-shaped: damn appears primarily in extreme reviews, and is thus a reliable marker of heightened emotion, without, though, providing much evidence about whether that emotionality is positive or negative. In this sense, damn contrasts with the German expressive der hammer (figure 2) and the Chinese swear tama (figure 3), both of which encode their emotional polarity to an important extent.

Figure 1
(der) hammer
Figure 2
Figure 3

Figures 1-3 exemplify just some of the robust statistical shapes that project researchers have identified. These shapes reflect hearer expectations about what emotional (and nonemotional) language signals, and they provide (probabilistic) information about the relationship between the speaker's emotional state and the language she uses. The project has developed similar methods for probing the nature of Japanese honorific markers, German politeness markers, and Chinese particles, among others.

Potts has also successfully found applications of these results outside of linguistics. For example, with John Kingston, he recently investigated the relationship between the qualitative information conveyed by news headlines like The markets soared (plunged)! and the quantitative information they purport to reflect. He also commented in the Wall Street Journal on the linguistic issues surrounding the ongoing U.S. Supreme Court case FCC v. Fox Television Stations, the case of the "fleeting expletive", and he has written about the issues for a general audience at the award-winning weblog Language Log. Potts regards these efforts as indicative of the general importance of these results; he sees many applications in the areas of media studies, the law, artificial intelligence, and cognitive psychology.