Research

Some themes we have focused on:

1. Reduced speech is processed just as well as clear speech

Since the 1960s, researchers have consistently shown that there is a bias toward “clear speech” by listeners. Specifically, since clear speech is slower, has a more dispersed vowel space, and has more release bursts for final segments (as examples), there is a benefit for the auditory system for these types of productions. I showed that the bulk of this research was biased by the stimuli used in these studies, as clear speech was recorded and “reduced” speech was created by manipulating that sample to arrive at a reduced version. The studies were designed to show a clear speech benefit. By using spoken language at different speech rates, I showed that naturally produced (within-accent) variation is processed equally well by listeners. Importantly, I also showed that variation in the speech signal is helpful, not harmful to successful recognition. This finding was replicated multiple times, and impacted our theories of language, processing and representation, shifting away from trying to find differences between clear speech and manipulated speech, to investigating speech processing as speech is produced by talkers, leading to more interesting questions about processing, representation and variation, that I detail below.

Sumner, M., and Samuel, A. G. (2005). Perception and representation of regular variation: The case of final /t/. Journal of Memory and Language, 52, 322 – 338.

Sumner, M. (2011). The role of variation in the perception of accented speech. Cognition, 119, 131-36.

de Marneffe, M–C., Tomlinson, J., Tice, M., and Sumner, M. (2011). The interaction of lexical frequency and phonetic variability in the perception of accented speech. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society.

Sumner, M. (2013). A phonetic explanation of phonological variant effects. Journal of the Acoustical Society of America, 134, EL26 – EL32.

2. Equivalence in online processing tasks does not imply an equivalence in memory

For years, psycholinguists and linguists alike have investigated the nature of language representations. To do this, researchers typically use some immediate processing task (semantic priming, word naming, shadowing, etc.) to assess the responses of participants to variable speech patterns. Any similarities or differences across conditions (e.g., a word like CAT might facilitate the recognition of the word DOG, but the same word with produced slightly differently (with the final T having a sound similar to the medial sound in mitten), might also facilitate recognition of the word DOG). Before my work, this type of equivalence would have been used to support a traditional and decades-long theory in linguistics that language representations are devoid of detail, and abstract. This way of making implicit assumptions about representations was called out in my work, by arguing that we need to conduct memory experiments in order to make claims about representation. I show quite clearly within and across speakers of various social groups, that there is a clear ability of listeners to understand variably produced words in immediate tasks, but those equivalences in processing are easily mapped to inequivalence in memory. Specifically, listeners recall less frequent “clear speech” forms better than they recall reduced speech forms, even though they are processed equivalently in the immediate term. This work served to highlight interactions in memory specificity, the effects of language use patterns on the system as a whole, and challenged our notions of representation in linguistics.

Sumner, M., and Samuel, A. G. (2009). The effect of experience on the perception and representation of dialect variants. Journal of Memory and Language, 60, 487 – 501.

Sumner, M. (2013). A phonetic explanation of phonological variant effects. Journal of the Acoustical Society of America, 134, EL26 – EL32.

Sumner, M. and Kim, S. K. (2020). Some thoughts on the phonetics-psycholinguistics interface. In J. Setter & R.-A. Knight (Eds), The Cambridge Handbook of Phonetics, Cambridge: Cambridge University Press, to appear.

3. Spoken words are socially weighted

Research in the realm of spoken word recognition typically asks questions like those articulated above: How do we move from a variable speech signal to representations, and how is it that we take any number of physical instantiations of a given spoken word and quickly map it to meaning, without conversational breakdowns? In this work, speech strings are treated as strings of linguistic units. But speech, is not just a string of Linguistic units. Speech carries phonetic information, in the form of predictable acoustic variation, that tells listeners not only about what was said, but also about who said what. Through various immediate and long-term spoken word recognition tasks, and across various populations that carry with them social baggage, I have found that words carry social weight. Specifically, the inequality of words in memory described in (2) is predictable based on how often a word is said, how the word is uttered, and also the social context in which the word has been experienced by the listener population. This finding impacted the explanatory power of purely frequency-based theories of lexical representation and access, and suggested that we cannot disentangle talker-based characteristics from speech to ever investigate “linguistic” processing without the influence of the other information inherent in the signal. We offered a mechanism through which this equivalence occurs, which has been tested and supported by a variety of researchers.

Sumner, M., Kim, S. K., King, E., and McGowan, K. (2014). The socially-weighted encoding of spoken words: A dual-route approach to speech perception. Frontiers in Psychology, 4, 1 – 13.

Sumner, M. (2015). The social weight of spoken words. Trends in Cognitive Sciences, 19, 238-239.

4. Phonetically-cued social information affect immediate processing early

Logically building on the impact of the findings in (3), I followed up on the notion that talker-information influences foundational behaviors in speech processing that have been investigated from a purely linguistic standpoint. For example, we have found that the voice of a talker influences responses in word association tasks. Specifically, when given a word like academy produced by a man, and asked for the first word to come to mind, most listeners say school. When the same word is produced by a woman, the top associate is awards. Going one step further, these voice-specific associates are also predictive of online processing (when given prime-target pairs like academy – awards, listeners are faster at recognizing the target when the pair is produced by a woman than when produced by a man). This is not exclusive to gender. Hearing a word in an emotionally angry prosody (TABLE!!!) facilitates the recognition of not only the semantically-related words like chair, but also facilitates recognition to the word mad. To be clear, there is no transparent, meaning-based relationship between the words table and mad. It is purely the prosody, not the lexical meaning, that activates this word, and activates it early enough to show semantic facilitation. These results have impact the field in various ways, ranging from a shift to understand social influences in speech processing generally, to working through ways to understand how this all fits into a system.

Kim, SK and Sumner, M. (2017). The effect of emotional prosody on spoken word recognition. Journal of the Acoustical Society of America, 142, EL49 – EL55.

Kim, S. K., and Sumner, M. (2015). Effects of emotional prosody on semantic priming. Proceedings of the 37th Annual Conference of the Cognitive Science Society.

King, E., and Sumner, M. (2015). Voice-specific effects in semantic association. Proceedings of the 37th Annual Conference of the Cognitive Science Society.

5. Social activation from speech introduces biases that modulate the encoding of linguistic events

Historically, linguistic processing has been considered independent from other aspects of the cognitive architecture. By linking to attention and memory, I have challenged this notion to show that we modulate our cognitive resources variably based on social information gleaned from speech. This happens subjectively: We may rate a woman’s voice as reliable when it is produced on its own, but the rating shifts downward when it is presented in the context of a man’s voice. This happens objectively: As attention to a particular talker increases, memory for that talker is more accurate (e.g., American listeners have better recall for a British speaker than for a NYC speaker). And these differences influences the subsequent processing and recall of speakers of different social categories (e.g., Listeners process speech produced by black speakers quite differently than that produced by white speakers, nearly always along metalinguistic stereotypical lines). This work has had an impact not only on theories of linguistics, shifting from a language-independent view to a complex system one, but also in society more broadly, showing that bias permeates processes as automatic as spoken word recognition.

King, S., and Sumner, M. (2014). Voices and variants: Effects of voice on the form-based processing of words with different phonological variants. n P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Conference of the Cognitive Science Society (pp. 2913 – 2918). Austin, TX: Cognitive Science Society.

Sumner, M., and Kataoka, R. (2013). Effects of phonetically-cued talker variation on semantic encoding. Journal of the Acoustical Society of America, 134, EL485–EL491.

What are gender barriers made of? Freakonomics radio broadcast, July 2016, http://freakonomics.com/podcast/gender-barriers/