Speech Recognition and Understanding Research in Dan Jurafsky's Lab
The lab studies a number of areas in speech recognition,
understanding, and synthesis. Our high-level focus is on
the use of linguistic knowledge (phonetic, phonological, prosodic,
syntactic, semantic, pragmatic) in machine speech processing.
-
Prosody in Speech Recognition and Synthesis:
We are working on various projects in prosody;
Jason Brenier
is working on the automatic detection of prosodic phenomena
like emphatic pitch accents. A new project, joint with
Simon King and Mark Steedman
at the University of Edinburgh, focuses on the use of prosody
in speech synthesis.
The postdoctoral position on this project has been filled!.
-
Pronunciation Modeling:
A key ASR problem, especially in recognizing human-to-human
conversational speech, is predicting how words are likely to be pronounced
in context.
-
Recognition of Dialect-Accented Speech
In a
JHU Summer Workshop 2004 project directed by Thomas Zheng of Tsinghua University and
Richard Sproat of the University of Illinois,
we are working on robust speech recognition of Mandarin Chinese spoken
by speakers with southern (Shanghainese) accents.
Related to this project, Stanford linguistics student Rebecca Starr is working on sociolinguistic and
phonological causes of variation in southern Mandarin.
-
PMLA Workshop:
With Eric Fosler-Lussier and Bill Byrne, I recently co-organized
PMLA-2002 (the Pronunciation Modeling/Lexicon Adaptation Workshop), a satellite conference to the ICSLP-2002 conference.
-
What kinds of Pronunciation Variation are Already Modelled by Triphones:
We have been trying to understand
why improvements in ASR due to pronunciation modeling have proven so elusive.
We show that many of the
kinds of variation which previous pronunciation models attempt to capture,
including phone substitution or phone reduction,
are in fact already well captured by triphones.
Our analysis suggests new areas where future pronunciation
models should focus instead, including syllable deletion.
Jurafsky, Ward, Zhang, Herold, Yu, and Zhang (2001).
.
-
The Effect of Disfluencies on Pronunciation Reduction:
In a number of recent papers, Alan Bell,
Eric Fosler-Lussier,
Dan Gildea,
Cynthia Girand,
Michelle Gregory,
Bill Raymond and I have studied what factors cause
the pronunciation of words to be reduced
or alternatively
what causes words to have full or longer pronunciations.
One result is that words are longer when they are in disfluent
contexts; either preceded or followed by pauses, filled pauses,
or repetitions. See
most recently our JASA paper
Bell et al. 2003,
-
The Effect of Word Frequency and Probability on Pronunciation Reduction:
Our lab has also been working on the effect of word frequency
and word predictability or probability on pronunciation variation.
We have found that words
are more likely to be full when they have surprising or
unpredictable.
See
Jurafsky, Bell, Gregory, and Raymond (2000),
Bell et al. 2003,
Gregory, Raymond, Bell, Fosler-Lussier, and Jurafsky (1999) (ps)
and
Jurafsky, Bell, Fosler, Girand, Raymond (1998)
(ps) .
-
The Effect of Word Sense or Part of Speech on Pronunciation Reduction:
We are also working on studying whether the different
senses or parts of speech of ambiguous words have different pronunciations.
See
Jurafsky, Bell, and Girand (2002).
-
Recognition of Foreign-Accented Speech
Together with
Wayne Ward
and other collaborators at Boulder,
we have been working on better recognition of foreign-accented
English.
Here's a paper
on recognition of Spanish-Accented spontaneous English.
-
Probabilistic Phonological Rules: Gary Tajchman, Eric Fosler-Lussier,
and I have looked at various ways that hand-written phonological rules can
be training probabilistically and then used to augment an ASR lexicon.
See
Tajchman, Fosler, and Jurafsky 1995
(ps) .
-
Language Modeling:
One of the most important problems in ASR is predicting the
next word the user is likely to say. Among our areas of interest are:
-
Latent Semantic Analysis:
Noah Coccaro and I are exploring the use of Latent Semantic Analysis (LSA),
a topic-based or word-association-based model of word-document similarity,
as a language model. See for example
Coccaro and Jurafsky 1998
(ps)
.
-
Stochastic Context-Free Grammars:
We have tried various experiments over the years with language models
based on
stochastic context-free grammars. A typical paper:
Jurafsky et al 1995
(ps)
.
-
Dialogue Modeling:
We also work on
probabilistic models of dialog, especially together with our team from the
1997 Johns Hopkins Workshop on Innovative Techniques in LVCSR (Becky Bates,
Noah Coccaro, Rachel Martin, Marie Meteer, Klaus Ries, Liz Shriberg,
Andreas Stolcke, Paul Taylor, Carol Van Ess-Dykema and me).
We are especially interested in the automatic detection of dialogue
structure, such as automatic labeling of speech acts or dialogue acts.
See the publications page for various
results from this work, including for example: