Beyond Voice Recognition, to a Computer That Reads Lips
From: New York Times - September 11, 2003
By: Anne Eisenberg

Enabling a computer to read lip movements could significantly improve the
accuracy of automatic speech recognition even in noisy environments, and
researchers at IBM, Intel, and elsewhere are working on such a capability.
IBM's Chalapathy Neti says a computer can be taught to integrate audio and
visual input to determine what is being said with the help of cameras,
statistical models, and vision algorithms. The camera picks up skin-tone
pixels, the statistical models look for face-like objects, and the algorithms
concentrate on the mouth area and ascertain where specific physical features
- the center and corners of the lips, for instance - are located; statistical
models are also employed to combine visual and audio features and predict the
speaker's words. Neti and colleagues are working on systems designed to
handle variables that may affect the accuracy of the camera-based system,
such as inconstant lighting: Currently in the prototype stage is an
audiovisual headset that features a small camera attached to a boom so that
the mouth region remains visible even when the subject is walking or moving
his head. Neti says the research group has also developed a feedback system
that monitors confidence levels. Meanwhile, Intel researcher Ara V. Nefian
says his company has created audiovisual analysis software and made it
available to the public through the Open Source Computer Vision Library. The
system, which recognizes four out of five words in noisy environments, can
"extract visual features and then acoustic features, and combine them using a
model that analyzes them jointly," Nefian explains. An audiovisual speech
recognition system being developed by Northwestern University's Aggelos
Katsaggelos could be used to boost security. 

http://www.nytimes.com/2003/09/11/technology/circuits/11next.html
(Access to this site is free; however, first-time visitors must register.)

