Two-step queries bridge search and speech

By Kimberly Patch, Technology Research News

After 30-odd years of computer speech recognition development, researchers are still looking for ways to make it easier for a computer to sift individual words from the constant stream of syllables that is spoken language.

What is most difficult for speech software is recognizing the borders of a word that's not in its dictionary. Researchers from the Japanese University of Library and Information Science and the Japanese National Institute of Advanced Industrial Science and Technology have made this a little easier.

The researchers have found a way to help speech recognition programs used to retrieve information from data collections like the Web identify out-of-vocabulary sequences of syllables. In a sense, the researchers have given computers a faster way to sound out words they don't already know.

State-of-the-art information retrieval systems allow users to input any number of keywords into a vocabulary. "It is often the case that a couple million terms are indexed for a single information retrieval system," said Atsushi Fujii a research assistant at the University of Library and Information Science in Japan.

State-of-the-art speech recognition systems have to limit vocabulary size to a few tens of thousands of words in order to match each syllable sequence to a word in real-time.

Because of the limited speech recognition vocabulary sizes, however, when speech is used to query information retrieval systems, some of the words may not be in the speech recognition vocabulary.

The trick to finding these words is knowing where to look. When someone uses speech recognition as an interface to search a collection of data, he naturally utters words related to the unrecognized query term, said Fujii.

To take advantage of this, the system carries out the query using the words the computer does recognize, then looks in those documents for words that are phonetically identical or similar to the unrecognized syllable sequences. The system then queries the documents again using the new-found words. This two-step process makes it possible for the computer to match an unrecognized syllable sequence to a real word relatively quickly, according to Fujii.

The researchers tested their method by dictating queries to archives of newspaper articles. The method improved the information retrieval system's accuracy and did not increase the search time, according to Fujii.

The researchers also used their data retrieval method to beef up a speech recognition system's vocabulary with appropriate new words. "We used a target collection to recover speech recognition errors so as to improve the quality of [both] speech recognition and information retrieval," Fujii said.

The method is a way to improve speech-driven information retrieval systems, which could lead to interactive dialogue and question-answering systems that allow users to control computers by speech, according to Fujii. These include car navigation systems, and Web search using telephones and mobile computers, he said.

The researchers have come up with a "clever trick" for turning sequences of syllables that are not in a speech recognizer's vocabulary into words, said Brian Roark, a senior technical staff member at AT&T Research. "This takes a step toward solving the problem of turning... syllable sequences into [correctly spelled] words," he said.

The method is potentially useful for speech recognition in general, Roark said. "If you can somehow leverage a particular task to give an indication of likely [out-of-vocabulary] words in a particular context, it might be possible to exploit this," he said.

But because large vocabulary recognition programs don't come across a lot of out-of-vocabulary sequences the total possible gain in recognition from this method would probably be fairly small, Roark added.

The researchers' next step is to do larger-scale experiments using different types of document collections, such as technical papers and Web pages, said Fujii.

The researchers' current experiments use Japanese speech that is dictated directly to the computer, said Fujii. Ultimately, the researchers are aiming to be able to process spontaneous speech in different languages, he said.

Practical applications using dictated speech are technically possible within two years, said Fujii. Applications that can handle spontaneous speech will take more than three years, he added.

Fujii's research colleagues were Katunobu Itou of the National Institute of Advanced Industrial Science and Technology in Japan, and Tetsuya Ishikawa of the University of Library and Information Science. The research was funded by the University of Library and Information Science and the Japan Science and Technology Corporation (JST).

Timeline:   < 2 years, > 3 years
Funding:   University, Corporate
TRN Categories:  Databases and Information Retrieval; Human-Computer Interaction
Story Type:   News
Related Elements:  Technical paper, "A Method for Open-Vocabulary Speech-Driven Text Retrieval," posted in the arXiv physics archive at


July 24/31, 2002

Page One

Disks set to go ballistic

Two-step queries bridge search and speech

Implant links nerve cells to electronics

Silicon chips set to go atomic

Light switch promises powerful computers


Research News Roundup
Research Watch blog

View from the High Ground Q&A
How It Works

RSS Feeds:
News  | Blog  | Books 

Ad links:
Buy an ad link


Ad links: Clear History

Buy an ad link

Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN

© Copyright Technology Research News, LLC 2000-2006. All rights reserved.