Software
sorts video soundtracks
By
Chhavi Sachdev ,
Technology Research News
The theme music for the nightly news and
the newscaster’s voice sound inherently different to us, but distinguishing
between the two is not an easy trick to teach a computer.
When it comes to cataloguing and indexing, however, computers are much
faster than humans. To use this indexing speed for audio in a way that’s
practical, computers must be able to distinguish a human voice from a
saxophone, and the tympani in the fourth movement of Beethoven’s sixth
symphony from real thunder.
Scientists at Microsoft Research have come up with an algorithm that allows
computers to differentiate among speech, music, environmental sounds,
and silence in video soundtracks by mapping and comparing the characteristics
of each type of sound.
Classifying sound as noise, speech, or music is an important key to coding
audio data, said Hong-Jian Zhang, a senior researcher and assistant managing
director at Microsoft Research in China. "Audio segmentation and classification
is [the] first step in audio content analysis and content-based [searching]
in large audio or music databases," he said. It can also help group together
segments of speech by a particular person.
To identify sound, the algorithm goes through a two-step process, Zhang
said. It analyzes each audio clip in one-second sections, then further
divides this window into forty, 25-millisecond snippets. Using pattern
classification, it characterizes each 25-millisecond frame as either speech
or non-speech, then further classifies non-speech sounds into silence,
music and environment sounds.
The researchers set thresholds to allow the algorithm to distinguish the
sound of a slamming door from a drum roll. Below a certain level the sound
is classified as environmental noise and above it, as music.
The most difficult task the computer faces is sorting mixed sounds, said
Zhang. Speech with a noisy background, for instance, is frequently classified
into music, and music clips with drum sounds are falsely pegged as speech,
he said.
Despite the overlaps, the algorithm has a 96-percent accuracy rate in
classifying audio data, according to Zhang. When the algorithm does not
go the second step of breaking sounds into 25-millisecond components,
it has a 91.47 percent accuracy rate in classifying 2- to 8-second clips,
he said.
This whole area of research is very exciting, said Dan Ellis, an assistant
professor of electrical engineering at Columbia University. "Automatic
classification and segmentation of [sound] could make searching for a
particular example or instance orders of magnitude easier," he said.
Segmentation algorithms like Microsoft’s "could be used as a basis for
more complex classifications, for instance to distinguish between news
and sitcoms by the patterns of alternation between speech and music, or
to detect commercials based on their particular soundtrack signatures,"
Ellis said.
There are generally several things going on at once in soundtracks, said
Ellis. The sound in most video "doesn't fit very comfortably into the
black-and-white classification paradigm assumed by this and similar work.
Ultimately, we will need computer soundtrack analysis systems that can
describe segments with more nuance, for instance as ‘male speech with
orchestral music in the background’," he said.
The system can be used now for simple classification, but the researchers
plan to expand the classification categories to make it more accurate,
said Zhang. It should be in practical use within three years, he said.
Zhang’s research colleagues were Lie Lu and Hao Jiang at Microsoft Research
in Beijing. They presented the research at the International Conference
on Multimedia held between September 30 and October 5 in Ottawa, Canada.
The research was funded by Microsoft.
Timeline: 3 years
Funding: Corporate
TRN Categories: Multimedia
Story Type: News
Related Elements: Technical paper, "A Robust Audio Classification
and Segmentation Method," at International Conference on Multimedia of
the Association for Computing Machinery (ACM) in Ottawa, Canada, September
30 - October 5, 2001.
Advertisements:
|
November
28, 2001
Page
One
Programmable DNA debuts
Device would boost
quantum messages
Virtual computers
reconfigure on the fly
Software sorts video
soundtracks
Bigger disks
won't hit quantum barrier
News:
Research News Roundup
Research Watch blog
Features:
View from the High Ground Q&A
How It Works
RSS Feeds:
News | Blog
| Books
Ad links:
Buy an ad link
Advertisements:
|
|
|
|