VR tool aims high TRN 053001

VR tool aims high

By Kimberly Patch, Technology Research News

We've known for a very long time that when it comes to learning some things, there's no substitute for experience. Researchers working to make virtual environments stand in for the real thing have spent the past few decades learning that actual experience is very hard to mimic.

A large group of researchers from the University of Southern California is working on a virtual environment that coordinates a raft of technologies -- and some Hollywood craft -- to take a step toward this end.

The project, funded by the U.S. Army, simulates a war-torn village where an officer on a peacekeeping mission has to make a difficult choice between joining a unit that is under attack and slowing down to get medical help for a boy hurt by an Army truck.

The technology could ultimately be used to represent any type of environment, allowing people to travel to and learn by experience in places and times where they couldn't ordinarily go, said Jeff Rickel, a project leader at the University of Southern California's Information Sciences Institute. "If you want to teach kids about ancient Greece... there's probably no substitute for just putting them in ancient Greece, [and] letting them experience the sights and the sounds... and interact with the people," he said.

Making an environment that includes believable human characters, however, is a difficult task involving a lot of interconnecting technology.

On the output side -- technologies that allow a human to experience the environment -- the researchers used 3-D computer modeling to project life-size figures on an 8- by 130-foot curved screen, a 12-speaker surround sound system, software models of emotion, a flexible storyline, speech generation software, and computer agent software that allows characters to gesture as they speak.

On the input side -- technologies that allow the computer-generated characters to react to the human using the system -- the researchers are tackling several speech recognition challenges and are using computer vision to understand pointing gestures.

The researchers also used Hollywood scriptwriters to make the scenario emotionally convincing. "You get engrossed in [a movie] because the writers and directors have done such a nice job of making it compelling and gripping. We want to create stories that are so emotionally gripping that people forget their training situation and react the way they would as if they were really in the field," Rickel said.

The difficult part is coordinating everything, because the various technologies have many interdependencies that the individual components don't address, said Rickel. For example, adding a model of emotion to a character means coordinating it with the character's natural language understanding, speech synthesis and decision-making software, he said.

"One of the challenges is to assemble this team of experts in all these different areas... and not just plug the stuff together, but have them hash out how these modules actually work together to create one virtual human," said Rickel.

The characters were built from intelligent agent software developed in an earlier project. The original agent was a humanlike knowledge base that was able to show the user around an environment, point to things, and explain them.

The project's aim is to make more realistic characters that have emotions, finely tuned gestures, an understanding of users' gestures, an ability to recognize words in noisy environments and eventually an ability to interpret word meanings.

In their pursuit of more realistic characters, the researchers also subtracted something from the original agent: its omniscient knowledge of its environment, like how much gas is in a certain tank. Now the characters have to look at gauges in order to gain that knowledge, as a human would.

In the army scenario, the researchers are trying to create a realistically stressful environment, said Rickel "One of the ways we do that is to create characters in the story that are getting emotional and... affecting the user through these emotions," he said.

In the simulation, the lieutenant played by the user is tugged in two different directions by emotional displays from various characters. Radio calls from the user's unit downtown get increasingly angry, "saying 'where are you, we're taking fire, we need your help, you're letting us down'," Rickel said. At the same time, right in front of the user is an injured boy and the boy's mother, who gets increasingly emotional if the user starts to send his troops to help the unit downtown.

To display these emotions, the Army characters on the radio and the mother character are dynamically reacting to the user's actions.

The models of emotion have drawn heavily from psychology literature, and in particular have been influenced by "models of how emotions come out of plans and goals that we have... and when somebody takes action that thwarts our plans and goals... we may get angry," he said.

The emotion software recognizes that the unit's moving out may conflict with the mother's goals, displays the appropriate emotions, and coordinates the effect those emotions have with the character's further actions.

In order to communicate with the user realistically, the characters must be able to convey emotion with their voices. This is something the researchers are still working on. "One of the biggest challenges that we face is emotionally expressive speech," said Rickel.

"When you have the character like the mother who is supposed to be getting very emotional [and] whose emotions are supposed to help draw the user in... if she sounds robotic like most speech synthesizers these days, it's going to completely destroy the effect," said Rickel.

The researchers are using recordings of a voice actor for the Army project, but they're working on being able to do the same thing using speech synthesis, which will allow for a more flexible storyline. "The research we're doing in voice synthesis is to study acoustic patterns that you see in emotionally expressive speech and try to reproduce that on-the-fly in a character like the mother so the we don't have to script her lines ahead of time," he said.

To do this, the researchers are tapping a relatively old technique called concatenative speech. "It's basically splicing together bits of text from lots of speech samples, in contrast to the other main approach, which is [building speech from] sounds for individual phonemes," Rickel said. With a large enough database, a speech synthesizer can theoretically pull out enough pieces to, for instance, allow a sergeant to convincingly bark an order, he said.

Humans use both speech and gestures to communicate. The researchers are working on incorporating gestures into the characters' communications to make them more realistic. "We're using... movements of the hands that accompany speech [and] probably don't have any meaning by themselves but... serve to emphasize words and convey the impact of those words," Rickel said.

The researchers are also using computer vision technology to give the characters the ability to react to the user's gestures. The challenge in gestural understanding is to allow the character to recognize that the total message is a combination of speech and gesture when the user is, for instance, pointing at an object and saying 'Go get that,' Rickel said.

It's also important that the characters understand the user by recognizing the words the user says even in noisy situations, recognizing the emotion attached to those words and, eventually, being able to discern the meaning of those words. These tasks are beyond the capabilities of today's standard dictation software, which simply recognizes words, and only when conditions are good.

"One challenge is to actually recognize the emotional content of the message... and still recognize the words despite the fact that the person is emotional," Rickel said.

The system's sound environment is also a difficult challenge for the speech recognizer. The 12-speaker surround sound system brings the user 3600 watts of power through 64 channels. For comparison, the 20,000-seat Hollywood bowl uses a 3500-watt speaker system. "If you've got helicopters buzzing over[head] and an angry crowd yelling at the guy and a distraught mother screaming for help for her boy, then it's a pretty noisy environment," said Rickel. This is very different from what today's speech recognizer's can generally handle, he said. "So one of the challenges for speech recognition is how do you -- like a person -- pick out the speech that this person is saying in the midst of all this other noise," he said.

A future challenge is natural language understanding, which will allow the characters to not only recognize words, but interpret their meaning. This would allow for a much more flexible story. Currently, the human participants have to stay close to a script.

The researchers are also working toward making the characters more automated by giving them a wider range of movements. Currently the characters' motions are generated by capturing the motion of real humans. Motion capture is realistic, but, like using voice actors to script emotional speech, limits the storyline's flexibility, said Rickel. "We're looking at how we can generate animations on-the-fly that retain the realistic look of motion capture [but] give you more flexibility," he said.

Pulling off convincing characters for any length of time or for scenarios that aren't closely scripted is an incredibly difficult artificial intelligence problem that's not likely to be solved anytime soon, said Terry Winograd, a professor of computer science at Stanford University.

The question to ask about the Army demo research is whether "it's going to build scientific knowledge which in the future will let us build realistic things," said Winograd, adding "that's a hard question to resolve. "

It's not clear whether demos like the Army project really delve into the hard intelligence problems that humans solve instinctively, he said. "There are some mechanisms that we just have no understanding of, yet they go on inside your brain, inside your body, inside your hormones. [It is] relatively easy to observe the behaviors of people who are having emotions and... mimic these behaviors in a simple pattern-driven kind of way. [But] it's not going to be consistently like a person... it'll only be when you happen to hit the right pieces of pattern," he said.

Rickel's research colleagues were William Swartout, Randy Hill, John Gratch, W. Lewis Johnson, Chris Kyriakakis, Catherine LaBore, Richard Lindheim, Stacey Marsella, David Miraglia, Ben Moore, Jackie Morie, Marcus Thiébaux, Larry Tuch, Richard Whitney and Jay Douglas of the University of Southern California. They are scheduled to present their research at the Agents 2001 conference in Montreal, May 28-June 1, 2001. The research was funded by the Army Research Office.

Timeline: Now
Funding: Government
TRN Categories: Human-Computer Interaction; Artificial Intelligence
Story Type: News
Related Elements: Technical paper, "Toward the Holodeck: Integrating Graphics, Sound, Character and Story," scheduled for presentation at the Agents 2001 conference in Montreal, May 28-June 1, 2001. Technical paper, "Steve Goes to Bosnia: Towards a New Generation of Virtual Humans for Interactive Experiences," presented at the AAAI Spring Symposium on Artificial Intelligence and Interactive Entertainment in at Stanford University, March 26-28, 2001.

Advertisements:

May 30, 2001

Page One

VR tool aims high

Bulk nanotubes make clean crystals

Engine fires up electrical devices

Microscopic stamps make nanotech devices

How metallic are metal nanotubes?

News:
Research News Roundup
Research Watch blog

Features:
View from the High Ground Q&A
How It Works

RSS Feeds:
News

| Blog

| Books

Ad links:
Buy an ad link

Advertisements:

Ad links: Clear History

Buy an ad link

Home Archive Resources Feeds Offline Publications Glossary

TRN Finder Research Dir. Events Dir. Researchers Bookshelf

Contribute Under Development T-shirts etc. Classifieds

Forum Comments Feedback About TRN

TRN Newswire and Headline Feeds for Web sites