InfoWiz: An Animated Voice Interactive Information System

InfoWiz Project

The InfoWiz project is centered around the idea of putting a interactive kiosk into the lobby of SRI. People who have a few minutes to spend should be able to learn something about SRI, enjoy themselves, and walk away with a good feeling of having seen something interesting and unusual.

One of the design decisions of the project has been to use speech recognition as the main form of user input to the system. Using a mouse to navigate through scrollbars and web links might be a concept unfamiliar with many visitors to SRI. Using a touchscreen is possible, but, hey, let's aim high. If we are successful in building a natural spoken language interface, this should contribute to the user's overall impression of ease of use, advanced technology, and general intelligence of the system. If we are not successful, we can always put back a touch screen.

In order to encourage spoken interaction with the system, we added an animated character, a cartoon wizard, who attempts to engage the user in conversations about SRI. As Don Nielson (to whom this system is dedicated) used to say, talking to a computer without an animated persona who can speak back is like talking to a wall. Initially, we did our own animations based on a popular clip-art figure. In later incarnations of the interface, we adopted Microsoft Agent's Merlin character to provide the graphics for the system.

The InfoWiz Kiosk

Interactions with the Wizard

After an audio and visual welcome from the InfoWiz Wizard, the user is presented with a screen containing a web browser, and the Wizard himself. The Wizard's job is to instruct, advise, suggest and demonstrate; he will take you on tours of the InfoWiz information space or the complete set of SRI's web pages, and he can answer questions you might have along the way. The Wizard is able to cross window boundaries, and interact with any part of the display, flying and pointing with his magic wand to various places on the screen.

All audio interactions with the system occur through a telephone next to the kiosk display. Users can make natural spoken requests about information presented on the current web page or about general topics related to SRI. The Wizard is capable of answering questions and providing supplementary information (videos, etc.) about the InfoSpace. When the user is looking at web pages outside of it's own dowmain (the InfoSpace), the InfoWiz can scroll pages for you or help you navigate links, but of course, does not have any special supplementary information about these pages.

InfoWiz Kiosk Screen

Interactions with the Wizard

The Challenge

The biggest challenge of this project is to create a system that appears intelligent, easy to use, and not overly constraining, to an untrained, not necessarily technically aware person who walks in off the street. A potential user only may have a few minutes to spend with the system, so it must be intuitively and immediately obvious what to do.

The Approach

Our approach is as follows:

Provide interaction styles that are familiar even to people who have never seen Microsoft Windows or a web browser: a telephone as an input device, and a human character with whom to converse.
Combine numerous state-of-the-art artificial intelligence technologies in a flexible and incremental fashion. Allow the system to be workable soon, but easily expandable in its capabilities.
Find a way to allow easy updating of the content, vocabulary and knowledge required by the system to appear intelligent.

The system is implemented on top of the Open Agent Architecture^TM (OAA^TM) , which provides the means for bringing together the component technologies in a plug-and-play manner. The OAA is an open and distributed system; components can be written in different programming languages, and be distributed over multiple computers. In addition, technologies can be added incrementally -- we can start out with a simple natural language processing component, and then as the system is used and user input analyzed in more detail, we can replace the prototype natural language agent with our most capable system.

The Technologies

A number of different technologies are integrated using the OAA to create the InfoWiz Kiosk. The plug and play nature of OAA has allowed us to experiment with various versions and vendors for the component technologies. But the basic elements are, as pictured below, animation graphics, speech recognition, natural language interpretation, dialog management, text-to-speech, and then knowledge about the InfoSpace.

InfoWiz Architecture

Speech Recognition is provided by technology created in SRI's STAR Laboratory and commercialized by Nuance Communications, a spinoff company. We have also tried IBM's ViaVoice product in it's stead.
Natural Language processing is handled by a mixture of Nuance's NL api, and SRI's DCG-NL parser. Other more powerful systems may be brought in later as needed.
Wizard animation is handled by Microsoft Agent. Under UNIX, we used DFKI's PPP Animated Presentation Agent , which provided both graphics and a a powerful multimedia generation system based on planning technology. We also used for a time an Java-based animation package called the Gamelet toolkit written by Mark Tachi.
We have written agents enabling us to incorporate either Netscape or Internet Explorer as the content viewer. A previous incarnation of the InfoWiz project made use of a modified variation of NCSA's Mosaic version 2.6.
Currently, for text output, we use text-to-speech provided as part of Microsoft Agent. In previous versions, the Wizard's voice was composed either of sequenced audio files played by the PlayWav agent, OR text to speech generated by Entropic's TrueTalk system. Combining recorded utterances limits what the Wizard can say, but sounds much better. The system adapts dynamically to the current set of agents connected to the network, using whichever output modalities are available.
To author, test and manage the knowledge that the InfoWiz has over the InfoSpace, we created our own knowledge management tools and a dialog processor to help guide the animations and verbal responses of the character.

Since these technologies are brought together by the Open Agent Architecture, it is probable that other OAA-enabled agents will be brought to bear on this project. Existing agents which may be considered for this project include:

Small Vision Module (SVM) stereo camera and algorithms for detecting whether people are near the kiosk.
Gemini Natural Language Understanding System, our most robust NL parser for building spoken language systems. CommandTalk is an example of a project that first started using the DCG-NL agent and later upgraded to Gemini as the domain became better understood.
Telephone control: ex: User: "I'm here to see Adam Cheyer", Wizard: "Let me call him for you..."
FASTUS: a system for extracting information from free text might be applied to extracting knowledge from ever-changing webpages.
Robots: SRI's family of mobile robots have been integrated into the OAA (an OAA-coordinated multiple-robot approach recently won the Office Navigation task of the AAAI Robot Contest). How about: user: "I'm here to see Adam Cheyer", Wizard: "Let me send someone to show you the way there..."

If you are interested in the evolution of this project, see the InfoWiz Project Timeline.