InfoWiz: An Animated Voice Interactive Information System
InfoWiz Project
The InfoWiz project is centered around the idea of
putting a interactive kiosk into the lobby of SRI. People who have a few minutes
to spend should be able to learn something about SRI, enjoy themselves, and walk
away with a good feeling of having seen something interesting and unusual.
One of the design decisions of the project has been to use speech recognition
as the main form of user input to the system. Using a mouse to navigate through
scrollbars and web links might be a concept unfamiliar with many visitors to
SRI. Using a touchscreen is possible, but, hey, let's aim high. If we are
successful in building a natural spoken language interface, this should
contribute to the user's overall impression of ease of use, advanced technology,
and general intelligence of the system. If we are not successful, we can always
put back a touch screen.
In order to encourage spoken interaction with the system, we added an
animated character, a cartoon wizard, who attempts to engage the user in
conversations about SRI. As Don Nielson (to whom this system is dedicated) used
to say, talking to a computer without an animated persona who can speak back is
like talking to a wall. Initially, we did our own animations based on a popular
clip-art figure. In later incarnations of the interface, we adopted Microsoft
Agent's Merlin character to provide the graphics for the system.
The InfoWiz Kiosk
Interactions with the Wizard
After an audio and visual welcome from the
InfoWiz Wizard, the user is presented with a screen containing a web browser, and the Wizard himself. The Wizard's job
is to instruct, advise, suggest and demonstrate; he will take you on tours of
the InfoWiz information space or the complete set of SRI's web pages, and he can
answer questions you might have along the way. The Wizard is able to cross
window boundaries, and interact with any part of the display, flying and
pointing with his magic wand to various places on the screen.
All audio interactions with the system occur through a telephone next to the
kiosk display. Users can make natural spoken requests about information
presented on the current web page or about general topics related to SRI. The Wizard is
capable of answering questions and providing supplementary information (videos,
etc.) about the InfoSpace. When the user is looking at web pages outside of it's
own dowmain (the InfoSpace), the InfoWiz can scroll pages for you or help you
navigate links, but of course, does not have any special supplementary
information about these pages.
InfoWiz Kiosk Screen
Interactions with the Wizard
The Challenge
The biggest challenge of this project is to create a
system that appears intelligent, easy to use, and not overly constraining, to an
untrained, not necessarily technically aware person who walks in off the street.
A potential user only may have a few minutes to spend with the system, so it
must be intuitively and immediately obvious what to do.
The Approach
Our approach is as follows:
- Provide interaction styles that are familiar even to people who have never
seen Microsoft Windows or a web browser: a telephone as an input device, and a
human character with whom to converse.
- Combine numerous state-of-the-art artificial intelligence technologies in
a flexible and incremental fashion. Allow the system to be workable soon, but
easily expandable in its capabilities.
- Find a way to allow easy updating of the content, vocabulary and knowledge
required by the system to appear intelligent.
The system is
implemented on top of the Open Agent
ArchitectureTM (OAATM) , which provides the means for
bringing together the component technologies in a plug-and-play manner. The OAA
is an open and distributed system; components can be written in different
programming languages, and be distributed over multiple computers. In addition,
technologies can be added incrementally -- we can start out with a
simple natural language processing component, and then as the system is used and
user input analyzed in more detail, we can replace the prototype natural
language agent with our most capable system.
The Technologies
A number of different technologies are integrated using
the OAA to create the InfoWiz Kiosk. The plug and play nature of OAA has allowed
us to experiment with various versions and vendors for the component
technologies. But the basic elements are, as pictured below, animation graphics,
speech recognition, natural language interpretation, dialog management,
text-to-speech, and then knowledge about the InfoSpace.
InfoWiz Architecture
- Speech Recognition is provided by technology created in SRI's STAR Laboratory and commercialized by
Nuance Communications, a spinoff company. We have also tried IBM's ViaVoice
product in it's stead.
- Natural Language processing is handled by a mixture of Nuance's NL api,
and SRI's DCG-NL parser. Other more powerful systems may be brought in later
as needed.
- Wizard animation is handled by Microsoft Agent. Under UNIX,
we used DFKI's PPP Animated Presentation Agent ,
which provided both graphics and a a powerful multimedia generation system
based on planning technology. We also used for a time an Java-based animation
package called the Gamelet
toolkit written by Mark Tachi.
- We have written agents enabling us to incorporate either Netscape or
Internet Explorer as the content viewer. A previous incarnation of the InfoWiz
project made use of a modified variation of NCSA's
Mosaic version 2.6.
- Currently, for text output, we use text-to-speech provided as part of
Microsoft Agent. In previous versions, the Wizard's voice was composed either
of sequenced audio files played by the PlayWav agent, OR text to speech
generated by Entropic's TrueTalk system. Combining recorded utterances limits
what the Wizard can say, but sounds much better. The system adapts dynamically
to the current set of agents connected to the network, using whichever output
modalities are available.
- To author, test and manage the knowledge that the InfoWiz has over the
InfoSpace, we created our own knowledge management tools and a dialog
processor to help guide the animations and verbal responses of the character.
Since these technologies are brought together by the Open Agent Architecture, it is probable
that other OAA-enabled agents will be brought to bear on this project. Existing
agents which may be considered for this project include:
- Small Vision Module (SVM) stereo camera and algorithms for detecting
whether people are near the kiosk.
- Gemini
Natural Language Understanding System, our most robust NL parser for
building spoken language systems. CommandTalk is an
example of a project that first started using the DCG-NL agent and later
upgraded to Gemini as the domain became better understood.
- Telephone control: ex: User: "I'm here to see Adam Cheyer", Wizard: "Let
me call him for you..."
- FASTUS:
a system for extracting information from free text might be applied to
extracting knowledge from ever-changing webpages.
- Robots: SRI's family of
mobile robots have been integrated into the OAA (an OAA-coordinated
multiple-robot approach recently won the Office Navigation task of the AAAI
Robot Contest). How about: user: "I'm here to see Adam Cheyer", Wizard: "Let
me send someone to show you the way there..."
If you are interested in the evolution of this project, see the InfoWiz Project
Timeline.