|
|
YOUR FEEDBACK
SOA World Conference
Virtualization Conference $200 Savings Expire May 16, 2008... – Register Today! Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS VoiceXML
Versatile Multimodal Solutions
By: T.V. Raman
Digg This!
User interaction is about creating an effective man-machine conversation that leads to rapid task completion. With this in mind, we can factor the typical application into the data model that holds the current interaction state, user interface components that render this state, and interaction behavior that is determined by the active event handlers.
Today, Web interaction is authored in markup languages like XHTML, with the host document providing the data model and user interface components, and the host browser implementing the DOM2 eventing loop for bringing the Web interaction to life. The W3C architecture exemplifies this separation in its current developments:
Notice that in this architecture, the richness of end-user interaction is a function of the available user interface events and event handlers that are available to respond to a given event. As the next evolution in user interfaces, multimodal interaction can be integrated into this evolving framework by enabling the Web author to attach rich voice handlers that implement spoken dialogs to aid in rapid task completion. A means to achieve this end was first detailed in XHTML+Voice (X+V). This article describes X+V 1.1, an update to X+V that integrates the results of more than two years of experience gained by implementing multimodal solutions using this framework. The remaining sections of this article summarize the additions to X+V and illustrate their use in creating multimodal interaction that leverages mixed-initiative VoiceXML dialogs. Formal descriptions of these additions can be found in the X+V 1.1 specification; here we'll focus on motivating these additions and explaining their use.
Aural CSS - Speaking in Style
P.romeo { voice-family: male; Create the content in style. Identify p elements in the XHTML document using class romeo or juliet as shown below.
<body ev:event="load" ev:handler="#sayHello"> Finally, define the voice handler invoked in the above fragment to speak the contents of the p elements. Contents from the XHTML document are accessed from within the vxml:prompt element using new X+V attribute xv:src. Using attribute src in this manner is consistent with the forthcoming W3C XHTML 2.0, which uses this attribute to allow authors to specify the contents of elements indirectly; X+V 1.1 extends element vxml:prompt with attribute xv:src to enable equivalent functionality when authoring multimodal interaction.
<vxml:form id="sayHello"> See Listing 1 for the complete example.
Two-Way Synchronization - One Handler to Bind Them Element sync can be thought of as a declarative event handler that synchronizes the two interaction state components being linked. We describe the use of element sync in the remainder of this section, with illustrative code fragments taken from the complete example shown in Listing 2. Element sync uses attributes input and field to specify the two interaction state components to be synchronized. The attribute names were chosen to match the interaction components most commonly used in the visual and aural modalities, respectively. Value of attribute input holds the value of attribute name from the HTML form control being synchronized. Notice that in the absence of a data model in HTML forms, this name attribute also names the location where the control stores user input. Attribute field holds the id of the VoiceXML field to be synchronized. In the mixed-initiative example shown in Listing 2, we've declared that the visual input controls that collect the hotel and city should be synchronized with the corresponding VoiceXML fields via the following statements:
<xv:sync input="city" field="#field_city"/> Notice that the value of attribute field is a URI, i.e., as in the rest of X+V, the VoiceXML elements may appear either within the XHTML document, or in a separate XML file. Using this addition to X+V, the author can enable several interaction metaphors that will be described in the remainder of this section.
Spoken Input with Visual Confirmation "I would like to stay at the Chicago Airport Hilton" and the values are synchronized with the visual interaction components. The user gets immediate visual confirmation as a result of the values being synchronized. For the same utterance as above, the voice browser recognizes the hotel, but fails to identify the city. Tapered prompts within the VoiceXML handler lead the user through the rest of the task. Synchronizing the fields helps the user get immediate feedback as to the portion of the task that remains to be completed.
Talk and Type
Talk or Type
Cancel - Speech Is Silver, but Silence Is Golden
Deploying X+V Multimodal Solutions
PDA
Mixed-mode client This feature is a consequence of reusing XML Events within X+V for creating event bindings. XML Events has been designed to operate in the hypertext environment of the Web, and a key design feature is that event handlers can be referred to via a URI. We emphasize this point in this article because the X+V specification itself does not highlight this fact, since we got the feature for free by virtue of reusing XML Events. Since the publication of X+V 1.0, authoring of remote voice handlers has been one of the most frequently asked questions from X+V partners, and we highlight the solution here for this reason.
Thin client This has the advantage that only the visual markup is transmitted to the thin client, i.e., the deployment respects the separation of concerns inherent in this deployment by not transmitting the voice markup to the thin client. This can save valuable bandwidth and processing cycles on the thin client. The voice handlers are available and preloaded into the voice server which can consequently be extremely responsive when it receives a request for invoking one of the specified voice handlers from the thin client. XML Events bridges the two execution environments, i.e., the visual browser on the thin client and the voice browser on the network. Finally, notice that the completely declarative nature of the X+V document is a major win in reliably deploying the multimodal application to devices such as cellphones which typically lack support for a full scripting environment.
References XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||