YOUR FEEDBACK
Chris Keene's Prescription for Curing the Java Flu
Rob wrote: I have to agree with Chris - I have been a developer and Java a...
SOA World Conference
Virtualization Conference
$200 Savings Expire May 16, 2008... – Register Today!


2007 West
GOLD SPONSORS:
Active Endpoints
Your SOA Needs BPEL for Orchestration
BEA
Virtualized SOA: Adaptive Infrastructure for Demanding Applications
Nexaweb
Overcoming Bandwidth Challenges with Nexaweb
TIBCO
What is Service Virtualization?
SILVER SPONSORS:
WSO2
Using Web Services Technologies and FOSS Solutions
Click For 2007 East
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


Modular Speech Application Development Using VoiceXML

Digg This!

One thing weve learned from Web-based application development is that tools are useful only if they can reuse components and third-party libraries and make it easy to assemble applications. This article reviews how we can build modular speech applications using VoiceXML. The focus will be on the language constructs that VoiceXML provides for modularization and reusability and on vendor-specific approaches toward creation of a library of reusable dialogs for speech applications.

As a language, VoiceXML is designed for reusability and modularity. Similar to the Web paradigm, VoiceXML supports modularization of the application components through document-to-document navigation (through menus, form submits, subdialogs etc.). The "src" attribute associated with a number of VoiceXML elements allows URL-based loosely coupled integration.

Since we're used to a hyperlinked world today, we take this for granted, but if we compare it with speech applications developed earlier, it's really like comparing a huge compiled application with a loosely coupled yet integrated application. We're used to separating Web application functionality into multiple sets of scripts (Perl scripts, PHP pages, ASP.NET pages, JavaServer Pages), images, and other media. This capability has been leveraged by VoiceXML as the language supports division of applications into VoiceXML documents (which can be dynamically generated through server-side programming), grammars, prompts, and subdialogs.

However, we'd be kidding ourselves if we didn't acknowledge that development of rich, high-quality speech applications is complex. VoiceXML does make it really simple to assemble a speech application, but we still need to create complex grammars and dialogs that can handle the complex conversations that the application needs to support.

This is where we can utilize the skills available within the speech recognition industry in the form of dialog and grammar experts. And this is where reusable components come to the rescue. Reusable components represent best practices and frequently used dialogs/ grammars that can be used by an application developer (or, rather, an assembler) to create a high-quality speech application.

For instance, if we were to develop a stock-trading application, we'd need the ability to recognize the various companies traded on the stock exchange. If application developers were to include this capability as part of their own applications, they wouldn't be able to focus on the end application and would be lost in the functionality of recognizing all the possible companies. On the other hand, if a third party has already built such a dialog/grammar, it can be readily used, allowing developers to focus on the valued-added business scenarios rather than dealing with the complexity of recognizing all the companies.

Reusability Requirements
As speech application developers, what we need is a set of reusable dialogs and grammars that we can incorporate in our applications. We need grammars to recognize city names, streets, companies, businesses airlines, and airports as well as dialogs that use these complex grammars to recognize addresses, driving directions, credit card information, and the like.

In a nutshell, we need a set of published, reusable voice components. Apart from the components themselves, we need a methodology for creating and utilizing reusable components. Also required is the integration of reusable components with development tools and VoiceXML gateways and/or hosting providers so that applications built on top of the components can be readily built/assembled and deployed.

VoiceXML Subdialogs
Probably the most adopted approach for creating reusable dialogs is through VoiceXML's element. From a definition perspective a subdialog element invokes a called dialog identified by the src element in the calling dialog. The subdialog is then created and executed in its own temporary execution context and proceeds with execution until it encounters a element, at which point any information is returned to the calling dialog. Subdialogs can be parameterized through the param elements and are fundamental in building a set of reusable components for use within VoiceXML applications.

The following code snippet defines a subdialog that collects users' phone numbers and PINs and returns the information to the calling environment:



What is your phone
number?



...

...


The subdialog can then be used by another application to authenticate a user, as shown by the following snippet:




"GetUserInfo.vxml">

"GetUserInfo.phone">
...



VoiceXML subdialogs behave somewhat differently from traditional programming languages. Subdialogs execute in a "temporary" execution context. They can take parameters, but don't follow the event percolation model (events must be handled within the subdialog itself or be explicitly returned). Unlike external objects, subdialogs do limit what can be built as a component as they themselves also have to be VoiceXML code. However, subdialogs are also the only portable mechanism today to create reusable dialog components for VoiceXML applications. As we'll see later, subdialogs are the basis for the majority of vendor-specific reusable components as well.

VoiceXML
The motivation behind the tag was to keep the VoiceXML-based implement extensible and to package advanced functionality that hasn't been introduced into the standard. A VoiceXML implementation platform (tool, gateway, and/or service provider) may expose additional functionality that's currently not available through the VoiceXML language itself.

A good example of this is detailed caller ID information service, which can be used to screen/identify/initialize the callers. Another example from the mobile world is a cell phone location service to provide emergency services. Boiling it down, components available through the object tag are entirely platform-dependent. The object element can pass parameters to the underlying subdialog through the param elements, which return an ECMAScript object that can be used to get the values returned.


name="location"
classid="component://co.location
/Location/GetDetails"
data="LocationComponent.jar"/

Due to its component-oriented nature, the object tag lends itself to providing a methodology for creation of a reusable component. The key benefit of tag-based components is that, unlike subdialogs, they aren't limited to implementation in VoiceXML itself and hence can be implemented in languages such as Java and C++. Another inherent benefit is that the object tag can help to create components that keep the intellectual property in the hands of the provider, as components can be delivered in binary format and you don't have to expose all the source code to the application developer.

Modularized VoiceXML
Complementary to the idea of creating of a set of reusable components is perhaps the modularity of the language itself. XHTML+Voice, acknowledged by W3C, attempts to modularize the core VoiceXML language itself into a set of modules. These modules are then combined with XHTML 1.1 to build applications that support a combination of visual and speech interactions. The specification modularizes the VoiceXML 2.0 specification into about 19 different modules, such as events, executable statements, flow control, dialogs, menus, objects, telephony control, audio output and speech synthesis, resources, and so on. This allows different applications and environments to support specific scenarios that don't need the complete language. For instance, a multimodal PDA (without any telephony interface) could support a VoiceXML browser with features like simplified speech recognition and speech synthesis/audio playback but doesn't need to support the telephony control module.

As an example, let's look at Listing 1, the code snippet of a simple XHTML document that uses the proposed Voice Profile modules. Whereas a traditional telephony-based VoiceXML browser could understand it by playing the Hello World prompt, a multimodal device would render a page with the paragraph. However, the paragraph can be "listened to" by clicking on it as well.

It will be interesting to see how the modularity of VoiceXML itself (as proposed by the XHTML+Voice) will work as it will be instrumental in modularizing the language itself so that it can be applied in multiple scenarios.

The remainder of this article will focus on understanding a couple of vendor-specific reusable component initiatives.

IBM Reusable Dialog Components
IBM Reusable Dialog Components are based on the VoiceXML subdialog element. They're built on top of grammars based on ECMAScript, VoiceXML, and JSGF (Java Speech Grammar Format) and are freely available at http://www-3.ibm.com/software/speech/enterprise/wvs-rdc.html. The package includes the entire source code of the various dialog components including JSGF-based grammars, VoiceXML dialog code, and samples for using the dialogs. The availability of the entire source code allows developers to further customize these dialogs based on individual requirements for, for example, a regional application in which you'd like to recognize the cities in your region. This can be done by customizing just the grammars for the cities and using the dialog component for "US Major City". IBM provides reusable dialog components for confirmation dialogs, selection lists, getting user input for alphanumeric strings, credit card information, currency, date, direction, duration, e-mail addresses, numbers, SSN, addresses, telephone numbers, URLs, major cities/states, and so on.

IBM provides a methodology for creating custom reusable dialog components as well. Reusable dialog components are based on VoiceXML 1.0 and have been tested for IBM Voice products including IBM's WebSphere Voice Toolkit and Voice Server. A key highlight of reusable dialog components is that they're tightly integrated with IBM's VoiceXML development toolkit (see February 2002 XML-J,"Tools for Developing VoiceXML Applications," Vol. 3, issue 2). A couple of simple wizards (see Figures 1 and 2) allow the application developer to select a dialog component, customize it using the defined parameters, and integrate it with the VoiceXML application. Listing 2 shows how a VoiceXML application generated through the toolkit uses the reusable dialog component "uspostalcode". Nuance SpeechObjects Unlike IBM Reusable Dialog Components, SpeechObjects from Nuance utilize the tag to provide reusable functionality. SpeechObjects, however, are self-contained and include prompts, dialog logic, and grammars. SpeechObjects are written using JavaBeans. Nuance also provides a methodology and a set of interfaces (as Java classes and APIs) for developers to write custom SpeechObjects. Currently available SpeechObjects from Nuance include confirmation dialogs, menus, browsable and actionable lists, audio recorder, dialogs for getting date, time, U.S. currency, time zone, SSN, zip code, telephone numbers, alphanumeric strings, credit card information, and the like as well as speech objects to integrate with Nuance products: speaker verification and enrollment and voice phrase enrollment. In addition, BeVocal a VoiceXML service provider, has built SpeechObjects to recognize airlines, equities traded on the stock exchange, and addresses. Listing 3 shows the BeVocal SpeechObject "PickStock" being used to identify the companies that have stock traded on the stock exchange. SpeechObjects are also integrated with the Nuance development tool V-Builder (see Figure 3). In addition, BeVocal Café, a hosted VoiceXML development environment, gives developers the ability to test and use SpeechObjects as part of the VoiceXML applications.

SpeechWorks OpenSpeech DialogModules
Like IBM Reusable Dialog Components, DialogModules also use the subdialog approach to create a set of reusable speech components. However, they aren't shipped as static vxml, script, or grammar source files but as server-side J2EE/JSP application modules. Available DialogModules include confirmation dialogs and dialogs for getting phone numbers, currency, SSN, digits, zip codes, alphanumeric strings, credit card numbers, and USPS addresses as well as dialogs for integration with other SpeechWorks products - SpeechSecure Authentication and Voice-Activated Enrollment.

A partial list of DialogModules is supported by VoiceGenie, a VoiceXML gateway and development tools provider. These DialogModules are available on VoiceGenie's SpeechWorks-centric developer site, SpeechGenie Developer Workshop (http://speechgenie.voicegenie.com), a hosted VoiceXML development and testing environment. The following code snippet shows the Yes/No dialog module used as part of the VoiceXML application.




"http://dm.voicegenie.com/-
osdm-core/jsp/yesno/wrapper.-
jsp"
name="yesno_1" method=
"post"
enctype="application/
x-www-form-urlencoded">
...

Conclusion
Let's face it, interactive speech- based application development is complex. As an open standard, VoiceXML truly did leverage developments in Interactive Voice Response (IVR), Advanced Speech Recognition (ASR), Text-to-Speech (TTS), and telephony integration and made it simpler for application developers to develop speech applications. However, this doesn't preclude the fact that we need dialog and speech recognition experts to create reusable and modular components that an application developer can reuse and build upon. Reusability and modularity are key components for VoiceXML adoption by the enterprises and independent software vendor community.

VoiceXML today does provide basic mechanisms for enabling reusability using the tag and the ability to create Subdialogs. However, there is ample scope for an established standard/approach around reusable dialog and other speech components. What is also needed is a published library of such dialog components, so that as application and software developers we enrich the library and don't rebuild (for instance, there are a number of common dialog components among the three vendors reviewed). However, until a standard approach and methodology is established and adopted, we do have vendor-specific approaches and reusable components that we can reuse today.

References

About Hitesh Seth
Hitesh Seth is chief technology officer of ikigo, Inc., a provider of XML-based web-services monitoring and management software. A freelance writer and well-known speaker, he regularly writes for technology publications on VoiceXML, Web Services, J2EE and Microsoft .NET, Wireless Computing & Enterprise/B2B Integration. He is the conference chair for VoiceXML Planet Conference & Expo.

Don Read wrote: Thsi docuemnt stops at page 4; please reload. Thanks
read & respond »
XML JOURNAL LATEST STORIES . . .
EDI to XML: A Practical Approach
While EDI transactions account for most worldwide commercial activity, XML-based alternatives are beginning to gain traction. According to Forrester Research, stateful XML, stateless XML, and even flat file exchanges are all projected to grow at a faster rate than EDI over the next few
3rd International Virtualization Conference & Expo: Themes & Topics
From Application Virtualization to Xen, a round-up of the virtualization themes & topics being discussed in NYC June 23-24, 2008 by the world-class speaker faculty at the 3rd International Virtualization Conference & Expo being held by SYS-CON Events in The Roosevelt Hotel, in midtown
Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
Red Hat is a trusted open source provider. Red Hat offers enterprise customers a long-term plan for building infrastructures on the quality and innovation of open source. Combining open source operating system platform, Red Hat Enterprise Linux, together with applications, management
JustSystems Contributes Key XBRL Rendering Technology to Financial Community
JustSystems announced that it is contributing intellectual property rights for its invention of eXtensible Business Reporting Language (XBRL) rendering technologies to XBRL International, the standards body responsible for the oversight of the XBRL specification. The invention, known a
JustSystems Launches Campaign for XBRL Success
JustSystems announced its campaign to help organizations adopt XBRL (eXtensible Business Reporting Language), the XML-based standard for communicating financial and business information. In related news, JustSystems also announced that it has contributed intellectual property rights of
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON FEATURED WHITEPAPERS


ADS BY GOOGLE
BREAKING XML NEWS
SAP Accelerates the Path to SOA for Customers
has led to customer requests for training and education involving SAP's proven design and de