YOUR FEEDBACK
Werner Keil wrote: Java 6 update 10. If I'd be running Apple, I'd probably really drop dead...
AJAXWorld RIA Conference
$300 Savings Expire September 5th. Register Today and SAVE!


2008 East
DIAMOND SPONSOR:
Data Direct
Frontiers in Data Access: The Coming Wave in Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
Intel
Virtualization – Path to Predictive Enterprise
Green Hills
IT Security in a Hostile World
JBoss / freedom oss
Practical SOA Approach
GOLD SPONSORS:
Software AG
The Art & Science of SOA: How Governance Enables Adoption
PlateSpin
Effective Planning for Virtual Infrastructure Growth
Fujitsu
Automated Business Process Discovery & Virtualization Service
Ceedo
Workspace Virtualization
Click For 2007 West
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


Introduction to SALT
Introduction to SALT

Speech Application Language Tags (SALT) is a set of XML-based tags that can be added to existing Web-based applications, enhancing the user interface through interactive speech recognition. In addition, SALT can be used to extend Web-based applications to the telephony world, thereby providing an opportunity to unleash the potential of a huge user community, users of normal touch-tone telephones.

SALTforum, an organization founded by Microsoft, Cisco, SpeechWorks, Philips, Comverse, and Intel, has spearheaded development of the SALT specification, now in its 1.0 release.

Multimodality: Beyond Standalone Web and Speech Applications
On one hand, we have the ubiquitous World Wide Web, which provides the core standards-based networking infrastructure and connects a large number of users to consumer and business applications. On the other hand, Interactive Voice Response (IVR) and touch-tone systems-based telephony applications have also been around for some time, providing basic connectivity (typically using the touch-tone and prerecorded speech interfaces) to telephony users. In both scenarios what we're really using is a single modality. In the case of interactive speech recognition applications, it is touch-tone or speech input and speech output, while in the Web application scenario, we're using a basic graphical user interface as the application modality.

Multimodality means that we can utilize more than one mode of user interface with the application, something like our normal human communications with each other. For instance, consider an application that allows us to get driving directions - while it's typically easier to speak the start and destination addresses (or even better, shortcuts like "my home," "my office," "my doctor's office," based on my previously established profile), the turn-by-turn and overall directions are typically best viewed through a map and turn-by-turn directions as well, something similar to what we're used to seeing on MapQuest. In essence, a multimodal application, when executed on a desktop device, would be an application very similar to MapQuest but would allow the user to talk/listen to the system for parts of the application's input/output as well - for example, the starting and destination addresses. That's multimodal. Imagine the same application using the same interface on a wirelessly connected PDA. Now we're talking true mobile/multimodal application. If we let our imaginations go a little bit wilder, we can easily extend the same application to the dashboard of our car or any other device we can imagine working with. That's really the vision, which, given the current state of technology, isn't far away. Another modality that can be added to the example application would be a pointing device that would zoom the map, focusing on a particular location.

So how does SALT fit in with all this? SALT has been built on the technology required to allow applications built using SALT to be deployed in a telephony and/or multimodal context.

SALT Application Model
With the evolution of the SALT specification and the platforms and tools around it, the exact architecture of SALT-based interactive, speech-driven applications is yet to surface.

Figure 1 presents an abstract representation of application architecture for deploying and using SALT-based applications. As expected, it's very similar to that of a Web application, with two major differences. In this case the application is also capable of delivering dynamic speech interactions (if the appropriate browser is capable of handling SALT, e.g., through an add-in or natively), and a stack is present that represents a set of technologies broadly representing the integration of speech recognition/synthesis and telephony platforms.

A note of caution: this diagram is really a conceptual representation. Where exactly the SALT browser/interpreter and speech recognition/synthesis components fit in depends on the capabilities of the end-user device/browser - actual implementation of the SALT stack may vary based on vendor implementations.

The Advanced Speech Recognition (ASR) component focuses on recognizing spoken user utterances based on speech grammars, whereas the Text-to-Speech synthesis component is focused on dynamically converting text messages into voice output.

When SALT applications are used in the telephony world, the telephony integration component connects the speech platform with the world of telephones, the Public Switched Telephony Network (PSTN). This is typically achieved by integrating telephony cards with the analog/digital telephony lines of your telephony provider (your phone company).

When SALT applications are used to enhance the interactions of Web-based applications by adding multimodality to the application, a typical Web application delivery framework (based on TCP/IP/HTTP/HTML/JavaScript etc.) is used for delivering the Web application, and the speech/telephony platform is used for the "speech/voice" aspect of the whole interaction, depending on the nature of the connection and the location of the speech recognition/synthesis components. Both interactions can happen together seamlessly, as part of the same user session, on the user's choice.

SALT & VoiceXML
It's clear that SALT and VoiceXML have utilized the Web application delivery model as an open platform for delivering telephony applications. However, VoiceXML and SALT have different technical goals - whereas VoiceXML tends to focus on development of telephony-based applications (applications used through a phone), SALT focuses on adding speech interaction and telephony integration to existing Web-based applications and enable true multimodality. In this case, I would also like to highlight the development of another upcoming standardization initiative called X+V (which stands for XHTML+Voice), an effort to combine VoiceXML with XHTML to develop multimodal applications.

Another difference between SALT and VoiceXML is the overall approach that has been utilized to develop applications. Whereas VoiceXML is pretty much declarative in nature, utilizing its extensive set of tags, SALT is very procedural and script oriented, having a very small set of core tags.

Also, it's important to understand that SALT actually utilizes key components of the standardization effort carried at the W3C Voice Browser Activity, including the XML-based Grammar Specification and the XML-based Speech Synthesis Markup Language. Both these specifications have been used by the VoiceXML 2.0 specification as well.

Hello SALT
The best way to introduce a language is to show what it actually looks like. Following the tradition of the famous "Hello World" program, let's see how a SALT application says "Hello World."



Hello World

As you can see , SALT tags () have been added to an existing XHTML document and are initiated through methods - Start(). For instance, when the above document is loaded in a SALT 1.0-compatible browser (it could be a telephony browser or a multimodal desktop/handheld browser) it would say "Hello World" using a Text-to-Speech (TTS) engine.

Going further with our exploration of SALT, let's look at how SALT applications provide speech recognition capability. The code shown in Listing 1 can be used as the basic template for an interactive order information system.

To understand the various components of this multimodal application, let's look at a snapshot of the sequence of actions performed when this application is launched within a SALT-compatible browser.

  • When the application is started, the RunApp() JavaScript function initiates the prompt "Welcome" to be played.
  • It is followed by the playing of the prompt "AskOrderNumber".
  • Speech recognition is then initiated for the form parameter "number" using the external XML grammar OrderInfo.xml”.
  • Once the recognition is over, the function processOrderStatus() submits the value ("number") to the server-side script "GetOrderStatus.aspx", which would ultimately look up the sales application and provide order information.

    Elements of SALT
    Since SALT uses the underlying XHTML (or similar) markup and JavaScript as the core application delivery model, the core language represented by the SALT 1.0 specification is really a small collection of tags - prompt, listen, grammar, record, dtmf, and smex. Table 1 shows technical highlights and example usage of elements of SALT language.

    Microsoft .NET Speech SDK
    Support for SALT-based application development is available from Microsoft through the .NET Speech SDK. A Beta 2 of the SDK is available from Microsoft's Web site. The SDK has three key components:
    1.   An add-in for Microsoft Internet Explorer that recognizes SALT tags and allows users to interact with the application using the desktop's microphone and speakers/headphones
    2.   A set of ASP.NET-based speech controls, which allow developers using Microsoft Visual Studio .NET to create multimodal/telephony applications and/or add speech interactivity to existing Web applications developed using Microsoft .NET and ASP.NET frameworks.
    3.   Development and debugging tools, including grammar builder, prompt builder, and speech debugger tools to aid in the development of grammars and prompts.

    It's important to understand that a SALT-based application can be delivered using a non-ASP.NET Web application framework (e.g., Perl or JavaServer Pages). What the .NET Speech SDK provides is ease of development in adding speech to your existing Web applications or creating new applications.

    Conclusion
    With the emergence of SALT, it is pretty clear that there has been some confusion regarding a single standard for development of open standards-based telephony applications. Even with different key technology objectives and implementation models, both VoiceXML and SALT can be utilized to develop telephony applications. However, I believe that the SALT 1.0 specification and Microsoft's participation in the speech application world is very positive. In this article we took a quick look at the technical highlights and capabilities that the SALT 1.0 specification can provide to existing Web-based applications, extending them to provide multimodal capability and/or extend existing applications to the telephony world. I'll continue my exploration of SALT in another article, in which I'll explain how to develop SALT- based multimodal and telephony applications using Microsoft .NET Speech SDK.

    References

  • SALT Forum: www.saltforum.org
  • SALT 1.0 Specification: www.saltforum.org/downloads/SALT1.0.pdf
  • Microsoft .NET Speech SDK: www.microsoft.com/speech
    About Hitesh Seth
    Hitesh Seth is chief technology officer of ikigo, Inc., a provider of XML-based web-services monitoring and management software. A freelance writer and well-known speaker, he regularly writes for technology publications on VoiceXML, Web Services, J2EE and Microsoft .NET, Wireless Computing & Enterprise/B2B Integration. He is the conference chair for VoiceXML Planet Conference & Expo.

  • XML JOURNAL LATEST STORIES . . .
    To be able to do anything useful, an ESB must be configured with all sorts of parameters, from endpoint connection URIs to message transformation scripts to content-based routing definitions. Moreover, ESBs like Mule can host custom components, which will process messages and perform u...
    Representatives of the state IT organizations of Brazil, South Africa and Venezuela, three of the four countries that protested ISO’s standardization of Microsoft’s Office Open XML (OOXML) file format, have apparently thrown in the towel on taking their appeal any further. India, t...
    Two of the biggest launches in Rich Internet Application history took place in 2007/2008 when Adobe launched AIR 1.0 in February '08 and Microsoft launched Silverlight (September '07). At the 6th International AJAXWorld RIA Conference & Expo in October SYS-CON Events is delighted to be...
    Red Hat CTO Brian Stevens, Citrix CTO Simon Crosby, Egenera CTO Pete Manca, Allen Stewart, Group Manager, Windows Virtualization at Microsoft, and Brian Duckering, Sr. Director of Products and Alliances at Symantec were the top industry executives who joined Jeremy Geelan in the 4th Fl...
    This article is aimed at beginner and intermediate Web developers looking to make the leap into database support of their Web site. The article suggests a new declarative language based on HTML-forms, which is used for development of the database interface. HTML forms can manage not on...
    SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
    SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


    SYS-CON FEATURED WHITEPAPERS


    ADS BY GOOGLE
    BREAKING XML NEWS

    Security Challenges for the Information Society