YOUR FEEDBACK
NGASI Releases AppServer Manager 8.1
Dave Jenkins wrote: The remote server management is a welcomed added feature...
SOA World Conference
Virtualization Conference
$200 Savings Expire May 16, 2008... – Register Today!


2007 West
GOLD SPONSORS:
Active Endpoints
Your SOA Needs BPEL for Orchestration
BEA
Virtualized SOA: Adaptive Infrastructure for Demanding Applications
Nexaweb
Overcoming Bandwidth Challenges with Nexaweb
TIBCO
What is Service Virtualization?
SILVER SPONSORS:
WSO2
Using Web Services Technologies and FOSS Solutions
Click For 2007 East
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


Introduction to SALT

Digg This!

Speech Application Language Tags (SALT) is a set of XML-based tags that can be added to existing Web-based applications, enhancing the user interface through interactive speech recognition. In addition, SALT can be used to extend Web-based applications to the telephony world, thereby providing an opportunity to unleash the potential of a huge user community, users of normal touch-tone telephones.

SALTforum, an organization founded by Microsoft, Cisco, SpeechWorks, Philips, Comverse, and Intel, has spearheaded development of the SALT specification, now in its 1.0 release.

Multimodality: Beyond Standalone Web and Speech Applications
On one hand, we have the ubiquitous World Wide Web, which provides the core standards-based networking infrastructure and connects a large number of users to consumer and business applications. On the other hand, Interactive Voice Response (IVR) and touch-tone systems-based telephony applications have also been around for some time, providing basic connectivity (typically using the touch-tone and prerecorded speech interfaces) to telephony users. In both scenarios what we're really using is a single modality. In the case of interactive speech recognition applications, it is touch-tone or speech input and speech output, while in the Web application scenario, we're using a basic graphical user interface as the application modality.

Multimodality means that we can utilize more than one mode of user interface with the application, something like our normal human communications with each other. For instance, consider an application that allows us to get driving directions - while it's typically easier to speak the start and destination addresses (or even better, shortcuts like "my home," "my office," "my doctor's office," based on my previously established profile), the turn-by-turn and overall directions are typically best viewed through a map and turn-by-turn directions as well, something similar to what we're used to seeing on MapQuest. In essence, a multimodal application, when executed on a desktop device, would be an application very similar to MapQuest but would allow the user to talk/listen to the system for parts of the application's input/output as well - for example, the starting and destination addresses. That's multimodal. Imagine the same application using the same interface on a wirelessly connected PDA. Now we're talking true mobile/multimodal application. If we let our imaginations go a little bit wilder, we can easily extend the same application to the dashboard of our car or any other device we can imagine working with. That's really the vision, which, given the current state of technology, isn't far away. Another modality that can be added to the example application would be a pointing device that would zoom the map, focusing on a particular location.

So how does SALT fit in with all this? SALT has been built on the technology required to allow applications built using SALT to be deployed in a telephony and/or multimodal context.

SALT Application Model
With the evolution of the SALT specification and the platforms and tools around it, the exact architecture of SALT-based interactive, speech-driven applications is yet to surface.

Figure 1 presents an abstract representation of application architecture for deploying and using SALT-based applications. As expected, it's very similar to that of a Web application, with two major differences. In this case the application is also capable of delivering dynamic speech interactions (if the appropriate browser is capable of handling SALT, e.g., through an add-in or natively), and a stack is present that represents a set of technologies broadly representing the integration of speech recognition/synthesis and telephony platforms.

A note of caution: this diagram is really a conceptual representation. Where exactly the SALT browser/interpreter and speech recognition/synthesis components fit in depends on the capabilities of the end-user device/browser - actual implementation of the SALT stack may vary based on vendor implementations.

The Advanced Speech Recognition (ASR) component focuses on recognizing spoken user utterances based on speech grammars, whereas the Text-to-Speech synthesis component is focused on dynamically converting text messages into voice output.

When SALT applications are used in the telephony world, the telephony integration component connects the speech platform with the world of telephones, the Public Switched Telephony Network (PSTN). This is typically achieved by integrating telephony cards with the analog/digital telephony lines of your telephony provider (your phone company).

When SALT applications are used to enhance the interactions of Web-based applications by adding multimodality to the application, a typical Web application delivery framework (based on TCP/IP/HTTP/HTML/JavaScript etc.) is used for delivering the Web application, and the speech/telephony platform is used for the "speech/voice" aspect of the whole interaction, depending on the nature of the connection and the location of the speech recognition/synthesis components. Both interactions can happen together seamlessly, as part of the same user session, on the user's choice.

SALT & VoiceXML
It's clear that SALT and VoiceXML have utilized the Web application delivery model as an open platform for delivering telephony applications. However, VoiceXML and SALT have different technical goals - whereas VoiceXML tends to focus on development of telephony-based applications (applications used through a phone), SALT focuses on adding speech interaction and telephony integration to existing Web-based applications and enable true multimodality. In this case, I would also like to highlight the development of another upcoming standardization initiative called X+V (which stands for XHTML+Voice), an effort to combine VoiceXML with XHTML to develop multimodal applications.

Another difference between SALT and VoiceXML is the overall approach that has been utilized to develop applications. Whereas VoiceXML is pretty much declarative in nature, utilizing its extensive set of tags, SALT is very procedural and script oriented, having a very small set of core tags.

Also, it's important to understand that SALT actually utilizes key components of the standardization effort carried at the W3C Voice Browser Activity, including the XML-based Grammar Specification and the XML-based Speech Synthesis Markup Language. Both these specifications have been used by the VoiceXML 2.0 specification as well.

Hello SALT
The best way to introduce a language is to show what it actually looks like. Following the tradition of the famous "Hello World" program, let's see how a SALT application says "Hello World."



Hello World

As you can see , SALT tags () have been added to an existing XHTML document and are initiated through methods - Start(). For instance, when the above document is loaded in a SALT 1.0-compatible browser (it could be a telephony browser or a multimodal desktop/handheld browser) it would say "Hello World" using a Text-to-Speech (TTS) engine.

Going further with our exploration of SALT, let's look at how SALT applications provide speech recognition capability. The code shown in Listing 1 can be used as the basic template for an interactive order information system.

To understand the various components of this multimodal application, let's look at a snapshot of the sequence of actions performed when this application is launched within a SALT-compatible browser.

  • When the application is started, the RunApp() JavaScript function initiates the prompt "Welcome" to be played.
  • It is followed by the playing of the prompt "AskOrderNumber".
  • Speech recognition is then initiated for the form parameter "number" using the external XML grammar OrderInfo.xml”.
  • Once the recognition is over, the function processOrderStatus() submits the value ("number") to the server-side script "GetOrderStatus.aspx", which would ultimately look up the sales application and provide order information.

    Elements of SALT
    Since SALT uses the underlying XHTML (or similar) markup and JavaScript as the core application delivery model, the core language represented by the SALT 1.0 specification is really a small collection of tags - prompt, listen, grammar, record, dtmf, and smex. Table 1 shows technical highlights and example usage of elements of SALT language.

    Microsoft .NET Speech SDK
    Support for SALT-based application development is available from Microsoft through the .NET Speech SDK. A Beta 2 of the SDK is available from Microsoft's Web site. The SDK has three key components:
    1.   An add-in for Microsoft Internet Explorer that recognizes SALT tags and allows users to interact with the application using the desktop's microphone and speakers/headphones
    2.   A set of ASP.NET-based speech controls, which allow developers using Microsoft Visual Studio .NET to create multimodal/telephony applications and/or add speech interactivity to existing Web applications developed using Microsoft .NET and ASP.NET frameworks.
    3.   Development and debugging tools, including grammar builder, prompt builder, and speech debugger tools to aid in the development of grammars and prompts.

    It's important to understand that a SALT-based application can be delivered using a non-ASP.NET Web application framework (e.g., Perl or JavaServer Pages). What the .NET Speech SDK provides is ease of development in adding speech to your existing Web applications or creating new applications.

    Conclusion
    With the emergence of SALT, it is pretty clear that there has been some confusion regarding a single standard for development of open standards-based telephony applications. Even with different key technology objectives and implementation models, both VoiceXML and SALT can be utilized to develop telephony applications. However, I believe that the SALT 1.0 specification and Microsoft's participation in the speech application world is very positive. In this article we took a quick look at the technical highlights and capabilities that the SALT 1.0 specification can provide to existing Web-based applications, extending them to provide multimodal capability and/or extend existing applications to the telephony world. I'll continue my exploration of SALT in another article, in which I'll explain how to develop SALT- based multimodal and telephony applications using Microsoft .NET Speech SDK.

    References

  • SALT Forum: www.saltforum.org
  • SALT 1.0 Specification: www.saltforum.org/downloads/SALT1.0.pdf
  • Microsoft .NET Speech SDK: www.microsoft.com/speech
    About Hitesh Seth
    Hitesh Seth is chief technology officer of ikigo, Inc., a provider of XML-based web-services monitoring and management software. A freelance writer and well-known speaker, he regularly writes for technology publications on VoiceXML, Web Services, J2EE and Microsoft .NET, Wireless Computing & Enterprise/B2B Integration. He is the conference chair for VoiceXML Planet Conference & Expo.

  • ghkazi wrote: hello
    read & respond »
    XML JOURNAL LATEST STORIES . . .
    3rd International Virtualization Conference & Expo: Themes & Topics
    From Application Virtualization to Xen, a round-up of the virtualization themes & topics being discussed in NYC June 23-24, 2008 by the world-class speaker faculty at the 3rd International Virtualization Conference & Expo being held by SYS-CON Events in The Roosevelt Hotel, in midtown
    Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
    Red Hat is a trusted open source provider. Red Hat offers enterprise customers a long-term plan for building infrastructures on the quality and innovation of open source. Combining open source operating system platform, Red Hat Enterprise Linux, together with applications, management
    JustSystems Contributes Key XBRL Rendering Technology to Financial Community
    JustSystems announced that it is contributing intellectual property rights for its invention of eXtensible Business Reporting Language (XBRL) rendering technologies to XBRL International, the standards body responsible for the oversight of the XBRL specification. The invention, known a
    JustSystems Launches Campaign for XBRL Success
    JustSystems announced its campaign to help organizations adopt XBRL (eXtensible Business Reporting Language), the XML-based standard for communicating financial and business information. In related news, JustSystems also announced that it has contributed intellectual property rights of
    Virtualization Meets DaaS - Desktop-as-a-Service
    After a $1.5 million angel round, Desktone, which was started in 2006 by Eric Pulier, who also started SOA Software, US Interactive and IVT, picked up $17 million in first-round funding about a year ago from Highland Capital Partners, SoftBank Capital, Citrix Systems and the China-base
    SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
    SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

    SYS-CON FEATURED WHITEPAPERS


    ADS BY GOOGLE
    BREAKING XML NEWS
    RCG IT Addresses BI and SOA Convergence and Business Architecture at TDWI World Conference in Chicago
    RCG Information Technology, Inc. (http://www.rcgit.com/) will participate in The Data Wareho