|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS Feature Telecommunications Meets XML
Telecommunications Meets XML
By: Marascio Louis
Dec. 21, 2000 12:00 AM
Traditional telecommunications reside in the realm of time division multiplexed circuit switches. Million-dollar pieces of equipment are king, and their thrones demand room-sized real estate. Among their myriad physical requirements, these switches present an entirely different set of hurdles when telecommunications providers try to deploy services on top of them. You see, stubbornness seems to run in the family for TDM circuit switches, and writing software to utilize their services is a daunting task. It should come as no surprise then for me to tell you that programmers for these (somewhat) esoteric beasts are quite rare. Now, I've never written software for a TDM telephony switch before, but I've met people who have and listened to their stories. More important, I've looked into their eyes and seen their pain - no one wants to do what they did. As always with this type of situation, the price increases, reusability decreases, and flexibility seems to disappear. Enter Voice-over-IP (VoIP), the art of putting voice chopped up into little segments and shoved into TCP/IP packets across standard data networks (see Figure 1). Convergence, they call it. Why have two networks, telephone and data, when you can sneak by with just one? An entirely new subset of telecommunications has begun to emerge around this concept, now often referred to as next-generation telecommunications. Protocols like H.323, SIP, and MGCP have emerged to handle the growing set of technological problems VoIP introduces. There's a difference, however, in how the problems are being solved today versus how our predecessors approached them. VoIP proponents have the luxury of the Internet perspective. Open standards are hot. Protocols developed "the Internet way" are being adopted and put into use throughout the new telecommunications industry. Putting voice traffic over data networks becomes extremely interesting when we reapproach the subject that circuit-based networks have so many problems with: services. There's always a lot of buzz around new technologies with the potential to reshape the world, and VoIP is certainly no exception. One common statement goes something like this: "When I pick up a telephone to make a call, I couldn't care less about which network it travels over. What I really care about are the services offered to me during that call." Indeed, it's a valid statement and often comes from the mouth of some old-school telecommunications executive in an attempt to discredit VoIP providers. Ironically, it's this reality the next-generation VoIP provider must embrace. The user's experience is impacted by the services, not the network - we accept this. Enhanced services take advantage of the underlying data network to more effectively impact the user. You might be thinking to yourself that traditional telecommunications companies have to deploy new services, too. You're right, but remember, those circuit switches are a major pain in the butt, so deploying new services effectively, en masse, and with reduced cost is almost impossible. On top of that, in the circuit-switched world there's no data network to leverage in producing services. VoIP providers use data networks, which give us all kinds of flexibility (see Figure 2). We can do things that traditional telecommunications companies can't. We can provide services to users over our data network from machines placed anywhere on that network. We can cluster. We can scale. We can use application servers to host any next-generation enhanced service. Part of the solution already exists in the protocols I described earlier, like SIP and MGCP. So, the landscape is painted. Much to the chagrin of old-school telecom, VoIP has the opportunity and means to do exactly what they say we must do to survive: provide enhanced services. Of course, now that you know we can provide enhanced services, how exactly do you describe the details of one? There are all sorts of issues, ranging from call flow to voice interaction. It could be a true mess and, done improperly, suffer from the same problems that those nasty TDM switches present. Once again, however, next-generation telecommunications benefit from the Internet age - enter XML. Everything that goes into a service for telecommunications users can be elegantly described using XML. A couple of companies have seen this and are positioning themselves to capitalize on the intersection of the need and the means of next-generation telecommunications providers. Open standards such as the XTML, VoiceXML, and CallXML have been defined and are being used in VoIP networks. The rest of this article focuses on these new schemas, how they can be used, and their drawbacks. At the end, for those of you interested in getting your feet wet in the telecommunications industry, I'll provide a series of resources for more information.
VoiceXML
VoiceXML has one significant drawback: it makes no solid attempt to define call flow. Call flow is just that - the way the interaction with the user on the other end of the telephone flows as the call progresses. VoiceXML simply defines the prompts, menus, and options that a user is presented with. The order in which they're presented can be toothpicked together but, in general, is not VoiceXML's problem. Listing 1 shows the age-old tradition in full effect. "Hello, World" in VoiceXML might not appear very exciting, but a few elements are worth noting. The <vxml> element is contained at the top level of the document. It has a version attribute to let the VoiceXML parsers know what to expect. The <vxml> tag is primarily used as a container for dialogs, which can be either forms or menus. Next, you see, as expected, the <form> element. In VoiceXML forms present information to the user and gather input. In our example you would hear "Hello, XML-Journal reader." Finally, the <block> element houses what we want to say to the user. Formally, <block> elements are containers of noninteractive executable code. True to form, the text representation of what we want to have spoken to the user is in our example. Let's move on to a more interesting example. Suppose you have a friend who wants to let people call a phone number to hear jokes. The user will be presented with a menu and, depending on his or her selection, hear a random joke from that category. Listing 2 shows what the VoiceXML representation of this service is. First, we start off with the familiar <vxml> tag. The next tag is something new. The <menu> element defines a dialog with categories that users can choose from. In this case we'll let users pick knock-knock, Aggie, or bad jokes. The next element is <prompt>, which holds data to be played to the user requesting input. Finally, the three possible choices are presented using the <choice> element, which is of particular interest in this example. Our joke hotline allows users to navigate and make choices by pressing keys on their telephones, producing DTMF tones. The <choice> element has two attributes in this example: dtmf and next. The dtmf attribute identifies the digit to expect for that choice. The next attribute signifies where to go when that input is received. As you can see, our example goes to a machine, then jokes, and then executes a program. These two VoiceXML examples are simplified to demonstrate the basic structure of an IVR system. It's possible to define rudimentary flow to your IVR application using VoiceXML, but VoiceXML is not particularly suited to do so. There are many more elements that can go into a VoiceXML document, and as this number grows, the application becomes extremely complex and hard to visualize. A more generic approach is necessary for defining call flow. CallXML from Voxeo Corporation defines a basic block/action/event interface model to represent the different paths a call might take.
CallXML
Revisiting the simple "Hello, World" example, Listing 3 shows how this might be implemented using CallXML. The uppermost element in Listing 3 is <callxml>. Like VoiceXML, this is a container for the rest of the elements to come. If you're familiar with HTML, the <callxml> tag can be considered the same as the <html> tag. It serves only a semantic role. Next, a <block> defines a group of elements to be bound together. To facilitate call flow, <block> elements have a label attribute, which can be used from other CallXML elements to refer to particular blocks of CallXML. Next, we encounter an element that illustrates an important difference between CallXML and VoiceXML. CallXML is event driven, meaning certain blocks of CallXML are executed based on the occurrence of different events. Here, we're telling the CallXML gateway to pick up the line and answer an incoming call. But what happens if an error occurs when the call is answered? The <answer/> element will generate an event, onError, that can be handled elsewhere in the CallXML. Finally, the <playAudio> element tells the gateway to play an audio file, found somewhere on the network, to the user. I'd like to modify our old joke hotline so I can spread the joy to friends who are feeling down. I'd want to call the joke line, enter my friends' phone numbers, and have the joke hotline call them to tell a joke. Listing 4 illustrates how this can be accomplished using CallXML. The CallXML begins as you might expect with <callxml> and <block> elements. Then we see an <answer> element telling the gateway to pick up the line. Because I want to play the joke for my friend, I need to prompt the user for some input, namely, a <playAudio> tag. Next, <getDigits> is used to retrieve input from the user. This element is not one that we've encountered in the past. The two attributes used within the element are var and maxDigits. The var attribute specifies the name of the variable to store the input in. This can be referenced later by using the syntax "$varname;". Next, maxDigits defines the maximum number of input digits to retrieve from the user. When the maximum number is hit, an event is generated that can be handled to continue the call flow. This is exactly what's happening with the element, <onMaxDigits>. Inside this element we can make an outbound call to the phone number entered with <call>, then, when my friend picks up the line, play the joke of the day. This example only touches on the features provided by the CallXML specification. CallXML gives us total call control capability, leverages the power of the Web, and is extremely easy to use.
The eXtensible Telephony Markup Language - XTML
Next-generation telecommunications is a dynamic industry. New protocols are always being proposed, different hardware is installed, and new ideas are discussed. The creators of XTML recognized this dynamic and built the schema around a core set of functionality that can be extended as the requirements of the problem change. The previous schemas, VoiceXML and CallXML, lack this flexibility. This does, however, increase the complexity of the language slightly, but can quickly be overcome with usage. Let's define a simple "Hello, World" telephony application using XTML. Before we get to the example, let me describe the basic structure of an XTML document. Like CallXML, XTML is based around an event model. Events that are interesting to a particular XTML document are identified, then bits of XTML within that document called functions are created to handle those events. Functions can accept parameters and return values as you might expect, and follow a sequence of actions. These actions are defined and results are mapped to further actions. It's really a very elegant and powerful way to describe a call flow. Now back to the example. Listing 5 shows what the XTML for our "Hello, World" looks like. Whoa! I can hear you screaming "Slow down already!" XTML's version of "Hello, World" is much more complex and quite a bit more obscure. As I said, this is due to the extreme flexibility and extensibility of XTML. Let's take it step by step and I'm sure it will become clearer to you. The <xtml> tag declares the XTML document's root element. Remember when I said that XTML is made up of core functionality with extensions? Well, the <xtml> tag is where we identify what schema defines this core functionality. This is accomplished through the xmlns attribute. Next, we declare the events we're interested in and identify the functions responsible for handling them. The <event> element specifies the event name, represented by the name attribute and the handler associated with it. It's important to note the naming convention for the event: Vendor.Event.Version. This is followed throughout XTML. Let me stop here for a moment to point out the way XTML can be extended. Anyone can implement and deploy new events to the application server and then reference these new events in their XTML. This model occurs within different elements of XTML. For example, there's a standard set of actions, and developing new actions and referencing them from XTML can expand this set. Now, back to the example. After declaring the events that are interesting to us, we define a set of reusable functions. The <functions> element declares the start of this block of XTML and then each function is defined using a <function> element. In our example there are three attributes of interest to a function. First, we give it a label using the name attribute, then we identify the first action in this function's sequence of execution using the start attribute. Later, we'll define the actions that make up this function, and one of them must have an ID that matches the function's start attribute. A return type is then identified with the returns attribute. The function definition continues by declaring parameters and local variables. This is very similar to the semantics of functions in regular programming languages. Finally, the <actions> element signifies the beginning of the action declarations. Each action has an ID and plug-in attribute. Of particular interest here is the plug-in. As I said before, XTML is flexible, and this is another area where the functionality can be extended. Third parties can develop new plug-ins that can be referenced from XTML documents. The first action uses an <announcements> element, which is contained in a separate schema, notably one that defines call extensions to XTML, to play the audio for our "Hello, World" example. Each action has a series of results that specify what actions should follow. The final two actions will terminate the call via a hang-up, then end the session with the application server.
Visual Tools
Other Standards and Initiatives
A large part of the problem is that within several months multiple ways of defining enhanced services have begun to pop up. XTML, CallXML, and VoiceXML each have their own high and low points. VoiceXML is great for defining IVR interaction but lacks real call control characteristics - you can't make an outbound call, for example. CallXML is very straightforward and easy to use, but it lacks real flexibility and extendibility. XTML is arguably the most powerful of all the approaches, but it's a bit more complex and hard to grasp. Unfortunately, no standards bodies are overseeing the development of these efforts.
I'm tempted to draw a comparison with the Java Application Server and J2EE situation. Sun found it necessary to have a common way to write enterprise applications but knew wide-scale acceptance would come from implementation options. They defined the interfaces and left it to the application server vendors to provide the service. The same approach might work well in the enhanced-services arena. Perhaps a standards body made up of a cross-section of industry leaders could define a comprehensive XML schema for enhanced-service definition. It would then be left to the Voxeos and Pactoluses of the world to implement the standard and provide an application server. Conclusion and Acknowledgments
If you'd like to discover what else XML brings to the telecommunications industry, I refer you to the specifications for VoiceXML (www.voicexml.org), CallXML (community.voxeo.com), and XTML (www.pactolus.com). Moreover, I urge you to e-mail me at lmarascio@pointone.com with any questions or feedback you might have. Finally, I'd like to thank Jasson Casey, Dave Horton, and Robby Slaughter for reading and providing feedback for improvements on the drafts of this article. XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||