|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS XML Protocols Scripting in XML: A New Standard
Scripting in XML: A New Standard
By: Richard Lanyon
Dec. 21, 2000 12:00 AM
As XML becomes accepted as the format for document markup, the demand for XML tools is also increasing. In particular, there's a need for a tool that can be easily programmed to perform any number of general reporting and editing operations on XML files. In the world of ASCII text, this role is fulfilled by scripting languages such as Perl and Python; such scripts sit behind many Web servers and are used in almost any situation where business requirements change so quickly that easy modification is of prime importance. It's entirely possible to use add-ons or modules that enable the use of Perl or Python with XML, but the fact remains that the native syntax of these languages has been very carefully designed for use with plain text. Also, if XML really is the ideal format for markup, the advantages of using XML syntax ought to apply just as much to a programming language as to documents. One programming language that's written in XML, for XML, is XSLT (eXtensible Stylesheet Language for Transformations). This was originally designed as part of the XSL project, specifically for rearranging XML documents prior to their conversion into presentation formats such as PDF, HTML, and DVI. However, because it's written in XML and there's such a demand for a general-purpose XML scripting language, XSLT has been proposed for this role. There are, though, many aspects of XSLT that make it awkward to use for those who don't have a background in publishing and stylesheets. In particular, XSLT is based on a model in which a series of templates are used to describe rules for converting a read-only input document into an output document - XSLT does not contain variables in the traditional programming sense of the word. This is difficult for programmers accustomed to reading a document into a set of variables, modifying the variables as necessary, and then serializing them to a new document. However, there's no reason why alternative languages should not be developed and used. One such language is XML Script designed by DecisionSoft Ltd. XML Script was designed as an XML-based alternative to Web-based scripting languages, such as Perl, PHP, and ColdFusion, and as such its "feel" is much more familiar to programmers used to a Web programming environment. However, the usefulness of the original version of XML Script (embodied in products such as X-Tract v1) was limited by the fact that it was designed before the advent of Namespaces, XPath, or many of the other technologies associated with XML. This article seeks to show how a language built on XML Script - we call it XML Script 2 - could be developed that combines ease of use with XML technology. In many respects XML Script 2 strives to be like a "normal" programming language, using the convention that elements in a script are analogous to function calls, and the attributes and content of that element provide parameters. Listing 1 shows the ubiquitous "Hello, World!" program written in XML Script 2. Exactly where the string Hello, World! goes will depend on the processor. It's important to note that an XML Script processor could be a stand-alone command-line processor or a module as part of some distributed network-based system using DCOM, HTTP/CGI, or whatever. For the purposes of this article it's probably easiest to think of output being sent to stdout. XML is also used for constructs that control program flow, such as if tests and for/foreach/while loops. The code shown in Listing 2 will print "Hello, World!" four times (the loop counter is inclusive). While these examples demonstrate how a script is processed, what's particularly important is how an XML Script 2 processor stores its data. The processor maintains an internal representation of an XML document as a DOM-like tree or something similar, and this initially consists of just a single empty element. Other XML documents or even, potentially, fragments of documents can then be pasted into this dummy data document. The advantage of copying the content of all XML input files into the same internal document is that all the content can then be referred to using a tree navigation syntax such as XPath (XML Script 1 actually used its own syntax, but the precise details are not important here). So, given some files called foo.xml and bar.xml and a script similar to that shown in Listing 3, "Hello, World!" will be printed 20 times. The script reads the files foo.xml and bar.xml and pastes them into the internal data tree. The for command then runs a loop, first setting the counter equal to the value of the first attribute of the foo element and running until the counter is equal to the value of the last attribute of the bar element. This example illustrates not only how XML Script 2 uses a tree to store programming variables, but also how certain commands (such as "for") can refer to those variables. Note: These are not read-only variables as they are in XSLT. Thus the script in Listing 4 reads the contents of foo.xml into the internal data tree and then alters the value of all attributes named attr1 that are part of the foo element or its children, setting them all equal to the text newvalue. Commands such as set are said to act in creative mode. What's interesting is that creative mode commands will not only change XML content that's already present, but will also create new content where none previously existed. Thus the previous example will actually set an attr1 attribute on foo and all its descendant elements. This is analogous to the way Perl will automatically create variables whenever assignments are made. There obviously needs to be some restrictions on what can be created, to answer questions like "What happens if you use the XPath '//' as the target of a creative mode command?". Some simple rules that can be applied include:
The example in Listing 4 doesn't actually do anything with the newly created data on the data tree, though; some general mechanism is needed to refer to the data tree from within a script. One way would be to use a command such as expr, which indicates that its contents are to be interpreted as an expression. This technique is used in Listing 5. It reads in data from foo.xml, alters all attr1 attributes, then outputs the foo element. While the expr command will be the solution to most of these problems, expr elements cannot be included everywhere - in particular they can't be included in attributes. This may not seem like a problem; for example, in Listings 2 and 3 the from and to attributes of the for command are automatically treated as expressions. However, it would be a mistake if this were always the case. The data commands shown in Listings 3-5 shouldn't treat the value of its src attribute as an expression - instead it should be interpreted as a literal URI. Sometimes, though, an XML Script 2 programmer will want to decide the value of this string at runtime. This is possible using interpolations, as shown in Listing 6. The script first reads the contents of list.xml onto the data tree, "foreach"es over each of the file elements that have been placed on the data tree, then reads each of the listed files onto the data tree. It does this because the contents of the src attribute of the data element contain the special hash characters. In XML Script hashes indicate the start and end delimiters of interpolations - expressions that are evaluated at runtime to give text strings. One important aspect of XML Script 2, included in both previous examples, is what happens when commands are nested within one another. It can be seen that elements that are children of output elements are themselves executed as commands before any output is produced. It's the result of processing all the children of output elements that's actually sent to the output of the processor. All commands have a defined result - in the case of the expr command the result is a list of nodes, whereas in the case of the data command the result is empty. On the other hand, the content of for or foreach elements isn't processed in quite this way, as a new result is calculated for each iteration of the loop. Exactly when or if the contents of a command are executed depends entirely on the command - this may seem inexact, but in fact it leads to a much more flexible and intuitive language. While all of this is enough to make XML Script 2 useful as a tool for small scripts, for even medium-size projects it's essential to allow code to be modularized. In traditional programming languages this is accomplished by allowing programmers to write their own functions. XML Script 2 essentially allows the same thing, albeit in a slightly different way. All elements that are part of the script are treated as potential commands - the name (and namespace) of the element is looked up in an internal table called the method table to determine what to do when encountering that command. It's entirely possible to create new entries in the method table using the method command. This associates a particular element name with a particular block of XML Script code, which must have previously been created with a template command. The example shown in Listing 7 should print "Hello, World!" twice - once for each greeting element in the www.fred.com/ namespace. Note: It's also possible to use the process command to apply methods to XML content that's on the data tree (see Listing 8). An alternative to adding user-defined methods from within a script is to add customized built-in methods to a particular XML Script processor. This is possible as long as the custom methods aren't in the www.xmlscript.org/2.0/ namespace, allowing people to use the commands they want from the XML Script 2 standard in combination with their own purpose-written code. As the basic framework of XML Script 2 is simple and straightforward, adding further methods should be easy. The overall intention is that XML Script 2 should be both easy to use and very flexible, which is of vital importance in a marketplace that's becoming swamped with competing standards. The syntax presented in this article is still under development and probably not perfect, but we believe that ease of use is an important consideration in an XML world dominated by complex specifications. Note: A family of XML Script 2 processors is under development by DecisionSoft Ltd. While DecisionSoft primarily creates server software for commercial use, the XML Script 2 specification isn't intended to be proprietary, and namespacing techniques can be used to separate extensions from core commands. At the time of writing, the specification is still under development, and any suggestions or comments should be addressed to xmlscript-editor@decisionsoft.com. XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||