YOUR FEEDBACK
Verizon Becomes a Counter-Android Linux Convert
JNels wrote: Hey - Jeffrey Nelson here at Verizon Wireless. Not a bit of ...
SOA World Conference
Virtualization Conference
$200 Savings Expire May 16, 2008... – Register Today!


2007 West
GOLD SPONSORS:
Active Endpoints
Your SOA Needs BPEL for Orchestration
BEA
Virtualized SOA: Adaptive Infrastructure for Demanding Applications
Nexaweb
Overcoming Bandwidth Challenges with Nexaweb
TIBCO
What is Service Virtualization?
SILVER SPONSORS:
WSO2
Using Web Services Technologies and FOSS Solutions
Click For 2007 East
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


What Is XQuery?

Digg This!

The World Wide Web Consortium is working on finalizing the specification for XQuery, aiming for a final release in mid to late 2002. XQuery is a powerful and convenient language that's designed for processing XML data - not just files in XML format, but other data (including databases) whose structure (nested named trees with attributes) is similar to XML.

XQuery is an interesting language with some unusual ideas. This article is intended to give you a hawk's-eye view of XQuery, introducing the main ideas you should understand before you go deeper...or actually try to use it!

An Expression Language
The first thing to notice is that in XQuery everything is an expression that evaluates to a value. An XQuery program or script is just an expression, together with (optionally) some function and other definitions. So the following:

3+4

is a complete and valid XQuery program that evaluates to the integer 7.

There are no side effects or updates in the XQuery standard, though those will probably be added at a future date. The standard specifies the result value of an expression or program, but it doesn't specify how it's to be evaluated. Therefore an implementation has considerable freedom in how it evaluates an XQuery program and what optimizations it does.

Here is a conditional expression that evaluates to a string:

if (3 < 4) then "yes!" else "no!"

You can define local variable definitions using a let-expression:

let $x := 5 let $y := 6 return 10*$x+$y

This evaluates to 56.

Primitive Data Types
The primitive data types in XQuery are the same as for XML Schema:

  • Numbers, including integers and floating-point numbers
  • The boolean values true and false
  • Strings of characters (for example, "Hello world!" [these are immutable, i.e., you can't modify a character in a string])
  • Various types to represent dates, times, and durations
  • A few XML-related types (for example, a QName is a pair consisting of a local name [like template] and a URL, which is used to represent a tag name like xsl:template after it has been namespace-resolved)

    Derived types are variations or restrictions of other types, for example, range types. Primitive types and the types derived from them are known as atomic types, because an atomic value doesn't contain other values. Thus a string is considered atomic because XQuery doesn't have character values.

    Node Values and Expressions
    XQuery, of course, also has the necessary data types needed to represent XML values. It does this using node values, of which there are seven kinds: element, attribute, namespace, text, comment, processing-instruction, and document (root) nodes. These are very similar to the corresponding DOM classes such as Node, Element, and so on. Some XQuery implementations use DOM objects to implement node values, though implementations may use other representations.

    Various standard XQuery functions create or return nodes. For example, the document function reads an XML file specified by a URL argument, and returns a document root node. (The root element is a child of the root node.)

    You can also create new node objects directly in the program. The most convenient way to do that is to use an element constructor expression, which looks just like regular XML data:

    <p>See <a href="index.html"><i>here</i></a> for info.</p>

    You can use {curly braces} to embed an XQuery expression inside element constructors:

    let $i := 2 return
    let $r := <em>Value </em> return
    <p>{$r} of 10*{$i} is {10*$i}.</p>

    creates:

    <p><em>Value </em> of 10*2 is 20.</p>

    Popular template processors such as JSP, ASP, and PHP allow you to embed expressions in a programming language into HTML content. XQuery gives you that ability, plus the ability to embed XML/HTML forms inside expressions and to have them be the value of variables and parameters.

    XQuery node values are immutable (you can't modify a node after it has been created).

    Sequences
    We've seen atomic values (numbers, strings, etc.) and node values (elements, attributes, etc). These are together known as simple values. XQuery expressions actually evaluate to sequences of simple values. The comma operator can be used to concatenate two values or sequences. For example:

    3,4,5

    is a sequence consisting of three integers. Note that a sequence containing just a single value is the same as that value by itself, and you cannot nest sequences. To illustrate this, we'll use the count function, which takes one argument and returns the number of values in that sequence. So this expression:

    let $a := 3,4
    let $b := ($a, $a)
    let $c := 99
    let $d := ()
    return (count($a), count($b), count($c), count($d))

    evaluates to (2, 4, 1, 0), because $b is the same as (3,4,3,4).

    Many of the standard functions for working with nodes return sequences. For example, the children function returns a sequence of the child nodes of the argument:

    children(<p>This is <em>very</em> cool.</p>)

    and returns this sequence of three values:

    "This is ", <em>very</em>, " cool."

    Path Expressions and Relationship to XPath
    XQuery borrows path expressions from XPath. In fact, XQuery can be viewed as a generalization of XPath: except for some obscure forms (mostly unusual "axis specifiers"), all XPath expressions are also XQuery expressions. For this reason the XPath specification is also being revised by the XQuery committee, with the plan that XQuery 1.0 and XPath 2.0 will be released about the same time.

    The following simple example assumes an XML file "mybook.xml" whose root element is a <book>, and it contains some <chapter> children:

    let $book := document("mybook.xml")/book
    return $book/chapter

    The document function returns the root node of a document. The /book expression selects the child elements of the root that are named book, so $book gets set to the single root element. The $book/chapter selects the child elements of the top-level book elements, which results in a sequence of the second-level chapter nodes, in document order.

    The next element includes a predicate.

    $book//para[@class="warning"]

    The double slash is a convenience syntax to select all descendants (rather than just children) of $book, selecting only <para> element nodes that have an attribute node named class whose value is "warning".

    One difference to note between XPath and XQuery is that XPath expressions may return a node set, whereas the same XQuery expression will return a node sequence. For compatibility, these sequences will be in document order and with duplicates removed, which makes them equivalent to sets.

    By the way, XPath expressions are mostly used as patterns in XSLT stylesheets. XSLT (XSL Transformation, where XSL stands for XML Stylesheet Language) is a rule-based language for transforming an input XML document into a result XML document. XSLT is very useful for expressing very simple transformations, but more complicated stylesheets (especially anything with nontrivial logic or programming) can often be written more compactly and readably using XQuery.

    My article, "Generating XML and HTML Using XQuery" ( www.gnu.org/software/qexo/XQ-Gen-XML.html), explains further how to generate XML documents and HTML Web pages using XQuery.

    Iterating over Sequences
    A for expression lets you "loop" over the elements of a sequence:

    for $x in (1 to 3) return $x,10+$x

    The for expression first evaluates the expression following the in. For each value of the resulting sequence, the variable (in this case $x) is then bound to the value, and the return expression evaluated using that variable binding. The value of the entire for expression is the concatenation of all values of the return expression, in order. So the example evaluates to this six-element sequence:

    1,11,2,12,3,13

    Here's a more useful example. Assume again that mybook.xml is a <book> that contains some <chapter> elements. Each <chapter> has a <title>. The following will create a simple Web page that just lists the titles:

    <html>{
    let $book := document("mybook.xml")/book
    for $ch in $book/chapter
    return <h2>{$ch/title)</h2>
    }</html>

    The term FLWR expressions includes both for and let expressions. The acronym FLWR refers to the fact that it consists of one or more for and/or let clauses, an optional where clause, and a result clause. A where clause causes the result clause to be evaluated only when the where expression is true.

    The following is an example illustrating the where clause. This example has a nested loop, allowing us to combine two sequences, one of customer elements, the other of order elements. We want to find the name(s) of customers who have ordered the part whose part_id is "xx".

    for $c in customers
    for $o in orders
    where $c.cust_id=$o.cust_id and $o.part_id="xx"
    return $c.name

    This is essentially a join of two tables, as commonly performed using relational databases. An important goal for XQuery is that it should be usable as a query language for "XML databases." Compare the corresponding SQL statement:

    select customers.name
    from customers, orders
    where customers.cust_id=orders.cust_id
    and orders.part_id="xx"

    Functions
    XQuery wouldn't be much of a programming language without user-defined functions. Such function definitions appear in the query prologue of an XQuery program. It's worth noting that function parameters and function results can be primitive values, nodes, or sequences of either.

    Below we define a recursive utility function. It returns all the descendant nodes of the argument, including the argument node itself. It does a depth-first traversal of the argument, returning the argument and then looping over the argument node's children, recursively calling itself for each child.

    define function descendant-or-self ($x)
    {
    $x,
    for $y in children($x)
    return descendant-or-self($y)
    }
    descendant-or-self(<a>X<b>Y</b></a>)

    evaluates to this sequence of length 4:

    <a>X<b>Y</b></a>; "X"; <b>Y</b>; "Y"

    Sorting and Context
    If you want to sort a sequence, you can use a sortby expression. For example, to sort a sequence of books in order of author name you can do:

    $books sortby (author/name)

    The sortby takes an input sequence (in this case $books) and one or more ordering expressions. During sorting the implementation needs to compare two values from the input sequence to determine which comes first. It does that by evaluating the ordering expression(s) in the context of a value from the input sequence. So the path expression author/name is evaluated many times, each time relative to a different book as the context (or current) item.

    Path expressions also use and set the context. For example, in author/name the name children that are returned are those of the context item, which is an author item.

    Type Specification
    XQuery is a strongly typed programming language. Like Java, C#, and other languages, it is a mix of static typing (type consistency checked at compile time) and dynamic typing (runtime type tests). However, the types in XQuery are different from the classes familiar from object-oriented programming. XQuery has types to match its data model, and allows you to import types from XML Schema.

    if ($child instance of element section)
    then process-section($child)
    else ( ) {--nothing--}

    This invokes the process-section function if the value of $child is an element whose tag name is section. XQuery has a convenient typeswitch shorthand for matching a value against a number of types. Here's an example of converting a set of tag names to a different set. It's a simple example of the kind of transformations that XSLT does well.

    define function convert($x) {
    typeswitch ($x)
    case element para return <p>
    {process-children($x)}</p>
    case element emph return <em>
    {process-children($x)}</em>
    default return process-children($x)
    }
    define function process-children($x) {
    for $ch in children($x) return convert($ch)
    }

    Resources
    The primary XQuery resource is www.w3.org/XML/Query at the World Wide Web Consortium. This has links to the draft standards, mailing lists, and implementations. The main documents follow:

  • The actual XQuery specification (www.w3.org/TR/xquery) isn't difficult to read, and is probably where you should go next.

  • The Data Model specification (www.w3.org/TR/query-datamodel) goes into nodes and the functions you can use to manipulate them.

  • The Functions and Operators specification (www.w3.org/TR/xquery-operators) defines the other (nonnode) functions, including string and date/time functions.

  • The Use Cases document (www.w3.org/TR/xmlquery-use-cases) contains lots of useful examples of XQuery programs to solve specific problems.

  • The Formal Semantics specification (www.w3.org/TR/query-semantics) uses formal mathematical notation and is not for the faint of heart. Most people should skip it.

    SIDEBAR
    WHERE TO GO NEXT

    There aren't many books on XQuery yet, mainly because there are significant loose ends in the specification. At the time of this writing, there is one book: Early Adopter XQuery from Wrox. I'm working with other authors on a book for Sams, which we hope will be ready soon after the standard is finalized.

    Obviously, there are no complete standards-conforming implementations either, but the XQuery site lists known implementations, some of which have executable demos. The only open-source implementation currently available seems to be my own Qexo implementation (see www.gnu.org/software/qexo/). The Qexo implementation is interesting in that it compiles XQuery programs on-the-fly directly to Java bytecodes. I welcome you to experiment with it. But in any case, I do recommend considering XQuery when you need a powerful and convenient tool for analyzing or generating XML.

    About Per Bothner
    Per Bothner in 1996 started the GCJ (GNU Compiler for the Java[tm] platform) project, which is the most active open-source Java implementation project. He implemented the Kawa framework for compiling high-level languages (including Scheme and XQuery) to Java bytecodes. He is currently an independent consultant.

  • XML JOURNAL LATEST STORIES . . .
    3rd International Virtualization Conference & Expo: Themes & Topics
    From Application Virtualization to Xen, a round-up of the virtualization themes & topics being discussed in NYC June 23-24, 2008 by the world-class speaker faculty at the 3rd International Virtualization Conference & Expo being held by SYS-CON Events in The Roosevelt Hotel, in midtown
    EDI to XML: A Practical Approach
    While EDI transactions account for most worldwide commercial activity, XML-based alternatives are beginning to gain traction. According to Forrester Research, stateful XML, stateless XML, and even flat file exchanges are all projected to grow at a faster rate than EDI over the next few
    Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
    Red Hat is a trusted open source provider. Red Hat offers enterprise customers a long-term plan for building infrastructures on the quality and innovation of open source. Combining open source operating system platform, Red Hat Enterprise Linux, together with applications, management
    JustSystems Contributes Key XBRL Rendering Technology to Financial Community
    JustSystems announced that it is contributing intellectual property rights for its invention of eXtensible Business Reporting Language (XBRL) rendering technologies to XBRL International, the standards body responsible for the oversight of the XBRL specification. The invention, known a
    JustSystems Launches Campaign for XBRL Success
    JustSystems announced its campaign to help organizations adopt XBRL (eXtensible Business Reporting Language), the XML-based standard for communicating financial and business information. In related news, JustSystems also announced that it has contributed intellectual property rights of
    SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
    SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

    SYS-CON FEATURED WHITEPAPERS


    ADS BY GOOGLE
    BREAKING XML NEWS
    IBM and HIPAAT Team to Give Patients Control Over Personal Health Information Access
    IBM (NYSE: IBM) and HIPAAT Inc. (HIPAAT), the leading provider of consent management solutions