YOUR FEEDBACK
NGASI Releases AppServer Manager 8.1
Dave Jenkins wrote: The remote server management is a welcomed added feature...
SOA World Conference
Virtualization Conference
$200 Savings Expire May 16, 2008... – Register Today!


2007 West
GOLD SPONSORS:
Active Endpoints
Your SOA Needs BPEL for Orchestration
BEA
Virtualized SOA: Adaptive Infrastructure for Demanding Applications
Nexaweb
Overcoming Bandwidth Challenges with Nexaweb
TIBCO
What is Service Virtualization?
SILVER SPONSORS:
WSO2
Using Web Services Technologies and FOSS Solutions
Click For 2007 East
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


Matters of Syntax
ConciseXML builds upon the important qualities of XML and S-Expressions

Digg This!

ConciseXML is a new syntax I co-developed with Christopher Fry that builds on the best features of XML and S-Expressions while eliminating their constraints.

XML originated with the document-markup world of SGML and has become the leading syntax of the Internet. Its use has been for documents and data - not for programming logic.

S-Expressions, or symbolic expressions, is the syntax behind Lisp-like languages, including Scheme. Basically, S-Expressions are nested lists of symbols. S-Expressions are used with languages that support the notion that code is data.

The many discussion postings on the Web regarding these topics would indicate a disconnect between the S-Expression and XML camps. The purpose of this article is to bridge the two worlds and offer a solution that addresses everyone's needs.

Syntax vs Language
A syntax is structural only. It defines the construction of valid expressions, not whether those expressions have a semantically valid meaning. For example, XML is a syntax that does not associate meanings with symbols or tagnames.

A language uses symbols, which are assigned specific meanings. For example, HTML is a language because the tags have associated meanings. XHTML is a language that uses XML syntax.

To illustrate the differences among the three syntaxes, let's start with a data structure expressed in XML, S-Expressions, ConciseXML, and also a commonly used semicolon-delimited syntax used by many programming languages, including Java. While there are multiple ways of encoding the data in XML and S-Expressions, only one is shown for the purposes of the first comparison.

  • XML

    <book isbn="0764525360" title="Water Language" copyright="2002"/>

  • S-Expressions

    (book "0764525360" "Water Language" 2002)

  • ConciseXML

    <book "0764525360" title="Water Language" copyright=2002/>

  • Semicolon-Delimited Syntax

    new Book("0764525360", "Water Language", 2002);

    There are a number of similarities among the first three syntaxes:

    • The expressions are delimited by characters that indicate the start and end of the expression. S-Expressions are delimited by a set of parentheses "( )", while XML and ConciseXML expressions use angle brackets "< />".
    • The expression name occurs immediately after the initial delimiter.
    • White space is used to separate arguments.
    • The syntax is independent of a specific language.
    The semicolon-delimited syntax is different from the others in several ways. A comma is used between arguments, and there is no set of delimiter characters that always begin or end an expression. Parsing a semicolon-delimited syntax requires that the parser have language-specific knowledge. The first three syntaxes can be easily parsed, independent of any language.

    Now let's look at some of the differences among the four syntaxes.

    Representing Arguments
    With XML, the arguments (XML attributes) must include a key to identify the argument name.

    Basic S-Expressions syntax does not support argument keys, and there is no one standard for adding keys, although there are several different ways people have attempted to add keys to S-Expressions.

    With ConciseXML, arguments have an optional key. Therefore the arguments can be either keyed arguments or unkeyed arguments. The previous example shows an unkeyed isbn argument, while the title and copyright arguments are keyed.

    Unkeyed or By-Position Arguments
    While XML requires the use of keyed arguments, the syntax of most programming languages, including S-Expressions, does not support keyed arguments. All arguments must be passed by position. The example below compares a call to a createBook method in a semicolon-delimited syntax used by languages such as Java, C#, or C++ with the same call in S-Expressions syntax.

  • Semi-Colon Delimited Syntax

    createBook("0764525360", "Water Language", 2002);

  • S-Expressions

    (createBook "0764525360" "Water Language" 2002)

    Definition of Content Argument
    XML can contain a non-empty element or an element with a content area. The term "empty element" refers to a call expression without content such as: <STUFF/>. The content argument is delimited with a special syntax familiar to HTML users: <STUFF> this is the content </STUFF>.

    XML requires a key for every argument. The workaround for this restriction is to use the content area of an XML expression for argument values. Here is a typical example of that encoding technique in XML:

    <books>
    <book isbn="0764525360" title="Water Language"/>
    <book isbn="0972006702" title="Water Programming"/>
    </books>

    Unlike most programming languages, XML does not have one standard method for representing an array or a generic ordered collection. Consider an ordered data structure consisting of an integer number, a decimal number, and a string. An array representing that data structure might appear as the following in several languages:

    [5, 10.3, "stuff"]

    There is no standard form to express this in XML because XML does not have a standard "array" or "vector" element and by-position arguments are not supported. In ConciseXML syntax, this example would be represented with unkeyed arguments and "vector" for the expression name in a similar style as S-Expressions.

  • ConciseXML

    <vector 5 10.3 "stuff"/>

  • S-Expressions

    (vector 5 10.3 "stuff")

    By allowing both keyed and unkeyed arguments, ConciseXML supports the by-position passing of arguments used in the syntax of almost every programming language, as well as the keyed arguments required by XML and HTML.

    Ordered Collections
    Most programming languages have a data structure that associates a key with a value. This structure is typically named a hash table, dictionary, or associative array. The value of the data can be any object, and typically the key can also be any object. With XML, argument keys are restricted to strings, and therefore many data structures are difficult to represent. For example, integer keys are not permitted in XML. In ConciseXML, however, integer keys are represented as shown in the following expression:

    <vector 0=5 1=10.3 2="stuff"/>

    Argument Value
    An argument value is the value of the argument, and does not include the argument key. Let's take another look at the first set of examples:

  • XML

    <book isbn="0764525360" title="Water Language" copyright="2002"/>

  • S-Expressions

    (book "0764525360" "Water Language" 2002)

  • ConciseXML

    <book "0764525360" title="Water Language" copyright=2002/>

  • Semicolon-Delimited Syntax

    new Book("0764525360", "Water Language", 2002);

    In all four syntaxes, the value of the isbn argument is the string "0764525360". The value of the "copyright" argument, though, is the string "2002" in XML and the integer 2002 in the other three syntaxes.

    In XML, every argument value must be delimited by quotes, which implies that the value must be a string. S-Expressions and ConciseXML permit argument values to be any expression, not just strings. For example, the value of the copyright argument is the integer 2002, not the string "2002". For representing data, it is important that the value of an argument can be anything, not just a string. The integer value 2002 is very different than the string "2002" that contains four characters. The ability for an argument to hold any value becomes even more important when representing nested data. One of the advantages of S-Expressions and ConciseXML is that they use a single standard method for representing nested data. Because the argument values may be any expression, an argument, such as "next_edition", could have a value that is another call expression. This makes it easy to have a book as the value of a field within another book:

    Definitions
    One of the difficulties in comparing various syntaxes is that they use different terminology to refer to similar concepts. This article uses the following core definitions.

    • Expression: A syntactically valid chunk of text.
    • Call expression: An element or tag in XML, an expression in S-Expressions.
    • Expression name: The area beginning a call expression. In the previous example, the expression name is "book".
    • Argument: XML calls these attributes. Arguments have a value and may have a key. S-Expressions do not have argument keys, while XML requires keys for every argument.
  • ConciseXML

    <book isbn="0764525360" next_edition=<book isbn="0764525361"/> />

  • S-Expression

    (book "0764525360" (book isbn="0764525361") )

    Although this data can also be represented in XML, there are multiple ways to encode the same data. Here are three valid XML encoding styles:

  • Style 1

    <book isbn="0764525360">
    <next_edition><book isbn="0764525361"/></next_edition>
    </book>

  • Style 2

    <book isbn="0764525360">
    <field key="next_edition"><book isbn="0764525361"/></field>
    </book>

  • Style 3

    <book isbn="0764525360">
    <attr><key>next_edition</key>
    <value><book isbn="0764525361"/></value>
    </attr>
    </book>

    ConciseXML Makes Eight Extensions to XML 1.0

    1.  Attribute values can be any expression:

    <input size=3/> or
    <person birth=<date year=2002 month=10 day=2/>/>

    XML requires all attributes values to be quoted, effectively requiring all values to be type string. Elements are often used to work around this limitation, presenting another set of problems.

    2.  Attribute keys can be any object:

    <thing 0="foo" <date 2004 10 10/>="mplusch"/>

    XML does not let an attribute key start with a digit or contain angle-brackets. That effectively limits attribute keys to strings. ConciseXML makes it possible to easily represent array-like fields with integer keys as well as any object by using a ConciseXML call syntax.

    3.  Tagname of an element can be any expression:

    <foo.bar/>

    XML namespaces are a step in this direction, but ConciseXML makes it possible to use any expression as the tagname of an element. The tagname may be a path or a call/tag.

    4.  Attribute keys are optional:

    <date 2004 month=10 day=28/>

    In the CSV (comma separated value) syntax and in all major programming languages, field or argument values are given by position, not by keyword.

    5.  Closing tagname is optional:
    Not only does this remove unnecessary clutter, but when ConciseXML is used as the syntax for dynamic languages, the tagname may not be known until runtime, therefore the closing tagname must be optional.

    6.  Top-level can be any expression, not just an element:
    For example, true. It is surprisingly difficult in XML 1.0 to create a document whose value is a simple type such as a string, number, or boolean value.

    7.  Multiple top-level expressions:
    The CSV file format and most programming languages allow multiple top level expressions. XML 1.0 allows only a single root element in a file. ConciseXML permits any number of expressions at the top level.

    8.  Attribute type:
    In addition to a key and a value, attributes can also have an optional type that is delimited by an equal sign:

    <thing some_key="some_value"=some_type/>

    Because XML argument values must be strings, the content area of an XML expression must be used to represent non-string argument values. XML does not have a standardized representation for nested data, which introduces significant ambiguity when interpreting the precise meaning of an XML expression. This ambiguity causes major problems when XML syntax is used to transport data between entities without an explicit agreement on a particular encoding style for XML syntax. XML is sometimes referred to as self-describing. However, because of the ambiguity in representing non-string data, XML documents require additional information about the encoding style used. XML, therefore, is not a self-describing syntax.

    On the other hand, S-Expressions and ConciseXML support argument values that can be any expression. Complex data structures can be easily represented and do not require the use of a content argument. The content argument offers an important feature, though, for representing content that contains both text and other expressions.

    Let's take a simple example from HTML:

    <H1> A heading with <B>bold</B> text </H1>

    The example shows the content of the H1 expression comprised of a sequence of hypertext. The three expressions are the string "A heading with", the expression <B>bold</B>, and the string "text". This use is extremely common in HTML and very useful for mixing text and data.

    S-Expressions do not have a convenient way to represent mixed text and data; therefore, S-Expressions do not have the ability to handle a very common feature of markup languages.

    S-Expressions can easily handle nested data structures, but do not support keyed arguments or the content argument for integrating text and data. XML supports the content argument and keyed arguments, but does not easily represent complex data structures or unkeyed arguments. ConciseXML represents an important milestone in the development of a common syntax because it supports the best features of both S-Expressions and XML.

    . . .

    This article has introduced a few of the features of ConciseXML that build upon the important qualities of XML and S-Expressions, namely optional keys, argument values that can be any expression, and conveniently representing mixed text and data. To learn more about ConciseXML and how it extends XML by eliminating eight constraints of XML, please visit www.ConciseXML.org. For a free trial of Steam XML software, a platform that makes use of the Concise XML syntax, visit www.clearmethods.com.

  • About Mike Plusch
    Mike Plusch has over 10 years of experience building platforms and complex Web applications for organizations including Digitas, Harlequin, and Bowstreet, a pioneer in Web services. He is the co-inventor of the Water language and ConciseXML. An accomplished author, Plusch has written two books and contributed to numerous books and articles on Web services, including Water: Simplified Web Services and XML Programming, and Water Programming. Plusch holds two degrees from MIT, one in management and one in computer science.

    XML JOURNAL LATEST STORIES . . .
    3rd International Virtualization Conference & Expo: Themes & Topics
    From Application Virtualization to Xen, a round-up of the virtualization themes & topics being discussed in NYC June 23-24, 2008 by the world-class speaker faculty at the 3rd International Virtualization Conference & Expo being held by SYS-CON Events in The Roosevelt Hotel, in midtown
    Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
    Red Hat is a trusted open source provider. Red Hat offers enterprise customers a long-term plan for building infrastructures on the quality and innovation of open source. Combining open source operating system platform, Red Hat Enterprise Linux, together with applications, management
    JustSystems Contributes Key XBRL Rendering Technology to Financial Community
    JustSystems announced that it is contributing intellectual property rights for its invention of eXtensible Business Reporting Language (XBRL) rendering technologies to XBRL International, the standards body responsible for the oversight of the XBRL specification. The invention, known a
    JustSystems Launches Campaign for XBRL Success
    JustSystems announced its campaign to help organizations adopt XBRL (eXtensible Business Reporting Language), the XML-based standard for communicating financial and business information. In related news, JustSystems also announced that it has contributed intellectual property rights of
    Virtualization Meets DaaS - Desktop-as-a-Service
    After a $1.5 million angel round, Desktone, which was started in 2006 by Eric Pulier, who also started SOA Software, US Interactive and IVT, picked up $17 million in first-round funding about a year ago from Highland Capital Partners, SoftBank Capital, Citrix Systems and the China-base
    SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
    SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

    SYS-CON FEATURED WHITEPAPERS


    ADS BY GOOGLE
    BREAKING XML NEWS
    RCG IT Addresses BI and SOA Convergence and Business Architecture at TDWI World Conference in Chicago
    RCG Information Technology, Inc. (http://www.rcgit.com/) will participate in The Data Wareho