|
|
YOUR FEEDBACK
SOA World Conference
Virtualization Conference $200 Savings Expire May 16, 2008... – Register Today! Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS Content Management
Matters of Syntax
ConciseXML builds upon the important qualities of XML and S-Expressions
By: Mike Plusch
Digg This!
ConciseXML is a new syntax I co-developed with Christopher Fry that builds on the best features of XML and S-Expressions while eliminating their constraints. XML originated with the document-markup world of SGML and has become the leading syntax of the Internet. Its use has been for documents and data - not for programming logic. S-Expressions, or symbolic expressions, is the syntax behind Lisp-like languages, including Scheme. Basically, S-Expressions are nested lists of symbols. S-Expressions are used with languages that support the notion that code is data. The many discussion postings on the Web regarding these topics would indicate a disconnect between the S-Expression and XML camps. The purpose of this article is to bridge the two worlds and offer a solution that addresses everyone's needs. Syntax vs Language A language uses symbols, which are assigned specific meanings. For example, HTML is a language because the tags have associated meanings. XHTML is a language that uses XML syntax. To illustrate the differences among the three syntaxes, let's start with a data structure expressed in XML, S-Expressions, ConciseXML, and also a commonly used semicolon-delimited syntax used by many programming languages, including Java. While there are multiple ways of encoding the data in XML and S-Expressions, only one is shown for the purposes of the first comparison.
<book isbn="0764525360" title="Water Language" copyright="2002"/> (book "0764525360" "Water Language" 2002) <book "0764525360" title="Water Language" copyright=2002/> new Book("0764525360", "Water Language", 2002); There are a number of similarities among the first three syntaxes:
Now let's look at some of the differences among the four syntaxes. Representing Arguments Basic S-Expressions syntax does not support argument keys, and there is no one standard for adding keys, although there are several different ways people have attempted to add keys to S-Expressions. With ConciseXML, arguments have an optional key. Therefore the arguments can be either keyed arguments or unkeyed arguments. The previous example shows an unkeyed isbn argument, while the title and copyright arguments are keyed. Unkeyed or By-Position Arguments createBook("0764525360", "Water Language", 2002); (createBook "0764525360" "Water Language" 2002) Definition of Content Argument XML requires a key for every argument. The workaround for this restriction is to use the content area of an XML expression for argument values. Here is a typical example of that encoding technique in XML: <books> Unlike most programming languages, XML does not have one standard method for representing an array or a generic ordered collection. Consider an ordered data structure consisting of an integer number, a decimal number, and a string. An array representing that data structure might appear as the following in several languages: [5, 10.3, "stuff"] There is no standard form to express this in XML because XML does not have a standard "array" or "vector" element and by-position arguments are not supported. In ConciseXML syntax, this example would be represented with unkeyed arguments and "vector" for the expression name in a similar style as S-Expressions. <vector 5 10.3 "stuff"/> (vector 5 10.3 "stuff") By allowing both keyed and unkeyed arguments, ConciseXML supports the by-position passing of arguments used in the syntax of almost every programming language, as well as the keyed arguments required by XML and HTML. Ordered Collections <vector 0=5 1=10.3 2="stuff"/> Argument Value <book isbn="0764525360" title="Water Language" copyright="2002"/> (book "0764525360" "Water Language" 2002) <book "0764525360" title="Water Language" copyright=2002/> new Book("0764525360", "Water Language", 2002); In all four syntaxes, the value of the isbn argument is the string "0764525360". The value of the "copyright" argument, though, is the string "2002" in XML and the integer 2002 in the other three syntaxes. In XML, every argument value must be delimited by quotes, which implies that the value must be a string. S-Expressions and ConciseXML permit argument values to be any expression, not just strings. For example, the value of the copyright argument is the integer 2002, not the string "2002". For representing data, it is important that the value of an argument can be anything, not just a string. The integer value 2002 is very different than the string "2002" that contains four characters. The ability for an argument to hold any value becomes even more important when representing nested data. One of the advantages of S-Expressions and ConciseXML is that they use a single standard method for representing nested data. Because the argument values may be any expression, an argument, such as "next_edition", could have a value that is another call expression. This makes it easy to have a book as the value of a field within another book: Definitions
<book isbn="0764525360" next_edition=<book isbn="0764525361"/> /> (book "0764525360" (book isbn="0764525361") ) Although this data can also be represented in XML, there are multiple ways to encode the same data. Here are three valid XML encoding styles: <book isbn="0764525360"> <book isbn="0764525360"> <book isbn="0764525360"> ConciseXML Makes Eight Extensions to XML 1.0 1. Attribute values can be any expression: <input size=3/> or XML requires all attributes values to be quoted, effectively requiring all values to be type string. Elements are often used to work around this limitation, presenting another set of problems. 2. Attribute keys can be any object: <thing 0="foo" <date 2004 10 10/>="mplusch"/> XML does not let an attribute key start with a digit or contain angle-brackets. That effectively limits attribute keys to strings. ConciseXML makes it possible to easily represent array-like fields with integer keys as well as any object by using a ConciseXML call syntax. 3. Tagname of an element can be any expression: <foo.bar/> XML namespaces are a step in this direction, but ConciseXML makes it possible to use any expression as the tagname of an element. The tagname may be a path or a call/tag. 4. Attribute keys are optional: <date 2004 month=10 day=28/> In the CSV (comma separated value) syntax and in all major programming languages, field or argument values are given by position, not by keyword. 5. Closing tagname is optional: 6. Top-level can be any expression, not just an element: 7. Multiple top-level expressions: 8. Attribute type: <thing some_key="some_value"=some_type/> Because XML argument values must be strings, the content area of an XML expression must be used to represent non-string argument values. XML does not have a standardized representation for nested data, which introduces significant ambiguity when interpreting the precise meaning of an XML expression. This ambiguity causes major problems when XML syntax is used to transport data between entities without an explicit agreement on a particular encoding style for XML syntax. XML is sometimes referred to as self-describing. However, because of the ambiguity in representing non-string data, XML documents require additional information about the encoding style used. XML, therefore, is not a self-describing syntax. On the other hand, S-Expressions and ConciseXML support argument values that can be any expression. Complex data structures can be easily represented and do not require the use of a content argument. The content argument offers an important feature, though, for representing content that contains both text and other expressions. Let's take a simple example from HTML: <H1> A heading with <B>bold</B> text </H1> The example shows the content of the H1 expression comprised of a sequence of hypertext. The three expressions are the string "A heading with", the expression <B>bold</B>, and the string "text". This use is extremely common in HTML and very useful for mixing text and data. S-Expressions do not have a convenient way to represent mixed text and data; therefore, S-Expressions do not have the ability to handle a very common feature of markup languages. S-Expressions can easily handle nested data structures, but do not support keyed arguments or the content argument for integrating text and data. XML supports the content argument and keyed arguments, but does not easily represent complex data structures or unkeyed arguments. ConciseXML represents an important milestone in the development of a common syntax because it supports the best features of both S-Expressions and XML.
This article has introduced a few of the features of ConciseXML that build upon the important qualities of XML and S-Expressions, namely optional keys, argument values that can be any expression, and conveniently representing mixed text and data. To learn more about ConciseXML and how it extends XML by eliminating eight constraints of XML, please visit www.ConciseXML.org. For a free trial of Steam XML software, a platform that makes use of the Concise XML syntax, visit www.clearmethods.com. XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||