|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS .NET Processing XML with C# and .NET - A solution that's simpler than you might expect
Processing XML with C# and .NET - A solution that's simpler than you might expect
By: Andrew Solymosi
Jan. 7, 2004 12:00 AM
Microsoft's counterpiece to Java, the new C# programming language with its rich .NET library, uses XML as a core technology. This article presents some basic ideas, for example creating and manipulating a DOM tree, and reading and writing XML streams. I also compare .NET's solution with the SAX model, and finally I show how a complex XSLT algorithm can be more simply implemented in C#. The source code can be downloaded from and www.sys-con.com/xml/sourcec.cfm.
XML Processors Which processor to use for a given task is not a trivial question. There are a number of ready-to-use XML processors on the market - like Cocoon and Axkit - and any XML-enabled browser (like Netscape Navigator 6 or Microsoft Internet Explorer 6.0) has a built-in XML processor. They require processing instructions, which can be contained in an XSLT document for complex operations. XSLT is, however, a special style and not a very convenient programming language. Many would prefer to fiddle around with well-known, conventional programming structures from C++ or Java. For them, a real alternative is to write their own XML processor. Modern programming languages like C# and Java offer great support for this task in the form of class libraries; in the case of C#, these are .NET classes.
Processing Methods Online processing (reading the document piece by piece) can be slow, but it works with less memory. It is advantageous if only certain parts of the document will be processed, or if the processing is sequentially straightforward. C#'s built-in .NET library provides all the classes necessary for processing an XML document both ways. They are placed in the System.Xml namespace (with its nested namespaces System.Xml.Schema, System.Xml.Xsl, etc.). The abstract classes XmlReader/XmlWriter provide the basis for online processing; XmlReader represents a fast, read-only forward cursor in an XML stream, while XmlWriter provides an interface for producing XML streams. The basis for offline processing is the class System.Xml.XmlDocument (with its superclass XmlNode) representing an XML document as a DOM tree (i.e., memory intensive). We'll investigate this option first.
Creating an XML Document
XmlDocument doc = new XmlDocument(); The content of the text file data.xml (after putting this program segment into a Main method and running it) is:
<NewAndOnlyElement>Data of the element In this way an XML tree of any complexity can be built step by step:
XmlNode node = doc.CreateComment Since XmlDocument's AppendChild method is inherited from XmlNode, nodes can be appended not only to doc but also to any of the created nodes.
Navigating in the DOM Tree
There are additional ways to fill an XmlDocument object with data. The most important is its Load method with a string or stream, or TextReader or XmlReader parameter. It assumes that the XML document exists in text form in a stream, e.g., in a text file. The string-Parameter can be any URI. The difference between the overloaded versions is the navigation over the document before loading (because there is no need to load the whole document, only parts of it). Stream or Text-Reader allows sequential navigation; XmlReader can navigate to any component of the XML tree before reading it (see online processing in the next section). If we have the XML data as a string object, we can to use the LoadXML method:
const string xmlData = Most void methods of XmlDocument (many inherited from XmlNode) suite to read and write the content of the DOM tree to and from a stream.
The abstract classes XmlReader and XmlWriter define an infrastructure for online processing, i.e., without building the DOM tree in memory. XmlReader is, however, not an implementation of the SAX model, rather a compromise between DOM (with the simple programming interface) and SAX (non-cached forward-only reader allowing the user to skip data). While a SAX reader activates the application's callback methods (push model), the methods of an XmlReader class must be called by the application (pull model). Some of the advantages of the latter are:
The following method shows how XmlReader moves along the document stream and displays the name of each element:
static void ReadMyDocument This method can be called with an XmlReader implementation as argument, e.g., with XmlText-Reader:
public static void Main If the command-line parameter contained the name of a text file with the content
<family location='Orlando' the output would be family lastname /lastname father /father /family This shows how XmlReader sequentially goes through the input document. A stop can be made at a node, which can be inspected with XmlReader's methods and properties:
string qName = reader.Name; XmlReader properties (all read-only, i.e., without set) deliver information about the current node. The most important are:
As you can see from Listing 1, XmlReader requires knowledge about the current node type before calling the appropriate method, and SAX calls the appropriate method, and according to the current node type.
Navigation
XmlNode root = doc.DocumentElement; The XmlNodeList object contains references to the underlying document's nodes. By modifying the value of a node, that node is also updated in the document and vice versa. For more sophisticated navigation (e.g., over any XML-enabled data store like a database, or also for XSLT), the .NET namespace System.Xml.XPath contains a number of classes and interfaces. Its core is the abstract class XPathNavigator defining the common functionality of all navigators. The classes XmlNode (because it implements the interface IXPathNavigable) and XmlDocument export the method CreateNavigator for creating an XPathNavigator object:
XmlDocument doc = new XmlDocument(); It's interesting that XPathNavigator is an abstract class with no public subclass in .NET. CreateNavigator still delivers an object; its type is unknown to the user. The last line reveals it: System.Xml.DocumentXPathNavigator, a class not published in .NET. It might be an inner type of the class XmlNode, or perhaps its publication has been forgotten. This object supports general navigation methods: selecting nodes, iterating over the selection, copying, moving, removing, and so on. The methods of XPathNavigator accept an XPath expression in the form of a string or a precompiled expression object; they are evaluated to identify the matching set of nodes. The class XPathDocument is an optimized version of XmlDocument for XSLT processing and XPath queries; it provides a read-only, high-performance cache.
Writing XML XmlTextWriter provides the WriteXxx methods for writing out typed elements and attributes, where Xxx = StartDocument, EndDocument, DocType, StartElement, EndElement, FullEndElement, ElementString, StartAttribute, EndAttribute, Attributes, AttributeString, Comment, ProcessingInstruction, String, Base64, BinHex, CData, CharEntity, Chars, EntityRef, Name, NmToken, Node, QualifiedName, Raw, SurrogateCharEntity, or Whitespace. It's possible, for example, to write out the whole current node in an XmlReader (with or without attributes) by calling WriteNode:
static void Serialize(XmlReader XmlTextWriter supports different output stream types (its constructor takes a string [file or URI], a Stream, or a TextWriter) and is configurable. Its properties specify whether to provide namespace support (bool property Namespace), indentation options (int property Intendation, char property IndentChar), quote character for attribute values (char property QuoteCahr), lexical representation for typed values, and so on. Listing 2 illustrates how some of these properties can be configured:
XSL Transformations To perform a transformation, an XslTransform object must be loaded with the XSLT document (as a DOM tree): the Load method requires a string (URL) parameter. Next an XPathDocument object (instead of XmlDocument - it offers a better performance for XSLT processing) must be created and initialized. Finally, the Transform method executes the transformation:
static void TransformXML(string xml, // name of file to transform The last call of Transform could take some additional XSLT parameters (XsltArgumentList and XmlResolver) - here they are null. Replacing XSLT Functions by C# Most transformations can be elegantly solved in XSLT. Nevertheless there are some tasks that need special tricks. Let's assume an XML document with a root element <matrix> contains a number of (let's say n) <row> elements and all of them contains the same number n of <column> elements with the text 1_1, 1_2, ..., 1_n, 2_1, 2_2, ..., 2_n, ..., n_1, n_2, ... n_n:
1_1 1_2 ... 1_n The task is to exchange the row and column elements (to mirror the table along its diameter):
1_1 2_1 ... n_1 This transformation is called transposing a matrix. Because XSLT is a functional language with no variables, this task can be solved only recursively (see Listing 3). The template recursive is called here inside the template recursive; this recursion implements a for-cycle of a procedural programming language (like C#) that doesn't exist in XSLT (for-each can iterate only over the children of a node, it's not the for-cycle of C# or Java!). The same algorithm can be expressed in C# as shown in Listing 4 (Listings 4-6 are available at www.sys-con.com/xml/sourcec.cfm). It's remarkable that this C# program doesn't use variables; the different values (between 1 and n) of the (constant) parameter COLUMN have been stored on the stack by the repeated calls of the method Recursive. So recursion has been "misused" for a variable - in XSLT (as in any other functional language) this is the only way to solve the problem. If we use C# variables, recursion is evitable (see Listing 5). Here the counter variable column takes the values 1 through n instead of the parameter COLUMN in the XSLT program. The opulence of the .NET library and the object-oriented nature of C# allows another, even simpler solution of hanging over the InnerText objects in the references of the original XmlDocument (i.e., offline) (see Listing 6).
As you can see, sometimes a self-programmed XSLT processor can solve a problem in a much simpler manner than a prefabricated one. YOUR FEEDBACK
XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||