Industrial IoT Authors: Sanjeev Khurana, Carmen Gonzalez, Yeshim Deniz, Klaus Enzenhofer, Rajesh Ramchandani

Related Topics: Industrial IoT

Industrial IoT: Article

Processing XML with C# and .NET

A solution that's simpler than you might expect

Microsoft's counterpiece to Java, the new C# programming language with its rich .NET library, uses XML as a core technology. This article presents some basic ideas, for example creating and manipulating a DOM tree, and reading and writing XML streams.

I also compare .NET's solution with the SAX model, and finally I show how a complex XSLT algorithm can be more simply implemented in C#. The source code can be downloaded from and www.sys-con.com/xml/sourcec.cfm.

XML Processors
To process an XML document means to extract information from it. Often the extracted information should be output to a new XML (or perhaps HTML) document that's similar to the original one - then we address the transformation.

Which processor to use for a given task is not a trivial question. There are a number of ready-to-use XML processors on the market - like Cocoon and Axkit - and any XML-enabled browser (like Netscape Navigator 6 or Microsoft Internet Explorer 6.0) has a built-in XML processor. They require processing instructions, which can be contained in an XSLT document for complex operations. XSLT is, however, a special style and not a very convenient programming language. Many would prefer to fiddle around with well-known, conventional programming structures from C++ or Java. For them, a real alternative is to write their own XML processor. Modern programming languages like C# and Java offer great support for this task in the form of class libraries; in the case of C#, these are .NET classes.

Processing Methods
There are two ways to process an XML document: online and offline. Offline processing means not being connected to the XML source, so the document has to be loaded (typically as a DOM tree) into memory beforehand. In the space-time trade, you lose space (memory) but gain time (speed) for the processing. This is the best method if most parts of the document are going to be processed, especially if they're going to be processed repeatedly.

Online processing (reading the document piece by piece) can be slow, but it works with less memory. It is advantageous if only certain parts of the document will be processed, or if the processing is sequentially straightforward.

C#'s built-in .NET library provides all the classes necessary for processing an XML document both ways. They are placed in the System.Xml namespace (with its nested namespaces System.Xml.Schema, System.Xml.Xsl, etc.).

The abstract classes XmlReader/XmlWriter provide the basis for online processing; XmlReader represents a fast, read-only forward cursor in an XML stream, while XmlWriter provides an interface for producing XML streams. The basis for offline processing is the class System.Xml.XmlDocument (with its superclass XmlNode) representing an XML document as a DOM tree (i.e., memory intensive). We'll investigate this option first.

Creating an XML Document
The XmlDocument constructor (usually called without parameters) creates an empty in-memory XML document. Its methods, like InsertBefore, InsertAfter, and AppendChild, build the DOM tree in memory. They need XmlNode parameters, which can be created by the XmlDocument's method CreateXxx, where Xxx stands for Element, Attribute, Node, Comment, ProcessingInstruction, CDataSection, DocumentFragment, XmlDeclaration, DocumentType, or EntityReference. They create a node in the context of the document but don't attach it to the tree:

XmlDocument doc = new XmlDocument();
// empty
XmlElement elem = doc.CreateElement
("NewAndOnlyElement"); // XmlElement
is a special XmlNode
elem.InnerText = "Data of the element";
... // many other properties to
configure the element
// appends node to document
// save the document to a file

The content of the text file data.xml (after putting this program segment into a Main method and running it) is:

<NewAndOnlyElement>Data of the element

In this way an XML tree of any complexity can be built step by step:

XmlNode node = doc.CreateComment
("This is a comment"); // XmlComment
is subclass of XmlNode
node = doc.CreateProcessingInstruction
("xml-stylesheet", // target
"type='text/xsl' href=data.xsl");
// data
node = doc.CreateCDataSection("<p>");

Since XmlDocument's AppendChild method is inherited from XmlNode, nodes can be appended not only to doc but also to any of the created nodes.

Navigating in the DOM Tree
After building (or loading) an XML tree, you can navigate over it and manipulate it with XmlDocument's properties (most of them are inherited from XmlNode). They have intelligible names:

  • DocumentElement delivers the root XmlElement of the tree.
  • Indexer [ ] delivers the XmlElement with the specified index.
  • Attributes delivers an XmlAttributeCollection object, a collection of XmlAttribute (a special Xml-Node) objects.
  • ParentNode, PreviousSibling, NextSibling, FirstChild, and LastChild navigate over the neighboring XmlNode objects for further manipulation or evaluation.
  • ChildNodes delivers all nodes as XmlNodeList, a collection of Xml-Node objects - it can be read, for example, with foreach.
  • NamespaceURI, Prefix, BaseURI, LocalName, and (qualified) Name deliver string objects about the document's URI.
  • OuterXml delivers a string containing the markup (XML text) of the node (with all its children).
  • InnerXml gets or sets a string containing the markup of all the children.
  • InnerText gets or sets a string, the concatenated text values (without XML markups) of the node and all its children.
  • PreserveWhitespace (with set), HasChildNodes, and IsReadOnly are bool properties.

Reading XML Data
There are additional ways to fill an XmlDocument object with data. The most important is its Load method with a string or stream, or TextReader or XmlReader parameter. It assumes that the XML document exists in text form in a stream, e.g., in a text file. The string-Parameter can be any URI. The difference between the overloaded versions is the navigation over the document before loading (because there is no need to load the whole document, only parts of it). Stream or Text-Reader allows sequential navigation; XmlReader can navigate to any component of the XML tree before reading it (see online processing in the next section).

If we have the XML data as a string object, we can to use the LoadXML method:

const string xmlData =
"<family location='Orlando' class='middle'>" +
"<lastname>Solymosi</lastname>" +
"<father>Andrew</father>" +
doc.Load(new StringReader(xmlData)); // equivalent alternative

Most void methods of XmlDocument (many inherited from XmlNode) suite to read and write the content of the DOM tree to and from a stream.

  • Load (with parameter Stream or string or TextReader or XmlReader) and LoadXml(string) fill an XmlDocument object with data.
  • Save (with parameter Stream or string or TextWriter or XmlWriter) writes the content of an XmlDocument object to a stream.
  • WriteTo writes OuterXML (markup of the root node); Write-Content-To writes InnerXML (the markup of all children) to an XmlWriter stream.
  • Normalize puts the tree into a "normal" form in which only markup separates XmlText nodes (i.e., there are no adjacent XmlText nodes).
  • RemoveChild and RemoveAll delete child element(s).

Online Processing
The abstract classes XmlReader and XmlWriter define an infrastructure for online processing, i.e., without building the DOM tree in memory. XmlReader is, however, not an implementation of the SAX model, rather a compromise between DOM (with the simple programming interface) and SAX (non-cached forward-only reader allowing the user to skip data). While a SAX reader activates the application's callback methods (push model), the methods of an XmlReader class must be called by the application (pull model). Some of the advantages of the latter are:

  • SAX requires the user to build complex state machines. A client for XmlReader builds a top-down procedural refinement.
  • XmlReader allows the client to read multiple streams (e.g., for aggregation); with SAX it's hard work.
  • An XmlReader application can be built on top of the SAX model (layering).
  • SAX copies the data from the parser buffer into the client's string object; XmlReader can use the client's string parameter as parser buffer so a string copy can be avoided.
  • SAX calls the client for each item (including attributes, processing instructions, and white space), while XmlReader's client can skip items (selective processing).

XmlReader has three implementations in .NET 1.1: XmlTextReader (this fast reader checks only well formedness and throws XmlException on failure), XmlValidatingReader (additionally validates against a DTD or a schema), and XmlNodeReader (for reading a DOM subtree). XmlWriter has only one implementation in .NET 1.1: XmlTextWriter. The counterpiece to XmlNodeReader (the class XmlNodeWriter) has not yet been implemented. However, people are welcome to write their own custom XmlReader and XmlWriter, extending the standard's functionality.

The following method shows how XmlReader moves along the document stream and displays the name of each element:

static void ReadMyDocument
(XmlReader reader) {
while (reader.Read()) {
if (reader.NodeType ==
XmlNodeType.Element ||
reader.NodeType ==
Console.Write (
(reader.NodeType ==
XmlNodeType.EndElement ?
"/" : "") + reader.Name + " ");

This method can be called with an XmlReader implementation as argument, e.g., with XmlText-Reader:

public static void Main
(String[] args) {
XmlReader r = new
XmlTextReader(args[0]); // URL

If the command-line parameter contained the name of a text file with the content

<family location='Orlando'

the output would be

family lastname /lastname father /father /family

This shows how XmlReader sequentially goes through the input document. A stop can be made at a node, which can be inspected with XmlReader's methods and properties:

string qName = reader.Name;
string localName = reader.LocalName;
string namespaceURI =
XmlNodeType nodeType =
string value = reader.Value;
bool hasAttributes =
int numberOfAttributes =

XmlReader properties (all read-only, i.e., without set) deliver information about the current node. The most important are:

  • Eof, HasAttributes, HasValue, IsDefault, IsEmptyElement deliver bool.
  • AttributeCount and Depth deliver int.
  • BaseURI, indexer [ ], LocalName, Name, NamespaceURI, Prefix, Value and XmlLang deliver string.
  • NodeType delivers an XmlNodeType enumeration value: one of Document, Text, Xml-Declaration, Element, EndElement, Entity, EndEntity, EntityReference, Attribute, Comment, Processing-Instruction, CDATA, DocumentFragment, DocumentType, Notation, White-space, SignificantWhitespace, or None. It can be used in a switch.

XmlReader's void methods navigate forward alongside the document:

  • Read delivers the bool value true if the next node is read successfully and throws XmlException if it detects an error in well formedness.
  • ReadXxx reads the next required node (otherwise throws XmlException) with Xxx = AttributeValue (with parameter Text, EntityReference or EndEntity) ElementString (simple text-only element), StartElement, EndElement, InnerXml, OuterXml, or String (contents of an element or text node).
  • MoveToXxx (with Xxx = Content, Element, Attribute, FirstAttribute or Next Attribute) moves to the required node and skips everything else.

The ReadXxx and MoveXxx methods perform a depth-first traversal of the tree, i.e., in the order that the sequential document stores it. The classic method for processing a document is to read it in a while cycle and switch according to XmlNodeType (see Listing 1).

As you can see from Listing 1, XmlReader requires knowledge about the current node type before calling the appropriate method, and SAX calls the appropriate method, and according to the current node type.

XmlNode provides methods for navigating the DOM tree horizontally and vertically to the neighboring nodes (like ChildNodes, FirstChild, LastChild, ParentNode, NextSibling, and PreviousSibling). Jumping to an arbitrary node requires an XPath expression. XmlNode's methods SelectXxx can be used to find a single node or all nodes that match a given criteria. The method SelectSingleNode returns the first node matching the search, from the top of the tree down, in document order. The method Select- Nodes returns an XmlNodeList object, a container for XmlNode objects that can be read by a foreach statement:

XmlNode root = doc.DocumentElement;
XmlNodeList nodeList =
root.SelectNodes("..."); // XPath
expression foreach (XmlNode node in
nodeList) {
... // work with the single nodes

The XmlNodeList object contains references to the underlying document's nodes. By modifying the value of a node, that node is also updated in the document and vice versa.

For more sophisticated navigation (e.g., over any XML-enabled data store like a database, or also for XSLT), the .NET namespace System.Xml.XPath contains a number of classes and interfaces. Its core is the abstract class XPathNavigator defining the common functionality of all navigators. The classes XmlNode (because it implements the interface IXPathNavigable) and XmlDocument export the method CreateNavigator for creating an XPathNavigator object:

XmlDocument doc = new XmlDocument();
XmlElement root = doc.DocumentElement;
// or any other node
XPathNavigator navigator =
Console.WriteLine("" +

It's interesting that XPathNavigator is an abstract class with no public subclass in .NET. CreateNavigator still delivers an object; its type is unknown to the user. The last line reveals it: System.Xml.DocumentXPathNavigator, a class not published in .NET. It might be an inner type of the class XmlNode, or perhaps its publication has been forgotten.

This object supports general navigation methods: selecting nodes, iterating over the selection, copying, moving, removing, and so on. The methods of XPathNavigator accept an XPath expression in the form of a string or a precompiled expression object; they are evaluated to identify the matching set of nodes.

The class XPathDocument is an optimized version of XmlDocument for XSLT processing and XPath queries; it provides a read-only, high-performance cache.

Writing XML
XmlWriter is an abstract class (like XmlReader) defining the base functionality for producing an XML document. The concept of .NET's XmlWriter is very similar to SAX. XmlTextWriter is currently its only implementation of the .NET class library; XmlNodeWriter (the counterpiece to XmlNodeReader) is hopefully coming in a future release. They work just like the reader versions but in the opposite directions.

XmlTextWriter provides the WriteXxx methods for writing out typed elements and attributes, where Xxx = StartDocument, EndDocument, DocType, StartElement, EndElement, FullEndElement, ElementString, StartAttribute, EndAttribute, Attributes, AttributeString, Comment, ProcessingInstruction, String, Base64, BinHex, CData, CharEntity, Chars, EntityRef, Name, NmToken, Node, QualifiedName, Raw, SurrogateCharEntity, or Whitespace. It's possible, for example, to write out the whole current node in an XmlReader (with or without attributes) by calling WriteNode:

static void Serialize(XmlReader
reader, XmlWriter writer) {
while (reader.Read())
writer.WriteNode(reader, true); // true = with default attributes

XmlTextWriter supports different output stream types (its constructor takes a string [file or URI], a Stream, or a TextWriter) and is configurable. Its properties specify whether to provide namespace support (bool property Namespace), indentation options (int property Intendation, char property IndentChar), quote character for attribute values (char property QuoteCahr), lexical representation for typed values, and so on. Listing 2 illustrates how some of these properties can be configured:

XSL Transformations
The namespace System.Xml.Xsl contains the class XslTransform, which manages XSLT transformations. It uses XPathNavigator during the transformation process. XslTransform reads as input an XML document, an XSLT document, and some optional parameters (of type XsltArgumentList). It can produce any text-based output.

To perform a transformation, an XslTransform object must be loaded with the XSLT document (as a DOM tree): the Load method requires a string (URL) parameter. Next an XPathDocument object (instead of XmlDocument - it offers a better performance for XSLT processing) must be created and initialized. Finally, the Transform method executes the transformation:

static void TransformXML(string xml, // name of file to transform
string xslt, // file name of stylesheet or XSLT document
XmlTextWriter writer) {
XslTransform transformator = new XslTransform();
transformator.Load(xslt); // stylesheet loaded
XPathDocument document = new XPathDocument(xml); // DOM tree created
transformator.Transform(document, null, writer, null);

The last call of Transform could take some additional XSLT parameters (XsltArgumentList and XmlResolver) - here they are null. Replacing XSLT Functions by C# Most transformations can be elegantly solved in XSLT. Nevertheless there are some tasks that need special tricks.

Let's assume an XML document with a root element <matrix> contains a number of (let's say n) <row> elements and all of them contains the same number n of <column> elements with the text 1_1, 1_2, ..., 1_n, 2_1, 2_2, ..., 2_n, ..., n_1, n_2, ... n_n:

1_1   1_2   ...   1_n
2_1   2_2   ...   2_n
n_1   n_2   ...   n_n

The task is to exchange the row and column elements (to mirror the table along its diameter):

1_1   2_1   ...   n_1
1_2   2_2   ...   n_2
1_n   2_n   ...   n_n

This transformation is called transposing a matrix. Because XSLT is a functional language with no variables, this task can be solved only recursively (see Listing 3).

The template recursive is called here inside the template recursive; this recursion implements a for-cycle of a procedural programming language (like C#) that doesn't exist in XSLT (for-each can iterate only over the children of a node, it's not the for-cycle of C# or Java!).

The same algorithm can be expressed in C# as shown in Listing 4 (Listings 4-6 are available at www.sys-con.com/xml/sourcec.cfm).

It's remarkable that this C# program doesn't use variables; the different values (between 1 and n) of the (constant) parameter COLUMN have been stored on the stack by the repeated calls of the method Recursive. So recursion has been "misused" for a variable - in XSLT (as in any other functional language) this is the only way to solve the problem. If we use C# variables, recursion is evitable (see Listing 5).

Here the counter variable column takes the values 1 through n instead of the parameter COLUMN in the XSLT program.

The opulence of the .NET library and the object-oriented nature of C# allows another, even simpler solution of hanging over the InnerText objects in the references of the original XmlDocument (i.e., offline) (see Listing 6).

.  .  .

As you can see, sometimes a self-programmed XSLT processor can solve a problem in a much simpler manner than a prefabricated one.

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Most Recent Comments
tom berry 09/18/07 09:54:59 AM EDT

get rid of the annoying pop-up ads if you want more people to visit this site.

@ThingsExpo Stories
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, will provide a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services ...
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a ...
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor - all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...
My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sum...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on ...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buyers...
The taxi industry never saw Uber coming. Startups are a threat to incumbents like never before, and a major enabler for startups is that they are instantly “cloud ready.” If innovation moves at the pace of IT, then your company is in trouble. Why? Because your data center will not keep up with frenetic pace AWS, Microsoft and Google are rolling out new capabilities In his session at 20th Cloud Expo, Don Browning, VP of Cloud Architecture at Turner, will posit that disruption is inevitable for c...
SYS-CON Events announced today that Infranics will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Since 2000, Infranics has developed SysMaster Suite, which is required for the stable and efficient management of ICT infrastructure. The ICT management solution developed and provided by Infranics continues to add intelligence to the ICT infrastructure through the IMC (Infra Management Cycle) based on mathemat...
SYS-CON Events announced today that SD Times | BZ Media has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. BZ Media LLC is a high-tech media company that produces technical conferences and expositions, and publishes a magazine, newsletters and websites in the software development, SharePoint, mobile development and commercial UAV markets.
Now that the world has connected “things,” we need to build these devices as truly intelligent in order to create instantaneous and precise results. This means you have to do as much of the processing at the point of entry as you can: at the edge. The killer use cases for IoT are becoming manifest through AI engines on edge devices. An autonomous car has this dual edge/cloud analytics model, producing precise, real-time results. In his session at @ThingsExpo, John Crupi, Vice President and Eng...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
In his General Session at 16th Cloud Expo, David Shacochis, host of The Hybrid IT Files podcast and Vice President at CenturyLink, investigated three key trends of the “gigabit economy" though the story of a Fortune 500 communications company in transformation. Narrating how multi-modal hybrid IT, service automation, and agile delivery all intersect, he will cover the role of storytelling and empathy in achieving strategic alignment between the enterprise and its information technology.
The buzz continues for cloud, data analytics and the Internet of Things (IoT) and their collective impact across all industries. But a new conversation is emerging - how do companies use industry disruption and technology enablers to lead in markets undergoing change, uncertainty and ambiguity? Organizations of all sizes need to evolve and transform, often under massive pressure, as industry lines blur and merge and traditional business models are assaulted and turned upside down. In this new da...
Web Real-Time Communication APIs have quickly revolutionized what browsers are capable of. In addition to video and audio streams, we can now bi-directionally send arbitrary data over WebRTC's PeerConnection Data Channels. With the advent of Progressive Web Apps and new hardware APIs such as WebBluetooh and WebUSB, we can finally enable users to stitch together the Internet of Things directly from their browsers while communicating privately and securely in a decentralized way.
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex softw...