Industrial IoT Authors: APM Blog, Elizabeth White, Pat Romanski, William Schmarzo, Liz McMillan

Related Topics: Industrial IoT

Industrial IoT: Article

Managing Your XML Documents with Schemas

Managing Your XML Documents with Schemas

The XML Schema Definition Language solves a number of problems posed with Document Type Definitions. Because DTDs prompted much confusion and complaining among XML developers, the W3C set about creating a new standard for defining a document's structure.

What the W3C created is something even more complex and flexible than DTDs: the XML Schema Definition Language. In this article we'll look at many aspects of schemas and how you can build and use them.

A Little Background
Schemas, while more complex than DTDs, give an individual much more power and control over how XML documents are validated. For instance, with the new W3C standard a document definition can specify the data type of an element's contents, the range of values for elements, the minimum as well as maximum number of times an element may occur, annotations to schemas, and much more.

In May of 2001 the W3C finalized its recommendation for the XML Schema Definition Language. This standard allows an author to define simple and complex elements as well as the rules governing how those elements and their attributes may show up within an instance document. The author has a large amount of control over how the structure of a conforming XML document must be created. The author can apply various restrictions to the elements and attributes within the document, from specifying the length to specifying an enumerated set of acceptable values for the element or attribute. With the XML Schema Definition Language, an XML schema author possesses an incredible amount of control over the conformance of an associated XML document to the specified schema.

Sample XML Document
The remainder of this article is devoted to creating and understanding the XML schema for the XML document shown in Listing 1, which details a purchase order for various items that can commonly be found in a grocery store. This document allows one individual to receive the shipment of the goods and an entirely different individual to pay for the purchase. This document also contains specific information about the products ordered, such as how much each product cost, how many were ordered, and so on.

As you can see, the listing represents a fairly small and simple order that could be placed online. It contains the necessary information regarding how payment is to be made, how the order is to be shipped, and what day delivery should be. The listing should by no means be construed as an all-inclusive document for an online grocery store order; it has been constructed only for use as an example.

For the listing, an author might construct a DTD to describe the XML document. While such a DTD might require only 30 lines or so, it would provide a relatively inflexible definition of the XML document.

A Sample Schema
Creating an XML schema to describe this document is somewhat more complex than building a DTD. However, in exchange for the extra complexity, the schema gives the author virtually limitless control over how an XML document can be validated against it.

Authoring an XML schema consists of declaring elements and attributes as well as the "properties" of those elements and attributes. We will begin our look at authoring XML schemas by working our way from the least complex to the most complex example. Because attributes may not contain other attributes or elements, we will start there.

Declaring attributes
Attributes in an XML document are contained by elements. To indicate that a complex element has an attribute, use the <attribute> element of the XML Schema Definition Language. For instance, Listing 2 is from a hypothetical PurchaseOrder schema based on the XML document shown in Listing 1. You can see the basics for declaring an attribute.

From this you can see that, when declaring an attribute, you must specify a type. This type must be one of the simple types: anyURI, base64Binary, boolean, byte, date, dateTime, decimal, double, duration, ENTITIES, ENTITY, float, gDay, gMonth, gMonthDay, gYear, gYearMonth, hexBinary, ID, IDREF, IDREFS, int, integer, language, long, Name, NCName, negativeInteger, NMTOKEN, NMTOKENS, nonNegativeInteger, nonPositiveInteger, normalizedString, NOTATION, positiveInteger, QName, short, string, time, token, unsignedByte, unsignedInt, unsignedLong, unsignedShort. Each type can be further categorized as a "primitive" data type or a "derived" data type. The derived data types are "primitive" or other "derived" data types with restrictions placed on them, such as integer, positiveInteger, and byte.

From the simple types you may notice what appears to be a group of duplicate or unnecessary types, such as nonNegativeInteger and positiveInteger. If you look closely, you'll see that nonNegativeInteger is an integer whose value is greater than or equal to zero, whereas the positiveInteger type is an integer whose value is greater than zero, which means a positiveInteger type cannot be zero. Keep this in mind when deciding on the base data type for your elements and attributes - these small details can greatly influence their acceptable value ranges.

Aside from defining the type of an attribute, the <attribute> element within the XML Schema Definition Language contains attributes to assist in defining when an attribute is optional, whether its value is fixed, what its default value is, and so on. Here's the basic syntax for the <attribute> element:

<attribute name="" type="" [use=""] [fixed=""] [default=""] [ref=""]/>

The use attribute can contain one of the following possible values:

  • Optional
  • Prohibited
  • Required
If the use attribute is set to required, the parent element must have the attribute; otherwise the document will be considered invalid. A value of optional indicates the attribute may or may not occur in the document and the attribute may contain any value. By assigning a value of prohibited to the use attribute, you can indicate that the attribute may not appear at all within the parent element.

Specifying a value for the default attribute indicates that if the attribute does not appear within the specified element of the XML document, it is assumed to have the value. A value within the fixed attribute indicates the attribute has a constant value.

It's important to remember that if you specify a value for the fixed attribute of the <attribute> element, the resulting attribute must have the value specified for the attribute to be valid. If you mean to indicate that the attribute should have a default value of some sort, use the default attribute instead. It should be noted that the default and fixed attributes are mutually exclusive.

The ref attribute for the <attribute> element indicates that the attribute declaration exists somewhere else within the schema. This allows complex attribute declarations to be defined once and referenced when necessary. For instance, let's say you've "inherited" elements and attributes from another schema and would simply like to reuse one of the attribute declarations within the current schema; this would provide the perfect opportunity to take advantage of the ref attribute.

Just as attributes can be defined based on the simple data types included in the XML Schema Definition Language, they can also be defined based on <simpleType> elements. This can easily be accomplished by declaring an attribute that contains a <simpleType> element, as the following example demonstrates:

<xsd:attribute name="exampleattribute">
<xsd:simpleType base="string">
<xsd:length value="2"/>

<xsd:complexType name="exampleelement">
<xsd:attribute ref="exampleattribute"/>
From this example you can see that the XML Schema Definition Language gives the schema author a great deal of control over how attributes are validated. One of the wonderful side effects of the XML Schema Definition Language is the similarity to object-oriented programming. Consider each attribute definition and element definition to be a class definition. These class definitions describe complex structures and behaviors among various different classes, so each individual class definition, whether it's a simple class or complex class, encapsulates everything necessary to perform its job. The same holds true for the declaration of attributes and elements within an XML document. Each item completely describes itself.

Declaring elements
Elements within an XML schema can be declared using the <element> element from the XML Schema Definition Language. The example in Listing 3 shows a simple element declaration using the XML Schema Definition Language.

From the example you can see that an element's type may be defined elsewhere within the schema. The location at which an element is defined determines certain characteristics about its availability within the schema. For instance, an element defined as a child of the <schema> element can be referenced anywhere within the schema document, whereas an element that is defined when it's declared can have that definition used only once.

An element's type can be defined with a <complexType> element, a <simpleType> element, a <complexContent> element, or a <simpleContent> element. The validation requirements for the document will influence the choice of an element's type. For instance, going back to our object-oriented analogy, let's say you define a high-level abstract class and then need to refine its definition for certain situations. In that case you would create a new class based on the existing one and change its definition as needed. The <complexContent> and <simpleContent> elements work much the same way: they provide a way to extend or restrict the existing simple or complex type definition as needed by the specific instance of the element declaration.

The basic construction of an element declaration using the <element> element within the XML Schema Definition Language is as follows:

<element name="" [type=""] [abstract=""] [block=""]
[default=""] [final=""] [fixed=""] [minOccurs=""]
[maxOccurs=""] [nillable=""] [ref=""] [substitutionGroup=""]/>
From this you can see that element declarations offer a myriad of possibilities to the author. For instance, the abstract attribute indicates whether the element being declared may show up directly within the XML document. If this attribute is true, the declared element may not show up directly. Instead, this element must be referenced by another element using the substitutionGroup attribute. This substitution works only if the element utilizing the substitutionGroup attribute occurs directly beneath the <schema> element.

In other words, for one element declaration to be substituted for another, the element using the substitutionGroup attribute must be a top-level element. Why would anyone in his right mind declare an element as abstract? The answer is really quite simple. Let's say you need to have multiple elements that have the same basic values specified for the attributes on the <element> element. A <complexType> element definition does not allow for those attributes. So, rather than define and set those attribute values for each element, you could make an "abstract" element declaration, set the values once, and substitute the abstract element definition as needed.

You may omit the type attribute from the <element> element, but you should have either the ref attribute or the substitutionGroup attribute specified.

The type attribute indicates that the element should be based on a complexType, simpleType, complexContent, or simpleContent element definition. By defining an element's structure using one of these other elements, the author can gain an incredible amount of control over the element's definition. We will cover these various element definitions in the "Declaring Complex Elements" and "Declaring Simple Types" sections later in this article.

The block attribute prevents any element with the specified derivation type from being used in place of the element. The block attribute may contain any of the following values:


If the value #all is specified within the block attribute, no elements derived from this element declaration may appear in place of this element. A value of extension prevents any element whose definition has been derived by extension from appearing in place on this element. If a value of restriction is assigned, an element derived by restriction from this element declaration is prevented from appearing in place of this element. Finally, a value of substitution indicates that an element derived through substitution cannot be used in place of this element.

The default attribute may be specified only for an element based on a simpleType or whose content is text only. This attribute assigns a default value to an element.

You cannot specify a value for both a default attribute and a fixed attribute; they are mutually exclusive. Also, if the element definition is based on a simpleType, the value must be a valid type of the data type.

The minOccurs and maxOccurs attributes specify the minimum and maximum number of times this element may appear within a valid XML document. Although you may explicitly set these attributes, they are not required. To indicate that an element's appearance within the parent element is optional, set the minOccurs attribute to 0. To indicate that the element may occur an unlimited number of times within the parent element, set the maxOccurs attribute to the string "unbounded". However, you may not specify the minOccurs attribute for an element whose parent element is the <schema> element.

The nillable attribute indicates whether an explicit null value can be assigned to the element. If this particular attribute is omitted, it is assumed to be false. If this attribute has a value of true, the nil attribute for the element will be true. So what exactly does this do for you, this nillable attribute? Well, let's say you are writing an application that uses a database that supports NULL values for fields and you are representing your data as XML. Now let's say you request the data from your database and convert it into some XML grammar. How do you tell the difference between those elements that are empty and those elements that are NULL? That's where the nillable attribute comes into play. By appending an attribute of nil to the element, you can tell whether it is empty or is actually NULL. Remember, the nillable attribute applies only to an element's contents and not the attributes of the element.

The fixed attribute specifies that the element has a constant, predetermined value. This attribute applies only to those elements whose type definitions are based on simpleType or whose content is text only.

Declaring complex elements
Many times within an XML document an element may contain child elements and/or attributes. To indicate this within the XML Schema Definition Language, you'll use the <complexType> element. If you examine the sample section from Listing 4, you'll see the basics used to define a complex element within an XML schema.

The sample section specifies the definition of PurchaseOrderType. This particular element contains three child elements - ShippingInformation, BillingInformation, and Order - as well as two attributes - Tax and Total. You should also notice the use of the maxOccurs and minOccurs attributes on the element declarations. With a value of 1 indicated for both attributes, the element declarations specify that they must occur one time within the PurchaseOrderType element.

The basic syntax for the <complexType> element is as follows:

<xsd:complexType name='' [abstract=''] [base='']
[block=''] [final=''] [mixed='']/>
The abstract attribute indicates whether an element may define its content directly from this type definition or from a type derived from this type definition. If this attribute is true, an element must define its content from a derived type definition. If this attribute is omitted or its value is false, an element may define its content directly based on this type definition.

The base attribute specifies the data type for the element. This attribute may hold any value from the included simple XML data types.

The block attribute indicates what types of derivation are prevented for this element definition. This attribute can contain any of the following values:


A value of #all prevents all complex types derived from this type definition from being used in place of this type definition. A value of extension prevents complex type definitions derived through extension from being used in place of this type definition. Assigning a value of restriction prevents a complex type definition derived through restriction from being used in place of this type definition. If this attribute is omitted, any type definition derived from this type definition may be used in place of this type definition.

The mixed attribute indicates whether character data is permitted to appear between the child elements of this type definition. If this attribute is false or is omitted, no character may appear. If the type definition contains a simpleContent type element, this value must be false. If the complexContent element appears as a child element, the mixed attribute on the complexContent element can override the value specified in the current type definition.

A <complexType> element in the XML Schema Definition Language may contain only one of the following elements:


Declaring simple types
Sometimes it's not necessary to declare a complex element type within an XML schema. In these cases you can use the <simpleType> element of the XML Schema Definition Language. These element type definitions support an element based on the simple XML data types or any simpleType declaration within the current schema. For example, consider the following example:

<xsd:simpleType name="PaymentMethodType">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Check"/>
<xsd:enumeration value="Cash"/>
<xsd:enumeration value="Credit Card"/>
<xsd:enumeration value="Debit Card"/>
<xsd:enumeration value="Other"/>
This type definition defines the PaymentMethodType element definition, which is based on the string data type included in the XML Schema Definition Language. You may notice the use of the <enumeration> element. This particular element is referred to as a facet, which we'll cover in the next section.

The basic syntax for defining a simpleType element definition is as follows:

<xsd:simpleType name=''>
<xsd:restriction base=''/>
The base attribute type may contain any simple XML data type or any simpleType declared within the schema. Specifying the value of this attribute determines the type of data it may contain. A simpleType may contain only a value, not other elements or attributes.

You may also notice the inclusion of the <restriction> element. This is probably the most common method in which to declare types, and it helps to set more stringent boundaries on the values an element or attribute based on this type definition may hold. So, to indicate that a type definition's value may hold only string values, you would declare a type definition as follows:

<xsd:simpleType name='mySimpleType'>
<xsd:restriction base='xsd:string'/>
Two other methods are available to an XML schema author to "refine" a simple type definition: <list> and <union>. The <list> element allows an element or attribute based on the type definition to contain a list of values of a specified simple data type. The <union> element allows you to combine two or more simple type definitions to create a collection of values.

Putting It All Together
Now let's look at Listing 5, a complete schema for the document shown in Listing 1. You may notice the use of the <xsd:choice> element. This element can be used to indicate when one of a group of elements or attributes may show up, but not all, as is the case with the DeliveryDate and BillingDate attributes. Also, notice the use of the xsd namespace. This namespace can be anything, but we'll use xsd to indicate an XML Schema Definition Language element.

As we indicated earlier, the listing is substantially more complex than a DTD would be, but it provides much better control over your XML document. There are many additional facets to an XML schema, but the information and examples here should be enough to get your feet wet.

The XML Schema Definition Language provides a very powerful and flexible way in which to validate XML documents. It includes everything from declaring elements and attributes to "inheriting" elements from other schemas, from defining complex element definitions to defining restrictions for even the simplest of data types. This gives the XML schema author such control over specifying a valid construction for an XML document that there is almost nothing that cannot be defined with an XML schema.

Further Reading

  • Schmelzer, R., Vandersypen T., et al. (2002). XML and Web Services Unleashed. Sams Publishing.
  • Savourel, Y. (2001). XML Internationalization and Localization. Sams Publishing.
  • Rambhia, A.M. (2002). XML Distributed Systems Design. Sams Publishing.
  • More Stories By Ron Schmelzer

    Ron Schmelzer is founder and senior analyst of ZapThink. A well-known expert in the field of XML and XML-based standards and initiatives, Ron has been featured in and written for periodicals and has spoken on the subject of XML at numerous industry conferences.

    More Stories By Travis Vandersypen

    Travis Vandersypen, a programmer with EPS Software Corporation, has five years' development experience in XML, UML, XSLT, FoxPro, HTML, and other tools. He has authored a number of articles and is a frequent speaker at conferences.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

    @ThingsExpo Stories
    "Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
    SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of the 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to gre...
    The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develop...
    In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
    "Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
    To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
    Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
    Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
    "Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
    "IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
    In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
    22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
    "Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
    Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
    "MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
    Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
    "There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
    SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
    SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
    It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...