Welcome!

Industrial IoT Authors: Pat Romanski, William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz

Related Topics: Industrial IoT

Industrial IoT: Article

Managing Your XML Documents with Schemas

Managing Your XML Documents with Schemas

The XML Schema Definition Language solves a number of problems posed with Document Type Definitions. Because DTDs prompted much confusion and complaining among XML developers, the W3C set about creating a new standard for defining a document's structure.

What the W3C created is something even more complex and flexible than DTDs: the XML Schema Definition Language. In this article we'll look at many aspects of schemas and how you can build and use them.

A Little Background
Schemas, while more complex than DTDs, give an individual much more power and control over how XML documents are validated. For instance, with the new W3C standard a document definition can specify the data type of an element's contents, the range of values for elements, the minimum as well as maximum number of times an element may occur, annotations to schemas, and much more.

In May of 2001 the W3C finalized its recommendation for the XML Schema Definition Language. This standard allows an author to define simple and complex elements as well as the rules governing how those elements and their attributes may show up within an instance document. The author has a large amount of control over how the structure of a conforming XML document must be created. The author can apply various restrictions to the elements and attributes within the document, from specifying the length to specifying an enumerated set of acceptable values for the element or attribute. With the XML Schema Definition Language, an XML schema author possesses an incredible amount of control over the conformance of an associated XML document to the specified schema.

Sample XML Document
The remainder of this article is devoted to creating and understanding the XML schema for the XML document shown in Listing 1, which details a purchase order for various items that can commonly be found in a grocery store. This document allows one individual to receive the shipment of the goods and an entirely different individual to pay for the purchase. This document also contains specific information about the products ordered, such as how much each product cost, how many were ordered, and so on.

As you can see, the listing represents a fairly small and simple order that could be placed online. It contains the necessary information regarding how payment is to be made, how the order is to be shipped, and what day delivery should be. The listing should by no means be construed as an all-inclusive document for an online grocery store order; it has been constructed only for use as an example.

For the listing, an author might construct a DTD to describe the XML document. While such a DTD might require only 30 lines or so, it would provide a relatively inflexible definition of the XML document.

A Sample Schema
Creating an XML schema to describe this document is somewhat more complex than building a DTD. However, in exchange for the extra complexity, the schema gives the author virtually limitless control over how an XML document can be validated against it.

Authoring an XML schema consists of declaring elements and attributes as well as the "properties" of those elements and attributes. We will begin our look at authoring XML schemas by working our way from the least complex to the most complex example. Because attributes may not contain other attributes or elements, we will start there.

Declaring attributes
Attributes in an XML document are contained by elements. To indicate that a complex element has an attribute, use the <attribute> element of the XML Schema Definition Language. For instance, Listing 2 is from a hypothetical PurchaseOrder schema based on the XML document shown in Listing 1. You can see the basics for declaring an attribute.

From this you can see that, when declaring an attribute, you must specify a type. This type must be one of the simple types: anyURI, base64Binary, boolean, byte, date, dateTime, decimal, double, duration, ENTITIES, ENTITY, float, gDay, gMonth, gMonthDay, gYear, gYearMonth, hexBinary, ID, IDREF, IDREFS, int, integer, language, long, Name, NCName, negativeInteger, NMTOKEN, NMTOKENS, nonNegativeInteger, nonPositiveInteger, normalizedString, NOTATION, positiveInteger, QName, short, string, time, token, unsignedByte, unsignedInt, unsignedLong, unsignedShort. Each type can be further categorized as a "primitive" data type or a "derived" data type. The derived data types are "primitive" or other "derived" data types with restrictions placed on them, such as integer, positiveInteger, and byte.

From the simple types you may notice what appears to be a group of duplicate or unnecessary types, such as nonNegativeInteger and positiveInteger. If you look closely, you'll see that nonNegativeInteger is an integer whose value is greater than or equal to zero, whereas the positiveInteger type is an integer whose value is greater than zero, which means a positiveInteger type cannot be zero. Keep this in mind when deciding on the base data type for your elements and attributes - these small details can greatly influence their acceptable value ranges.

Aside from defining the type of an attribute, the <attribute> element within the XML Schema Definition Language contains attributes to assist in defining when an attribute is optional, whether its value is fixed, what its default value is, and so on. Here's the basic syntax for the <attribute> element:

<attribute name="" type="" [use=""] [fixed=""] [default=""] [ref=""]/>

The use attribute can contain one of the following possible values:

  • Optional
  • Prohibited
  • Required
If the use attribute is set to required, the parent element must have the attribute; otherwise the document will be considered invalid. A value of optional indicates the attribute may or may not occur in the document and the attribute may contain any value. By assigning a value of prohibited to the use attribute, you can indicate that the attribute may not appear at all within the parent element.

Specifying a value for the default attribute indicates that if the attribute does not appear within the specified element of the XML document, it is assumed to have the value. A value within the fixed attribute indicates the attribute has a constant value.

It's important to remember that if you specify a value for the fixed attribute of the <attribute> element, the resulting attribute must have the value specified for the attribute to be valid. If you mean to indicate that the attribute should have a default value of some sort, use the default attribute instead. It should be noted that the default and fixed attributes are mutually exclusive.

The ref attribute for the <attribute> element indicates that the attribute declaration exists somewhere else within the schema. This allows complex attribute declarations to be defined once and referenced when necessary. For instance, let's say you've "inherited" elements and attributes from another schema and would simply like to reuse one of the attribute declarations within the current schema; this would provide the perfect opportunity to take advantage of the ref attribute.

Just as attributes can be defined based on the simple data types included in the XML Schema Definition Language, they can also be defined based on <simpleType> elements. This can easily be accomplished by declaring an attribute that contains a <simpleType> element, as the following example demonstrates:

<xsd:attribute name="exampleattribute">
<xsd:simpleType base="string">
<xsd:length value="2"/>
</xsd:simpleType>
</xsd:attribute>

<xsd:complexType name="exampleelement">
<xsd:attribute ref="exampleattribute"/>
</xsd:complexType>
From this example you can see that the XML Schema Definition Language gives the schema author a great deal of control over how attributes are validated. One of the wonderful side effects of the XML Schema Definition Language is the similarity to object-oriented programming. Consider each attribute definition and element definition to be a class definition. These class definitions describe complex structures and behaviors among various different classes, so each individual class definition, whether it's a simple class or complex class, encapsulates everything necessary to perform its job. The same holds true for the declaration of attributes and elements within an XML document. Each item completely describes itself.

Declaring elements
Elements within an XML schema can be declared using the <element> element from the XML Schema Definition Language. The example in Listing 3 shows a simple element declaration using the XML Schema Definition Language.

From the example you can see that an element's type may be defined elsewhere within the schema. The location at which an element is defined determines certain characteristics about its availability within the schema. For instance, an element defined as a child of the <schema> element can be referenced anywhere within the schema document, whereas an element that is defined when it's declared can have that definition used only once.

An element's type can be defined with a <complexType> element, a <simpleType> element, a <complexContent> element, or a <simpleContent> element. The validation requirements for the document will influence the choice of an element's type. For instance, going back to our object-oriented analogy, let's say you define a high-level abstract class and then need to refine its definition for certain situations. In that case you would create a new class based on the existing one and change its definition as needed. The <complexContent> and <simpleContent> elements work much the same way: they provide a way to extend or restrict the existing simple or complex type definition as needed by the specific instance of the element declaration.

The basic construction of an element declaration using the <element> element within the XML Schema Definition Language is as follows:

<element name="" [type=""] [abstract=""] [block=""]
[default=""] [final=""] [fixed=""] [minOccurs=""]
[maxOccurs=""] [nillable=""] [ref=""] [substitutionGroup=""]/>
From this you can see that element declarations offer a myriad of possibilities to the author. For instance, the abstract attribute indicates whether the element being declared may show up directly within the XML document. If this attribute is true, the declared element may not show up directly. Instead, this element must be referenced by another element using the substitutionGroup attribute. This substitution works only if the element utilizing the substitutionGroup attribute occurs directly beneath the <schema> element.

In other words, for one element declaration to be substituted for another, the element using the substitutionGroup attribute must be a top-level element. Why would anyone in his right mind declare an element as abstract? The answer is really quite simple. Let's say you need to have multiple elements that have the same basic values specified for the attributes on the <element> element. A <complexType> element definition does not allow for those attributes. So, rather than define and set those attribute values for each element, you could make an "abstract" element declaration, set the values once, and substitute the abstract element definition as needed.

You may omit the type attribute from the <element> element, but you should have either the ref attribute or the substitutionGroup attribute specified.

The type attribute indicates that the element should be based on a complexType, simpleType, complexContent, or simpleContent element definition. By defining an element's structure using one of these other elements, the author can gain an incredible amount of control over the element's definition. We will cover these various element definitions in the "Declaring Complex Elements" and "Declaring Simple Types" sections later in this article.

The block attribute prevents any element with the specified derivation type from being used in place of the element. The block attribute may contain any of the following values:

#all
extension
restriction
substitution

If the value #all is specified within the block attribute, no elements derived from this element declaration may appear in place of this element. A value of extension prevents any element whose definition has been derived by extension from appearing in place on this element. If a value of restriction is assigned, an element derived by restriction from this element declaration is prevented from appearing in place of this element. Finally, a value of substitution indicates that an element derived through substitution cannot be used in place of this element.

The default attribute may be specified only for an element based on a simpleType or whose content is text only. This attribute assigns a default value to an element.

You cannot specify a value for both a default attribute and a fixed attribute; they are mutually exclusive. Also, if the element definition is based on a simpleType, the value must be a valid type of the data type.

The minOccurs and maxOccurs attributes specify the minimum and maximum number of times this element may appear within a valid XML document. Although you may explicitly set these attributes, they are not required. To indicate that an element's appearance within the parent element is optional, set the minOccurs attribute to 0. To indicate that the element may occur an unlimited number of times within the parent element, set the maxOccurs attribute to the string "unbounded". However, you may not specify the minOccurs attribute for an element whose parent element is the <schema> element.

The nillable attribute indicates whether an explicit null value can be assigned to the element. If this particular attribute is omitted, it is assumed to be false. If this attribute has a value of true, the nil attribute for the element will be true. So what exactly does this do for you, this nillable attribute? Well, let's say you are writing an application that uses a database that supports NULL values for fields and you are representing your data as XML. Now let's say you request the data from your database and convert it into some XML grammar. How do you tell the difference between those elements that are empty and those elements that are NULL? That's where the nillable attribute comes into play. By appending an attribute of nil to the element, you can tell whether it is empty or is actually NULL. Remember, the nillable attribute applies only to an element's contents and not the attributes of the element.

The fixed attribute specifies that the element has a constant, predetermined value. This attribute applies only to those elements whose type definitions are based on simpleType or whose content is text only.

Declaring complex elements
Many times within an XML document an element may contain child elements and/or attributes. To indicate this within the XML Schema Definition Language, you'll use the <complexType> element. If you examine the sample section from Listing 4, you'll see the basics used to define a complex element within an XML schema.

The sample section specifies the definition of PurchaseOrderType. This particular element contains three child elements - ShippingInformation, BillingInformation, and Order - as well as two attributes - Tax and Total. You should also notice the use of the maxOccurs and minOccurs attributes on the element declarations. With a value of 1 indicated for both attributes, the element declarations specify that they must occur one time within the PurchaseOrderType element.

The basic syntax for the <complexType> element is as follows:

<xsd:complexType name='' [abstract=''] [base='']
[block=''] [final=''] [mixed='']/>
The abstract attribute indicates whether an element may define its content directly from this type definition or from a type derived from this type definition. If this attribute is true, an element must define its content from a derived type definition. If this attribute is omitted or its value is false, an element may define its content directly based on this type definition.

The base attribute specifies the data type for the element. This attribute may hold any value from the included simple XML data types.

The block attribute indicates what types of derivation are prevented for this element definition. This attribute can contain any of the following values:

#all
extension
restriction

A value of #all prevents all complex types derived from this type definition from being used in place of this type definition. A value of extension prevents complex type definitions derived through extension from being used in place of this type definition. Assigning a value of restriction prevents a complex type definition derived through restriction from being used in place of this type definition. If this attribute is omitted, any type definition derived from this type definition may be used in place of this type definition.

The mixed attribute indicates whether character data is permitted to appear between the child elements of this type definition. If this attribute is false or is omitted, no character may appear. If the type definition contains a simpleContent type element, this value must be false. If the complexContent element appears as a child element, the mixed attribute on the complexContent element can override the value specified in the current type definition.

A <complexType> element in the XML Schema Definition Language may contain only one of the following elements:

all
choice
complexContent
group
sequence
simpleContent

Declaring simple types
Sometimes it's not necessary to declare a complex element type within an XML schema. In these cases you can use the <simpleType> element of the XML Schema Definition Language. These element type definitions support an element based on the simple XML data types or any simpleType declaration within the current schema. For example, consider the following example:

<xsd:simpleType name="PaymentMethodType">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Check"/>
<xsd:enumeration value="Cash"/>
<xsd:enumeration value="Credit Card"/>
<xsd:enumeration value="Debit Card"/>
<xsd:enumeration value="Other"/>
</xsd:restriction>
</xsd:simpleType>
This type definition defines the PaymentMethodType element definition, which is based on the string data type included in the XML Schema Definition Language. You may notice the use of the <enumeration> element. This particular element is referred to as a facet, which we'll cover in the next section.

The basic syntax for defining a simpleType element definition is as follows:

<xsd:simpleType name=''>
<xsd:restriction base=''/>
</xsd:simpleType>
The base attribute type may contain any simple XML data type or any simpleType declared within the schema. Specifying the value of this attribute determines the type of data it may contain. A simpleType may contain only a value, not other elements or attributes.

You may also notice the inclusion of the <restriction> element. This is probably the most common method in which to declare types, and it helps to set more stringent boundaries on the values an element or attribute based on this type definition may hold. So, to indicate that a type definition's value may hold only string values, you would declare a type definition as follows:

<xsd:simpleType name='mySimpleType'>
<xsd:restriction base='xsd:string'/>
</xsd:simpleType>
Two other methods are available to an XML schema author to "refine" a simple type definition: <list> and <union>. The <list> element allows an element or attribute based on the type definition to contain a list of values of a specified simple data type. The <union> element allows you to combine two or more simple type definitions to create a collection of values.

Putting It All Together
Now let's look at Listing 5, a complete schema for the document shown in Listing 1. You may notice the use of the <xsd:choice> element. This element can be used to indicate when one of a group of elements or attributes may show up, but not all, as is the case with the DeliveryDate and BillingDate attributes. Also, notice the use of the xsd namespace. This namespace can be anything, but we'll use xsd to indicate an XML Schema Definition Language element.

As we indicated earlier, the listing is substantially more complex than a DTD would be, but it provides much better control over your XML document. There are many additional facets to an XML schema, but the information and examples here should be enough to get your feet wet.

Summary
The XML Schema Definition Language provides a very powerful and flexible way in which to validate XML documents. It includes everything from declaring elements and attributes to "inheriting" elements from other schemas, from defining complex element definitions to defining restrictions for even the simplest of data types. This gives the XML schema author such control over specifying a valid construction for an XML document that there is almost nothing that cannot be defined with an XML schema.

Further Reading

  • Schmelzer, R., Vandersypen T., et al. (2002). XML and Web Services Unleashed. Sams Publishing.
  • Savourel, Y. (2001). XML Internationalization and Localization. Sams Publishing.
  • Rambhia, A.M. (2002). XML Distributed Systems Design. Sams Publishing.
  • More Stories By Ron Schmelzer

    Ron Schmelzer is founder and senior analyst of ZapThink. A well-known expert in the field of XML and XML-based standards and initiatives, Ron has been featured in and written for periodicals and has spoken on the subject of XML at numerous industry conferences.

    More Stories By Travis Vandersypen

    Travis Vandersypen, a programmer with EPS Software Corporation, has five years' development experience in XML, UML, XSLT, FoxPro, HTML, and other tools. He has authored a number of articles and is a frequent speaker at conferences.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    IoT & Smart Cities Stories
    Moroccanoil®, the global leader in oil-infused beauty, is thrilled to announce the NEW Moroccanoil Color Depositing Masks, a collection of dual-benefit hair masks that deposit pure pigments while providing the treatment benefits of a deep conditioning mask. The collection consists of seven curated shades for commitment-free, beautifully-colored hair that looks and feels healthy.
    The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
    The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
    We all love the many benefits of natural plant oils, used as a deap treatment before shampooing, at home or at the beach, but is there an all-in-one solution for everyday intensive nutrition and modern styling?I am passionate about the benefits of natural extracts with tried-and-tested results, which I have used to develop my own brand (lemon for its acid ph, wheat germ for its fortifying action…). I wanted a product which combined caring and styling effects, and which could be used after shampo...
    The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
    There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
    Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
    At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
    Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
    BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.