Welcome!

Industrial IoT Authors: William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz, SmartBear Blog

Related Topics: Industrial IoT

Industrial IoT: Article

Document XSLT Automatically

Document XSLT Automatically

Business users spend a great deal of money on new software systems. For this they demand faithful implementation of their project objectives. And they expect enough visibility into an application to verify that their goals have been implemented. This visibility also ensures that changes can be identified to satisfy new business goals.

One approach to meeting these objectives is the use of a formal specification language. The intent is that the increased formality in the specification will lead to an implementation closer to the goals of the business. Formal specification approaches include algebraic languages such as Z and diagrammatic languages such as UML. While many such languages have been developed, few can serve as a bridge between business users and software developers. This is due primarily to the large gap between domain concepts and software design tools. In addition, unless implementations can be created automatically from the specification languages, maintenance of the implementation often diverges quickly from its initial design.

Consequently, most business users rely on documentation to explain the inner workings of a system. This addresses the critical need to gain visibility into the system to ensure that their goals have been met and the system can be changed easily to adjust to new business concerns. The value of documentation depends on:

  • How well it conveys an understanding of the system
  • How easily modifications can be performed
  • How well the documentation can be kept up to date with new changes
The area of literate programming addresses this concern somewhat - although the most widely used tools, such as Javadocs, are used by programmers for the benefit of other programmers.

This article discusses a method of automatically documenting, in domain-specific terms, the behavior of conditional text processing applications. The use of such terms, as well as actual text in its domain-specific format, yields a small gap that can be readily bridged by business users. The documentation presents a form that can be marked up by business users with minimal ambiguity. Automatic generation of the documentation ensures that it remains faithful to each build of the application.

Conditional text processing is a very large horizontal application area with potential impact on much of literate society. It affects areas as diverse as traditional print document production, Web page generation, document personalization, targeted advertising, and access-controlled documents. Customized text processing is bound to increase rapidly with the trend toward information delivery that is increasingly personalized, access controlled, and market-segment specific.

Representing Text
There are many ways to describe textual content that can be manipulated programmatically. Historically, this subject has been addressed with a myriad of ad hoc and proprietary formats. However, the SGML community has long recognized the benefits of formats that are open, standard, and domain defined. The phenomenal success of HTML (whose format followed that of SGML) and the need for the flexibility of domain-defined markup combined to motivate the W3C to recommend the similar but streamlined XML format as a basis for all Web-based content.

In the domain of technical documentation, DocBook is a well-accepted XML application that can be used as an intermediate form for generating text in a variety of formats including plaintext, XHTML, RTF, TeX, PDF, and PostScript. DocBook is directed at the production of articles or books. Common tags include <book>, <chapter>, and <para>. Listing 1 presents the skeleton of a document in DocBook format. The document is intended to represent a fragment from a financial planning document.

XSLT for Conditional Text
XSL is the W3C stylesheet standard for XML documents. It includes a language, XSLT, to transform XML documents. XSLT supports several programming styles but we focus on the "fill-in-the-blanks" style as identified by Michael Kay in his book, XSLT Programmer's Reference. This style is useful when a target XML document is to be produced by filling in missing items using data provided by an XML data document.

Listing 2 presents an XML data document that includes data about an individual customer. While we won't present the details of the XML DTD (or XML Schema) defining the document structure, it should be evident that it represents properties of a single individual such as age and estimated financial net worth. It's assumed that some other calculation process has determined this information based on data about the individual's financial status.

Listing 3 presents XSLT markup that's been added to Listing 1. The markup includes both a simple data substitution and a simple conditional statement. It uses XPath, another W3C standard and part of XSL, to refer to the data in the XML data document. The XPath reference "customer/name" refers to the customer's name in the XML document given in Listing 2.

Listing 4 illustrates the resulting DocBook document produced by the XSLT transformation of Listing 3 to the XML data in Listing 2. We've used Michael Kay's Saxon processor to generate this example and have rendered this DocBook document into XHTML (see Figure 1) to show how it can be presented to the business user.

Documenting XSLT Processing
Since the XSLT program is itself an XML application, we can apply other XSLT programs to it. In fact, our method maps all XSLT elements used in an XSLT transformation into textual elements used in the target XML document. Since each XSLT element is mapped in this manner, this produces documentation that can be rendered in the same fashion as the documents of the domain. This ensures that the documentation will be familiar to the business users, as it appears as a normal document with the addition of pseudocode annotations.

Listing 5 presents the XSLT program that maps the XSLT program of Listing 3 into a target DocBook document. As this is the most critical step in the process, we elaborate on this example - line numbers have been added to the left-hand side of the listing for reference. Line 1 simply identifies the file as an XML document in a Latin1 character encoding. Line 2 declares an XSL transformation using the standard XSL namespace. The next two lines specify the public and system identifiers that the output file should include to identify the document as a DocBook file. Line 6 indicates that excess white space is to be stripped out of all elements.

The remainder of Listing 5 consists of five templates (i.e., specific transformations). The template at Line 7 matches those XSL elements whose content should be processed further but without any special consideration due the containing XSL element. The Line 10 template ignores all markup inside the xsl:output elements. The template at Line 11 performs the first actual metamarkup of the output document by outputting (source-data-name) for each fragment of text to be pulled from <customer> input data. Similarly, at Line 18, metamarkup of the form (IF condition text) is generated to express the condition for which text should be included in the output document. The final template at Line 26 is a common XSLT default processing rule that simply copies unmatched markup to the output file.

Listing 6 presents a mapping from XSLT variable names to domain-specific names used by business users. This step is syntactic sugar for increasing readability beyond that provided by clearly named XML elements in the input data file. Listing 5 Line 32 includes routine XSLT code that performs this mapping. In production, this mapping was performed in a postprocessing phase via a Perl script.

Listing 7 presents the DocBook markup produced by applying Listing 5 to Listings 3 and 6. In practice, we've termed this a specification because it precisely specifies the operation of the XSLT program in producing the target documents.

Figure 2 illustrates the XHTML presentation of the DocBook document presented in Listing 7. Two extra files (autodocxslt.xsl and mapnames.xsl) that support the build process are too insignificant to warrant including in the listings. However, they can be downloaded from www.sys-con.com/xml/sourcec.cfm so interested readers can build the examples. (A README file that explains all the files is included.)

Discussion
The method outlined above has been used to map a number of other XSLT constructs into text including variable-length lists and loops. These items may appear in a variety of textual contexts including page headers and footers, section headers, tables, bulleted lists, multicolumn layouts, and glossaries.

However, a more general treatment of the subject is difficult because the method would need to account for the appearance of any XSLT element in any textual context. In practice, the XSLT documentation program has been developed by accounting for every XSLT construct used along with every DocBook context it appears in. To ensure that we've accounted for all possibilities, each specification is tested using James Clark's nsgmls validator to ensure that the markup conforms to the DocBook DTD. Because of this constraint, it isn't sufficient to have a properly working XSLT program - it must be translatable into valid DocBook as well. If DocBook directly supported metamarkup constructs, or if we targeted a different output format that provided such direct support, the challenge of choosing output representations for metalevel markup while simultaneously maintaining validity could have been avoided.

Another aspect of this method that requires attention is the translation of expressions used in conditionals, loops, and other XSL statements. The method assumes that all expressions can be transformed by a simple replacement of XPath references into short, descriptive English names. If the expressions are more complex, additional processing may be needed on the expression to render it into readable form. For example, if a call is made to the XSL "format-number" function to depict a number as a dollar amount, then a U.S business user would prefer to see the expression rendered with a leading dollar sign:

format-number("customer/netWorth", #,###,###.00)
=> $customer's net worth

Conclusions
This method works well for several reasons.

  1. The XSLT transformation itself is represented as markup so that it can be easily manipulated by other XSLT transformations. If the transformation were written in conventional 3GL, a programming language parser would be needed along with custom code for handling the reference to text elements.
  2. The programming style used in the XSLT transformation was limited to the "fill-in-the-blanks" style that did not require a general solution to mapping programming constructs into text markup such as DocBook. A rule-based programming style would have been very difficult to map into the proper text structures.
  3. There is a small distance between an instance of the target document and its automatically documented specification, making it easier for business users to comprehend.
  4. The use of domain-specific business terms instead of program variables helped to make the XSLT control flow statements easier to understand.
This method is very useful for communicating conditional text processing to business users because it expresses processing in terms and context that are familiar to users. The precise translation of each XSLT element into readable text makes the produced documentation an ideal tool for maintaining the XSLT program as the produced documentation always matches the operating program.

For More Information

  • Clark, J. XSL Transformations (XSLT) Version 1.0, W3 Recommendation 16 November 1999: www.w3.org/TR/xslt
  • Walsh, N., and Muellner, L. (1999). DocBook: The Definitive Guide. O'Reilly. www.docbook.org/
  • Kay, M. (2000). XSLT Programmer's Reference. Wrox Press.
  • SAXON XSLT Processor: http://saxon.sourceforge.net/

    This work was supported by ExpLore Reasoning Systems, Inc., a firm specializing in intelligent systems for financial services applications.

  • More Stories By Karl Schwamb

    Karl B. Schwamb ([email protected]) is President of Colonnade Software (http://www.colonnadesoftware.com/), a consulting firm specializing in distributed systems. He has led systems design and development efforts for several Fortune 500 companies, primarily in the area of Financial Services. Many of these systems employ cutting edge technology such as XML, Java, middleware, and intelligent systems in environments that demand high-availability, high-throughput, and security.

    More Stories By Kenneth Hughes

    Kenneth J. Hughes ([email protected]) is President of Entelechy
    Corporation (http://www.entel.com), a consulting firm that specializes in XML. He received a BS in Electrical and Computer Engineering and Mathematics in 1985 and a MS in Electrical and Computer Engineering in 1988 from Carnegie Mellon. He has provided strategic guidance, architectural design, and hands-on development for organizations seeking to apply XML to both traditional publishing and Internet-based systems.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    IoT & Smart Cities Stories
    Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
    Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
    IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
    The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
    Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
    Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
    Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
    To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
    In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
    Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...