Welcome!

Industrial IoT Authors: Pat Romanski, William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz

Related Topics: Industrial IoT

Industrial IoT: Article

Document XSLT Automatically

Document XSLT Automatically

Business users spend a great deal of money on new software systems. For this they demand faithful implementation of their project objectives. And they expect enough visibility into an application to verify that their goals have been implemented. This visibility also ensures that changes can be identified to satisfy new business goals.

One approach to meeting these objectives is the use of a formal specification language. The intent is that the increased formality in the specification will lead to an implementation closer to the goals of the business. Formal specification approaches include algebraic languages such as Z and diagrammatic languages such as UML. While many such languages have been developed, few can serve as a bridge between business users and software developers. This is due primarily to the large gap between domain concepts and software design tools. In addition, unless implementations can be created automatically from the specification languages, maintenance of the implementation often diverges quickly from its initial design.

Consequently, most business users rely on documentation to explain the inner workings of a system. This addresses the critical need to gain visibility into the system to ensure that their goals have been met and the system can be changed easily to adjust to new business concerns. The value of documentation depends on:

  • How well it conveys an understanding of the system
  • How easily modifications can be performed
  • How well the documentation can be kept up to date with new changes
The area of literate programming addresses this concern somewhat - although the most widely used tools, such as Javadocs, are used by programmers for the benefit of other programmers.

This article discusses a method of automatically documenting, in domain-specific terms, the behavior of conditional text processing applications. The use of such terms, as well as actual text in its domain-specific format, yields a small gap that can be readily bridged by business users. The documentation presents a form that can be marked up by business users with minimal ambiguity. Automatic generation of the documentation ensures that it remains faithful to each build of the application.

Conditional text processing is a very large horizontal application area with potential impact on much of literate society. It affects areas as diverse as traditional print document production, Web page generation, document personalization, targeted advertising, and access-controlled documents. Customized text processing is bound to increase rapidly with the trend toward information delivery that is increasingly personalized, access controlled, and market-segment specific.

Representing Text
There are many ways to describe textual content that can be manipulated programmatically. Historically, this subject has been addressed with a myriad of ad hoc and proprietary formats. However, the SGML community has long recognized the benefits of formats that are open, standard, and domain defined. The phenomenal success of HTML (whose format followed that of SGML) and the need for the flexibility of domain-defined markup combined to motivate the W3C to recommend the similar but streamlined XML format as a basis for all Web-based content.

In the domain of technical documentation, DocBook is a well-accepted XML application that can be used as an intermediate form for generating text in a variety of formats including plaintext, XHTML, RTF, TeX, PDF, and PostScript. DocBook is directed at the production of articles or books. Common tags include <book>, <chapter>, and <para>. Listing 1 presents the skeleton of a document in DocBook format. The document is intended to represent a fragment from a financial planning document.

XSLT for Conditional Text
XSL is the W3C stylesheet standard for XML documents. It includes a language, XSLT, to transform XML documents. XSLT supports several programming styles but we focus on the "fill-in-the-blanks" style as identified by Michael Kay in his book, XSLT Programmer's Reference. This style is useful when a target XML document is to be produced by filling in missing items using data provided by an XML data document.

Listing 2 presents an XML data document that includes data about an individual customer. While we won't present the details of the XML DTD (or XML Schema) defining the document structure, it should be evident that it represents properties of a single individual such as age and estimated financial net worth. It's assumed that some other calculation process has determined this information based on data about the individual's financial status.

Listing 3 presents XSLT markup that's been added to Listing 1. The markup includes both a simple data substitution and a simple conditional statement. It uses XPath, another W3C standard and part of XSL, to refer to the data in the XML data document. The XPath reference "customer/name" refers to the customer's name in the XML document given in Listing 2.

Listing 4 illustrates the resulting DocBook document produced by the XSLT transformation of Listing 3 to the XML data in Listing 2. We've used Michael Kay's Saxon processor to generate this example and have rendered this DocBook document into XHTML (see Figure 1) to show how it can be presented to the business user.

Documenting XSLT Processing
Since the XSLT program is itself an XML application, we can apply other XSLT programs to it. In fact, our method maps all XSLT elements used in an XSLT transformation into textual elements used in the target XML document. Since each XSLT element is mapped in this manner, this produces documentation that can be rendered in the same fashion as the documents of the domain. This ensures that the documentation will be familiar to the business users, as it appears as a normal document with the addition of pseudocode annotations.

Listing 5 presents the XSLT program that maps the XSLT program of Listing 3 into a target DocBook document. As this is the most critical step in the process, we elaborate on this example - line numbers have been added to the left-hand side of the listing for reference. Line 1 simply identifies the file as an XML document in a Latin1 character encoding. Line 2 declares an XSL transformation using the standard XSL namespace. The next two lines specify the public and system identifiers that the output file should include to identify the document as a DocBook file. Line 6 indicates that excess white space is to be stripped out of all elements.

The remainder of Listing 5 consists of five templates (i.e., specific transformations). The template at Line 7 matches those XSL elements whose content should be processed further but without any special consideration due the containing XSL element. The Line 10 template ignores all markup inside the xsl:output elements. The template at Line 11 performs the first actual metamarkup of the output document by outputting (source-data-name) for each fragment of text to be pulled from <customer> input data. Similarly, at Line 18, metamarkup of the form (IF condition text) is generated to express the condition for which text should be included in the output document. The final template at Line 26 is a common XSLT default processing rule that simply copies unmatched markup to the output file.

Listing 6 presents a mapping from XSLT variable names to domain-specific names used by business users. This step is syntactic sugar for increasing readability beyond that provided by clearly named XML elements in the input data file. Listing 5 Line 32 includes routine XSLT code that performs this mapping. In production, this mapping was performed in a postprocessing phase via a Perl script.

Listing 7 presents the DocBook markup produced by applying Listing 5 to Listings 3 and 6. In practice, we've termed this a specification because it precisely specifies the operation of the XSLT program in producing the target documents.

Figure 2 illustrates the XHTML presentation of the DocBook document presented in Listing 7. Two extra files (autodocxslt.xsl and mapnames.xsl) that support the build process are too insignificant to warrant including in the listings. However, they can be downloaded from www.sys-con.com/xml/sourcec.cfm so interested readers can build the examples. (A README file that explains all the files is included.)

Discussion
The method outlined above has been used to map a number of other XSLT constructs into text including variable-length lists and loops. These items may appear in a variety of textual contexts including page headers and footers, section headers, tables, bulleted lists, multicolumn layouts, and glossaries.

However, a more general treatment of the subject is difficult because the method would need to account for the appearance of any XSLT element in any textual context. In practice, the XSLT documentation program has been developed by accounting for every XSLT construct used along with every DocBook context it appears in. To ensure that we've accounted for all possibilities, each specification is tested using James Clark's nsgmls validator to ensure that the markup conforms to the DocBook DTD. Because of this constraint, it isn't sufficient to have a properly working XSLT program - it must be translatable into valid DocBook as well. If DocBook directly supported metamarkup constructs, or if we targeted a different output format that provided such direct support, the challenge of choosing output representations for metalevel markup while simultaneously maintaining validity could have been avoided.

Another aspect of this method that requires attention is the translation of expressions used in conditionals, loops, and other XSL statements. The method assumes that all expressions can be transformed by a simple replacement of XPath references into short, descriptive English names. If the expressions are more complex, additional processing may be needed on the expression to render it into readable form. For example, if a call is made to the XSL "format-number" function to depict a number as a dollar amount, then a U.S business user would prefer to see the expression rendered with a leading dollar sign:

format-number("customer/netWorth", #,###,###.00)
=> $customer's net worth

Conclusions
This method works well for several reasons.

  1. The XSLT transformation itself is represented as markup so that it can be easily manipulated by other XSLT transformations. If the transformation were written in conventional 3GL, a programming language parser would be needed along with custom code for handling the reference to text elements.
  2. The programming style used in the XSLT transformation was limited to the "fill-in-the-blanks" style that did not require a general solution to mapping programming constructs into text markup such as DocBook. A rule-based programming style would have been very difficult to map into the proper text structures.
  3. There is a small distance between an instance of the target document and its automatically documented specification, making it easier for business users to comprehend.
  4. The use of domain-specific business terms instead of program variables helped to make the XSLT control flow statements easier to understand.
This method is very useful for communicating conditional text processing to business users because it expresses processing in terms and context that are familiar to users. The precise translation of each XSLT element into readable text makes the produced documentation an ideal tool for maintaining the XSLT program as the produced documentation always matches the operating program.

For More Information

  • Clark, J. XSL Transformations (XSLT) Version 1.0, W3 Recommendation 16 November 1999: www.w3.org/TR/xslt
  • Walsh, N., and Muellner, L. (1999). DocBook: The Definitive Guide. O'Reilly. www.docbook.org/
  • Kay, M. (2000). XSLT Programmer's Reference. Wrox Press.
  • SAXON XSLT Processor: http://saxon.sourceforge.net/

    This work was supported by ExpLore Reasoning Systems, Inc., a firm specializing in intelligent systems for financial services applications.

  • More Stories By Karl Schwamb

    Karl B. Schwamb ([email protected]) is President of Colonnade Software (http://www.colonnadesoftware.com/), a consulting firm specializing in distributed systems. He has led systems design and development efforts for several Fortune 500 companies, primarily in the area of Financial Services. Many of these systems employ cutting edge technology such as XML, Java, middleware, and intelligent systems in environments that demand high-availability, high-throughput, and security.

    More Stories By Kenneth Hughes

    Kenneth J. Hughes ([email protected]) is President of Entelechy
    Corporation (http://www.entel.com), a consulting firm that specializes in XML. He received a BS in Electrical and Computer Engineering and Mathematics in 1985 and a MS in Electrical and Computer Engineering in 1988 from Carnegie Mellon. He has provided strategic guidance, architectural design, and hands-on development for organizations seeking to apply XML to both traditional publishing and Internet-based systems.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    IoT & Smart Cities Stories
    While the focus and objectives of IoT initiatives are many and diverse, they all share a few common attributes, and one of those is the network. Commonly, that network includes the Internet, over which there isn't any real control for performance and availability. Or is there? The current state of the art for Big Data analytics, as applied to network telemetry, offers new opportunities for improving and assuring operational integrity. In his session at @ThingsExpo, Jim Frey, Vice President of S...
    In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
    @CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in ...
    Two weeks ago (November 3-5), I attended the Cloud Expo Silicon Valley as a speaker, where I presented on the security and privacy due diligence requirements for cloud solutions. Cloud security is a topical issue for every CIO, CISO, and technology buyer. Decision-makers are always looking for insights on how to mitigate the security risks of implementing and using cloud solutions. Based on the presentation topics covered at the conference, as well as the general discussions heard between sessio...
    The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
    The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
    Rodrigo Coutinho is part of OutSystems' founders' team and currently the Head of Product Design. He provides a cross-functional role where he supports Product Management in defining the positioning and direction of the Agile Platform, while at the same time promoting model-based development and new techniques to deliver applications in the cloud.
    There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
    LogRocket helps product teams develop better experiences for users by recording videos of user sessions with logs and network data. It identifies UX problems and reveals the root cause of every bug. LogRocket presents impactful errors on a website, and how to reproduce it. With LogRocket, users can replay problems.
    Data Theorem is a leading provider of modern application security. Its core mission is to analyze and secure any modern application anytime, anywhere. The Data Theorem Analyzer Engine continuously scans APIs and mobile applications in search of security flaws and data privacy gaps. Data Theorem products help organizations build safer applications that maximize data security and brand protection. The company has detected more than 300 million application eavesdropping incidents and currently secu...