|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS XML Protocols Managing Data Sources with XSLT
Managing Data Sources with XSLT
By: Craig King
May. 30, 2002 12:00 AM
I and a colleague were working on a research project when we saw an opportunity to approach our data management from a different angle. XML appeared on the scene, and when IBM alphaWorks released its first parser we were on our way, using XML to solve our data access problems. During several Department of Defense research projects since the mid-1990s, my colleague and I have had the opportunity to explore various techniques for accessing different types of data sources using XML. These data sources include relational databases, CORBA objects, flat files, and, more recently, LDAP and the J2EE Connector architecture. Initially we used JDBC to access the relational data sources. The JDBC code was tedious to write, however, and difficult to update when new database schemas were released.
XML As a Data Access Language
The client part of our program was already completely dynamic. Everything about the client, including the menu bar and dialog boxes, was defined by property files, and the code needed to run the interface wasn't known until runtime. We took the lessons learned from the client side and applied them to the server side of the program. Our desire for a completely dynamic processing engine on the server side pushed us to look deeper for ways to use XML technologies to manipulate data sources without writing any Java code. Along the way we looked at using XSLT to provide a conditional execution and transformation environment. This turned out to be exactly what was needed. XSLT provided the technology necessary to make our static documents dynamic. The next step was to make our instruction set dynamic and build an extensible processing engine for the instructions.
Java As a Dynamic Plug-in Language
The processing framework requires two basic interfaces: one to define the instruction set and another to wrap the data source drivers. Using Java interfaces to define the idea of an "instruction" and a "driver" allows us to define the contract of the handler without requiring the presence of any code. This allows the loading and knowledge of the instruction handlers and data source handlers to be deferred until runtime. A small property file is used to define each instruction:
type=Instruction The instruction set can be extended or replaced, and even the XML namespace used to define the instruction set can be changed. The drivers are also created in this way. The JDBC driver that wraps JDBC is defined in a property file as well:
type=Driver Using a pluggable component-based engine is key to providing a dynamic, loosely coupled framework for handling any type of data source through a common interface.
XSLT as a transformation between the data source and the result
XSLT also provides a mechanism for handling XML input documents. Our first efforts centered on the generation of XML content from data sources. Processing an input XML document was tedious. We used the DOM trees to search for relevant data and then we would use this to update the data source. This approach was hardly dynamic. Using XSLT to process an input XML document allows us to transform the input into a set of data processing instructions (let's call this a "map document"). This allows the handling of XML input and output without any requirements for writing code to handle the data. XSLT is used to express the business rules of our data processing. The result of the XSLT transformation is a set of processing instructions similar to the XSLT elements. These instructions have their own namespace, "xdl". When the XSL processor is finished transforming the input document, the result is passed to the processing engine, which iterates over the instructions and performs the required data processing. This may or may not result in the generation of a new XML document. One other very compelling reason for using XSLT: the Java bindings to the language allow us to extend the functionality of XSL. That gives us the extendability of the preprocessing through XSLT and the extendability of the processing engine through dynamic instruction handling and data source handling. One very simple feature that really makes working with relational data easier is the ability to create the unique sequence numbers required for most tables before we insert the data. This is much easier than using the database to define its own sequences because the ID is immediately available for use in child tables. The namespace for the extension is defined in the XSLT stylesheet tag, and the extension is then used anywhere an expression is valid:
<xsl:variable name="myid" There are many other ways to make use of the XSLT bindings to Java. When working with multiple data sources you can do quick database lookups to provide conditional behavior before the data sources are queried or updated. It's entirely up to your imagination and creativity.
<xsl:variable name="pid" select= The best part of using XSLT to handle the business rules is that it allows you to define your data access requirements outside your program.
XML Grammar / Instruction Set
We divided up the basic instruction set into four basic core abstractions: instructions, expressions, functions, and extensions. After I define what each of these mean, I'll provide an example that should make everything clear. Instructions are a child of an xdl: querysheet element. Instructions provide the context for managing flow control, interpreting application logic, and providing data output. The flow control and conditional evaluation instructions are based on XSLT and consist of the following: template, call-template, choose, when, otherwise, cdata, pi (processing instruction), value-of, and variable. We've defined the data source interaction instructions as transaction, session, for-each, bind, and bind-param. And for XML content creation there are the element and attribute instructions. <xdl:call-template name="mytemplate"> Expressions are used to provide operators on the values of an instructions attribute. This is the same as the use of the "$" in XSLT to indicate that a variable is being operated on rather than a constant value: <xdl:value-of select="$id"/> Expressions are based on matching of the first character of the expr attribute with a registered expression handler. The ability to define new expression handlers is very powerful. We've defined expressions to allow evaluation of result set data values (*), variables ($), constant values ('), XPath (/), and others. What about creating an expression handler to read in an XML document from a URL so the contents could be added to a database? <xdl:value-of expr="@file://mydoc.xml"/> Functions are addressed by name and can be evaluated with parameters. A function is executed by the use of the function expression handler (:). The functions we've defined are for the most part limited to string manipulation. These functions perform such tasks as concatenation and substring:
<xdl:value-of Extensions are globally defined instructions. Where an instruction is used inside an xdl:template, an extension is a child of the xdl:querysheet and therefore global to the entire document. This distinction is subtle, but it's important to know which instructions are global and which are local. Extensions are used to acquire connections to data sources, format queries presented to the data source, and encapsulate the execution of scripting languages such as Python, JavaScript, or Pnuts.
<xdl:statement ns="getUsers"> Given these definitions, Listing 1 shows what a complete data map might look like. Given a table with Tom and Jane, the output generated by the map would look like this:
<Users> By changing a couple of lines in the map, we can generate the ID as an attribute rather than an element:
<xdl:attribute name="id" expr="*1"/> With this simple change the output would appear like this.
<Users> This basic set of instructions is all that's needed to handle most documentprocessing requirements. What's left is to come up with a way to handle different kinds of data sources in a uniform way.
Wrapping Data Sources
The query for a data source is provided by the xdl:statement extension. This element provides a container for holding the query. This query is passed to the underlying driver when the processing engine encounters an xdl:session element. The query is simply passed off to the underlying driver. If the underlying driver is a JDBC source, the xdl:statement element would contain an SQL query with optional bind parameters:
<xdl:statement ns="mymoney"> The statement "ns" attribute provides the namespace value for addressing the statement at a later point in the processing. The xdl:bind would be used to define a session context for binding values, and the xsl:bind-param provides the actual binding values:
<xdl:session ns="%mymoney"> The bind variable can come from a variable, an XSLT parameter, or as the result of a previous query. Data sources can easily be integrated using this technique, and the update of the multiple data sources using a single document is just as easy. The transaction control is nested and does not provide a complete two-phase commit capability, but it is appropriate for a majority of the cases. When constructing a driver, four things have to be taken into consideration: connections, statements, result sets, and metadata. When managing connections, remember that the connection user name and password may not be the same for each access. It's easy to place the login credentials in a property file, but the connection mechanism should account for server-based use with many different users and roles - pool the connections if possible. The statements should be pooled and there should be a way to defer the binding of values into the statement. JDBC includes the ability to bind everywhere a "?" appears in the query. The instruction processing is set up to allow binding for any data source, so the statement handler should handle this. In fact, the parameter handling must be part of the driver interface specification. Metadata must be made available to create dynamic interactions with the data source. The metadata includes things such as the source schema and the size and format of the returned elements. The result set specification of the driver interface must include handling of metadata. The driver is important in this framework. Developing it is more difficult than developing an instruction handler, but when done correctly, it allows LDAPs to be treated like relational databases, which are then compatible with flat files, and so on.
Output Styles
<xdl:for-each expr="*"> Again, the kinds of things you can do are limited only by your imagination. XSLT provides the business rules and transformation instructions, the processing engine provides the mapping to the data source through the instruction execution, and you supply the ingenuity.
Performance Implications
Many user interface applications or Web-based applications use small data sets, which will perform well in the type of system we've been talking about. In the end it's up to you, the way you've designed your system, and your specific performance needs. My experience while working on this XSLT framework indicates that the XML processing overhead for each record returned from JDBC is about .1-.5ms per record. This was the case whether we processed 250 records or 25,000 records. The database query execution time more often than not will dominate the overall processing time.
Conclusion
To learn more about what I've been working on for the past five years, or if you have questions about the performance of XML/XSLT in data-intensive applications, please send me an e-mail.
Acknowledgment
XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||