|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS Feature Building XML Middleware Using Omnimark
Building XML Middleware Using Omnimark
By: Mark Baker
Aug. 18, 2000 12:00 AM
XML is rapidly becoming the way applications communicate. Because it isn't one language but a means to create many languages, it can play many roles in application integration and data exchange. In this article I'll explore some of the varied roles that XML can play by developing a simple middleware application that serves up data from a database. The application (see Listing 1) uses XML in three ways:
OmniMark is alone, though, in having developed the streaming model into a full, generalized programming language. The conventional approach to writing a program is to design a data structure, populate it, manipulate it and finally serialize it to create output; that is, it's a memory-centric approach. The streaming approach involves acting on the data directly as it streams, without creating data structures - it's an I/O-centric approach. An OmniMark program is structured as a collection of rules, with different rule types used for different purposes:
Server Programming
The two essential performance characteristics that every server program must have are:
OmniMark uses OMX (OmniMark extension) components to connect to external data sources. To establish a TCP service, I use the tcpServiceOpen open function found in the TCP/IP library. The function returns an OMX variable that's a handle to a TCPService OMX component that manages the TCP service. The function is called in the initializer of the "service" variable: local tcpService serviceHow to Receive Requests Once the service is established, the program must wait for a connection attempt from a client. This is accomplished with the TCPServiceAcceptConnection function, which takes the OMX variable representing the service and waits for a connection. When a connection is made, it returns another OMX variable, which is a handle to a TCPConnection OMX component that will manage the connection: local tcpConnection connection
Establishing the Request and Response Streams
local stream replyThe statement "protocol IOProtocolMultiPacket" is a second parameter to the TCPConnectionGetOutput function. It establishes the protocol to use for writing the data. Because a TCP/IP connection is a two-way communication channel, the sender can't signal the end of its data by closing the channel; an I/O protocol is required to establish when a message ends. OmniMark provides support for most common protocols through the IOProtocol library. This program uses the MultiPacket protocol, which breaks up the message into packets and sends each packet prefixed by a network-long value specifying its size. The message ends with a zero-length packet. Opening the "reply" stream with the TCP/IP connection attached creates a vehicle for sending output to the TCP/IP connection. To actually send output to the connection, I have to make "reply" the current output stream. I do this using the statement "using output as": using output as replyThe "using output as" statement is a prefix to the "do" block and establishes the current output for all the code that executes within the block, including any functions or rules called within the block. I can output data to the TCP/IP connection with a simple "output" statement anywhere within the output scope established by this statement. To receive data from the TCP/IP connection, I use the "tcpConnectionGetSource" function. It takes the connection OMX and protocol parameters - just like "tcpConnectionGetOutput" - and returns an OmniMark source that can be used by any of the OmniMark keywords that accept sources - in this case the "scan" keyword: scan tcpConnectionGetSource connection
How to Survive Errors and Stay Running
catch #program-errorOmniMark's catch and throw keywords provide robust structured exception handling that you can use for both flow control and error handling. A throw isn't a GOTO. A throw starts a systematic process in which program scopes are closed one by one, starting with the scope in which the throw occurs and ending with the one in which the catch occurs. Garbage collection is automatic. "#program-error" is a built-in catch name that I use here as the line of last defense. Any program error, any uncaught throw, any error in an external system that is not caught and handled by another catch will be caught here. The catch will result in the current iteration of the loop being shut down and tidied up. Control will then return to the top of the loop, starting a new iteration.
How to Return to a Stable State After Each Request
At one place in this program I violate this rule. I use a global variable for the database connection ("db"), trading off a little robustness for the performance advantage of maintaining a permanent connection to the database. You can use catch and throw to detect problems with the database connection and recover from them, but that's outside the scope of this article.
XML Processing
Once parsing is initiated, the parser fires markup rules for the various markup structures it encounters. The most common of the markup rules is the element rule. While SAX has separate events for the start and end of an element, OmniMark fires a single element rule for each element. Each element rule uses the parse continuation operator ("%c") to initiate parsing of the element's content. Element rules are thus fired hierarchically. Each rule is suspended while the element's content is parsed and resumes once the parsing of the content is complete. Parsing is initiated by the "do xml-parse" statement: do xml-parse instanceThe "with" clause specifies the DTD to use. In this program the request DTD is precompiled in the start-up section. The "scan" clause specifies the source from which to read the XML document. In this case it's the source returned by the tcpConnectionGetSource function that attaches the OmniMark source to the TCP/IP connection. What this means is that the XML document will be streamed directly from the TCP/IP connection into the XML parser. "do xml-parse" is a block statement. Within that block the parse state has been established, but parsing isn't actually in progress. Parsing is started by the parse-continuation operator ("%c"), which is roughly equivalent in function to the XSL statement apply-templates. "%c" is a string escape sequence that allows you to easily express where you want the content of an element to fall in the output stream.
Parsing the Information Requests
The parser is started in response to the "%c" in the "do xml-parse" block. When it finds the request element, it fires the element rule for "request" and pauses. The "request" element rule has no work to do, so it simply restarts the parser with "%c". element "request"Suppose the request is for a list of selected products. The request element will contain an element, "selected-products", whose data content will be a comma-separated list of product IDs. The element rule for "selected-products" begins like this: element "selected-products"The "%c", which every element rule must contain, is tacked onto the end of the initializer for the variable "query". The data content of the "selected-products" element is streamed into the variable "query" and becomes part of the SQL statement that will be used to query the database. In effect, the product line ID has been streamed directly from the TCP/IP connection into the SQL statement.
Database Access
global dbDatabase db initial {dbOpenODBC dsn}The dbOpenODBC function takes a parameter that is the data source name (DSN) used by the ODBC driver manager to identify a database. It returns a database OMX variable.
Querying the Database
dbQuery db sql query record rs The dbQuery function takes three parameters, the OMX variable for the database, the SQL query - heralded by the word "sql" - and an OmniMark shelf named "rs" - heralded by the word "record." A shelf is an OmniMark data structure. It's an associative array, meaning that items can be addressed either by position or by a textual key value. "dbField" is an OMX variable type for an OMX component representing a database field. The "dbQuery" function will populate the "rs" shelf with "dbField" OMX variables representing the fields of the current record. The names of the fields will become the keys of the shelf. After executing the query, the program checks to see if any records were returned; if not, it throws "record-not-found": throw record-not-found unless dbRecordExists rsThis throw is caught by the statement: catch record-not-foundThrowing out of an element rule terminates the current parse. The code in the catch block then outputs the XML message: <response status="notfound"/>Since the "do XML parse" block is within the output scope created by the statement "using output as reply", this output goes straight to the TCP/IP connection and to the client. Figure 1 illustrates how data streams through the program. In this figure the top line shows the streaming of the request data from the TCP/IP port to the XML parser and into the SQL query. The query itself is passed as a function call to the database OMX, not streamed. The bottom line shows the streaming of the response data from the database to the find rules that escape markup characters in the text, the interpolation of the XML tagging by the program, and the streaming of the result to the TCP/IP port.
Building an XML Encoded Response
output '<response status="ok">%n'Sometimes it takes more than one SQL query to collect the information needed to construct a response. This is the case for the product-by-line and product-by-type requests, both of which return a description of the product type or product line followed by a list of products. Because there are two separate requests, either of which can fail, I buffer the output until both queries have succeeded. To do this, I attach the stream "response-buffer" to a buffer and make it the current output scope for the duration of the two queries: local stream response-bufferAfter the block governed by "using output as response-buffer," I close the stream and output it. The original output scope was restored when the block ended, so the output once again goes to the TCP/IP connection.
Dealing with the Markup in the Database
I deal with this markup by simple inclusion. One of the virtues of XML is that because of its linear nature and nested structure, the root element of one XML document can become an element in another document simply by dropping it in place. Because I control both the database and the server, I don't have to worry about namespace conflicts. In effect, the "description" language used in the database is just a subset of the "response" language used by the server.
Escaping the Markup Characters in the Data
Rather than outputting the values of the fields directly, the program "submits" them. Submitted data is processed by find rules, which apply pattern-matching techniques to data streams. (The "dbFieldValue" function returns an OmniMark source, not a string, so the data is being streamed, not copied here.) OmniMark supports a full pattern-matching language. However, this program requires only literal text matching. Here's the find rule for escaping the "<" character: find "<"This find rule looks for "<" in the streaming data. When it finds it, it removes the matched character from the stream and outputs the escape sequence "& l t ;" in its place. All the find rules are active at once so all the replacements are done in a single pass. There's no need to worry about the "&" inserted by this find rule being seen and replaced by the rule "find "&"." All data not matched by a find rule will stream to the current output, which is still the output scope established for the parse: the TCP/IP connection. Notice that the "description" field, since it's already in XML, isn't submitted for escaping.
Shutting Down the Server
A Stub Client
Sending a Request
set connection to TCPConnectionOpenOnce I have a connection, I can send a request. Since the request is a single value, I use "set" rather than "using output as" and "output". As in the server, the attachment to the TCP/IP connection is provided by the "TCPConnectionGetOutput" function: set TCPConnectionGetOutput connectionProcessing the Response The response is streamed directly into the parser, just as in the server program. The only difference in the client is that the DTD (see Listing 3) is fed to the parser as text rather than being precompiled. do xml-parse documentThe response is fed to the parser, which fires element rules as before. The element rules output simple HTML tagging in place of the XML tagging in the response.
Summary
Because it combines broad-based connectivity, a streaming programming model and an integrated parser, OmniMark is a good language for building the next generation of XML-enabled Internet applications.
XML Resource
XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||