| By Sam Natarajan | Article Rating: |
|
| April 18, 2002 12:00 AM EDT | Reads: |
10,021 |
Our technical solution included the development of a complete
rules engine in XSLT, XML parsing in Java using DOM/SAX, a persistent
store of XML data within a relational database (Oracle), a
customizable report writer (Cognos), and a complete runtime
environment in Unix. A two-person team completed the project work
over a period of 12 weeks - an incredible feat, and one that would
never have been possible with "traditional" programming paradigms.
XML, XSLT, and Java technologies are worthy of your attention; read
further to see how they helped us to deliver an innovative solution
for one of our important clients on Wall Street.
Project Overview
A large Wall Street financial services firm developed a
powerful quantitative model for valuing convertible bonds for use by
its traders, risk managers, and accounting department. The
convertible bond asset class is a hybrid of fixed-income and equity
instruments and presents a challenge for data manipulation.
Convertible bonds can have hundreds of fields of reference data
(e.g., coupon, maturity, conversion price, conversion rate, currency,
par amount), with many embedded time elements (e.g., call schedule,
put schedule).
Although many vendors supply convertible bond reference data (e.g., Bloomberg, Reuters), our client was unwilling to accept any single vendor's data feed as the sole basis for its model inputs. The firm needed to normalize the structure of the convertible bond data elements and account for differences by each vendor in feed formats and in nomenclature for describing convertible bond reference data (i.e., field names). By transforming the data elements from each vendor into a standardized database structure, the firm created a consistent framework to select inputs for its valuation engine.
Prior to our solution, the client was trying to manage the large volume of data feeds through a manual comparison process - an onerous task given the quantity of data. The comparison logic for the data had to be heuristic and learn to suppress previously tagged discrepancies so that daily reporting wouldn't include repeat offenders.
Our project was intended to fulfill several key business objectives:
- Process data feeds from multiple vendors.
- Normalize the incoming data feeds to a common schema using a complex set of rules.
- Enable users to transform source data using complex and changing business rules - without involving subsequent IT programming resources.
- Store normalized data into a persistent data store.
- Enable users to create reports that compared data from multiple sources. Automate the manual data entry process.
We developed many shared classes and utilities, leveraging the solution to meet additional client requests to transform related data, generate feeds, and provide automated ways to populate their internal databases.
This article provides an overview of the project and describes the mechanics of that effort, including business and functional requirements, the detailed technical solution, and its rationale. I've included some sample code to highlight key methods we employed to meet these critical business needs for our client.
Background on Convertible Bonds
Bonds are fixed-income securities that represent the debt of
domestic and international governments, corporations, banks,
institutions, or municipalities. When purchasing a fixed-income
security, an investor is lending money to the issuer for a specified
period of time. In return, the investor (i.e., lender) receives
regular interest payments (i.e., the coupon) and later the bond face
value on the maturity date. The fixed-income market offers a wide
range of asset classes with varying degrees of risk. The credit
ratings from independent rating groups like Moody's Investors Service
and Standard & Poor's are intended to rank the relative credit
worthiness of these securities.
Convertible bonds are hybrid instruments that offer a
fixed-income component (i.e., a coupon) and an option to convert the
bond into the issuer's underlying equity at a predetermined
conversion price and ratio. Convertible bonds are influenced by
interest rates, credit risk, underlying equity price, market
volatility (due to the embedded equity option), and other market
factors. Sometimes even New York City subway noise can affect
convertible bond valuation! The convertible bond asset class is an
exemplary illustration of Wall Street innovation, particularly as
bonds are currently issued with many interesting features. Hedge
funds and other investors have played an important role in
influencing the valuation and hedging of convertible bonds.
Solution Overview
Briefly, our solution consisted of the following modules:
- Job scheduling framework in Unix to invoke the XSLT rules engine upon delivery of vendor feeds
- Utility classes to transform non-XML feeds to XML format
- XSLT business rules definitions for transforming incoming data feeds into normalized schema
- Java programs to parse data feeds and invoke XSLT business rules
- Java programs to store transformed data feeds into Oracle
- Oracle database to store normalized convertible bond data elements for each vendor
- Cognos reporting framework for creating customized reports using the Oracle database
- Utility classes used as extensions in XSLT transformations
The functional and reporting framework requirements drove the main system design considerations.
Functional Requirements
- Rules engine should be extensible, accommodating input file changes automatically and consolidating data from multiple files per vendor.
- Rules engine should be independent of nomenclature used by the feed provider.
- Business rules grammar should be easy for the end user to learn, develop, test, and deploy into the production environment.
- Persistent data store should enable extraction of data in XML or other formats.
Reporting Requirements
- Define reports to compare convertible bond data from different sources.
- Define reports to identify the history of changes in data fields from a single-source vendor.
- Create a report to identify new or recently terminated/expired convertible bond issues.
- Allow downloading of report data into various formats (Excel, XML, and text files).
Why XML? With the flexibility expected from the rules engine, XML is the obvious choice. XML doesn't have the baggage of fixed formatted files. By handling XML, the rules engine is ignorant of both source feed formats and the nomenclature used to define indicative data. (Also, we needed an excuse to get published in XML-Journal!)
Why XSLT? One of the core requirements of the rules engine was to allow the end user to maintain the rules engine. This required a grammar that was easy to learn and at the same time had the flexibility to handle data spanned in numerous data files with different nomenclature for attributes. XSLT was the hands-down choice; it's a declarative language (like SQL) and provides mechanisms for handling XML data spanning multiple files. XSLT also allows inclusion of Java functions in defining rules (a very powerful feature to handle more complex rules).
Why Java? We chose Java as the programming language for its platform independence and the availability of XML public domain parsers.
Why Oracle? We chose the Oracle database for its rich XML/XSLT parser API (see Figure 1) and its database-level tools for extracting XML data. The XSLT engine from Oracle has the flexibility and support for using extended Java functions in XSLT rules. Oracle's XML-SQL utility API provides easy ways to store and extract XML documents from a relational database structure. The "oraxsl" tool is especially useful for quick testing of business rules.
Why Cognos Impromptu? For the reporting tool, we selected COGNOS Impromptu, which provides an interactive reporting framework with numerous built-in functions and flexibility to save reports in multiple formats (e.g., CSV, HTML, PDF, XLS). The Cognos reporting tool also contains a job scheduler and allows the report to be saved as an SQL query, which can be used to extract the contents of the report as well-formed XML data.
Rules Engine Architecture
Figure 2 provides a high-level architecture and data flow
diagram of the rules engine. The rules engine consists of (1)
numerous Java modules to parse XML files using a combination of
SAX/DOM parsers, (2) an XSLT transformation engine to apply business
rules in XSLT to transform feeds into normalized XML Schema, and (3)
an Oracle XML SQL utility to move transformed data into a persistent
data store.
The rules engine provides the option of storing the output to a database, a file, or both. The engine's ability to store the output to a file is exploited in handling large feed files (explained later). In addition, the rules engine can be configured to process a subset of convertible bond instruments rather than all the available instruments on the feed file (an important feature in using memory-constrained DOM parsing).
Once the rules engine stored the normalized data into a relational structure, the business user analyzed the quality of the data and selected the "best" data source for use in the financial models. The XML generator utility provided additional tools for creating XML output from saved SQL queries and/or text files.
Key System Modules
Converting Text Files to XML
We developed a utility class to convert the data contained in
delimited text files to XML format. The premise of this utility was
simple: the first line in the text file would contain the field names
to be used as attribute tag names, while all the other lines
contained the values for those fields. The utility built each record
in the text file as a row element, with all data from each row as
attributes of the row element. The utility provided flexibility in
defining the root node and rows of the XML file as parameters within
a configuration file. The text file delimiter was also defined as a
configuration file parameter, which meant that a file with any
delimiter could be processed. We also created two helper classes: one
that abstracted out the parser and the other that had utility
functions for creating/manipulating XML messages.
Handling Large XML Files
The most important technical aspect of the rules engine is
its ability to handle very large XML files - over 100MB. These large
files typically contained data for more than 10,000 convertible bond
instruments, with each instrument having over 200 attributes. We
adopted the divide and conquer strategy to address the constraints
of loading the entire file into memory for DOM parsing. The XML
splitter (based on the SAX parser) was used to obtain a subset of
instruments to build a document object and subsequent XSLT
transformation. To reduce the file size, we extracted only relevant
field attributes from the input file using selector XSLT rules.
(Figure 3 provides a schematic representation of this process.)
Transforming and Saving Results to a Persistent Store
The XML transformer served as the main module, coordinating
the activities of the rules engine and creating an instance of the
XML splitter to parse the XML file into smaller subsets. With each
extracted subset, the transformer used the XSLT processor to apply
the business rules. Upon transformation, the parsed XML data was
saved into the database using the Oracle XML SQL utility API. (Figure
4 illustrates this process.) With careful monitoring of the
performance of the system, we determined an optimum batch size for
each of the feeds. We created intermediary XML documents for each of
the input files from the vendor and used XPATH to access content from
these imported documents in the main module.
Business Rules - XSLT
After carefully reviewing the various data and working with
the business users, we created an initial set of business rules for
each data source. From the outset we organized the XSLT rules in
layers to promote modularity and to enable the reuse of generic rules
common to multiple sources of data. We created a template for each
rule with guidelines for variable names, formatting, and rules logic.
The following are examples of the business rules:
- Rule to translate the coupon frequency: Replace numeric data 1, 2, 4, and 12 with "A" for annual, "S" for semiannual, "Q" for quarterly, and "M" for monthly, respectively. There were many such "code/decode" rules in the engine (see Listing 1).
- Rule to set the maturity date: Here the maturity date should
be set to "01/01/2001" for perpetual instruments (instruments for
which the marketSector="Pfds") where the maturity date on the
incoming file is null/blank.
- The ability to change a field based on
the content of another field was another common rules definition (see
Listing 2).
- Rule for par: Par is calculated by dividing the legacy par amount by FX rate in case of old Euro currencies. We have an XML file containing old European currency codes and the exchange rate for Euro currency. We had many such rules that referenced legacy data for rules processing (see Listing 3).
- Rule to obtain the number of days between two dates: Here we use Java functions to obtain the number of days between two dates. We had many rules involving date manipulation (see Listing 4).
- Rule to extract time series elements: Here we show how to extract schedule data from an imported file. The ability to manipulate time series data from schedules was an important feature of the XSLT rules definitions (see Listing 5).
Reporting Tool
With active participation from the business user, we defined
report templates, allowing the end user to create reports with
minimal effort. In addition to the above report templates, we
provided the capability to suppress known differences/exceptions in
the daily comparison of data elements. Using the report/job
scheduler, the user could save daily production reports in XLS/PDF
formats and subsequently transfer the files to a shared network drive
for distribution to other users.
Lessons Learned
The system has been well received by our client and continues
to enjoy heavy use in their daily processing of convertible bond
data. We achieved user satisfaction by empowering the business user
with the ability to write and control the rules engine.
From a technical perspective, this project helped remind us
of several key points:
- XSLT is a powerful framework for manipulating XML data.
- XSLT can be written in simple, easy-to-understand declarative rules.
- XSLT stylesheets can be used effectively to manage rules definitions.
- Using a combination of SAX and DOM parsing with large files is effective and efficient.
- The paradigm of modular design extends well to the organization of business rules.
Published April 18, 2002 Reads 10,021
Copyright © 2002 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Sam Natarajan
The author, Mr. Sam Natarajan, is the Founder and CEO of Harvest Technology Corporation (www.HarvestTechnology.com), a leading software solutions provider for financial services clients. Mr. Natarajan has 20 years of experience developing technology solutions in the financial services industry. He holds a Master’s Degree in Computer Science from the Stevens Institute of Technology and an MBA from New York University. Mr. Natarajan worked for over 12 years at Merrill Lynch building web applications, global databases, financial models, and trading and risk management systems. In addition, his time at Merrill Lynch included four years as a derivatives trader in the Fixed Income and Equity divisions. His principal duties as CEO of Harvest Technology Corporation include the oversight and management of our development team of Java programmers, XML developers, data modelers, and project managers.
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Will PR Firms Survive The New Media Avalanche?
- Publishing Synergy: Blog, Twitter and Ulitzer
- Typhoon Ondoy (Ketsana) Hits the Philippines (Part 2)
- Combining the Cloud with the Computing: Application Delivery Networks
- SOA World Magazine’s 8th Annual "Readers' Choice Awards" Nominations Open
- Confessions of a Ulitzer Addict
- My Thoughts on Ulitzer
- Ulitzer vs. Ning
- Orchestration in the Cloud to Manage Lower Operational Costs
- AJAX World RIA Conference & Expo Kicks Off in New York City
- Sun Federal's Dr Harry Foxwell to Speak at 1st Annual GovIT Expo
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Ted Weissman and Lois Paul & Partners PR Firm
- Will PR Firms Survive The New Media Avalanche?
- Publishing Synergy: Blog, Twitter and Ulitzer
- Improving the Efficiency of SOA-Based Applications
- Typhoon Ondoy (Ketsana) Hits the Philippines (Part 2)
- SOA, BPM, CEP: Getting IT Budget in a Tight Economy
- Combining the Cloud with the Computing: Application Delivery Networks
- Where Are RIA Technologies Headed in 2008?
- AJAX World RIA Conference & Expo Kicks Off in New York City
- JSON vs XML - A Jason vs Freddie Sequel
- Processing XML with C# and .NET
- Has the Technology Bounceback Begun?
- BPEL Processes and Human Workflow
- Open Source Database Special Feature: An Introduction to Berkeley DB XML
- "HP's Problem Ain't the SAP Install," Says Sun's Schwartz
- eXist - An Introduction To Open Source Native XML Database
- Digitizing the Planet: Google Earth vs MSN Virtual Earth vs MapQuest

































