|
|
YOUR FEEDBACK
SOA World Conference
Virtualization Conference $200 Savings Expire May 16, 2008... – Register Today! Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS Feature
Defining Mainframe Transaction's Signature with an XML Schema; How To Convert Cobol Metadata
Converting Cobol metadata into an XML Schema using regular expressions processing
By: Edgardo Burin
Jun. 28, 2005 11:00 AM
Digg This!
Page 1 of 3
next page »
Integrating mainframe applications into an SOA often carries the burden of dealing with metadata in the form of Cobol Copybooks. This metadata converted to an XML Schema format can be useful for a range of applications (from validation to creation of services). This article explains how to automate the conversion from Copybooks to XML Schema using regular expression logic.
Elementary Item is the name Cobol assigns to a data item that is not further subdivided (analogous to variables in other languages). Elementary Items are composed of: a Level Number, a Data Name, and a Picture Clause. The Picture Clause (or PIC) allows us to declare the data format of the item. In Cobol there are three basic data types: Alphanumeric (text strings), Numeric, and Alphabetic. Each of these formats is defined using a declaration sentence associated with a Picture Clause. The basic symbols used in the Picture Clause are: X for Alphanumeric, 9 for Numeric, and A for Alphabetic. The number of positions taken up by the data item is defined with a number inside parentheses, as in PIC X(10), which means an alphanumeric composed of 10 characters. There are more symbols and variants of declarations, but for the sake of simplicity I will restrict the explanation to these basic formats. For more details see the References section at the end of the article. Group Items allow grouping a set of Elementary Items (or other Group Items) together. Group Items are composed of a Level Number and a Data Name, but don't contain a picture format. The Level Number creates a kind of hierarchical structure where one level groups all of the lower levels inside. The Level Number represents here the relationship that exists between different items in the definition. For example, the following declaration: represents a data definition composed of a Group Item called COURSES containing information about training courses. This group includes two items: the first is an Elementary Item called COURSE-NAME that is defined as a 20-positions alphanumeric field, and a Group Item called COURSE-ID. This group is composed of two Elementary Items: a three-character item called COURSE-TYPE and a five-position numeric item called COURSE-NUMBER. For a full description of the copybook see Listing 2. Usually Level Numbers between 1 and 49 are free to use without restrictions. Levels don't need to be contiguous between them (a 01 group item can group several 04, 03, and 02 items). Levels 66, 77, and 88 have some special meaning assigned. Since the main purpose of this article is to present a technique to convert from Cobol data definition into XML Schema, I will restrict the Copybooks to these basic formats (Elementary Items and Group Items), not including other kind of data (like arrays). In case of need the reader can extend the model to include other formats.
XML Schema 101 As I said before, XML Schemas allow describing the valid structure of a related XML file. Then, XML Schemas can be considered a metadata definition "from an underlying information set," in the words of the W3C. The complete reference of XML Schema can be found in the W3C site (see the Reference section). Elements are defined in the XML Schema with the element construct. Elements can be defined based on primitive datatypes or derived datatypes. Derived datatypes are defined using existing datatypes (primitive or not). XML Schemas allow us to define two type of elements: simpleTypes and complexTypes. For example a COURSE-ID can be defined as a complexType as in: This means COURSE-ID is a complex construct that includes a sequence of two other elements: COURSE-TYPE and COURSE-NUMBER. The sequence tag implies that the elements come in the order defined and without repetition. The ref attribute allows me to reference a type defined elsewhere. In this case, I will need to define a COURSE-TYPE and a COURSE-NUMBER datatype in the same Schema:
<element name="COURSE-TYPE"><simpleType><restriction base="string"> The element is a simple type defined based in the XML Schema primitive datatype string. I included some additional constraints (called facets in XML Schema language) using the length keyword. This definition means that I will allow just a string with a length of three characters. I used a primitive datatype string to define my simpleType. This primitive datatype is built-in to the XML Schema recommendation and includes for example string, Boolean, decimal, float, and double. Additionally a numeric datatype can be defined using a similar statement as in: Here I used another facet called totalDigits to constrain the numeric values. Also note that positiveInteger is a derived built-in datatype. Some examples of derived built-in datatypes are: normalizedString, integer, positiveInteger, and negativeInteger. Page 1 of 3 next page »
XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||