| By Edgardo Burin | Article Rating: |
|
| June 28, 2005 11:00 AM EDT | Reads: |
16,519 |
Integrating mainframe applications into an SOA often carries the burden of dealing with metadata in the form of Cobol Copybooks. This metadata converted to an XML Schema format can be useful for a range of applications (from validation to creation of services). This article explains how to automate the conversion from Copybooks to XML Schema using regular expression logic.
Cobol Copybooks 101
Mainframe metadata is usually defined using a subset of the Cobol language. Mainframe developers call these descriptions Copybooks. Cobol data definition is based on a hierarchical structure composed by two different types of items: Elementary Items and Group Items.
Elementary Item is the name Cobol assigns to a data item that is not further subdivided (analogous to variables in other languages). Elementary Items are composed of: a Level Number, a Data Name, and a Picture Clause. The Picture Clause (or PIC) allows us to declare the data format of the item.
In Cobol there are three basic data types: Alphanumeric (text strings), Numeric, and Alphabetic. Each of these formats is defined using a declaration sentence associated with a Picture Clause. The basic symbols used in the Picture Clause are: X for Alphanumeric, 9 for Numeric, and A for Alphabetic. The number of positions taken up by the data item is defined with a number inside parentheses, as in PIC X(10), which means an alphanumeric composed of 10 characters. There are more symbols and variants of declarations, but for the sake of simplicity I will restrict the explanation to these basic formats. For more details see the References section at the end of the article.
Group Items allow grouping a set of Elementary Items (or other Group Items) together. Group Items are composed of a Level Number and a Data Name, but don't contain a picture format. The Level Number creates a kind of hierarchical structure where one level groups all of the lower levels inside. The Level Number represents here the relationship that exists between different items in the definition.
For example, the following declaration:
represents a data definition composed of a Group Item called COURSES containing information about training courses. This group includes two items: the first is an Elementary Item called COURSE-NAME that is defined as a 20-positions alphanumeric field, and a Group Item called COURSE-ID. This group is composed of two Elementary Items: a three-character item called COURSE-TYPE and a five-position numeric item called COURSE-NUMBER. For a full description of the copybook see Listing 2.
01 COURSES.
02 COURSE-ID.
03 COURSE-TYPE PIC X(3).
03 COURSE-NUMBER PIC 9(5).
02 COURSE-NAME PIC X(20).
Usually Level Numbers between 1 and 49 are free to use without restrictions. Levels don't need to be contiguous between them (a 01 group item can group several 04, 03, and 02 items). Levels 66, 77, and 88 have some special meaning assigned.
Since the main purpose of this article is to present a technique to convert from Cobol data definition into XML Schema, I will restrict the Copybooks to these basic formats (Elementary Items and Group Items), not including other kind of data (like arrays). In case of need the reader can extend the model to include other formats.
XML Schema 101
Having taken a look at the basics of Cobol data definition I will now move to our target: defining data structures in XML Schema. XML Schema allows us to construct valid XML documents. Schemas are defined using a vocabulary that names data items and their constraints (data types for example). The relationship between items is also part of the schema definition.
As I said before, XML Schemas allow describing the valid structure of a related XML file. Then, XML Schemas can be considered a metadata definition "from an underlying information set," in the words of the W3C. The complete reference of XML Schema can be found in the W3C site (see the Reference section).
Elements are defined in the XML Schema with the element construct. Elements can be defined based on primitive datatypes or derived datatypes. Derived datatypes are defined using existing datatypes (primitive or not). XML Schemas allow us to define two type of elements: simpleTypes and complexTypes. For example a COURSE-ID can be defined as a complexType as in:
This means COURSE-ID is a complex construct that includes a sequence of two other elements: COURSE-TYPE and COURSE-NUMBER. The sequence tag implies that the elements come in the order defined and without repetition. The ref attribute allows me to reference a type defined elsewhere. In this case, I will need to define a COURSE-TYPE and a COURSE-NUMBER datatype in the same Schema:
<element name="COURSE-ID"><complexType><sequence>
<element ref="COURSE-TYPE"/>
<element ref="COURSE-NUMBER"/>
</sequence></complexType></element>
<element name="COURSE-TYPE"><simpleType><restriction base="string">
<length value="3"/><restriction></simpleType></element>
The element is a simple type defined based in the XML Schema primitive datatype string. I included some additional constraints (called facets in XML Schema language) using the length keyword. This definition means that I will allow just a string with a length of three characters. I used a primitive datatype string to define my simpleType. This primitive datatype is built-in to the XML Schema recommendation and includes for example string, Boolean, decimal, float, and double.
Additionally a numeric datatype can be defined using a similar statement as in:
Here I used another facet called totalDigits to constrain the numeric values. Also note that positiveInteger is a derived built-in datatype. Some examples of derived built-in datatypes are: normalizedString, integer, positiveInteger, and negativeInteger.
<element name="COURSE-NUMBER"><simpleType><restriction
base="positiveInteger">
<totalDigits value="4"/><restriction></simpleType></element>
Published June 28, 2005 Reads 16,519
Copyright © 2005 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Edgardo Burin
Edgardo Burin works for ING Canada as a solution architect in integration projects using webMethods. He works in different projects integrating mainframe transactions, MQ services, and Oracle databases using webMethods. He has more than 10 years of experience managing infrastructure. His areas of expertise are in Oracle databases, integration, and service-oriented architecture.
![]() |
Carlos Magno 02/13/08 08:25:44 AM EST | |||
Thank you, fantastic, i was search about Mainframe and xml, but i need know about how to write cobol screen on the jsp language by running browser. |
||||
![]() |
Peter Prager 08/18/05 09:14:50 PM EDT | |||
COBOL copybooks and other COBOL and C structures can be easily converted with *XML Thunder* from Canam Software. The logic to consume or create XML in COBOL or C languages can also be generated. |
||||
![]() |
Caroline Williams 07/23/05 11:54:38 PM EDT | |||
The last page of this article mentions "Listing 1". There is no link to "Listing 1". I would like to see the java program. Thanks for the excellent article. |
||||
![]() |
Binoy Bastin 07/08/05 10:49:45 AM EDT | |||
You only talked about java .Is it possible to do the same thing using .net ,also how do you writeback xml to mainframe in this way |
||||
![]() |
Edgardo Burin 06/28/05 12:39:25 PM EDT | |||
Defining Mainframe Transaction's Signature with an XML Schema; How To Convert Cobol Metadata. Integrating mainframe applications into an SOA often carries the burden of dealing with metadata in the form of Cobol Copybooks. This metadata converted to an XML Schema format can be useful for a range of applications (from validation to creation of services). This article explains how to automate the conversion from Copybooks to XML Schema using regular expression logic. |
||||
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Will PR Firms Survive The New Media Avalanche?
- Publishing Synergy: Blog, Twitter and Ulitzer
- Typhoon Ondoy (Ketsana) Hits the Philippines (Part 2)
- Combining the Cloud with the Computing: Application Delivery Networks
- SOA World Magazine’s 8th Annual "Readers' Choice Awards" Nominations Open
- Confessions of a Ulitzer Addict
- My Thoughts on Ulitzer
- Ulitzer vs. Ning
- Orchestration in the Cloud to Manage Lower Operational Costs
- AJAX World RIA Conference & Expo Kicks Off in New York City
- Sun Federal's Dr Harry Foxwell to Speak at 1st Annual GovIT Expo
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Ted Weissman and Lois Paul & Partners PR Firm
- Will PR Firms Survive The New Media Avalanche?
- Publishing Synergy: Blog, Twitter and Ulitzer
- Improving the Efficiency of SOA-Based Applications
- Typhoon Ondoy (Ketsana) Hits the Philippines (Part 2)
- SOA, BPM, CEP: Getting IT Budget in a Tight Economy
- Combining the Cloud with the Computing: Application Delivery Networks
- Where Are RIA Technologies Headed in 2008?
- AJAX World RIA Conference & Expo Kicks Off in New York City
- JSON vs XML - A Jason vs Freddie Sequel
- Processing XML with C# and .NET
- Has the Technology Bounceback Begun?
- BPEL Processes and Human Workflow
- Open Source Database Special Feature: An Introduction to Berkeley DB XML
- "HP's Problem Ain't the SAP Install," Says Sun's Schwartz
- eXist - An Introduction To Open Source Native XML Database
- Digitizing the Planet: Google Earth vs MSN Virtual Earth vs MapQuest



































