Welcome!

Industrial IoT Authors: Pat Romanski, William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz

Related Topics: Industrial IoT

Industrial IoT: Article

Defining Mainframe Transaction's Signature with an XML Schema; How To Convert Cobol Metadata

Converting Cobol metadata into an XML Schema using regular expressions processing

Integrating mainframe applications into an SOA often carries the burden of dealing with metadata in the form of Cobol Copybooks. This metadata converted to an XML Schema format can be useful for a range of applications (from validation to creation of services). This article explains how to automate the conversion from Copybooks to XML Schema using regular expression logic.

Cobol Copybooks 101
Mainframe metadata is usually defined using a subset of the Cobol language. Mainframe developers call these descriptions Copybooks. Cobol data definition is based on a hierarchical structure composed by two different types of items: Elementary Items and Group Items.

Elementary Item is the name Cobol assigns to a data item that is not further subdivided (analogous to variables in other languages). Elementary Items are composed of: a Level Number, a Data Name, and a Picture Clause. The Picture Clause (or PIC) allows us to declare the data format of the item.

In Cobol there are three basic data types: Alphanumeric (text strings), Numeric, and Alphabetic. Each of these formats is defined using a declaration sentence associated with a Picture Clause. The basic symbols used in the Picture Clause are: X for Alphanumeric, 9 for Numeric, and A for Alphabetic. The number of positions taken up by the data item is defined with a number inside parentheses, as in PIC X(10), which means an alphanumeric composed of 10 characters. There are more symbols and variants of declarations, but for the sake of simplicity I will restrict the explanation to these basic formats. For more details see the References section at the end of the article.

Group Items allow grouping a set of Elementary Items (or other Group Items) together. Group Items are composed of a Level Number and a Data Name, but don't contain a picture format. The Level Number creates a kind of hierarchical structure where one level groups all of the lower levels inside. The Level Number represents here the relationship that exists between different items in the definition.

For example, the following declaration:


01 COURSES.
02 COURSE-ID.
03 COURSE-TYPE PIC X(3).
03 COURSE-NUMBER PIC 9(5).
02 COURSE-NAME PIC X(20).
represents a data definition composed of a Group Item called COURSES containing information about training courses. This group includes two items: the first is an Elementary Item called COURSE-NAME that is defined as a 20-positions alphanumeric field, and a Group Item called COURSE-ID. This group is composed of two Elementary Items: a three-character item called COURSE-TYPE and a five-position numeric item called COURSE-NUMBER. For a full description of the copybook see Listing 2.

Usually Level Numbers between 1 and 49 are free to use without restrictions. Levels don't need to be contiguous between them (a 01 group item can group several 04, 03, and 02 items). Levels 66, 77, and 88 have some special meaning assigned.

Since the main purpose of this article is to present a technique to convert from Cobol data definition into XML Schema, I will restrict the Copybooks to these basic formats (Elementary Items and Group Items), not including other kind of data (like arrays). In case of need the reader can extend the model to include other formats.

XML Schema 101
Having taken a look at the basics of Cobol data definition I will now move to our target: defining data structures in XML Schema. XML Schema allows us to construct valid XML documents. Schemas are defined using a vocabulary that names data items and their constraints (data types for example). The relationship between items is also part of the schema definition.

As I said before, XML Schemas allow describing the valid structure of a related XML file. Then, XML Schemas can be considered a metadata definition "from an underlying information set," in the words of the W3C. The complete reference of XML Schema can be found in the W3C site (see the Reference section).

Elements are defined in the XML Schema with the element construct. Elements can be defined based on primitive datatypes or derived datatypes. Derived datatypes are defined using existing datatypes (primitive or not). XML Schemas allow us to define two type of elements: simpleTypes and complexTypes. For example a COURSE-ID can be defined as a complexType as in:


<element name="COURSE-ID"><complexType><sequence>
<element ref="COURSE-TYPE"/>
<element ref="COURSE-NUMBER"/>
</sequence></complexType></element>
This means COURSE-ID is a complex construct that includes a sequence of two other elements: COURSE-TYPE and COURSE-NUMBER. The sequence tag implies that the elements come in the order defined and without repetition. The ref attribute allows me to reference a type defined elsewhere. In this case, I will need to define a COURSE-TYPE and a COURSE-NUMBER datatype in the same Schema:

<element name="COURSE-TYPE"><simpleType><restriction base="string">
<length value="3"/><restriction></simpleType></element>

The element is a simple type defined based in the XML Schema primitive datatype string. I included some additional constraints (called facets in XML Schema language) using the length keyword. This definition means that I will allow just a string with a length of three characters. I used a primitive datatype string to define my simpleType. This primitive datatype is built-in to the XML Schema recommendation and includes for example string, Boolean, decimal, float, and double.

Additionally a numeric datatype can be defined using a similar statement as in:


<element name="COURSE-NUMBER"><simpleType><restriction
base="positiveInteger">
<totalDigits value="4"/><restriction></simpleType></element>
Here I used another facet called totalDigits to constrain the numeric values. Also note that positiveInteger is a derived built-in datatype. Some examples of derived built-in datatypes are: normalizedString, integer, positiveInteger, and negativeInteger.

More Stories By Edgardo Burin

Edgardo Burin works for ING Canada as a solution architect in integration projects using webMethods. He works in different projects integrating mainframe transactions, MQ services, and Oracle databases using webMethods. He has more than 10 years of experience managing infrastructure. His areas of expertise are in Oracle databases, integration, and service-oriented architecture.

Comments (5)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...