|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS Feature Defining Mainframe Transaction's Signature with an XML Schema; How To Convert Cobol Metadata
Converting Cobol metadata into an XML Schema using regular expressions processing
By: Edgardo Burin
Jun. 28, 2005 11:00 AM
In a nutshell, we can define the XML Schema using primitive data types and derived data types defined using primitive or other derived data types. The primitive data types can be of any of the standard formats (for our application we will use just string and integer). Simple datatypes are declared with the <simpleType> element and include the following basic attributes: name, base type, and they can contain a valid constraining facet. Complex datatypes are declared with the <complexType> element and they are defined by extension or restriction based on other datatypes. Instead of referring a datatype defined in another portion of the same schema, derived data types can also nest datatype definitions, one inside the other as in: Even when this kind of nested definition is less clear than the ones that use references, it will be useful for automating the generation of the XML Schema from the copybook as we will see soon. For a full description of the XML Schema obtained from the Cobol copybook see Listing 3.
Regular Expressions 101 Regular expressions, called also regex, are used in several UNIX utilities and languages (Perl, awk, etc.). Regex allows us to locate a specific pattern or a particular sequence of characters in a string. This combination of characters is defined using a rather powerful syntax. Regular expressions are built around the use of special characters that are matched against the actual string. These special characters allow us to create a template against which each portion of the compared text is matched and processed in a certain mode. For example, the regular expression ^.PIC * will match a string starting (^.) with just one character followed by the string "PIC" and followed by 0 or more blanks (will match APIC, BPIC__, but will not match CCPIC - two characters before PIC- or PIC - no character before PIC-). As seen in this example, special characters play an essential role in regex definitions. The Table 1 introduces the most common special characters used in regex. Even when this is a very basic list of special characters it will suffice for our project. For a more extended information about regular expressions see the reference section.
The Project As said before, Cobol organizes the metadata in levels. To produce an XML Schema representation I will convert any level not including a PIC clause (that is any level that doesn't define a basic field) in a complexType. As one level usually includes other levels nested inside, I will nest the complexType definitions to mimic the Cobol definition, using the syntax seen in the XML Schema section. The corollary of this rule is that any definition including a PIC clause will be considered a simpleType. We will use the length as a restriction in the definition of the field. The Cobol example seen in the first paragraph: can be translated then, as a complexType called COURSES that is composed of one complexType COURSE-ID and a simpleType COURSE-NAME. COURSE-ID is composed, in turn, of two simpleType fields: COURSE-TYPE and COURSE-NUMBER. So with these two simple rules I can try to produce the schema. Now I will explain the tool we will use to achieve this objective.
The Program The program uses a set of classes included in Jakarta (mainly under org.apache.oro.text). These classes give us the basic functionality to search based on regular expressions:
import org.apache.oro.text.awk.*; The regex functionality is provided by the three classes: Pattern, AwkMatcher, and AwkCompile. AwkCompile allows compiling a regex as in:
Pattern pattern = compiler.compile("(\\sPIC)|(\\sVALUE)| The compiled pattern can be used afterwards to match against a string (contained here in an irecord variable) using an AwkMatcher object: matcher.contains(irecord,pattern) YOUR FEEDBACK
XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||