Industrial IoT Authors: William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz, SmartBear Blog

Related Topics: Industrial IoT

Industrial IoT: Article

'There's Too Much Confusion!'

'There's Too Much Confusion!'

In the past year I've encountered much confusion about XML and the role of databases. If you're among the confused, don't feel bad. XML is the technology du jour and there's a steady stream of new information. Vendors are competing for mindshare, so their white papers and product literature add to the confusion about databases and XML servers. In addition, many developers simply don't understand databases. To help reduce the confusion, this article examines some database concepts and looks at solutions related to XML, data access, databases and data integration.

XML enables us to define vocabularies for a wide spectrum of applications. Most XML applications fall into one of two categories: publishing and data integration. For convenience, assume an application that sends content to a browser for a human audience is a publishing application. A software application that emits XML for not-for-human consumption is using XML for data integration. Neither type of application is immune to problems related to version control, authentication, authorization, searching and large volumes of data.

Managing and searching XML data is harder when content exceeds a small number of files. Version control for collaboration and security for applications having sensitive information are problems if you have multiple authors or users. Another issue is the need for sophisticated indexing and searching techniques. These are required to search a large collection of documents or to execute content-based queries. A content-based query can match text in an XML document, patterns in an image, a location in an area, or all three.

Data volume can also be a problem. Publishing projects deliver pages to a browser but they may need to search large data stores. The (U.S.) National Library of Medicine's MEDLINE databases include over 10 million bibliographic citations being converted to XML. Using an XML-enabled Web browser, you'll be able to search the world's largest medical literature database for information such as the citations shown in Figure 1.

Server-to-Server Messaging
Data integration is a server-to-server scenario instead of server-to-browser. It involves an exchange of XML as data streams intended for machine processing, not individual documents to be read by a person. Servers exchanging B2B messages are an example of data integration. They can, for example, use XML to transmit and confirm purchase orders. Other examples of XML for data integration include feeding metadata to a data warehouse, content syndication and exchanging information between disparate databases.

B2B applications often extract data from databases before sending an XML document, and store information in a database after receiving one. In addition, B2B messaging may require a database as a persistent store. This provides a safety net when an application deals with a large volume of messages or supports message queuing.

The XML world is not a narrow niche, and being an XML developer today is somewhat like being a handyman. Whether you use XML for publishing or data integration, you need a mix of skills and tools. Knowledge of databases and data access techniques can be a useful addition to your skillset. If your role is that of a system architect, you'll need other skills. The system architect should understand how XML relates to objects and components, integration servers, Java, portals and messaging middleware. That's too much to cover in a single article, so I'll focus on databases and data access.

Database Concepts
People who come to XML from publishing or designing Web sites might not have database development experience (design or programming). Because they don't understand how databases work, they don't know how to use a database for problems such as concurrency in collaborative environments. They amass a collection of XML files because they don't know how to store XML in a database.

To effectively use databases for XML projects, you need to understand certain fundamentals. First, a database management system (DBMS) provides a consistent means of accessing data, particularly data that's shared among applications and multiple users. Databases come in more than one flavor. They don't all have the same physical organization or the same logical model for accessing information. Since the 1970s, the industry has published standards for databases that conform to three different logical models - linked lists, sets and objects. Objects are data plus methods that operate on that data.

Databases aren't a passive container like an XML text file. They're flexible because you can store behavior that suits your application needs. You can store XML documents in their entirety, or decompose them. In either case you can embed logic and rules in a database to help produce XML applications having consistent behavior. (Uniform rules and consistent behavior are the rationale for creating XML schema repositories such as BizTalk.org and xml.org.)

Physical and Logical Models
Successive generations of database technology produced different physical and logical models for data. The nodes or tree structure of an XML document is a hierarchical data model that first appeared in DBMS products in the 1960s. Because this type of database dates back to the mainframe era, you can still find legacy applications using a hierarchical DBMS. Some developers see XML as an excellent solution for data integration with these legacy databases. If you don't have legacy data and are looking for off-the-shelf database technology, you can expect to use relational, object or object-relational databases.

As database models evolved, so did the languages for querying databases. SQL is a query language that's been an international standard since 1986. IBM originally developed SQL as a solution for working with relational data, but there have been several standards updates. SQL is a set-oriented language that provides the ability to treat data as tables, and more recently as objects.

SQL is nonprocedural. You submit SQL statements to a query processor, and it returns a result set containing rows matching your search criteria. Other names for the rows returned by a query are recordset and rowset. Many popular SQL products use a client/ server or distributed processing architecture. The SQL server executes queries sent to it by a client application or another server, such as an XML server. SQL servers enable you to embed stored procedures and other logic in databases that serve multiple clients. A stored procedure is a precompiled SQL script that executes at the server. Because stored procedures offer performance benefits, database vendors such as Microsoft encourage their use for serving up XML from databases.

Object-Relational Databases
The early versions of SQL enabled us to store and retrieve tables of numbers and characters. However, object-relational technology changed that. With an ORDBMS you can store and retrieve images, video, text and geospatial data. Because XML is structured text, an object-relational database can treat XML data as tables or objects, and store information about a document's structure.

Today IBM, Oracle and Informix offer object-relational DBMS (ORDBMS) products having an SQL interface. Their architecture is extensible so you can support a new type by adding a server plug-in. This is similar to the way you can use a Web browser plug-in to read Adobe Acrobat documents. Browser and SQL servers share another common trait. They include a Java library and embed the Java Virtual Machine. A browser uses the JVM to execute Java applets downloaded with Web pages, but an SQL server uses the JVM to execute Java classes embedded in databases. This gives developers the capability of writing Java classes for processing documents and storing the classes in the same database as XML data, DTDs and even Web pages.

APIs and Schemas
To exploit the combination of XML and databases, a developer needs to understand schemas and APIs (application programming interfaces). Programmers writing XML applications can use different APIs for parsing documents and different APIs for accessing databases. For designing documents and databases, they also use different schema languages.

Schemas are a formal means of describing data and defining rules about it. SQL includes a Data Definition Language (DDL) for specifying database schemas. XML developers can define documents using a Document Type Definition (DTD) or XML schema. Several XML initiatives have been related to schemas, including Resource Description Framework (RDF), Document Content Description (DCD), XML Data and XML Data-Reduced (XDR). The W3C XML Schema Working Group has also developed a two-part draft specification for a schema definition language (see "XML References" at the end of this article).

Part 1 explains structures, constraints, attribute groups, model groups, derived type definitions and validation. Part 2 describes how to specify data types used in XML schemas.

SQL schemas and XML schemas aren't divergent concepts. Both enable you to define custom types to encapsulate information and provide abstraction to simplify programming. All versions of SQL include the ability to define tables with a CREATE TABLE statement. Some versions provide statements for defining objects or object views. Oracle 8.x includes a CREATE TYPE statement to define new types, including object types. Example 1 is an SQL excerpt that shows creation of an Author type for an Oracle database. Example 2 shows what the counterpart XML Schema might be for defining an Author type; it is based on an XML Schema specification that's still a draft as I write this.

Example 1: SQL

authorID VARCHAR2 (5),
firstname VARCHAR2(40),
lastname VARCHAR2(40),
middlename VARCHAR2(30),
biography VARCHAR2(512));

Example 2: XML

<Schema targetNS="http://burns/books.xsd"

<element name = "Author">
<element name = "AuthorID" type="string"/>
<element name = "Firstname" type="string"/>
<element name = "Lastname" type="string"/>
<element name = "Middlename"
type="string" minOccurs="0"/ maxOccurs="1"/>
<element name = "Biography"
type="string" minOccurs="0"/ maxOccurs="1"/>

There's no one single solution for retrieving information from databases and delivering it as XML. Likewise, there are different solutions for parsing XML data before storing it in a database.

Parsing Documents with DOM and SAX
The W3C published a specification for the Document Object Model for XML documents. There are XML processors for Java and C++ that will parse a document and create in memory a version of its DOM objects. Companies and individuals have released XML parsers that are freely available on the Web for developers using Java, C++ and other languages. If you're developing XML applications that use SQL servers from IBM, Microsoft and Oracle, you'll find all three provide a DOM-compliant parser. Microsoft's parser exposes Component Object Model (COM) interfaces to programs and scripts. IBM offers parsers for Java and C++ and Oracle offers parsers for Java, C/C++ and PL/SQL.

DOM parsers consume more memory as document size increases. This makes DOM unsuitable for applications that require large XML data streams. The Simple API for XML (SAX) is an event-driven API. A SAX parser works through an XML stream and generates an alert for events such as the start of an element. SAX programs can run in less memory than a corresponding DOM program because they do not build an in-memory structure for mapping the document to objects. This makes SAX a better choice for Java classes stored in databases. Both APIs are undergoing revision so you'll be hearing about DOM2 and SAX2.

XML Data Bindings for Java Programs
Databases can enforce certain restrictions before inserting or changing data, but it's important to validate content before sending it to a database. The current generation of DOM and SAX parsers can use a DTD to validate a document's structure. At this writing the XML Schema specification is imminent, so many DOM and SAX programmers are looking to incorporate schemas in the near future. This will make it easier for a developer to write data validation routines based on using type, structure and constraint information.

For developers working with Java, Sun has been developing XML Data Binding for Java as an alternative to the SAX and DOM APIs. This specification will permit the creation of tools that read XML schemas and generate Java classes to access XML documents. The rules for document structure and content will be reflected in generated Java code for parsing and validating an XML document. SQL Data Access

We've seen that developers can use different APIs for processing XML data. Given the widespread adoption of SQL, it should be no surprise that there are multiple APIs for accessing SQL databases. Historically, database vendors developed proprietary APIs and then adopted multidatabase APIs as standards emerged. In 1995 the ISO modified the SQL-92 standard by adding a call-level interface, and Microsoft revised the ODBC (Open Database Connectivity) API to align with that standard. ODBC returns data as result sets that conform to a logical model of rows and columns. Shortly after the introduction of Java, Sun introduced the JDBC API. JDBC enables developers to use Java, SQL and advanced objects such as arrays and disconnected rowsets. ODBC and JDBC are ubiquitous. There are ODBC and JDBC drivers for a variety of databases (see online).

Microsoft has also developed an object-based data access API named ActiveX Data Objects (ADO). ADO can return XML query results as a recordset object and use flat files or streams to make a recordset persistent. ADO's XML persistence mechanism writes a schema section followed by a data section. At present the schema conforms to XDR, but Microsoft will eventually use the XML Schema standard. To support data hierarchies such as XML documents, ADO can operate with a hierarchical recordset. Applications or scripts can save recordset objects directly into DOM objects. Microsoft's Web server technology, Active Server Pages (ASP), also provides support for XML. A program or script can save an XML recordset directly to ASP response and request objects.

Different programming and scripting languages require different SQL APIs and XML parsers. A C++ programmer, for example, can use ODBC and an XML parser for C++ to write XML programs that access databases. A Java programmer uses a Java XML parser and JDBC. A developer writing Visual Basic code or VBScript uses the COM interfaces exposed by Microsoft's XML parser and ADO. Because of the variety of programming languages and data-access APIs, the large database companies maintain a multi-API strategy. They also offer tools for developers programming with C++, Java and scripting languages such as VBScript and JavaScript.

SQL vendors have developed tools to simplify the process of using XML and SQL databases. Oracle, for example, provides an XML Developer Kit that includes a program to read a DTD and generate Java classes for processing that type of document. Oracle also developed a Java servlet (XSQL) that can return query results as XML and transform the XML content to render an HTML page.

Informix developed an XML Mapper to enable developers to store and retrieve XML data in Informix databases. XML Mapper works with SQL tables and an XML document template to generate the code for the document's objects. For example, given two tables (Authors and Books) and an XML template containing author information, you can generate a Java class that includes functionality for storing and retrieving XML. The Java code generated by XML Mapper includes getXML and setXML methods. In this scenario the getXML method returns a string containing author data. The setXML method uses a filled-in XML template to insert or update data in the database. Informix XML Mapper can also generate a servlet that enables you to take database content and render it for a Web browser.

SQL Server XML Technology Preview
Microsoft is upgrading SQL Server's XML capabilities and, in late 1999, it released an XML Technology Preview for SQL Server. The preview includes a library (SQLXML.DLL) that enables a developer to use keywords in SQL queries to return results as XML. Version of SQLXML works with an ODBC connection to SQL Server. Future releases are likely to include OLE DB connections to Oracle and other databases. Figure 2 is an example of the XML returned by SQLXML when you query the Northwind database.

The Big Picture
This article discussed some aspects of the confusion about XML and databases. The tools and products discussed here aren't the whole picture of XML and data access. Besides the SQL database vendors, other companies have developed solutions for storing and serving XML documents. These include Bluestone Software, eXcelon (formerly Object Design), POET Software, and Software AG. Space doesn't permit me to discuss them in this article, but you can browse the URLs below for more information.

XML References 1. XML Schema Part 1: Structures www.w3.org/TR/1999/WD-xmlschema-1-19991217/
2. XML Schema Part 2: Datatypes www.w3.org/TR/1999/WD-xmlschema-2-19991217/
3. JDBC Drivers http://ourworld.compuserve.com/homepages/Ken_North/jdbcvend.htm
4. ODBC Drivers http://ourworld.compuserve.com/homepages/Ken_North/odbcvend.htm
5. Bluestone XML-Suite www.bluestone.com
6. Software AG www.softwareag.com
7. eXcelon www.odi.com
8. Informix Software www.informix.com
9. Oracle www.oracle.com
10. Microsoft msdn.microsoft.com/xml

More Stories By Ken North

Ken North is a consultant, software developer, author, speaker, industry analyst, and company founder. His software technology expertise has spanned the mainframe, minicomputer, microcomputer and Internet eras. Ken teaches Expert Series seminars and is the publisher of SQLSummit.com, WebServicesSummit.com and GridSummit.com. Ken was Contributing Editor for Internet Computing, Web Techniques and Dr. Dobb's Journal. He wrote the "Database Developer" column for Web Techniques and Dr. Dobb's Sourcebook and was XML and Web Services Editor for Dr. Dobbs Journal.

Ken has consulted, spoken at conferences and taught seminars in North America, South America, Asia and Europe. He's served as conference chair for LinkedData Planet, NextWare, and the XML DevCon conferences in Europe and North America. He has programmed conference content for Penton Media, Jupitermedia, SIGS, 101 Communications, Giga Information Group and Camelot Communications.

Prior to founding Resource Group, Inc. in 1981, Mr. North held management and software engineering positions with TRW and Computer Sciences Corporation. He founded Ken North Computing (www.kncomputing.com) in 1997.

Ken wrote Database Magic with Ken North (Prentice Hall) and Windows Multi-DBMS Programming (John Wiley & Sons). He developed APIBench, the SQL API benchmarking suite and contributed to Dr. Dobb's Database Development: Tools and Techniques (R&D Books). He was a technical reviewer of JDBC Database Access with Java (Addison-Wesley) and JDBC API Tutorial and Reference: Second Edition (Addison-Wesley). Ken's articles have appeared in dozens of publications including Dr. Dobb's Journal, Software Times, Database Trends and Applications, Intelligent Enterprise, SQL Server, DB2, Business Integration Journal, XML, XML-Journal, Web Techniques, The Data Administration Newsletter, SearchDatabase, Java Pro, Software Development, DBMS, Byte, PC Week, Windows NT, Network Computing, Windows NT Systems, Windows Tech Journal.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...