Industrial IoT Authors: Pat Romanski, William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz

Related Topics: Industrial IoT

Industrial IoT: Article

'There's Too Much Confusion!'

'There's Too Much Confusion!'

In the past year I've encountered much confusion about XML and the role of databases. If you're among the confused, don't feel bad. XML is the technology du jour and there's a steady stream of new information. Vendors are competing for mindshare, so their white papers and product literature add to the confusion about databases and XML servers. In addition, many developers simply don't understand databases. To help reduce the confusion, this article examines some database concepts and looks at solutions related to XML, data access, databases and data integration.

XML enables us to define vocabularies for a wide spectrum of applications. Most XML applications fall into one of two categories: publishing and data integration. For convenience, assume an application that sends content to a browser for a human audience is a publishing application. A software application that emits XML for not-for-human consumption is using XML for data integration. Neither type of application is immune to problems related to version control, authentication, authorization, searching and large volumes of data.

Managing and searching XML data is harder when content exceeds a small number of files. Version control for collaboration and security for applications having sensitive information are problems if you have multiple authors or users. Another issue is the need for sophisticated indexing and searching techniques. These are required to search a large collection of documents or to execute content-based queries. A content-based query can match text in an XML document, patterns in an image, a location in an area, or all three.

Data volume can also be a problem. Publishing projects deliver pages to a browser but they may need to search large data stores. The (U.S.) National Library of Medicine's MEDLINE databases include over 10 million bibliographic citations being converted to XML. Using an XML-enabled Web browser, you'll be able to search the world's largest medical literature database for information such as the citations shown in Figure 1.

Server-to-Server Messaging
Data integration is a server-to-server scenario instead of server-to-browser. It involves an exchange of XML as data streams intended for machine processing, not individual documents to be read by a person. Servers exchanging B2B messages are an example of data integration. They can, for example, use XML to transmit and confirm purchase orders. Other examples of XML for data integration include feeding metadata to a data warehouse, content syndication and exchanging information between disparate databases.

B2B applications often extract data from databases before sending an XML document, and store information in a database after receiving one. In addition, B2B messaging may require a database as a persistent store. This provides a safety net when an application deals with a large volume of messages or supports message queuing.

The XML world is not a narrow niche, and being an XML developer today is somewhat like being a handyman. Whether you use XML for publishing or data integration, you need a mix of skills and tools. Knowledge of databases and data access techniques can be a useful addition to your skillset. If your role is that of a system architect, you'll need other skills. The system architect should understand how XML relates to objects and components, integration servers, Java, portals and messaging middleware. That's too much to cover in a single article, so I'll focus on databases and data access.

Database Concepts
People who come to XML from publishing or designing Web sites might not have database development experience (design or programming). Because they don't understand how databases work, they don't know how to use a database for problems such as concurrency in collaborative environments. They amass a collection of XML files because they don't know how to store XML in a database.

To effectively use databases for XML projects, you need to understand certain fundamentals. First, a database management system (DBMS) provides a consistent means of accessing data, particularly data that's shared among applications and multiple users. Databases come in more than one flavor. They don't all have the same physical organization or the same logical model for accessing information. Since the 1970s, the industry has published standards for databases that conform to three different logical models - linked lists, sets and objects. Objects are data plus methods that operate on that data.

Databases aren't a passive container like an XML text file. They're flexible because you can store behavior that suits your application needs. You can store XML documents in their entirety, or decompose them. In either case you can embed logic and rules in a database to help produce XML applications having consistent behavior. (Uniform rules and consistent behavior are the rationale for creating XML schema repositories such as BizTalk.org and xml.org.)

Physical and Logical Models
Successive generations of database technology produced different physical and logical models for data. The nodes or tree structure of an XML document is a hierarchical data model that first appeared in DBMS products in the 1960s. Because this type of database dates back to the mainframe era, you can still find legacy applications using a hierarchical DBMS. Some developers see XML as an excellent solution for data integration with these legacy databases. If you don't have legacy data and are looking for off-the-shelf database technology, you can expect to use relational, object or object-relational databases.

As database models evolved, so did the languages for querying databases. SQL is a query language that's been an international standard since 1986. IBM originally developed SQL as a solution for working with relational data, but there have been several standards updates. SQL is a set-oriented language that provides the ability to treat data as tables, and more recently as objects.

SQL is nonprocedural. You submit SQL statements to a query processor, and it returns a result set containing rows matching your search criteria. Other names for the rows returned by a query are recordset and rowset. Many popular SQL products use a client/ server or distributed processing architecture. The SQL server executes queries sent to it by a client application or another server, such as an XML server. SQL servers enable you to embed stored procedures and other logic in databases that serve multiple clients. A stored procedure is a precompiled SQL script that executes at the server. Because stored procedures offer performance benefits, database vendors such as Microsoft encourage their use for serving up XML from databases.

Object-Relational Databases
The early versions of SQL enabled us to store and retrieve tables of numbers and characters. However, object-relational technology changed that. With an ORDBMS you can store and retrieve images, video, text and geospatial data. Because XML is structured text, an object-relational database can treat XML data as tables or objects, and store information about a document's structure.

Today IBM, Oracle and Informix offer object-relational DBMS (ORDBMS) products having an SQL interface. Their architecture is extensible so you can support a new type by adding a server plug-in. This is similar to the way you can use a Web browser plug-in to read Adobe Acrobat documents. Browser and SQL servers share another common trait. They include a Java library and embed the Java Virtual Machine. A browser uses the JVM to execute Java applets downloaded with Web pages, but an SQL server uses the JVM to execute Java classes embedded in databases. This gives developers the capability of writing Java classes for processing documents and storing the classes in the same database as XML data, DTDs and even Web pages.

APIs and Schemas
To exploit the combination of XML and databases, a developer needs to understand schemas and APIs (application programming interfaces). Programmers writing XML applications can use different APIs for parsing documents and different APIs for accessing databases. For designing documents and databases, they also use different schema languages.

Schemas are a formal means of describing data and defining rules about it. SQL includes a Data Definition Language (DDL) for specifying database schemas. XML developers can define documents using a Document Type Definition (DTD) or XML schema. Several XML initiatives have been related to schemas, including Resource Description Framework (RDF), Document Content Description (DCD), XML Data and XML Data-Reduced (XDR). The W3C XML Schema Working Group has also developed a two-part draft specification for a schema definition language (see "XML References" at the end of this article).

Part 1 explains structures, constraints, attribute groups, model groups, derived type definitions and validation. Part 2 describes how to specify data types used in XML schemas.

SQL schemas and XML schemas aren't divergent concepts. Both enable you to define custom types to encapsulate information and provide abstraction to simplify programming. All versions of SQL include the ability to define tables with a CREATE TABLE statement. Some versions provide statements for defining objects or object views. Oracle 8.x includes a CREATE TYPE statement to define new types, including object types. Example 1 is an SQL excerpt that shows creation of an Author type for an Oracle database. Example 2 shows what the counterpart XML Schema might be for defining an Author type; it is based on an XML Schema specification that's still a draft as I write this.

Example 1: SQL

authorID VARCHAR2 (5),
firstname VARCHAR2(40),
lastname VARCHAR2(40),
middlename VARCHAR2(30),
biography VARCHAR2(512));

Example 2: XML

<Schema targetNS="http://burns/books.xsd"

<element name = "Author">
<element name = "AuthorID" type="string"/>
<element name = "Firstname" type="string"/>
<element name = "Lastname" type="string"/>
<element name = "Middlename"
type="string" minOccurs="0"/ maxOccurs="1"/>
<element name = "Biography"
type="string" minOccurs="0"/ maxOccurs="1"/>

There's no one single solution for retrieving information from databases and delivering it as XML. Likewise, there are different solutions for parsing XML data before storing it in a database.

Parsing Documents with DOM and SAX
The W3C published a specification for the Document Object Model for XML documents. There are XML processors for Java and C++ that will parse a document and create in memory a version of its DOM objects. Companies and individuals have released XML parsers that are freely available on the Web for developers using Java, C++ and other languages. If you're developing XML applications that use SQL servers from IBM, Microsoft and Oracle, you'll find all three provide a DOM-compliant parser. Microsoft's parser exposes Component Object Model (COM) interfaces to programs and scripts. IBM offers parsers for Java and C++ and Oracle offers parsers for Java, C/C++ and PL/SQL.

DOM parsers consume more memory as document size increases. This makes DOM unsuitable for applications that require large XML data streams. The Simple API for XML (SAX) is an event-driven API. A SAX parser works through an XML stream and generates an alert for events such as the start of an element. SAX programs can run in less memory than a corresponding DOM program because they do not build an in-memory structure for mapping the document to objects. This makes SAX a better choice for Java classes stored in databases. Both APIs are undergoing revision so you'll be hearing about DOM2 and SAX2.

XML Data Bindings for Java Programs
Databases can enforce certain restrictions before inserting or changing data, but it's important to validate content before sending it to a database. The current generation of DOM and SAX parsers can use a DTD to validate a document's structure. At this writing the XML Schema specification is imminent, so many DOM and SAX programmers are looking to incorporate schemas in the near future. This will make it easier for a developer to write data validation routines based on using type, structure and constraint information.

For developers working with Java, Sun has been developing XML Data Binding for Java as an alternative to the SAX and DOM APIs. This specification will permit the creation of tools that read XML schemas and generate Java classes to access XML documents. The rules for document structure and content will be reflected in generated Java code for parsing and validating an XML document. SQL Data Access

We've seen that developers can use different APIs for processing XML data. Given the widespread adoption of SQL, it should be no surprise that there are multiple APIs for accessing SQL databases. Historically, database vendors developed proprietary APIs and then adopted multidatabase APIs as standards emerged. In 1995 the ISO modified the SQL-92 standard by adding a call-level interface, and Microsoft revised the ODBC (Open Database Connectivity) API to align with that standard. ODBC returns data as result sets that conform to a logical model of rows and columns. Shortly after the introduction of Java, Sun introduced the JDBC API. JDBC enables developers to use Java, SQL and advanced objects such as arrays and disconnected rowsets. ODBC and JDBC are ubiquitous. There are ODBC and JDBC drivers for a variety of databases (see online).

Microsoft has also developed an object-based data access API named ActiveX Data Objects (ADO). ADO can return XML query results as a recordset object and use flat files or streams to make a recordset persistent. ADO's XML persistence mechanism writes a schema section followed by a data section. At present the schema conforms to XDR, but Microsoft will eventually use the XML Schema standard. To support data hierarchies such as XML documents, ADO can operate with a hierarchical recordset. Applications or scripts can save recordset objects directly into DOM objects. Microsoft's Web server technology, Active Server Pages (ASP), also provides support for XML. A program or script can save an XML recordset directly to ASP response and request objects.

Different programming and scripting languages require different SQL APIs and XML parsers. A C++ programmer, for example, can use ODBC and an XML parser for C++ to write XML programs that access databases. A Java programmer uses a Java XML parser and JDBC. A developer writing Visual Basic code or VBScript uses the COM interfaces exposed by Microsoft's XML parser and ADO. Because of the variety of programming languages and data-access APIs, the large database companies maintain a multi-API strategy. They also offer tools for developers programming with C++, Java and scripting languages such as VBScript and JavaScript.

SQL vendors have developed tools to simplify the process of using XML and SQL databases. Oracle, for example, provides an XML Developer Kit that includes a program to read a DTD and generate Java classes for processing that type of document. Oracle also developed a Java servlet (XSQL) that can return query results as XML and transform the XML content to render an HTML page.

Informix developed an XML Mapper to enable developers to store and retrieve XML data in Informix databases. XML Mapper works with SQL tables and an XML document template to generate the code for the document's objects. For example, given two tables (Authors and Books) and an XML template containing author information, you can generate a Java class that includes functionality for storing and retrieving XML. The Java code generated by XML Mapper includes getXML and setXML methods. In this scenario the getXML method returns a string containing author data. The setXML method uses a filled-in XML template to insert or update data in the database. Informix XML Mapper can also generate a servlet that enables you to take database content and render it for a Web browser.

SQL Server XML Technology Preview
Microsoft is upgrading SQL Server's XML capabilities and, in late 1999, it released an XML Technology Preview for SQL Server. The preview includes a library (SQLXML.DLL) that enables a developer to use keywords in SQL queries to return results as XML. Version of SQLXML works with an ODBC connection to SQL Server. Future releases are likely to include OLE DB connections to Oracle and other databases. Figure 2 is an example of the XML returned by SQLXML when you query the Northwind database.

The Big Picture
This article discussed some aspects of the confusion about XML and databases. The tools and products discussed here aren't the whole picture of XML and data access. Besides the SQL database vendors, other companies have developed solutions for storing and serving XML documents. These include Bluestone Software, eXcelon (formerly Object Design), POET Software, and Software AG. Space doesn't permit me to discuss them in this article, but you can browse the URLs below for more information.

XML References 1. XML Schema Part 1: Structures www.w3.org/TR/1999/WD-xmlschema-1-19991217/
2. XML Schema Part 2: Datatypes www.w3.org/TR/1999/WD-xmlschema-2-19991217/
3. JDBC Drivers http://ourworld.compuserve.com/homepages/Ken_North/jdbcvend.htm
4. ODBC Drivers http://ourworld.compuserve.com/homepages/Ken_North/odbcvend.htm
5. Bluestone XML-Suite www.bluestone.com
6. Software AG www.softwareag.com
7. eXcelon www.odi.com
8. Informix Software www.informix.com
9. Oracle www.oracle.com
10. Microsoft msdn.microsoft.com/xml

More Stories By Ken North

Ken North is a consultant, software developer, author, speaker, industry analyst, and company founder. His software technology expertise has spanned the mainframe, minicomputer, microcomputer and Internet eras. Ken teaches Expert Series seminars and is the publisher of SQLSummit.com, WebServicesSummit.com and GridSummit.com. Ken was Contributing Editor for Internet Computing, Web Techniques and Dr. Dobb's Journal. He wrote the "Database Developer" column for Web Techniques and Dr. Dobb's Sourcebook and was XML and Web Services Editor for Dr. Dobbs Journal.

Ken has consulted, spoken at conferences and taught seminars in North America, South America, Asia and Europe. He's served as conference chair for LinkedData Planet, NextWare, and the XML DevCon conferences in Europe and North America. He has programmed conference content for Penton Media, Jupitermedia, SIGS, 101 Communications, Giga Information Group and Camelot Communications.

Prior to founding Resource Group, Inc. in 1981, Mr. North held management and software engineering positions with TRW and Computer Sciences Corporation. He founded Ken North Computing (www.kncomputing.com) in 1997.

Ken wrote Database Magic with Ken North (Prentice Hall) and Windows Multi-DBMS Programming (John Wiley & Sons). He developed APIBench, the SQL API benchmarking suite and contributed to Dr. Dobb's Database Development: Tools and Techniques (R&D Books). He was a technical reviewer of JDBC Database Access with Java (Addison-Wesley) and JDBC API Tutorial and Reference: Second Edition (Addison-Wesley). Ken's articles have appeared in dozens of publications including Dr. Dobb's Journal, Software Times, Database Trends and Applications, Intelligent Enterprise, SQL Server, DB2, Business Integration Journal, XML, XML-Journal, Web Techniques, The Data Administration Newsletter, SearchDatabase, Java Pro, Software Development, DBMS, Byte, PC Week, Windows NT, Network Computing, Windows NT Systems, Windows Tech Journal.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Here are the Top 20 Twitter Influencers of the month as determined by the Kcore algorithm, in a range of current topics of interest from #IoT to #DeepLearning. To run a real-time search of a given term in our website and see the current top influencers, click on the topic name. Among the top 20 IoT influencers, ThingsEXPO ranked #14 and CloudEXPO ranked #17.
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
According to Forrester Research, every business will become either a digital predator or digital prey by 2020. To avoid demise, organizations must rapidly create new sources of value in their end-to-end customer experiences. True digital predators also must break down information and process silos and extend digital transformation initiatives to empower employees with the digital resources needed to win, serve, and retain customers.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
Contextual Analytics of various threat data provides a deeper understanding of a given threat and enables identification of unknown threat vectors. In his session at @ThingsExpo, David Dufour, Head of Security Architecture, IoT, Webroot, Inc., discussed how through the use of Big Data analytics and deep data correlation across different threat types, it is possible to gain a better understanding of where, how and to what level of danger a malicious actor poses to an organization, and to determin...
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in ...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
LogRocket helps product teams develop better experiences for users by recording videos of user sessions with logs and network data. It identifies UX problems and reveals the root cause of every bug. LogRocket presents impactful errors on a website, and how to reproduce it. With LogRocket, users can replay problems.
Data Theorem is a leading provider of modern application security. Its core mission is to analyze and secure any modern application anytime, anywhere. The Data Theorem Analyzer Engine continuously scans APIs and mobile applications in search of security flaws and data privacy gaps. Data Theorem products help organizations build safer applications that maximize data security and brand protection. The company has detected more than 300 million application eavesdropping incidents and currently secu...
Rafay enables developers to automate the distribution, operations, cross-region scaling and lifecycle management of containerized microservices across public and private clouds, and service provider networks. Rafay's platform is built around foundational elements that together deliver an optimal abstraction layer across disparate infrastructure, making it easy for developers to scale and operate applications across any number of locations or regions. Consumed as a service, Rafay's platform elimi...