Feature
What Lies Beneath
Data(base) considerations for service-oriented architectures
Apr. 26, 2005 11:00 AM
How important are data considerations to service-oriented architectures (SOA) vending Web services? Consider the following definition of Web services from AMR Research: "Web services have been commonly defined as a standardized way of integrating applications using Extensible Markup Language (XML), Simple Object Access Protocol (SOAP), Web Services Definition Language (WSDL), and Universal Description, Discovery and Integration (UDDI) standards. XML is used to tag the data; SOAP is used to transfer the data; WSDL is used for describing the services available; and UDDI is used for listing what services are available." (All the emphasis is mine).
In other words, we use technologies such as UDDI, WSDL, SOAP, and XML to find, understand, package, and format Web services - but the services themselves are all about managing data and providing data-based behavior. Now, this is neither the time nor the place to get into a "process v/s data" war. The point is simply that in order for service-oriented architecture to deliver up to expectations, there must be serious consideration of data-related factors. What are these factors? From the perspective of data management, interestingly enough, these factors are essentially no different for Web services architecture than for other application architectures, such as a client/server application or a conventional Web-based transaction processing application. (Conventional and Web-based together? How the times have changed.) A note to the purists: for the purposes of this article, "service-oriented architecture" and "Web services" are used interchangeably - in truth, of course, any loosely coupled, layered architecture where one or more layers exposes its functionality in the form of callable services may be deemed as a "service-oriented architecture."
Data Integration
Perhaps the most important aspect of data management is the integration of various components and layers of data. The greatest threat to the success of SOA lies here: fail to integrate all your data, and the application may tend to get increasingly disservice-oriented. As our information systems move from being soloists to joining ensembles, chorus and symphony are the name of the game. It is no longer sufficient that an application or a component of a large architecture manages its data correctly - though let's face it, even this is often no ordinary task. In addition, the component data must stay coordinated and "march in step" with other components. In a research paper on IT Trends 2004, Philip Russom of Giga Research (a wholly owned subsidiary of Forrester Research), identified the following key trends in data integration. In the paper, Mr.
Russom discusses "data integration" from a more classical and thus somewhat different perspective than I have in mind; hence, although the bullet titles are borrowed from the paper, the ordering, interpretation, and corresponding descriptions are my own.
Convergence between data integration and application integration. No longer can data integration be viewed as a concern distinct from (and independent of) application integration. If we consider Web services architecture as a collection of semi-independent applications, then integrating these applications into a single cohesive architecture must pay attention to how the corresponding data repositories integrate, not just at the physical (database) level, but more important at the conceptual and transactional level.
Distributed architectures for data integration. Today, traditional (or classical) data integration has come to connote the bringing together of widely disparate and distributed data by various means such as data extraction-transformation-loading (ETL) and enterprise information integration (EII), which requires a distributed architectural approach (and curiously enough, may also use Web services!). On the other hand, while SOA may not quite be "distributed" in nature (or at least by definition), it is certainly not a conventional, centralized architecture: it falls, as a loosely coupled architecture, somewhere in the middle. From a data integration perspective, this loose coupling leads to considerations that are very similar to those in distributed data architectures: we would still have to deal with disparate data sources and data formats, and data integration is still de rigueur.
Real-time data integration. Perhaps the greatest data management challenge for service-oriented architectures is real-time data integration. Whereas conventional data integration - such as ETL - often contends with (and is content with) data integration with a degree of latency, the on-demand nature of Web services requires an on-demand integration of data. It is important to understand, however, that "on-demand" is not quite the same as "on-the-fly." On-demand (or real-time) data integration typically requires significant up-front planning, design, and development efforts in order for the promise to be realized. Once again, think symphony.
Virtual and federal data integration. If the problem is distributed data architectures and the need for real-time data integration, then consider using data virtualization techniques in your SOA. A "required competency" for service-oriented architectures that is reborn in the forceful new avatar of Enterprise Information Integration (EII), data virtualization is now available in the form of sophisticated, off-the-shelf solutions that you more or less plug into your SOA. Of course, this does not imply that the need for a planned approach for data integration is diminished in any way - in fact, quite the contrary.
About Rajan ChandrasRajan Chandras is a principal consultant with the New York offices of CSC Consulting (www.csc.com),. The article is written in his personal capacity and not on behalf of or representing CSC.