Industrial IoT Authors: William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz, SmartBear Blog

Related Topics: Industrial IoT

Industrial IoT: Article

Web-Services Transactions

From Loosely Coupled - The Missing Pieces of Web Services

Most non-programmers think of transactions as associated with buying and selling, credit-card authorizations, and the like. But in the jargon of computer science, the word transaction has a very specific meaning: the interaction and managed outcome of a well-defined set of tasks. If that definition still sounds rather vague or abstract, it's because the scope of what's considered a transaction has expanded over the past two decades, and the older simpler definitions are no longer adequate. Computer systems have been connected via networks, and applications are more distributed in nature. The theories and practices of transactions have been repeatedly stretched to their limits, re-evaluated, and extended. Now, because of web services, we're once again expanding that definition to include long-lived loosely coupled asynchronous transactions.


Transaction Basics
Most database operations are simple, and thus don't qualify as transactions per se. For example, when a customer-service application wants to look up a customer's phone number, the application sends a query message to the database. It's a read-only operation that involves only one record in a single database. But most importantly, it's a one-step (atomic) operation that doesn't interact or conflict with other applications that may be interacting with the same record or even the same database.

More complex database operations require multiple steps that must all be completed for the operation to succeed. We refer to these operations as transactions. The traditional definition of a transaction is a single unit of work composed of two or more tasks. If any of these component tasks cannot be completed, the entire transaction fails, leaving the data in the state it was in before the transaction was initiated. In other words, a transaction is a collection of tasks that either all succeed, or all fail. Achieving this consistent termination of a unit of work is the goal of a traditional transaction-processing monitor (TP monitor) which is software that manages lower-level database operations.

An example of a simple transaction is a transfer of funds from one account to another within the same bank. The transaction's unit of work consists of two tasks: the debiting from one account, and the crediting to another. Ideally, both tasks will execute properly (commit), but even more important is that if one task can't be accomplished, neither will be executed (i.e., they'll both abort). It's okay if the matching credit and debit both fail—the application initiating the transaction can always try again. But it's a serious problem if the credit is executed without the associated debit, or vice versa.

As the results of their theoretical studies of transactions, Theo Häerder and Andreas Reuter published a 1983 paper, "Principles of Transaction-Orientated Database Recovery," in which they presented the requirements for systems that could process multiple-task units of work (transactions), and would not be corrupted by hardware, database, or operating-system failures. The paper is most famous for its specification of the principles of Atomicity, Consistency, Isolation, and Durability (ACID). A system that conforms to these so-called ACID properties guarantees the reliability of its transactions.

Two-Phase Commit
When all of the data involved in a transaction resides on a single database, only one TM is required to maintain atomicity. But applications and databases are increasingly distributed, such as those linked by web services. The challenge for web services is to maintain atomicity by guaranteeing the mutual success and durability of all of the elements of such a distributed transaction, so named because it involves a distributed unit of work. In other words, multiple steps are required that involve two or more databases.

The traditional method for handling distributed transactions is known as the two-phase commit, which, as its name implies, breaks transactions into two cooperating phases. The two-phase commit protocol is illustrated in Figure 1.


The two-phase commit process assures the atomicity of the distributed transaction. It's clean and simple—except when things go wrong. Due to hardware, software, or communications failures, it's possible that one or more messages may be lost, resulting in an uncertain state for one or more of the resource managers. As it turns out, however, only the loss of a commit message can cause a serious problem. Losses of other message types are less critical. If a resource manager fails to get the request-to-prepare message, it will simply fail to respond. The controller will give up waiting for the resource manager's response and send out an abort message. The other resource managers will not have committed any of their changes. The same occurs if one or more of the response messages is lost. And if a done message is lost, no action need be taken, since all of the resource coordinators will have committed the transaction.

The most serious problem occurs when a resource manager prepares for the transaction but never receives either a commit or an abort message from the transaction coordinator. Once a resource manager has sent its prepared response, it's in limbo. It can't commit the transaction, and it can't release any resources locked on behalf of the transaction. (Resource locks are under the control of the individual resource managers, not the transaction controller.)

In fact, there's no simple solution to this problem. No two-phase commit protocol can protect against all failures. The possibility will always exist that a communications failure can cause a resource manager to become blocked, or unable to commit or abort. Still, even with its limitations, the two-phase commit protocol remains the mainstay of distributed transactions.

The Web-Services Challenges
The ACID model has been the focus of transaction technologies for twenty years. It's widely used for both local and—via the two-phase commit protocol—distributed transaction systems. But as valuable as the ACID model has proven to be for tightly coupled distributed systems, it falls short for long-lived, loosely coupled asynchronous transactions.

Long-lived transactions
Web services are far more complex in terms of time and space than the transactions for which the ACID concepts were developed. Whereas ACID-based transactions may span many seconds or even a few minutes, loosely coupled web-services transactions may extend over hours or even days. Considerable time can elapse between the preparation and commit phases. Using ACID-style transactions in such long-running business processes would mean that participating resources could be locked and unavailable for extended periods of time—which is unacceptable to many local applications that use the same databases and pend until the resources they require are released.

ACID-style transactions are designed to cope with failures in hardware, software, and communications, but only in otherwise reliable environments where such failures occur relatively infrequently. Most ACID-style distributed transactions systems are based on synchronous, connection-oriented protocols, which maintain communications paths between transaction coordinators and the participating resource managers for at least the duration of the transaction. These synchronous protocols assist in handling such errors by signaling the transaction-coordinator or resource-manager software when a communication failure occurs, so that the coordinator or resource manager knows it can no longer communicate with the service at the other end of the connection. When a communications link fails, all synchronous transactions that depend on that link are promptly aborted.

Short-term communications failures are therefore fatal errors for tightly coupled synchronous transactions, but they must be routinely handled by the systems that support long-lived, loosely coupled asynchronous transactions. The latter are based on a reliable-messaging infrastructure that delivers messages with a high degree of assuredness, even in cases where the recipient and the intervening infrastructure may be down for extended periods of time.

Because the resource locks typically used with ACID-style transactions may block applications, it's critical that they be held for as short a time as possible. If an application dies after locking a resource, that resource could be orphaned forever. If the resource in question represents the availability of an airline seat, that seat might never be filled. A resource manager therefore manages its resources like a mother hen, making sure that locked resources are never abandoned. If a local application requests a lock and then terminates, the resource manager must clean up the mess by unlocking the resource. Before a resource manager allows transactions to be initiated by remote transaction coordinators, a great deal of trust must exist among the resource manager, the remote coordinators, and other resource managers participating in the transactions.

Suppose it's not the link that fails, but rather the remote transaction coordinator. Although the messaging software won't signal a communications error (the communications link is still operational), the local resource manager has the ultimate fallback: It can rely on timeouts to protect its resources. Unfortunately, timeouts can't be used for long-lived transactions, because by definition they execute over extended periods. Again, the techniques that support ACID-style transactions won't work with those that are long-lived, loosely coupled, and asynchronous.

Cancellation risks and abuses
External web services introduce a number of risks just by exposing internal systems to access by others. Allowing externally initiated transactions increases what's known as cancellation risk. For example, consider airline seats purchased at full price a few months before the flight. If they're cancelled at the last minute, the airline may be unable to sell them.

The problem becomes more acute when business processes are automated by web services, because accidental or even intentional abuse can so easily go undetected. For example, imagine how an unethical travel aggregator might exploit an airline-reservation web service. Months in advance, the aggregator reserves every available seat on a particular flight—but at the last minute, cancels them. In a panic to sell the seats, the airline puts them on sale at a deep discount. The unscrupulous travel aggregator then repurchases the same seats at this much lower price.

Accepting a reservation carries an inherent risk of such a last-minute cancellation. This problem exists even without web services, but there are systems in place to detect and prevent most abuses. Airlines manage this risk through overbooking. Concert and theater ticket agencies protect themselves using no-refund policies. But many other businesses - particularly those in wholesale trade - have no formal methods for managing cancellation risks. The risks and abuses of cancellations will probably increase and spread to other industries as external web services are deployed. Web services will ultimately need to express and negotiate the policies under which such transactions are made.

Loosely Coupled Transactions
Clearly, the web-services requirements for transactions far exceed what can be accomplished using traditional technologies. The more loosely we couple systems - separating them in time, space, and control - the more difficult it becomes to manage transactions distributed among them. Loosely coupled transactions, it would seem, come at a cost of increased complexity. That's true, but only so long as we keep trying to apply, refine, and improve traditional approaches based on ACID-style concepts. Instead, let's consider how we can build an all-new transactional system based on loosely coupled web services technologies: asynchronous communications, reliable messaging, and document-style interaction. Let's use an example of a tightly coupled transaction, then see how it can be improved.

You're in your car, listening to the radio, when you hear an announcement that your favorite musician will be performing in your town. You grab your cell phone and dial the ticket-sales agency. A friendly salesperson answers the phone, and you launch into your request - only to be interrupted by the salesperson telling you, "I'm sorry, but our computers are down right now, and we don't know when they'll be back up. You'll have to call again later."

You've just stumbled into one of the drawbacks of synchronous transactions: In this case, there's nothing you can do but abort the transaction. You (the requestor) and the reservation system (the provider) must be available simultaneously. There's no point leaving your information with a salesperson who's just an intermediary, with no store-and-forward capability. Even if the salesperson were willing to take down your information, would you trust that person to complete your order? The responsibility for recovering from the system failure and restarting the transaction falls entirely on you, the requestor.

Half an hour later, you call back (retry), and learn that the system is now available. Of course the context of your transaction has been lost, so you've got to start from the very beginning. As luck would have it, the agent submits your request only to report, "Sorry, but all of the orchestra seats are now sold out. The best I can do is row J, seats 103 and 104 in the upper mezzanine." For a period of a few minutes, the reservation system locks the database records that represent those two seats while you make up your mind. If other customers are placing orders through different agents, they won't be offered those same seats. (This is now a synchronous transaction.)

You tell the agent you'll take the tickets, but your cell phone goes dead just as you're about to jot down your confirmation number. Now what? Did the transaction complete? Do you really have two tickets for the concert, or do you need to call back and place another order? If you do, will you end up with four tickets instead of two? Unfortunately, there's no way to know. Such are the problems of tightly coupled transactions without a reliable asynchronous messaging infrastructure.

Wouldn't it be great if you could just leave a voice-mail message (a self-contained document) including not only the obvious details, but instructions (the business logic) for what to do in case your first-choice seats aren't available? Your voice-mail message would then enter a message queue along with those of other customers, and be processed in sequence. As a result of your request, the ticket agency would call you back or send you an email message confirming your purchase. The acknowledgement would complete this long-lived, loosely coupled asynchronous transaction.

Long-lived transactions
By communicating asynchronously, you've eliminated the real-time constraint of the transaction. You can make your request in the middle of the night. Even if a human agent must review your order, that person need not be available at the time you submit it. Although the vendor's voice-mail system must be able to accept calls at a reasonable rate, the actual transaction system that processes the request is highly scalable. Even if the transaction system goes offline, all orders will get processed in due time as long as customers can submit voice-mail orders. You can see how a reliable asynchronous messaging system is key to long-lived, loosely coupled asynchronous transactions.

Isolation without locking
You've also eliminated the need for record locking. So long as all requests are submitted through a single queue, the ticket agency can process its requests serially. And provided only one ticket request is being processed at a time, the application doesn't need to simulate serialization by locking resources.

Compensating Transactions
Once a transaction has been committed, it can no longer be aborted. Yet in the real world, there are often times when the effects of a transaction must be undone. The problem is that some transactions can't be reversed because their effects are permanent, and/or conditions have so changed over time that restoring the previous state would be inappropriate. As an example, consider a transaction that triggers the manufacturing of an item. Materials are consumed, and money is spent. It's impossible to simply wipe out the transaction. You can't un-manufacture the item. Instead, other actions must be taken, such as charging the customer a cancellation fee and offering the item for sale to other parties. In the earlier example of an unscrupulous travel aggregator who cancelled airline tickets at the last minute, we saw how the airline chose to put those seats on sale at a discount in order to make sure they'd be sold and the airplane would be full.

These are examples of compensating transactions that can be applied after an original transaction has been committed in order to undo its effects, without necessarily returning resources to their original states. Many transaction managers support compensating transactions - and as we'll see in the case of long-lived, loosely coupled asynchronous web services, compensating transactions can actually be used instead of resource locking.

ACID-style transactions are optimistic, and assume a high likelihood of success. You can imagine a human coordinator of a simple two-phase commit transaction commanding the participants. Phase One: "Okay, here's what you need to do. [Coordinator enumerates the requirements.] Has everyone prepared for the transaction by safely storing the results? Good." Phase Two, after receiving affirmative votes from all participants: "Now everyone...GO!" There's no need for the coordinator to ask whether anyone was unsuccessful, since all of the participants promised in Phase One that they could do as requested. The key to the success (and integrity) of the transaction is the locking of the resources between these two phases.

On the other hand, a loosely coupled transaction coordinator must take a pessimistic view of a transaction's outcome. Even with a reliable messaging protocol, many other errors can occur due to the long-term nature of the transaction. Rather than reserve their resources in advance, loosely coupled participants prepare compensating transactions that will undo the local effects in case the first phase is unsuccessful. If the transaction is later aborted, all participants execute their compensating transactions.

When using compensating transactions, our human coordinator might say in Phase One, "Okay, here's what you need to do. Don't do it yet, but in case this doesn't work, I want each of you to figure out ahead of time how to recover. Now everyone...GO!" Then, in Phase Two: "Great...did that work for everyone, or do we all need to run our back-out scenarios?"

Compensating transactions are one of the technologies that decouple systems from one another, and are a first step towards filling in the missing pieces of complex web services.

IBM, Microsoft, and BEA are at work on WS-Coordination, a framework that supports multiple coordination types including WS-AtomicTransactions for short-lived "all-or-nothing" transactions, and WS-BusinessActivity for long-lived loosely coupled transactions using compensation.

Sun, Oracle, Iona and others have announced plans for WS-CAF, the Web Services Composite Application Framework, for transactions and coordination of interdependent web services. And the OASIS Business Transaction Technical Committee is continuing to develop BTP, the Business Transaction Protocol, but they're awaiting implementations so that they can progress it towards a full OASIS standard.

The issues are both political and technical. Because the traditional mechanisms for handling distributed transactions don't work for web services, the standards for web-services transactions will be some of the last to be developed, agreed to, and adopted. Most experts don't expect much impact from these competing standardization efforts until 2005.

This article is excerpted from Loosely Coupled - The Missing Pieces of Web Services (ISBN 1881378241). ©2003 Doug Kaye.

More Stories By Doug Kaye

Doug Kaye is the CEO of RDS Strategies LLC and the publisher of the IT Strategy Letter.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...