Welcome!

Industrial IoT Authors: Pat Romanski, William Schmarzo, Elizabeth White, Stackify Blog, Yeshim Deniz

Related Topics: Industrial IoT

Industrial IoT: Article

Web-Services Transactions

From Loosely Coupled - The Missing Pieces of Web Services

Most non-programmers think of transactions as associated with buying and selling, credit-card authorizations, and the like. But in the jargon of computer science, the word transaction has a very specific meaning: the interaction and managed outcome of a well-defined set of tasks. If that definition still sounds rather vague or abstract, it's because the scope of what's considered a transaction has expanded over the past two decades, and the older simpler definitions are no longer adequate. Computer systems have been connected via networks, and applications are more distributed in nature. The theories and practices of transactions have been repeatedly stretched to their limits, re-evaluated, and extended. Now, because of web services, we're once again expanding that definition to include long-lived loosely coupled asynchronous transactions.

 

Transaction Basics
Most database operations are simple, and thus don't qualify as transactions per se. For example, when a customer-service application wants to look up a customer's phone number, the application sends a query message to the database. It's a read-only operation that involves only one record in a single database. But most importantly, it's a one-step (atomic) operation that doesn't interact or conflict with other applications that may be interacting with the same record or even the same database.

More complex database operations require multiple steps that must all be completed for the operation to succeed. We refer to these operations as transactions. The traditional definition of a transaction is a single unit of work composed of two or more tasks. If any of these component tasks cannot be completed, the entire transaction fails, leaving the data in the state it was in before the transaction was initiated. In other words, a transaction is a collection of tasks that either all succeed, or all fail. Achieving this consistent termination of a unit of work is the goal of a traditional transaction-processing monitor (TP monitor) which is software that manages lower-level database operations.

An example of a simple transaction is a transfer of funds from one account to another within the same bank. The transaction's unit of work consists of two tasks: the debiting from one account, and the crediting to another. Ideally, both tasks will execute properly (commit), but even more important is that if one task can't be accomplished, neither will be executed (i.e., they'll both abort). It's okay if the matching credit and debit both fail—the application initiating the transaction can always try again. But it's a serious problem if the credit is executed without the associated debit, or vice versa.

ACID
As the results of their theoretical studies of transactions, Theo Häerder and Andreas Reuter published a 1983 paper, "Principles of Transaction-Orientated Database Recovery," in which they presented the requirements for systems that could process multiple-task units of work (transactions), and would not be corrupted by hardware, database, or operating-system failures. The paper is most famous for its specification of the principles of Atomicity, Consistency, Isolation, and Durability (ACID). A system that conforms to these so-called ACID properties guarantees the reliability of its transactions.

Two-Phase Commit
When all of the data involved in a transaction resides on a single database, only one TM is required to maintain atomicity. But applications and databases are increasingly distributed, such as those linked by web services. The challenge for web services is to maintain atomicity by guaranteeing the mutual success and durability of all of the elements of such a distributed transaction, so named because it involves a distributed unit of work. In other words, multiple steps are required that involve two or more databases.

The traditional method for handling distributed transactions is known as the two-phase commit, which, as its name implies, breaks transactions into two cooperating phases. The two-phase commit protocol is illustrated in Figure 1.

 

The two-phase commit process assures the atomicity of the distributed transaction. It's clean and simple—except when things go wrong. Due to hardware, software, or communications failures, it's possible that one or more messages may be lost, resulting in an uncertain state for one or more of the resource managers. As it turns out, however, only the loss of a commit message can cause a serious problem. Losses of other message types are less critical. If a resource manager fails to get the request-to-prepare message, it will simply fail to respond. The controller will give up waiting for the resource manager's response and send out an abort message. The other resource managers will not have committed any of their changes. The same occurs if one or more of the response messages is lost. And if a done message is lost, no action need be taken, since all of the resource coordinators will have committed the transaction.

The most serious problem occurs when a resource manager prepares for the transaction but never receives either a commit or an abort message from the transaction coordinator. Once a resource manager has sent its prepared response, it's in limbo. It can't commit the transaction, and it can't release any resources locked on behalf of the transaction. (Resource locks are under the control of the individual resource managers, not the transaction controller.)

In fact, there's no simple solution to this problem. No two-phase commit protocol can protect against all failures. The possibility will always exist that a communications failure can cause a resource manager to become blocked, or unable to commit or abort. Still, even with its limitations, the two-phase commit protocol remains the mainstay of distributed transactions.

The Web-Services Challenges
The ACID model has been the focus of transaction technologies for twenty years. It's widely used for both local and—via the two-phase commit protocol—distributed transaction systems. But as valuable as the ACID model has proven to be for tightly coupled distributed systems, it falls short for long-lived, loosely coupled asynchronous transactions.

Long-lived transactions
Web services are far more complex in terms of time and space than the transactions for which the ACID concepts were developed. Whereas ACID-based transactions may span many seconds or even a few minutes, loosely coupled web-services transactions may extend over hours or even days. Considerable time can elapse between the preparation and commit phases. Using ACID-style transactions in such long-running business processes would mean that participating resources could be locked and unavailable for extended periods of time—which is unacceptable to many local applications that use the same databases and pend until the resources they require are released.

Reliability
ACID-style transactions are designed to cope with failures in hardware, software, and communications, but only in otherwise reliable environments where such failures occur relatively infrequently. Most ACID-style distributed transactions systems are based on synchronous, connection-oriented protocols, which maintain communications paths between transaction coordinators and the participating resource managers for at least the duration of the transaction. These synchronous protocols assist in handling such errors by signaling the transaction-coordinator or resource-manager software when a communication failure occurs, so that the coordinator or resource manager knows it can no longer communicate with the service at the other end of the connection. When a communications link fails, all synchronous transactions that depend on that link are promptly aborted.

Short-term communications failures are therefore fatal errors for tightly coupled synchronous transactions, but they must be routinely handled by the systems that support long-lived, loosely coupled asynchronous transactions. The latter are based on a reliable-messaging infrastructure that delivers messages with a high degree of assuredness, even in cases where the recipient and the intervening infrastructure may be down for extended periods of time.

Trust
Because the resource locks typically used with ACID-style transactions may block applications, it's critical that they be held for as short a time as possible. If an application dies after locking a resource, that resource could be orphaned forever. If the resource in question represents the availability of an airline seat, that seat might never be filled. A resource manager therefore manages its resources like a mother hen, making sure that locked resources are never abandoned. If a local application requests a lock and then terminates, the resource manager must clean up the mess by unlocking the resource. Before a resource manager allows transactions to be initiated by remote transaction coordinators, a great deal of trust must exist among the resource manager, the remote coordinators, and other resource managers participating in the transactions.

Suppose it's not the link that fails, but rather the remote transaction coordinator. Although the messaging software won't signal a communications error (the communications link is still operational), the local resource manager has the ultimate fallback: It can rely on timeouts to protect its resources. Unfortunately, timeouts can't be used for long-lived transactions, because by definition they execute over extended periods. Again, the techniques that support ACID-style transactions won't work with those that are long-lived, loosely coupled, and asynchronous.

Cancellation risks and abuses
External web services introduce a number of risks just by exposing internal systems to access by others. Allowing externally initiated transactions increases what's known as cancellation risk. For example, consider airline seats purchased at full price a few months before the flight. If they're cancelled at the last minute, the airline may be unable to sell them.

The problem becomes more acute when business processes are automated by web services, because accidental or even intentional abuse can so easily go undetected. For example, imagine how an unethical travel aggregator might exploit an airline-reservation web service. Months in advance, the aggregator reserves every available seat on a particular flight—but at the last minute, cancels them. In a panic to sell the seats, the airline puts them on sale at a deep discount. The unscrupulous travel aggregator then repurchases the same seats at this much lower price.

Accepting a reservation carries an inherent risk of such a last-minute cancellation. This problem exists even without web services, but there are systems in place to detect and prevent most abuses. Airlines manage this risk through overbooking. Concert and theater ticket agencies protect themselves using no-refund policies. But many other businesses - particularly those in wholesale trade - have no formal methods for managing cancellation risks. The risks and abuses of cancellations will probably increase and spread to other industries as external web services are deployed. Web services will ultimately need to express and negotiate the policies under which such transactions are made.

Loosely Coupled Transactions
Clearly, the web-services requirements for transactions far exceed what can be accomplished using traditional technologies. The more loosely we couple systems - separating them in time, space, and control - the more difficult it becomes to manage transactions distributed among them. Loosely coupled transactions, it would seem, come at a cost of increased complexity. That's true, but only so long as we keep trying to apply, refine, and improve traditional approaches based on ACID-style concepts. Instead, let's consider how we can build an all-new transactional system based on loosely coupled web services technologies: asynchronous communications, reliable messaging, and document-style interaction. Let's use an example of a tightly coupled transaction, then see how it can be improved.

You're in your car, listening to the radio, when you hear an announcement that your favorite musician will be performing in your town. You grab your cell phone and dial the ticket-sales agency. A friendly salesperson answers the phone, and you launch into your request - only to be interrupted by the salesperson telling you, "I'm sorry, but our computers are down right now, and we don't know when they'll be back up. You'll have to call again later."

You've just stumbled into one of the drawbacks of synchronous transactions: In this case, there's nothing you can do but abort the transaction. You (the requestor) and the reservation system (the provider) must be available simultaneously. There's no point leaving your information with a salesperson who's just an intermediary, with no store-and-forward capability. Even if the salesperson were willing to take down your information, would you trust that person to complete your order? The responsibility for recovering from the system failure and restarting the transaction falls entirely on you, the requestor.

Half an hour later, you call back (retry), and learn that the system is now available. Of course the context of your transaction has been lost, so you've got to start from the very beginning. As luck would have it, the agent submits your request only to report, "Sorry, but all of the orchestra seats are now sold out. The best I can do is row J, seats 103 and 104 in the upper mezzanine." For a period of a few minutes, the reservation system locks the database records that represent those two seats while you make up your mind. If other customers are placing orders through different agents, they won't be offered those same seats. (This is now a synchronous transaction.)

You tell the agent you'll take the tickets, but your cell phone goes dead just as you're about to jot down your confirmation number. Now what? Did the transaction complete? Do you really have two tickets for the concert, or do you need to call back and place another order? If you do, will you end up with four tickets instead of two? Unfortunately, there's no way to know. Such are the problems of tightly coupled transactions without a reliable asynchronous messaging infrastructure.

Wouldn't it be great if you could just leave a voice-mail message (a self-contained document) including not only the obvious details, but instructions (the business logic) for what to do in case your first-choice seats aren't available? Your voice-mail message would then enter a message queue along with those of other customers, and be processed in sequence. As a result of your request, the ticket agency would call you back or send you an email message confirming your purchase. The acknowledgement would complete this long-lived, loosely coupled asynchronous transaction.

Long-lived transactions
By communicating asynchronously, you've eliminated the real-time constraint of the transaction. You can make your request in the middle of the night. Even if a human agent must review your order, that person need not be available at the time you submit it. Although the vendor's voice-mail system must be able to accept calls at a reasonable rate, the actual transaction system that processes the request is highly scalable. Even if the transaction system goes offline, all orders will get processed in due time as long as customers can submit voice-mail orders. You can see how a reliable asynchronous messaging system is key to long-lived, loosely coupled asynchronous transactions.

Isolation without locking
You've also eliminated the need for record locking. So long as all requests are submitted through a single queue, the ticket agency can process its requests serially. And provided only one ticket request is being processed at a time, the application doesn't need to simulate serialization by locking resources.

Compensating Transactions
Once a transaction has been committed, it can no longer be aborted. Yet in the real world, there are often times when the effects of a transaction must be undone. The problem is that some transactions can't be reversed because their effects are permanent, and/or conditions have so changed over time that restoring the previous state would be inappropriate. As an example, consider a transaction that triggers the manufacturing of an item. Materials are consumed, and money is spent. It's impossible to simply wipe out the transaction. You can't un-manufacture the item. Instead, other actions must be taken, such as charging the customer a cancellation fee and offering the item for sale to other parties. In the earlier example of an unscrupulous travel aggregator who cancelled airline tickets at the last minute, we saw how the airline chose to put those seats on sale at a discount in order to make sure they'd be sold and the airplane would be full.

These are examples of compensating transactions that can be applied after an original transaction has been committed in order to undo its effects, without necessarily returning resources to their original states. Many transaction managers support compensating transactions - and as we'll see in the case of long-lived, loosely coupled asynchronous web services, compensating transactions can actually be used instead of resource locking.

Optimism
ACID-style transactions are optimistic, and assume a high likelihood of success. You can imagine a human coordinator of a simple two-phase commit transaction commanding the participants. Phase One: "Okay, here's what you need to do. [Coordinator enumerates the requirements.] Has everyone prepared for the transaction by safely storing the results? Good." Phase Two, after receiving affirmative votes from all participants: "Now everyone...GO!" There's no need for the coordinator to ask whether anyone was unsuccessful, since all of the participants promised in Phase One that they could do as requested. The key to the success (and integrity) of the transaction is the locking of the resources between these two phases.

On the other hand, a loosely coupled transaction coordinator must take a pessimistic view of a transaction's outcome. Even with a reliable messaging protocol, many other errors can occur due to the long-term nature of the transaction. Rather than reserve their resources in advance, loosely coupled participants prepare compensating transactions that will undo the local effects in case the first phase is unsuccessful. If the transaction is later aborted, all participants execute their compensating transactions.

When using compensating transactions, our human coordinator might say in Phase One, "Okay, here's what you need to do. Don't do it yet, but in case this doesn't work, I want each of you to figure out ahead of time how to recover. Now everyone...GO!" Then, in Phase Two: "Great...did that work for everyone, or do we all need to run our back-out scenarios?"

Compensating transactions are one of the technologies that decouple systems from one another, and are a first step towards filling in the missing pieces of complex web services.

Standards
IBM, Microsoft, and BEA are at work on WS-Coordination, a framework that supports multiple coordination types including WS-AtomicTransactions for short-lived "all-or-nothing" transactions, and WS-BusinessActivity for long-lived loosely coupled transactions using compensation.

Sun, Oracle, Iona and others have announced plans for WS-CAF, the Web Services Composite Application Framework, for transactions and coordination of interdependent web services. And the OASIS Business Transaction Technical Committee is continuing to develop BTP, the Business Transaction Protocol, but they're awaiting implementations so that they can progress it towards a full OASIS standard.

The issues are both political and technical. Because the traditional mechanisms for handling distributed transactions don't work for web services, the standards for web-services transactions will be some of the last to be developed, agreed to, and adopted. Most experts don't expect much impact from these competing standardization efforts until 2005.

This article is excerpted from Loosely Coupled - The Missing Pieces of Web Services (ISBN 1881378241). ©2003 Doug Kaye.

More Stories By Doug Kaye

Doug Kaye is the CEO of RDS Strategies LLC and the publisher of the IT Strategy Letter.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...