Welcome!

XML Authors: Arthur Hefti, NeonDrum News, Katharine Hadow, Corey Roth, Bill Roth

Related Topics: SOA & WOA

SOA & WOA: Article

The Challenges of SOA

Which rules are necessary and which are just nice to have

"Our processes are bulletproof. Nothing gets into production that doesn't go through the proper and complete approval process." Famous last words uttered by far too many enterprise architects. Some of them actually believe it's true - others think that by hoping it's true, maybe, just maybe, they can make it true.

The reality, as any line-of-business developer can attest, is much less clear-cut. The challenge is that governance only gets harder the more an organization moves towards a service-based architecture.

One of the first myths that drives a number of enterprise architecture governance decisions is that adding more rules reduces risk. That may be true in theory, but in practice it actually increases risk. The reason is simple: complexity increases risk. A perfect case study of this, one that most people have probably experienced, is password-control policies. As many IT organizations have attempted to "improve security," they've done things like disallow use of dictionary words in passwords, force passwords to change often, disallow reuse of older passwords, etc. The net result is that, because of the added complexity, more people write down their password on a Post-it note. And written-down passwords increase the likelihood of a security breach while, at the same time, making it harder to detect the breach. Increased complexity increases risk.

Avoiding the Complexity Pitfall
There are two ways to address this complexity issue:

  • Have fewer rules, but make them more important rules
  • Automate compliance with the rules
In terms of gauging the importance of rules, I've seen a number of cases where architects put too much emphasis on the technical side and too little emphasis on the business side. For example, let's look at a technical requirement: the need to promote reuse. This often leads to many rules: Rules around the use of certain schemas, security mechanisms, designing a service interface, and many others. Reuse is no doubt important so it makes sense to have rules to promote it. But, let's contrast this with a business requirement: regulatory compliance - whether it's Sarbanes-Oxley (SOX), European Union privacy regulations, HIPAA, or even Visa's Cardholder Information Security Program (CISP). These lead to a large set of rules as well. So, let's say you had to choose between rules to promote reuse or rules to ensure regulator compliance. Would you choose the rules that have no directly quantifiable upside and, at worst, lead to increased cost and reduced agility? Or, would you choose the rules that would keep you from going to jail, getting fired, getting fined, or force your company to shut down? When put in these terms it's easy to see which rules are the most important.

The other approach is to attempt to automate as much rule checking as possible. There are solutions that help address this at every stage of the application lifecycle. Of course, not every rule can be automated, so you still need to be mindful to tightly control and prioritize the set of rules that development must follow.

Automating Governance
For the rules that can be automated, one of the most common approaches is a deployment checkpoint. At deployment time your services are checked against a set of automated rules. These might validate that the services are WS-I-compliant, (increasing their interoperability) and follow further sets of rules that are specific to your organization. This might be that the services use specific predefined schemas, only use certain message transports, etc. The good thing is that this catches non-compliant services before they go into production. The downside is that by the time the service is caught, it's often too late. When it's a choice between meeting a business deadline or following the architecture committee's guidelines, most often the business wins.

The next aspect of automating governance rule validation is applying checks at development time. There are a number of products emerging that can validate the same sets of rules as the "deployment checkpoint" approaches, but do this as a normal and natural part of the development process itself. The advantage of these tools is that they guide the developer down the right path from day one as they build their services, so there's no wasted effort. An added benefit of these tools is that they not only validate that the metadata (such as WSDL) is complieswith the rules, but they often validate that the content of the messages themselves is also compliant. This includes checks such as whether the messages actually match the WSDL, whether the use of the SOAP protocol is WS-I compliant, etc.

There is a major blind spot in these approaches: they can only validate what they can see. This is where the third aspect of automating governance comes in: runtime governance. There are three different kinds of blind spots in development and deployment time governance products that are addressed by the more advanced runtime governance products.

Blind Spot #1: Service Behavior
While development and deployment time approaches can validate metadata like WSDL and (in some cases) message content, what they can't do is validate that a service behaves according to the rules. For example, does the service properly keep an audit trail in all required cases? Does the service only allow authorized individuals to use it? These are things that can't be validated by development or deployment time governance tools. Even testing tools can't adequately validate that these kinds of rules are enforced in all the requisite cases. In many cases, when these types of rules are implemented in code, the only way to validate that they are properly enforced is by diving deeply into the code and evaluating it against a wide series of potential scenarios.

Alternately, you can take advantage of runtime governance tools. These products change governance rules related to behavior from being a coding task to a configuration task. In these products you point and click to declare auditing, security, and other policy behaviors. Moving the enforcement of these rules from a coding task to a configuration task addresses two issues: repeatability - configure these products the same way, and they behave the same way. The same can't be said about custom per-service code. Secondly, since the configuration itself is metadata, validating whether the service meets the governance rules can now be automated, eliminating or at least significantly reducing the chance of human error while simultaneously reducing the time and cost of validation.

Blind Spot #2: Process Awareness
Service Oriented Architectures dramatically change the way you need to think about your production applications. When a service goes into production, that's not the end - it's just the beginning. The reason is that every time a service is reused, it essentially becomes part of a new application - a new business process - and that business process may have an entirely new set of rules to obey. For instance, a service that's used to store an audit log of information. When the service goes into production you might apply a certain set of governance rules to it - checking those at development and deployment time. Let's say another service - part of a new application - now starts using the audit-log service to store order information as part of an ordering process and that order information includes credit card data. In this case, the service would now be subject to Visa CISP rules even though the service wasn't changed and wasn't redeployed. The only thing that changed was how the service was used, and now the set of applicable governance rules changes.

The net result is that you can't assume that development and deployment time governance checks on a service are enough. This is another role where runtime governance comes to the rescue. The most advanced runtime governance products can apply their governance policies not only to individual services, but across entire end-to-end business processes, regardless of when the services were deployed. Since the new business process and thus the new use of the service is what is being deployed, you can validate the policies effectively at business-process deployment time. In contrast, without awareness of a new context of use, the business process, you'd be forced to re-analyze each service that's already in production the moment another application is deployed - a very complex and time-consuming challenge.

Blind Spot #3: Rogue Services
Up until now, we've gone with the assumption that the governance review process is aware of all of the services and uses of services that are going into production. But, is this a realistic assumption? It turns out that in many cases it's not. If a service or service-use gets into production and it didn't go through the proper approval process, you have what's called a rogue service. Rogue services are organizational risks because you just don't know whether they're in compliance or not. It doesn't matter how well you tried to follow your process - if a service gets into production and it's not auditing financial data (and so isn't SOX-compliant) someone might go to jail. The SEC doesn't give you amnesty because you tried to follow your process.

Rarely are rogue services the result of malicious acts. Most people in an organization don't try to bypass the approval process - it can happen for a lot of innocent reasons. For example, let's say you're deploying a packaged application or an application built by a third-party outsourcer - you might not be aware of all of the services contained in this application. Even when you are, sometimes there are just too many to fully evaluate - SAP, for example, has hundreds of services ready to use out-of-the-box. A second case might be a service that was built purely for internal use in an application and so wasn't subject to the approval process - but someone in another application gets a hold of it and starts using it. When you talk about rogue service use, the set of cases where this can occur grows even longer. One organization relayed a story of how they had built a service that had five authorized consumers (each of which had been issued a special consumer key so that the service owner could track them), but it turned out there were 34 different consumers. What happened was one of the five authorized consumers had built the use of the service into a jar file. The jar file embedded the consumer key for simplicity. Twenty-nine other project teams reused this jar file without knowing that it happened to use an external service - so they unwittingly reused the service. And these service uses didn't get approved; they were rogue service uses.

How did this organization find out about these other uses? It turned out that, to find a performance problem with their service they deployed a runtime governance product (one of the common capabilities of runtime governance products is service-level measurement) - since they thought there were only five consumers, they didn't understand why it wasn't performing as expected. The runtime governance product they deployed could also automatically discover new services and new service consumers. This product automatically discovered all 34 consumers. By interfacing with the company's registry of approved services, the product determined that 29 of these consumers were actually rogue consumers and immediately flagged these for approval. The most advanced runtime governance products can even automatically quarantine rogue services and service uses until they're approved - eliminating the risk of rogue services.

Bringing It All Together
To implement a complete approach to SOA governance, you have to consider the roles of development, deployment, and runtime governance. Taking a holistic view of governance across the lifecycle will automate as much of the governance burden as possible, while providing a backstop to catch the rogue services and service uses that your human-centric processes don't catch. Of course, there's no perfect solution - the human element still plays a key role. To reduce risk, you have to reduce the complexity of the manual processes - so remember to think strategically about which rules are really necessary, and which are just "nice to have."

More Stories By Dan Foody

Dan Foody, CTO of Sonic and Actional products, leverages his extensive experience in enterprise systems software toward designing robust and manageable service-oriented architectures. Foody's experience with distributed systems technologies including middleware, integration and Web services, gives him a broad knowledge of the complexities and requirements for managing real-world enterprise software deployments. He is the author of various standards, and contributed significantly to the OMG standard for COM/CORBA interworking. Most recently, Foody was the recipient of InfoWorld's 2005 CTO 25 award. Foody holds a BSEE and MSEE from Cornell University.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.