|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS XML Protocols Balancing IT Needs For Effective Drug Discovery
Balancing IT Needs For Effective Drug Discovery
By: Joe Shaumbaugh
Feb. 4, 2001 12:00 AM
Programming in script-based languages such as Perl and Tcl is popular in life science disciplines such as bio- and cheminformatics, especially among developers at pharmaceutical and biotech companies. Adding the ability to use XML to pipe these scripts into more robust systems used in many bioinformatics computing environments lowers the entry hurdle for those with little or no programming experience in CORBA, Java, C, and C++. Although XML is no replacement for CORBA, adding XML can bring significant computing benefits along with huge time and cost savings if applied properly. Often, programming tools that offer the greatest flexibility and ease of use aren't the ones you'd use to deploy solutions to large numbers of people. Typically, tools are chosen to complete specific tasks, with some of the unseen implications given only secondary consideration. In the area of life sciences research, for example, programmers piece together solutions using a variety of different programming languages, scripting tools, and data formats. Perl is a favorite tool of many bioinformaticists due to its parsing capabilities and because most common forms of biological data exist in easy-to-read ASCII formats. Since Perl and HTML work well together, the results are typically distributed via Web interfaces to scientists who depend on the data. While this scenario works well for individual programs and result sets, it becomes a problem when integration across data and tools is the goal. On the other hand, technologies such as CORBA provide a means for creating robust, scalable solutions that provide interoperability in a secure, stable, runtime environment composed of dissimilar computing platforms, servers, and client applications. This is the key reason for its appeal in the drug discovery business, where technology advancements have enabled the generation of an ever-increasing amount of data from all kinds of sources. The success of pharmaceutical companies today is tied to how effectively they can analyze, integrate, and share this data among research scientists. This is especially true lately, as the pharmaceutical industry continues to consolidate. Mergers among large pharmaceutical companies result in the blending of research groups that approach the generation, manipulation, and storage of data differently. CORBA has become a standard for building enterprise-scale software solutions in many industries. The Life Sciences Research Domain Task Force at the Object Management Group (OMG) recently adopted the Biomolecular Sequence Analysis (BSA) standard. This is good news for an IT director whose job is to facilitate the integration of data across different domains. As standards are adopted, more vendors will produce software products that interoperate. CORBA provides an object-oriented middleware solution that allows objects and services to interact no matter where they're located on the Web, provided they're designed to communicate through a common interface. CORBA is an ideal technology for constructing large-scale systems. It also lends itself well to incorporating legacy systems through a process known as wrappering. Here, a programmer defines interfaces and writes code to encapsulate an existing process or program so it can be treated as an object in a larger object-oriented system. However, CORBA is known to have a steep learning curve, and if systems are not properly designed, solutions can also become rigid, fragile, and expensive. In the CORBA paradigm interfaces are defined using Interface Definition Language (IDL). Once the interfaces are specified and the system is designed, the implementation can be accomplished using a variety of languages, including Java, C++, C, Perl, and Python. Unfortunately, in rapidly evolving industries the requirements for the system may not be well defined at design time. There's also an increasing need to rapidly incorporate functionality into an existing system, even though the specific functionality may have a very short life span or is likely to change or evolve significantly. In this scenario the extra work required to create robust CORBA-based solutions becomes less attractive. The area of drug discovery research reflects just this type of environment. Rapidly changing technologies are driving frequent revisions in the requirements for information systems to access and analyze data. Developers working for pharmaceutical companies, whether as employees, contractors, or consultants, are interested in solving a researcher's immediate needs while meeting the higher-level objective of producing a robust, enterprise-wide informatics solution. Ideally, an enterprise framework based on COM, CORBA, or some other middleware technology would provide an API to enable these immediate needs to be met. Pharmaceutical IT directors are discovering that they need a robust system that can handle data in a generic way, yet allow programmers to rapidly incorporate algorithms, tools, and viewers to manipulate the data in ways that weren't considered when the larger system was designed. The key is to provide informaticists with tools that enable them to rapidly prototype functionality and test it within the context of a larger framework. Increasingly, XML appears to be the flexible data interchange format that could provide a means to link complex data into large, enterprise-scale systems. In fact, over the past few years bioinformatics professionals accustomed to using Perl in conjunction with HTML have migrated toward XML in order to take advantage of its ability to encode semantic information.
A Practical Solution Gene Expression 1.3 works like an object-oriented, biocentric, smart spreadsheet. Data representing experimental values are included in a spreadsheet format with rows usually symbolizing individual genes and columns containing experimental observations about those genes. Users are able to perform complex calculations and statistical analyses on these data sets by plugging in their own data sources and algorithms. For instance, a user could plug in a clustering algorithm from a statistical package simply by writing an executable wrapper in a scripting language, such as Perl, and registering it with the system. This new algorithm would then dynamically become available on the client through a drop-down menu. When the user selects the cluster algorithm, the client dynamically generates the appropriate input parameter screens. After the appropriate clustering parameters and data columns are selected, the script is executed and the data is returned to the client spreadsheet in the form of a new column of data containing the clustering details from the algorithm. The system allows for additional data to be associated with each cell in the spreadsheet. The user interacts with this additional data by double-clicking on a given cell. This data can be formatted in one of many different formats, including text, HTML, and XML. When formatted in XML, the client is capable of reading the tagged data from the XML and launching the appropriate viewer. For example, a specific script might be capable of returning complex data that's best viewed in a tree format. When the client inspects the XML, it would learn that the data is best viewed using the registered tree viewer. The client would then launch the tree viewer with the XML data.
Integrating CORBA and XML
In this context XML is simply used to extend the CORBA functionality to incorporate any executable program, script, or URL. Writing scripts that conform to a particular DTD is fairly straightforward, as this requires only the data input and output to be tagged with a few specific tags that indicate their type. The Service Plug-in takes this approach only at the bottom layer. XML is used only to pass data between easily written analysis scripts and a CORBA service. CORBA networking is used between the other layers. At the moment we think this is the most appropriate mix.
Summary The relationship between CORBA and XML is expanding as the OMG actively pursues the use of XML through initiatives such as XML Metadata Interchange (XMI). The OMG's IDL is just one way of representing a model. What if, as in the case of the Service Plug-in described above, an XML representation is needed? Wouldn't it be nice if the model could be designed without worrying about all the possible representations? Automated tools that address just this issue are becoming available, enabling the design of a model to be done in UML (Unified Modeling Lang- uage), with the generation of IDL or XML DTD as additional representations. UML permits semantic specifications that go beyond what's expressible in IDL or XML. XMI is an OMG standard, allowing representation of UML in XML. XMI is the result of collaboration between industry leaders such as Unisys, IBM, Oracle, Rational, Fujitsu, and others interested in allowing tools to exchange models. The purpose is to provide standardized methods for sharing data between programming and data modeling tools in a collaborative environment. This will allow developers to share data and designs from different development tools in a heterogeneous distributed environment, resulting in shorter development times for large-scale, multivendor solutions built with such tools. This should come as welcome news to IT professionals in life science informatics and other industries with long development cycles. XML JOURNAL LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING XML NEWS |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||