Welcome!

XML Authors: Katharine Hadow, Greg Schulz, Ambal Balakrishnan, Jeff Scholes, Brad Abrams

Related Topics: XML

XML: Article

Untangling the Semantic Web

The straightforward way to give more meaning to the Web

The Semantic Web is a hot topic in information circles today, and its adoption will largely depend on stakeholders understanding its potential benefits and tools vendors providing an easy entry for developers to learn and work with its related technologies.

The Semantic Web Vision
Imagine this scenario. You're a software consultant, and today you're taking a working lunch with one of your biggest clients. Her company has an emergency project at one of its remote offices, and they need your consulting services there for the next two weeks. You need to get there as soon as possible to begin work, so you take out your hand-held computer, activate its Semantic Web agent, and instruct it to book a nonstop flight that leaves before 10 a.m. the next day. You want an aisle seat if it's available. Once the agent finds an acceptable flight with an available aisle seat, it books it using your credit card and assigns the charges to your client's account in your accounting application. It also warns you that you'll be missing a dentist appointment back home and adds a note to your calendar reminding you to reschedule. Next, you specify that you want a car service to the client's site, so the agent scans the availability of limos in the area with "very good" or higher service ratings and books an appointment to have you picked up 30 minutes after your flight lands. The agent also books you at your favorite hotel chain, automatically securing the lowest rate using your rewards card number. Finally, your agent updates your calendar with your trip information and prints out your confirmation documents back at your office.

With just a few clicks your Semantic Web software agent found and booked your flight, hotel, and car service, then updated your accounting system and calendars automatically. It even compared your itinerary to your calendar and detected the scheduling conflict with your dentist appointment. To do all this, the agent had to find, interpret, combine, and act on information from multiple sources and multiple, disparate applications and data repositories. During those few minutes it takes your software agent to book your trip, you wonder what you ever did before the Semantic Web. This example, of course, is a long-term vision for applying the Semantic Web. It's one that may or may not come to fruition - only the future will tell. However, the vision itself is important for understanding the potential of Semantic Web technologies.

The Semantic Web is currently the focus of W3C Working Groups (www.w3.org/2001/sw/) and is considered the next step in Web evolution. In the Semantic Web, data itself becomes part of the Web and is able to be processed independently of application, platform, or domain. This is in contrast to the World Wide Web as we know it today, which contains virtually boundless information in the form of documents. We can use computers to search for keywords in these documents, but search results still have to be read and interpreted by humans before any useful information can be extrapolated. Computers can present you with information but can't understand what the information is well enough to display the data that is most relevant in a given circumstance. The Semantic Web, on the other hand, is about having data as well as documents on the Web so that machines can process, transform, assemble, and even act on the data in useful ways. To accomplish this, the Semantic Web relies on structured sets of information and inference rules that allow applications to "understand" the relationships between different data resources.

The true impact of the Semantic Web will not be known for quite some time, but some proponents have asserted that it will lead to the evolution of human knowledge itself by allowing people and machines - for the first time - to quickly filter and synergize the massive amounts of data that exist in the world in a relevant, productive way.

As with any potentially revolutionary technology, the scope of the Semantic Web's evolution will depend on several factors, including industry buy-in, the technology learning-curve, and the availability of productive Semantic Web development tools. The first factor, industry buy-in, is largely dependent on the other factors of developer proficiency and tools support.

Spinning Meaning and Relationships
The Semantic Web is a "web of data" that not only harnesses the seemingly endless amount of data on the World Wide Web, but also connects that information with data in relational databases and other noninteroperable data repositories. Considering that relational databases house the majority of enterprise data today, the ability of Semantic Web technologies to access and process them alongside other data from Web sites, other databases, XML documents, and other systems increases the amount of available data exponentially. In addition, relational databases can adapt easily to the Semantic Web model since they already include a great deal of semantic information. Database tables and columns are created based on the relationships between the data they house, and this organization reveals some of the meaning - the semantics - of the data.

Implementing the Semantic Web requires adding semantic metadata to the information resources that are available on the Web or on internal networks. This will allow machines to effectively process the data based on the semantic information that describes it. When there is enough semantic information associated with data, computers can make inferences about the data, i.e., understand what a particular data resource is and how it relates to other data.

XML has paved the road by adding some metadata in the form of human-readable tags that describe data. In addition, XML documents can include information about the author of a Web page, relevant keywords for search engine optimization, and the software tools used to create the XML file, for example.

Before XML, data was stored in flat file and database formats, where most data was proprietary to an application. XML came along and made data interoperable within a single domain, i.e., within the domain defined by a schema or a set of related schemas that define the structure of related documents. By itself, XML provides syntactic interoperability only when both parties know and understand the element names used. If I label an element <price>12.00</price> and someone else labels it <cost>12.00</cost>, there's no way for a machine to know if those are the same thing without the aid of a separate application to map between the elements. Semantic Web technologies help address this problem by making tags understandable not just to humans - but to machines as well.

The first step required for machines to understand data is to get that data into a uniform format, where, for instance, a field labeled "street" always has the same format and contains the same type of information, and so on. This type of functionality can be found today on Web sites that use forms that allow users to enter information and run a query, such as airline sites that allow visitors to search for and book flights based on a variety of criteria. However, considering the amount and variety of data available from different sources today, this method of data typing does not scale beyond very specific applications.

The next step towards the Semantic Web requires data from multiple domains to be classified based on its properties and its relationship with other data. This is where Semantic Web technologies such as RDF and OWL come in.

RDF and OWL
RDF (Resource Description Framework) is the XML-based W3C standard that forms the basis for the Semantic Web. RDF statements describe a resource (identified by a URI), the resource's properties, and the values of those properties. RDF statements are often referred to as "triples" that consist of a subject, predicate, and object, which correspond to a resource (subject), a property (predicate), and a property value (object). A triple written in plain text is depicted in Figure 1.

By creating triples with subjects, predicates, and objects, RDF allows machines to make logical assertions based on the associations between subjects and objects. However, while RDF provides a model and a syntax (the rules that specify the elements of a sentence) for describing resources, it does not specify the semantics (the meaning) of the resources. To truly define semantics, we need RDFS or OWL.

More Stories By Erin Cavanaugh

Erin Cavanaugh is product marketing manager for Altova (www.altova.com), creator of XMLSpy and other leading XML, data management, UML, and Web services tools. In this role, Erin manages Altova's XML-related line of tools. She has held product marketing, training, and technical copywriting roles at a variety of hardware and software firms.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.