Web Content Display Web Content Display

Understanding the Semantic Web Stack

 

Last summer as part of my work on the Mailing Systems Capabilities reference implementation, I had to wrestle with a whole new model for describing data. Instead of the structured data design used in standard programming paradigms, I was exposed to the idea of designing from a knowledge perspective when trying to understand the W3C standards for describing devices. These standards used the Semantic Web Language Stack that the W3C Semantic Web Activity had designed for representing knowledge. The Semantic Web Stack includes three web technologies, with XML at the bottom, RDF/RDF Schema in the middle, and OWL at the top. At the time, I had real problems fully understanding what the added value was for each level when trying to design representations for data that were consumed by computer applications. Specifically, I could not understand the justification for the added complexity (and the use of more unstable tools) of the RDF/RDF Schema level. This discussion is intended to start a conversation inside AC&T about parameters on how to select a web technology for representing data.

XML/XML Schema is a technology that can be used to define the hierarchical structure of data and constraints on values carried within that structure. XML instance documents that carry data that conforms to that structure and value constraints are said to be valid documents. There are many stable tools that simplify instance document use in software applications, including the ability to: 1) ensure the validity of instance documents, 2) create/extract data using the Web DOM paradigm, and 3) create/extract data using the OO paradigm. For example JAXB, a specification for making schemas to Java objects, helps integrate data from the instance documents more naturally into object-based applications. Actual java classes are created based on the schema and the data in instance documents gets transformed to appropriate objects. Given the expansiveness and reliability of XML-related tools, using XML Schema to represent data within well-defined systems makes a lot of sense.

RDF/RDF Schema can also be used to represent data, however it de-emphasizes hierarchical structure and prescribed constraints. XML is an underlying technology, but it is relegated to be the"interchange language," i.e., the language used to express RDF documents from one computer to another. Theoretically speaking, RDF represents data as a pool of "Things" that have inter-relationships defined by "Properties." It relies on unique naming so that the "Things" can be identified beyond a single scope. From what I can understand, the motivation behind RDF and RDF Schema, as it exists today, is to represent data that is incomplete, and to provide a simple mechanism for merging data that appears related to be able to build a "more complete" representation of the world, according to available data. Think about medical records, for example. Each medical office you visit has a piece of your medical history. If you could consolidate the data pieces from across the entire web, your view into your health would be more complete. RDF is made for this. Relationships can be aggregated for "Things" with the same ids, so if all of your medical records had "your" unique web identifier you could create a document with your complete medical history.

OWL is a set of language extensions to RDF Schema that help complete the aggregation and interpretation of disparate data. For example the equivalence of "Things" with different identifiers can be declared, as well as other meta information that helps support inference across the data set. So, for example, we all know that medical offices use different identifiers for their patients. With OWL, you can specify that patient 123 from one medical office is that same as patient XYZ from another, and so more realistically aggregate the information across medical offices. From my POV, RDF/RDF Schema is not fully capable of representing and merging partially represented data. OWL is needed to fully address this space. Therefore, applications should generally use the OWL extensions rather than just RDF Schema (IMHO). Unfortunately, the OWL specification still seems to be evolving. An OWL 2.0 has just been released. Also, tools for supporting the use of RDF and OWL instance documents are few and those that exist are not as mature as the tools for XML. They tend to be harder to and to have much slower performance use (based on experience from prior projects).

Another, less obvious distinction between XML and RDF technologies is the idea of "closed world" vs. "open world." When using XML technologies, the data is assumed to be fully specified. If a data element is missing, then is simply does not, and will not, exist. This is referred to as "closed world" assumptions. (Please note I am oversimplifying for the purposes of this document.) RDF assumes an underlying "open world" model. If data is missing then the idea is that it could still exist, either at another place or another time, or… This shows itself most clearly in the idea of validation. An XML document must exactly match the hierarchy and constraints expressed in the underlying schema. New elements cannot be added to the instance document unless the schema is first changed. In RDF, new properties can be added at any time, and the document will still be considered valid. For example, an RDF Vocabulary can specify a household with properties: isFather, isMother, isDaughter, isSon. An RDF instance document could contain a property, isPet – unknown to the schema, and still be a valid document.

So, when you need to choose which of the web technologies to represent data, I recommend staying with XML unless representing partial (and/or evolving) data is needed. For now, the tools make the difference. In a few years, OWL tools may mature. At that point transforming XML data to RDF, if needed, is very straightforward, and I believe that the W3C is actually looking into ways to formalize the translation, based on conversation with them that David and I had in January.

This document subject to change, based on further experience I may gain with RDF/RDF Schema and OWL. 

Deb Zukowski

June 18, 2009