There are many debates on the internet about relationships between Resource Description Framework (RDF), Topic Maps and some ontology expressing languages. Some fuel to the fire has been added with the introduction of other ontology languages such as OWL and SKOS. The W3C has made an attempt to establish standard guidelines for RDF/Topic Maps interoperability by consolidating the existing proposals of integrating RDF and Topic Maps data.
The primary goal of W3C was to achieve interoperability between RDF and Topic Maps at the data level. This means that it should be possible to translate data from one form to the other without unacceptable loss of information or corruption of the semantics. It should also be possible to query the results of a translation in terms of the target model and it should be possible to share vocabularies across the two paradigms.
For those readers who are not familiar enough with Topic Maps, a good introduction can be found in Steve Pepper's famous document: "The TAO of Topic Maps, Finding the way in the age of infoglut". It is also supposed that reader is very familiar with RDF and its triplet concepts, as well as with Semantic Web concepts.
IntroductionThe Resource Description Framework (RDF) is a model developed by the W3C for representing information about resources in the World Wide Web. Topic Maps is a standard for knowledge integration developed by the ISO. The two specifications were developed in parallel during the late 1990's within their separate organizations for what at first appeared to be very different purposes. However, it appears that they have a lot in common.
A number of attempts have been made to uncover the synergies between RDF and Topic Maps and to find ways of achieving interoperability at the data level. The goal of W3C now is to provide guidelines for users who want to combine the W3C's RDF/OWL family of specifications and the ISO's family of Topic Maps standards. This article is a survey of existing approaches and an analysis of their strengths and weaknesses. A W3C Recommendation with guidelines on transforming is yet to be published.
So what are the proposals?I have selected five different proposals for analysis. They will be referred to by the names of their authors or, in the case of multiple authors, by the name of the organisation to which the authors are affiliated. Each proposal builds upon and references previous work and they are characterised here in terms of the translation directions that they cover: i.e., RDF to Topic Maps (RDF2TM), and Topic Maps to RDF (TM2RDF), respectively:
- Moore: RDF2TM and TM2RDF proposal. (Described in "RDF and Topic Maps: An exercise in convergence".
- Stanford: TM2RDF proposal. (Described in "On the Integration of Topic Maps and RDF Data".
- Ogievetsky: TM2RDF proposal. It is implemented in XTM2RDF translator. (Described in "XML Topic Maps through RDF glasses").
- Garshol: RDF2TM and TM2RDF proposals. Both are implemented in Ontopia Knowledge Suite (a commercial set of tools for building, maintaining, and deploying topic map-based applications). The original proposals are described in "Topic maps, RDF, DAML, OIL: A comparison", and "Living with Topic Maps and RDF".
- Unibo: RDF2TM and TM2RDF proposals. Described in "Metadata on the Web: On the integration of RDF and Topic Maps". This proposal has been implemented in Meta, a tool implemented at the University of Bologna. Meta allows the creation and navigation of documents containing meta-data information in an environment where RDF and Topic Maps need to coexist.
A word on the XTM standard also has to be said. TM is short for Topic Maps, the name of the standard, the paradigm and (lower-cased) the artifacts themselves. XTM is the standard interchange syntax (XML Topic Maps). This is clarified when mentioned XTM.
All the existing approaches fall into two distinct categories that Moore originally termed "modeling the model" and "mapping the model". These might be more appropriately termed "object mappings" and "semantic mappings" respectively. The basic difference between the two approaches can be summed up as follows:
- Object mappings use the low-level building blocks of one language to describe the object model of the other. For example, assuming for now that the structure of a simple binary associations data model is a quintuple, consisting of one (a)ssociation, two (r)oles, and two role (p)layers (p-r-a-r-p), that association would be represented using an object mapping as four statements that relate five resources.
- Semantic mappings start from higher level concepts that carry the semantics of each model and attempt to find equivalences between them. A binary association in Topic Maps would be seen to represent the same kind of "thing" that is often represented by an RDF statement (i.e., a relationship between two entities) and would therefore be represented using a single RDF statement. Where no direct semantic equivalent can be found, the missing semantics are defined using the facilities available in one of the two paradigms, i.e., classes, properties, or published subjects.
It was the first proposal to address the issue of interoperability between RDF and Topic Maps. Having presented the two models, Moore introduces the distinction between what he calls "mapping the model" and "modeling the model". The key difference is that the former is "semantic", whereas the latter "uses each standard as a tool for describing other models". Let's have a look what this means in fact.
Moore's RDF2TM object mapping approach is based on defining PSIs (PSI is a Published Subject Identifier, a type of topic which models a single term in a thesaurus) for every RDF construct in his model (i.e., resource, statement, property, subject, object, identity, literal, and model) and expressing RDF statements as ternary associations of type rdf-statement using the role types rdf-subject, rdf-property and rdf-object. This raises issues with the handling of literals (since role players in associations cannot be strings) to which no solution is proposed.
The TM2RDF object mapping approach is based on defining RDF properties for each TM construct as follows: topic, topicassoc, instanceof, topicassocmember, roleplayingtopic, roledefiningtopic, topicoccur, topicname, topicnamevalue, scopeset, subjindicatorref, resourceref.
Moore's object mapping approach is reasonably complete, whereas his semantic mapping approach is just a sketch that focuses on RDF statements and associations. Neither approach is reversible. In the case of the object mapping approach, the assumption is that one is working in one domain or the other, but not in both.
In the case of the semantic mapping approach, the fact that a statement maps to a single association whereas an association maps to two statements shows that translations cannot be reversed. Semantic mappings are shown to be superior to object mappings in terms of naturalness. The latter yields unnatural results in both directions. Whatever the direction, a "natural" source document leads to an "unnatural" result and achieving a "natural" result is only possible if the starting point is "unnatural". In the object mapping example given in the Moore's proposal, a simple binary association translates to 22 RDF statements.
The Stanford proposalThe main idea is to make possible to query Topic Maps using an "RDF-aware infrastructure". This proposal is thus TM2RDF only. Reference is made to the layered integration model of data interoperability which separates the data integration problem into three quasi-independent layers: the syntax layer, the object layer, and the semantic layer. The idea is to build an RDF representation of the topic map on the object layer and then perform a "bijective graph transformation" such that the topic map can be viewed as RDF. Ignoring the syntax layer means that the approach will work with both the SGML and the XML serialization syntaxes of Topic Maps.
It ignores semantic layer so that all information, according to the authors, is preserved. Instead of defining their own model for Topic Maps, authors use PMTM4, the Processing Model for Topic Maps, proposed by Newcomb and Biezunski ("Topicmaps.net's Processing Model for XTM 1.0"). In short, PMTM4 is a graph model consisting of three node types (for topics, associations, and scopes), and four arc types: associationMember (aM), associationScope (aS), associationTemplate (aT), and scopeComponent (sC).
Having constructed an RDF graph from the topic map, authors show how it can be queried, together with native RDF data, by a single query expressed in a special logic syntax. The query in the following example uses the RDF-encoded topic map to find all countries that have petroleum as a natural resource and then extracts links to DMOZ Travel_and_Tourism pages for those countries from the RDF-encoded Open Directory (See example1.txt):
Example 1
FORALL pages <- Country, DMOZCountry Y,X, Z
Y[tms:roleLabel->country;rdf:object->Country]
@CIA_WORLD_FACTBOOK and
X[tms:roleLabel->natural-resource;
rdf:object->petroleum;
rdf:subject->Z[tms:associationMember->Country]
@CIA_WORLD_FACTBOOK]
@CIA_WORLD_FACTBOOK and
Country[mapsTo->DMOZCountry] and
DMOZCountry[Travel_and_Tourism ->dmozpage[links->pages]]
@DMOZ.
The Stanford approach is complete with respect to PMTM4, but the latter is not a complete model for Topic Maps since is does not handle URIs and strings. The Stanford proposal itself is therefore not complete. The proposal does not score well in terms of naturalness since it requires upwards of 20 statements to represent information that would naturally be modeled using two statements in RDF.
The Ogievetsky proposalIn this proposal, the author describes both a method for transforming topic maps expressed in XTM syntax to RDF and the author's XSLT-based implementation of this approach in the XTM2RDF Translator. Transformations are described in terms of the processing of XTM elements and the approach is thus very syntax-oriented. The resulting RDF conforms to a vocabulary (called RTM) which consists of 11 classes and 17 properties defined partly in terms of XTM itself and partly in terms of discussed earlier PMTM4, the "processing model" proposed by Newcomb and Biezunski and described in the preceding section.
The classes and properties defined by the RTM vocabulary are:
- rdfs:Class: t-node, topic, scope, member, association, basename, variantname, occurrence, class-subclass, class-instance, templaterpc.
- rdf:Property: association-role, validIn, indicatedBy, constitutedBy, name, templatedBy, role-topic, role-basename, role-variantname, role-occurrence, role-superclass, role-subclass, role-class, role-instance, role-template, role-role, role-rpc.
The mapping is pretty simple: each <topic> element results in the creation of an RDF statement of type rtm:topic. The topic's subject locator (if any) becomes the URI of the subject of the statement; otherwise a blank node is created. Subject identifiers (if any) result in properties of type rtm:indicatedBy.
Associations are represented as blank nodes whose type corresponds to the association type. In addition, for each role in the association there is one statement whose property corresponds to the role type (e.g. ns1:composer and ns1:work in the example below); its value is a node of type rtm:member that references the role player. Referencing is done through an rtm:indicatedBy property when the role player has a subject identifier and an rtm:constitutedBy property when the role player has a subject locator. (The text does not state what form the reference takes when the role player has neither.)
The following example shows how the association between Tosca and Puccini is represented in RDF/XML in "third RDF basic abbreviated form" (See example2.txt):
Example 2
<ns1:composed-by>
<ns1:composer>
<rtm:member>
<rtm:indicatedByrdf:resource="http://en.wikipedia.org/wiki/Puccini" />
</rtm:member>
</ns1:composer>
<ns1:work>
<rtm:member>
<rtm:indicatedByrdf:resource="http://psi.ontopia.net/opera/#tosca" />
</rtm:member>
</ns1:work>
</ns1:composed-by>
There is a very obvious similarity between the syntax shown above and XTM, which could indicate that the desire to output readable RDF/XML syntax (and perhaps the exigencies of XSLT-based processing) have influenced the form of RDF chosen for the target model.
String values for names and internal occurrences are represented as the values of rtm:name properties of member nodes. The following example shows the base name of the composer Puccini as output by the xtm2rdf.xsl XSLT stylesheet (See example3.txt). A blank node represents the topic-basename relationship. Syntactically, the rtm:baseName construct has exactly the same "shape" as the association shown above:
Example 3
<rtm:baseNamerdf:ID="XSLTbaseName122124120120">
<rtm:role-topic>
<rtm:member>
<rtm:indicatedByrdf:resource="#puccini" />
</rtm:member>
</rtm:role-topic>
<rtm:role-name>
<rtm:member>
<rtm:name>Giacomo Puccini</rtm:name>
</rtm:member>
</rtm:role-name>
</rtm:baseName>
As with binary associations, seven RDF statements are required to represent a single topic name characteristic that would naturally be modeled using a single statement in RDF.
The author shows also how such "RDF Topic Maps" can be queried (using the RDF query language SquishQL and constrained (using DAML+OIL). The following sample query (See example4.txt) shows how to find all topics that have names in the scope "taxon":
Example 4
SELECT ?topic, ?name
FROM http://www.cogx.com/xtm2rdf/seacr.rtm#
WHERE
(rdf::type ?a ?rtm::basename)
(rtm::role-topic ?a ?m1) (rtm::indicatedBy ?m1 ?topic)
(rtm::role-name ?a ?m2)(rtm::name ?m2 ?name)
(rtm::validIn ?a ?s)(rtm::indicatedBy ?s this::taxon)
USING
rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
rtm FOR http://www.cogx.com/xtm2rdf/rtm.rdf#
this FOR http://www.cogx.com/xtm2rdf/seacr.rtm#
The proposal appears to be fairly complete in that it covers more-or-less every aspect of XTM syntax (which requires extending the underlying PMTM4 model in order to cater for identifiers). The proposal requires seven statements to represent information content that would naturally be modeled using one statement in RDF and thus rates very low in terms of naturalness. Translating the Topic Maps test case results in an RDF document containing 125 statements.
The Garshol proposalThis proposal was originally presented as part of a comparative analysis of the RDF and Topic Maps models. The analysis was further developed (and extended to partially address OWL). The approach has been implemented by the author in the Ontopia Knowledge Suite.
The author starts by comparing RDF and Topic Maps through an examination of concepts that are fundamental to both paradigms: "symbols and things", "assertions", "identity", "reification", "qualification", and "types and subtypes". For each concept, Garshol shows how they are expressed in each paradigm and draws out the similarities and differences.
According to Garshol, RDF and Topic Maps are both "identity-based technologies"; that is, the key concept in both is symbols representing identifiable things about which assertions can be made. In Topic Maps, "things" are called "subjects"; in RDF they are called "resources" and, despite different definitions, they are essentially the same concept. Subjects are represented by topics; resources are represented by RDF nodes (or "nodes" for short). According to Garshol, the correspondence between "topic" and "node" is close but not exact.
Assertions express relationships between things and take the form of "topic characteristics" in Topic Maps and "statements" in RDF. A topic characteristic can be a name, an occurrence, or an association. An RDF statement can thus in theory be mapped to any one of these three kinds of construct. Special attention is paid to associations since these can be of any variety, whereas all RDF statements are binary. A binary association maps fairly well to an RDF statement, but a non-binary association does not.
The concept of types and subtypes, on the other hand, is regarded as being identical in Topic Maps and RDF (except for the fact that the subClassOf property is part of RDF Schema rather than RDF itself).
The author considered object-mapping approaches described in previous proposals as heavy-weight and rather awkward to work with. As an alternative, Garshol proposes to use vocabulary-specific mappings underpinned by a generic mapping. Statements should in general be mapped to names, occurrences or associations since this provides the most "natural" results. However, it is not possible to know which of these is most appropriate for any given statement without an understanding of the semantics of the property in question -- hence the need for vocabulary-specific mappings.
For example, the RDF statement:
<http://example.com/X><http://example.com/Y>
"foo" .
Could be mapped in Topic Maps to either a name or an internal occurrence (since the object is a literal).Similarly, the statement:
<http://example.com/X><http://example.com/W>
<http://example.com/Z> .
could be mapped to either an association or an external occurrence (since the object is a resource). An optimal semantic translation cannot be performed without knowledge of the semantics of the properties Y and W.
For RDF2TM mapping, the solution is to provide additional mapping information. This is done using an RDF vocabulary called RTM which is used to annotate RDF documents (or their schemas) and thus guide the translation process. The RTM vocabulary is used for translating from RDF to Topic Maps and consists of the following RDF properties: maps-to, type, in-scope, subject-role, object-role.
For TM2RDF additional information is required in order for optimal and/or predictable results to be achieved. As with the RDF2TM translation, the implementations provide some level of defaulting. Both subject identifiers and subject locators are automatically mapped to resource URIs. In addition, associations can be exported to RDF in the absence of mapping information about roles; in this case the choice of subject and object for the resulting statement is arbitrary.
As currently specified the Garshol proposal provides an almost complete solution and the author himself identifies most of the respects in which it is incomplete. Those which are not mentioned include containers, collections, XML literals and typed literals. A high degree of reversibility and round-tripping is achievable, provided appropriate reverse mappings are generated during the translation. An issue exists with subject locators that end up as subject identifiers when round-tripping from Topic Maps to RDF and back to Topic Maps.
The Unibo proposalThe authors of the Unibo proposal clearly prefer Garshol's approach rather than those previously described because it produces much more "readable" results and which is similar to their own. The main difference is that Garshol does not utilise the "standard RDF and RDFS predicates" and thus always requires a mapping to be specified.
Like earlier authors, Ciancarini et al recognize that there are two fundamental approaches to tackling the problem of translation, corresponding to what this survey calls object mapping and semantic mapping. The first of these is seen to be problematic in that "the converted document is necessarily very different from the one that would have been written directly in the destination language, and hardly readable." The problem with the second one is that it is "not always possible" to identify semantic equivalences, and that doing so often requires a case-by-case approach and thus has no general usefulness.
The authors therefore consider a hybrid approach to be the optimal solution and their implementation in the Meta Converter combines a generic mapping, which tries to stay as close as possible to the original semantics, with the ability to define specific mappings using an XML vocabulary.
The Unibo proposal is fairly complete but some features, e.g., language tags and data typing in RDF, and reification of roles and topic maps, are not covered explicitly. The proposal permits some degree of reversibility, but the result of a roundtrip may not always be the same as the starting point. For example, using the generic mappings, most RDF statements would be translated to typed associations with untyped roles, each of which would result in several statements when translated back to RDF.
The approach produces somewhat natural results in both directions provided mapping information is supplied. Generic translations are far less satisfactory, with a single binary association resulting in nine RDF statements.
Resolving remaining issues
Among the several
possible criteria for evaluating these proposals, two -- completeness and
naturalness -- have been selected as the most relevant and appropriate for
evaluating the qualities and limitations of each proposal.
Completeness -- is defined as the extent to which any semantic structure in the source model is correctly (i.e., without losing or adding information) translated into the destination model, provides a clear indication of the semantic power of each translation approach.
Naturalness -- is defined as the extent to which a translated model resembles in structure and content an equivalent model expressed directly in the target paradigm, provides an indication of the level of integration that each approach offers for the translated result to merge, and interacts with other models expressed in the same paradigm.
The analysis of the proposals identified two main approaches towards translation, which we dubbed "object mapping" (providing a translation of every structural component of the source paradigm) and "semantic mapping" (providing a structure corresponding to every conceptual structure of the source model).
The analysis of the options and solutions provided in the literature, therefore, clearly shows the advantages of semantic mapping, but at the same time lists the issues that need to be addressed and solved in any future translation approach. However, now that both RDF and Topic Maps have formal data models, and with the help of RDF Schema and OWL, it seems likely that most, if not all, of the issues we have listed here can be resolved without resorting to the restricted interoperability offered by object mapping.






Leave a comment