Giovanni Pirrotta

Just a curious person

Semantic Web Ingredients: RDFa, RDF in attributes

July 30, 2014

In previous posts I introduced the RDF model as the key ingredient of the Semantic Web and other standard languages, such as RDF Schema and OWL, to describe concepts and relationships more expressively. These semantic technologies allow to extend the actual Web providing a globally coherent notion of meaning, transforming the text in concepts and relations to increase the data integration with external applications. #}

We know the HTML Web languages do not explicitly encode information, then, to create the new Web (also called by someone Web 3.0) we must proceed as follows:

  • for each new Web page we have to describe the information therein contained in a new parallel document where we have to apply one or more specific ontologies defining the Web page data into resources. The result will be a machine-understandable document;
  • for pages currently on the Web (the majority) a solution could be that of adding a triplifier tool or scraping Web pages to automatically generate the relative semantified version of Web documents, allowing machines to process data.

In both cases some problems immediately arise. Web pages may frequently change without warning and for each update a human intervention is necessary to align data. Publishing a relative RDF document for each Web page significantly increases data redundancy; this violates the “Don’t Repeat Yourself” principle (DRY) and this should be avoided.

To circumvent these limits, the Resource Description Framework in attributes W3C Recommendation (RDFa) provided a solution to embed machine readable data on the same Web page. The concept is very simple; the RDFa technology allows to bring the RDF model into Web pages coding RDF statements inside XHTML tags as attributes. The RDFa defines a syntax for embedding an RDF graph in XHTML document using attributes for expressing RDF properties about concepts inside the page. So doing, the DRY principle is not violated, and data result to be written and published only once, either for humans or machines. Data inside pages would be easier to maintain since each update will require to modify only one source. In addition, structured information would allow a better indexing by search engines that will experience benefits in terms of efficiency.

Now I will show an example:

<HTML>
  <head><title>An RDFa sample</title></head>
  <body>
    <p>The author of <b>UML Distilled</b>
    is <i>Martin Fowler</i></html>
    </p>
  </body>
  </html>

We need to unlock the metadata already in Web-pages, and RDFa provides a generic way to do this by building on features already in HTML. In the following code we can see how to enrich the previous HTML code using RDFa technology

<?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
  "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml"
    xmlns:ex="http://example.org/" version="XHTML+RDFa 1.0" xml:lang="en">
     <head><title>An RDFa sample</title></head>
     <body>
     <p><div typeof="ex:Book" about="http://example.org/umld">
      The author of <b><span property="ex:title">UML Distilled</span></b>
      is <div typeof="ex:Person" about="http://example.org/mfowler">
      <i><span property="ex:name">Martin</span>
      <span property="ex:surname">Fowler</span></div></i>
      </div></p>
   </body>
   </html>
  • about attribute determines what we are talking about;
  • property attribute defines the type of relation between resources;
  • typeof attribute defines the class to instantiate.

When a Web page is a valid RDFa document generally it is present the following icon…

Image

…and here you can find the W3C Validator tool.

Obviously, this is just an overview. For detail I invite you to refer to the official documentation.

That’s all folks. Stay tuned!

Comments