Giovanni Pirrotta

Just a curious person

Semantic Web Ingredients: why the XML alone is not sufficiently powerful

November 20, 2013

In this post we continue the Semantic Web post series (the last published in July) and in particular I will explain why XML alone is not sufficiently powerful in the Semantic Web context.

XML is the acronym of Extensible Markup Language and represents a meta-language for the syntactic definition of markup languages allowing to add metadata to local data and separate contents from presentations with a neutral textual format.

Imagine we have the following XML code:


<?xml version="1.0" encoding="ISO-8859-1" ?>
<mailbox>
	<mail id="01">
		<to>Giovanni</to>
		<from>Francesco</from>
		<title>Enjoy Christmas</title>
		<body>Merry Christmas and a Happy New Year</body>
	</mail>
</mailbox>

In the above example the tags <mailbox>, <mail>,<to>,<from>,<title> and <body> have been created specifically for our document and the name of each tag has been implemented to describe in the best possible way the content data. The big problem is that XML is unsuitable to the global semantic interpretation because there is no way to explain to the machine that for example, the string Giovanni, enclosed withing the tag <to>, represents the name of a person.

With XML it is possible to adequately describe the content of a document, bux XML syntax does not define any explicit mechanism to semantically describe relationships between resources inside and outside the document. In fact, XML alone is not sufficiently powerful to explain the semantics in an independent and autonomous way from the context.

Also it is not able to generate new knowledge starting from the original data infering new statements. For example, if we have the fact The dog is a mammal, the XML does not allow to define the sentence in terms of relationship. Also if we consider the fact Fido is a dog, it should be possible to infer the new fact Fido is a mammal. But XML does not allow to explicit this rule between resources because of it is unsuitable to represent semantically the resource on the Web.

To overcome the XML semantic markup limitations, the W3C Consortium has formalized some models which aim to provide an abstract formalism to describe resources and their relationships (RDF, RDFs, OWL).

The RDF model will be described in the next post, so, stay tuned!

Comments