Wednesday 25 May 2005

Understanding Semantic Web (Part -3)

In the previous two posts (part1 and part2)about Semantic Web, I mainly talked about the problems and approaches people follow towards making data and services more seamless. From databases, to service oriented middleware; issues of integration and challenges that lie ahead are huge.

What's semantic web going to change? Will this make the whole system work automatically? Rather than predicting the answer, let's try to walk into approaches that the Semantic Web community is trying to apply.

At the core of Semantic Web techniques, there exists a data model (like Relational DB Model, Object Oriented Model, XML) called RDF (Resource Description Framework). RDF is based on XML from a syntactical point of view, but semantically there are huge differences between the two. RDF is mainly a graph oriented model and XML is a hierarchical model. RDF enables one to make statements about resources. So if one wants to say "John has age 35", the RDF data model enables one to represent this statement.

Consider the above statement as a tuple ( John, hasAge, 35) where John is the Subject, hasAge is predicate (property) and the value 35 is an Object (Literal).

RDF enables one to make statements and also statements about statements. Extending the above example:
John owns Ferrari. The color of Ferrari is red.

Equivalent tuples:
(John, owns, Ferrari) (Ferrari, hasColor, Red)

So RDF allows one to represent statements in form of triples and then aggregate such triples (subject, predicate and object) pairs.

One thing that makes RDF interesting is the use of URI's. URI's are Unique Resource Identifiers via which one can identify a resource uniquely. e.g. URL is a particular type of URI. So how do URI's help?

Let's consider a scenario where I represent certain information about myself on my website in RDF. (Sunil, hasAge, 26) (Sunil livesIn, New Delhi)
Lets say I identify Sunil using the URI http://enventure.blogspot.com/person#sunil
This URI is unique for Sunil and anyone using the above URI anywhere refers to the same resource.
I identify the property hasAge by URI http://enventure.blogspot.com/person#hasAge
and livesIn by http://enventure.blogspot.com/person#livesIn

So triples can be represented via
(http://enventure.blogspot.com/person#Sunil , http://enventure.blogspot.com/person#hasAge, 26)

(http://enventure.blogspot.com/person#Sunil, http://enventure.blogspot.com/person#livesIn, New Delhi)

And now lets say my friend Lomesh wants to talk about himself on his website.

(http://lomesh.blogspot.com/person#Lomesh , http://lomesh.blogspot.com/person#hasAge, 26)
(http://lomesh.blogspot.com/person#Lomesh, http://lomesh.blogspot.com/person#livesIn, Sacramento)

And now say Lomesh wants to refer Sunil and say Lomesh is a friend of Sunil or want to enrich some more information about Sunil, he just adds the equivalent triples to his website.

(http://lomesh.blogspot.com/person#Lomesh , http://lomesh.blogspot.com/person#friendOf, http://enventure.blogspot.com/person#Sunil)

RDF is flexible enough to allow anyone to add any kind of triples identified by URI's. Any RDF aggregator, can aggregarte all such triples together and then do reasoning on top of it. In Object Oriented Models, every class has two main components : the data or properties it identifies (e.g name, age for Class Person) and the methods (getName(), getAge() ) which operate on data. In RDF model, data lies outside the class definition enabling anyone to add any data or property at will.

So if a future search engine, aggregates triples that exist on both Lomesh's and Sunil's website, the search engine will be able to integrate the enriched information together (URI's enable to do the corresponding matching) and present a more coherent picture.

But for this to happen, one key thing that needs to be solved is standardizing vocabularies! If everyone defines his own vocabularies of hasAge, livesIn, friendOf by his own independent URI's, search engines will still be confused as they will not be able to do any interpretation and we would still be in the syntactical world.

Vocabularies/Ontologies bring semantics to the RDF world. In essence any kind of structred data varying from a dictionary, thesaurus, categories can be considered to be an ontology, variation being the richness. Ontologies differ from vocabularies as they try to map the human or world knowledge into structure and other being its a shared knowledge, so it has to be AGREED UPON. Defining your own ontology isn't actually an ontology, its just another vocabulary.

So lets say if we want to define person via an ontology. We can define an ontology such as:
(Person hasName String)
Person is a class or a concept. It has a property "hasName" whose value is a String.

Similary other properties can be attatched or new concepts can be defined.
(Person hasAge Numeric)
(Person livesIn City)
(City isPartof State)
(State isPartof Country)
(Country isPart of World)
(India isInstanceOf Country)

The above relationships when combined together form a graph like structure where entities(subjects or objects) are related by certain properties. Ontologies can vary from being very simple to being very complex. There are ontologies for cultural domain, education, beer and wine, Persons, Address Books etc. Then there are ontologies which link multiple ontologies or act as upper level ontologies. A good resource for ontologies is http://www.schemaweb.info

Another data schema model provided by Semantic Web community is RDFS (RDF-Schema) and newer ones like OWL (Web Ontology Language extension of RDF-S). RDF-S and OWL provide constructs to build ontologies. e.g. There are constructs like (instanceof, subClass, subProperty). These constructs enable one to reason about things.

Example:
(City subClassOf Country)
(NewDelhi instanceof City)
(India instanceOf Country)

From the above three constructs, one can deduce NewDelhi is a part of India.
Languages like OWL provide more richer constructs where one can say do cardinality constraints.
e.g. One can represent the following two sentences using OWL.
Person owns a Car. Person can own 0 or more cars.


RDF-S and OWL being based on RDF use URI's to distinguish concepts and properties. So anyone can establish an ontology and anyone else can make extensions to it.

Taking the previous example of Sunil and Lomesh, there exists now two things to make data semantically rich:
An RDF-S based ontology for defining people and friends.
And particular instances Sunil and Lomesh who use that ontology to make particular statements. Any third person can use or refer to the ontology or triples written by Lomesh or Sunil.

An intelligent search engine should be able to aggregate all such triples, combine ontologies, combine the instances and then do intelligent reasoning based on that.

RDF-S, OWL, RDF all such technologies come from Artifical Intelligence background. There have been expert systems around which did most of all the stuff and much more than what RDF or OWL does. The key thing that has changed between the past and now is the WEB. The previous systems were closed systems that were intended to solve a particular AI problem, but the current ones are being designed keeping in the view the ubiquotous nature of the web.

But the problems that previous AI systems faced, semantic web community will still need to overcome them. Most of the problems revolve around ontology building, ontology maintenance, ontology merging , ontology mapping and weather do we need any complex ontologies? Most of the ontologies have grown out to be complex, sometimes far from an average individual's comprehension. What one needs is the application of software engineering practices to semantic web technologies. In the next sections, I will go through some particular challenges and use case scenarios where Semantic Web Techniques are being applied.

No comments:

Post a Comment