Thursday, 16 June 2005

Google Will Eat Itself

Via John Battelle's Blog

Check out GWEI

We generate money by serving Google text advertisments on our website GWEI.org. With this money we automatically buy Google shares via our Swiss e-banking account. We buy Google via their own advertisment! Google eats itself - but in the end we will own it!


So who else wanna do it???

Wednesday, 15 June 2005

Understanding Semantic Web (Part -4)

In this last post of the "Understanding Semantic Web" series (1 2 3)I will mainly talk about the current scenarios where Semantic Web techniques are being applied and what holds there in the future.

Semantic Web technologies are trying to penetrate every existing corner of software development, the way we build software from databases, middleware, applications to even designing UI for a portal. A number of databases (Kowari, rdfDB etc.), semantic middleware platforms (Intellidimension), multimedia systems are being built by the research community. In this post I will mainly concentrate on applications, which is where I believe the action lies!

In the previous posts I mentioned about ontologies which provide definition and are an important part of the application domain. Most of the applications that have become widespread are mainly with smaller ontologies that an individual can understand. The list below is not exhaustive and is provided to just give the readers (newbies) an initial impression.

User Applications

RSS - RSS is a widespread technology. Its commonly referred to as the "low hanging fruit" of the semantic web. RSS enables anyone to package content and metadata and provide updates. Mainly used for news and blogs updates.

FOAF - Friend of a Friend - FOAF provides vocabulary where friends can define semantic descriptions about each other. Simple GUI Applications exist for creating and visualizing FOAF data. People are trying to define a trust of web based on FOAF data. Perhaps this can revolutionize the way social networking works today or build smarter systems who know your friends and perhaps enemies too!!

Creative Commons - Creative Commons provides a very simple vocabulary to define licensing schemes for any kind of original/derived works (content, blog,website, pictures, audio , video ). An author can via simple form choose what kind of license he wants. A semantic description is generated at the back that an author can put on his website. Any search engine aggregates these descriptions and provides a semantic search. So I can now search for images which I as an author can use freely without paying a royalty. Yahoo recently incorporared creative commons in its search engine.

EventSherpa (SemaView) - Tool and service for creating and sharing events, schedules and calendar information over the Internet. (www.semaview.com or www.eventsherpa.com) I used their tool around a year back but apparently the site is down at the moment.


Enterprise Systems

Adobe XMP (Extensible Metadata Platform) - Adobe XMP Platform enables embedding of semantic descriptions as binary format within the multimedia (JPEG, GIF, TIFF images etc.) itself. So any image when once tagged via semantics using an Adobe Tool retains its semantics forever, for instance if image is copy pasted, the semantic descriptions get carried along. The platform is open to anyone to build or query semantic descriptions.


There are a number of other companies providing enterprise level solutions using Semantic Web Technologies e.g. Mondecca, Empolis, Ontoprise, Cerebra, Profium and others.

Application of semantic web technologies in the enterpise world is more in the sense of supporting a particular standard. The systems or platforms here offer a particular solution (Intranet or Internet portal which can be more dynamically configured or provide a better search using semantic metadata or as information integrators) for their clients. There isn't a standard software semantic web stack that is used (and is being built).


Research Applications

Haystack - information client as part of the Information Management Project at MIT. It aggregates RDF from multiple arbitrary locations and presents it to the user in a human-readable fashion.

TAP - A Project at stanford enabling Activity Based Search. Check out this.

Annotation Tools: Tools enabling rich annotations of HTML/multimedia documents. A number of such tools exist (MnM, Cultos,GATE, KIM, SWAN) enabling manual/ semi-automatic/ automatic generation of semantic descriptions.

AKTive Space Visualisations: Prototype showing geographically, research being conducted at different locations in UK. http://www.aktors.org/technologies/geography/


Research Themes: There are number of interesting projects going on in the research community.

- Languages - RDF, RDFS, OWL, RULEML (etc..)
- Semantic Databases - store and query RDF descriptions via SQL like query languages.
- Inference Engines - (inferencing RDF/OWL semantic descriptions and do reasoning/inferencing)
- Visualization Tools
- Ontology Core Research + Tools (Ontology creation, editing, merging, maintenance etc.)
- Buiding Domain Specific and Domain Independent Ontologies (Human effort for building ontologies by Ontologists)
- Semantic Search (Latent Semantic Indexing - how does one define PageRank in Semantic metadata ?)
- Semantic Middleware (standard J2EE/dot net, P2P , Asynchronous Messaging Services)
- Semantic Web Services (Service creation, discovery, query, integration)
- Semantic User Interfaces
- NLP (Natural Language Processing)
....
and the list is endless.


So what's the killer app of Semantic Web?

I think Semantic Web is much like the Web of today. Infrastructure is being built as of today but there isn't a single application like EMAIL which one could call a killer app. For an application to be killer, there has to be a widespread adoption. Semantic Web Techniques as of today are much heavy for a common man to comprehend and even for a software engineer to build. The revolution is more likely to occur in the middleware space which will act as an integration platform. For users, it could be an information dashboard, or a single service criss crossing multiple platforms -- the interface would likely remain the same or get simpler. What's going to change is the richness!

One important part within semantic web lies in the creation of semantics. With content creation moving from web onto mobiles, semantic web technologies can be clevery applied to bring in richer semantics. So if carefully harnessed next semantic web killer app might lie in the mobile world than on the web.

Semantic Web Technologies are not just about generating RDF/OWL encoded data, its also about being making systems more open. XML/Web Services have already started that trend. Companies like Google, Yahoo, Amazon, eBay (more recent ones like Flickr) provide a platform to query their data. And this has resulted in number of small startups and innovative services (Clubbing Craigslist classifieds with Google Maps).

In future one could envision every major service provider to start providing data and services as XML/Web services/RDF etc.

Monster providing Jobs data.
Social networks like LinkedIn, Orkut.
Match.com
Travel Sites

As each of the above services start opening up, one would see a greater ease in information integration and newer services propping up. A killer application might be the one which helps one manage himself/herself better. RSS removed the pain that one doesn't need to go and visit every website even though one is essentially viewing the same content. The semantic web would remove the pain more or less in the same way as RSS did with content. One thing it might fuel more is more innovative services and more Innovation!

Tuesday, 7 June 2005

Greasemonkey

Greasemonkey is an extension of Firefox that allows users to alter the content and behavior of any website through user scripts which work inside the browser.

I did hear the buzz around Greasemonkey from quite some time back. I thought of checking the Greasemonkey site and found quite many scripts. One can find scripts for altering webpages of CNN, BBC, Amazon, craigslist, eBay, ESPN, Friendster, GMAIL and even an indian website Indiatimes.com.
Current list of user scripts is available here.

Friday, 27 May 2005

Blogging As a Career

Gawker is trying to set up a model for advertising-supported weblogs. Gawker Media blogs include the popular gossip sites Gawker, Wonkette and Defamer, gadget blog like Gizmodo and others. Bloggers are paid $2500 a month and the blog aims to earn 75K $ per annum. And yes, you can be an intern too. IWantMedia has a story published on Gawker.

Mobile Search

Got this interesting piece from Mobile Technology Weblog about Mobile Search getting hotter.

Firstly, 30% of searches are currently to look for mobile content (ringtones etc). Since about 2/3 of mobile content is currently sold via operator portals, this is a clear and present danger for operator revenues. In other words, while they may make money from the advertisers paying for their ads to be presented to users, many of these ads will be for competitors of the operators.


Complete article here.

Wednesday, 25 May 2005

Understanding Semantic Web (Part -3)

In the previous two posts (part1 and part2)about Semantic Web, I mainly talked about the problems and approaches people follow towards making data and services more seamless. From databases, to service oriented middleware; issues of integration and challenges that lie ahead are huge.

What's semantic web going to change? Will this make the whole system work automatically? Rather than predicting the answer, let's try to walk into approaches that the Semantic Web community is trying to apply.

At the core of Semantic Web techniques, there exists a data model (like Relational DB Model, Object Oriented Model, XML) called RDF (Resource Description Framework). RDF is based on XML from a syntactical point of view, but semantically there are huge differences between the two. RDF is mainly a graph oriented model and XML is a hierarchical model. RDF enables one to make statements about resources. So if one wants to say "John has age 35", the RDF data model enables one to represent this statement.

Consider the above statement as a tuple ( John, hasAge, 35) where John is the Subject, hasAge is predicate (property) and the value 35 is an Object (Literal).

RDF enables one to make statements and also statements about statements. Extending the above example:
John owns Ferrari. The color of Ferrari is red.

Equivalent tuples:
(John, owns, Ferrari) (Ferrari, hasColor, Red)

So RDF allows one to represent statements in form of triples and then aggregate such triples (subject, predicate and object) pairs.

One thing that makes RDF interesting is the use of URI's. URI's are Unique Resource Identifiers via which one can identify a resource uniquely. e.g. URL is a particular type of URI. So how do URI's help?

Let's consider a scenario where I represent certain information about myself on my website in RDF. (Sunil, hasAge, 26) (Sunil livesIn, New Delhi)
Lets say I identify Sunil using the URI http://enventure.blogspot.com/person#sunil
This URI is unique for Sunil and anyone using the above URI anywhere refers to the same resource.
I identify the property hasAge by URI http://enventure.blogspot.com/person#hasAge
and livesIn by http://enventure.blogspot.com/person#livesIn

So triples can be represented via
(http://enventure.blogspot.com/person#Sunil , http://enventure.blogspot.com/person#hasAge, 26)

(http://enventure.blogspot.com/person#Sunil, http://enventure.blogspot.com/person#livesIn, New Delhi)

And now lets say my friend Lomesh wants to talk about himself on his website.

(http://lomesh.blogspot.com/person#Lomesh , http://lomesh.blogspot.com/person#hasAge, 26)
(http://lomesh.blogspot.com/person#Lomesh, http://lomesh.blogspot.com/person#livesIn, Sacramento)

And now say Lomesh wants to refer Sunil and say Lomesh is a friend of Sunil or want to enrich some more information about Sunil, he just adds the equivalent triples to his website.

(http://lomesh.blogspot.com/person#Lomesh , http://lomesh.blogspot.com/person#friendOf, http://enventure.blogspot.com/person#Sunil)

RDF is flexible enough to allow anyone to add any kind of triples identified by URI's. Any RDF aggregator, can aggregarte all such triples together and then do reasoning on top of it. In Object Oriented Models, every class has two main components : the data or properties it identifies (e.g name, age for Class Person) and the methods (getName(), getAge() ) which operate on data. In RDF model, data lies outside the class definition enabling anyone to add any data or property at will.

So if a future search engine, aggregates triples that exist on both Lomesh's and Sunil's website, the search engine will be able to integrate the enriched information together (URI's enable to do the corresponding matching) and present a more coherent picture.

But for this to happen, one key thing that needs to be solved is standardizing vocabularies! If everyone defines his own vocabularies of hasAge, livesIn, friendOf by his own independent URI's, search engines will still be confused as they will not be able to do any interpretation and we would still be in the syntactical world.

Vocabularies/Ontologies bring semantics to the RDF world. In essence any kind of structred data varying from a dictionary, thesaurus, categories can be considered to be an ontology, variation being the richness. Ontologies differ from vocabularies as they try to map the human or world knowledge into structure and other being its a shared knowledge, so it has to be AGREED UPON. Defining your own ontology isn't actually an ontology, its just another vocabulary.

So lets say if we want to define person via an ontology. We can define an ontology such as:
(Person hasName String)
Person is a class or a concept. It has a property "hasName" whose value is a String.

Similary other properties can be attatched or new concepts can be defined.
(Person hasAge Numeric)
(Person livesIn City)
(City isPartof State)
(State isPartof Country)
(Country isPart of World)
(India isInstanceOf Country)

The above relationships when combined together form a graph like structure where entities(subjects or objects) are related by certain properties. Ontologies can vary from being very simple to being very complex. There are ontologies for cultural domain, education, beer and wine, Persons, Address Books etc. Then there are ontologies which link multiple ontologies or act as upper level ontologies. A good resource for ontologies is http://www.schemaweb.info

Another data schema model provided by Semantic Web community is RDFS (RDF-Schema) and newer ones like OWL (Web Ontology Language extension of RDF-S). RDF-S and OWL provide constructs to build ontologies. e.g. There are constructs like (instanceof, subClass, subProperty). These constructs enable one to reason about things.

Example:
(City subClassOf Country)
(NewDelhi instanceof City)
(India instanceOf Country)

From the above three constructs, one can deduce NewDelhi is a part of India.
Languages like OWL provide more richer constructs where one can say do cardinality constraints.
e.g. One can represent the following two sentences using OWL.
Person owns a Car. Person can own 0 or more cars.


RDF-S and OWL being based on RDF use URI's to distinguish concepts and properties. So anyone can establish an ontology and anyone else can make extensions to it.

Taking the previous example of Sunil and Lomesh, there exists now two things to make data semantically rich:
An RDF-S based ontology for defining people and friends.
And particular instances Sunil and Lomesh who use that ontology to make particular statements. Any third person can use or refer to the ontology or triples written by Lomesh or Sunil.

An intelligent search engine should be able to aggregate all such triples, combine ontologies, combine the instances and then do intelligent reasoning based on that.

RDF-S, OWL, RDF all such technologies come from Artifical Intelligence background. There have been expert systems around which did most of all the stuff and much more than what RDF or OWL does. The key thing that has changed between the past and now is the WEB. The previous systems were closed systems that were intended to solve a particular AI problem, but the current ones are being designed keeping in the view the ubiquotous nature of the web.

But the problems that previous AI systems faced, semantic web community will still need to overcome them. Most of the problems revolve around ontology building, ontology maintenance, ontology merging , ontology mapping and weather do we need any complex ontologies? Most of the ontologies have grown out to be complex, sometimes far from an average individual's comprehension. What one needs is the application of software engineering practices to semantic web technologies. In the next sections, I will go through some particular challenges and use case scenarios where Semantic Web Techniques are being applied.