What is the Semantic Web?

What is the semantic Web?

I seem to be asked this question on a daily basis mostly in reference to Google now using RDFA and Drupal 7 efforts to put RDFA in Drupal Core (iO1 is sponsoring ;-). The answer is not a simple one liner so I am constantly struggling to come up with a concise unifying statement that allows me to explain it to anyone. No luck yet so here's my current best effort at it.

This is a VERY top level and simplified explanation that is not written as a proper technical introduction but merely as something that will help non technical people get their head around the concepts. It is missing chunks of the semantic web on purpose.

The Semantic Web
Lets start with trying to get you thinking in a way that will help drive your interest deeper Have a look at this list http://sindice.com/map and imagine all the ways in which you could use all the data from these sites if you could just link together it in a useful manner and query it in an intelligent manner. The solution to how to create whatever your mind just came up with is the semantic web.

At the moment we have a web of documents that are linked via hyperlinks. While this has been great in getting the web started it extremely limiting in terms of being able to develop intelligent computing solutions to common problems. This happens because all the data ends up in silo's with no way of properly expressing the links between the information.

The semantic web allows us to adds meaning to links between "stuff" on the web, (stuff can include the documents themselves or people) and add additional machine processable data to "stuff". Critical to the concept of the semantic web is the notion of being able to run a query against the web in the way that you would run a query on a Database in house. As you can see from the list above we are moving towards a web of data with more and more people publishing there information is usable formats.

Obvious problems that this approach can solve:

Scenario 1)

  • The BBC run a front page news article linking to a site www.we-are-scammers.me saying that this site is known to be scammer and no-one should use them.
  • Obviously other people will now run with the story and start to post similar links to www.we-are-scammers.me.
  • Google then gives a massive boost to the credibility level of site www.we-are-scammers.me because of all the links and now this site can rank for almost any keyword it wants in Google serps.

Result - Users being put in danger

Solution : Give the BBC ways of expressing in a computer readable manner that it thinks this is a bad link (and before anyone asks nofollow does not work in this scenario)

Scenario 2)

  • A search engine reads this paragraph:
    • I go out for a stroll on a breezy day and as I wind my way along I feel the wind in my face. This sets my mind thinking about how to improve the way that people wind up companies when they fall on there face? Recently I was involved in the wind up of a company that was involved in making gears that make it easier to wind up clocks. I felt that one of the directors was trying to wind the rest of the board of directors up so that they would quit and that this was the real reason why communications failed.
  • How does it work out what the six instances of "wind" mean?
    • If you google "wind" the entire first page (for me at least) is make up of 8 separate links to "wind" in the weather context and two links to products/services that are named "wind".

Result : Search engines return your article to the wrong people Solution: Use a semantically enabled cms that allows you to define what each instance of "wind" means using ontological references so that you help the search engines to understand what you meant.

Scenario3)

  • Twitter becomes massively popular
  • I follow a few hundred people
  • Some of these users "occasionally" tweet something useful but in the main publish links to crap or Re tweet people I am not interested in listening to

Result : I stop following them as there is no way of simply filtering the tweets that are directly relevant to what I am interested in and I feel that I am missing genuinely interesting information due to the being swamped with to many useless tweets

Solution: Using RDFA, FOAF and SIOC I can create a client that intelligently drops stuff that is obviously not of interest to me and auto filters based on:
a) Topic of tweet (RDFA)
b) Relationship between me and the person and / or the person they are retweeting (FOAF)
c) The relationship between that person and the content (SIOC)

There are millions of incredibly useful semantic applications that are suddenly becoming feasible as commercial projects due to the vast amounts of data that are now starting to be published. Just using dbpedia itself has huge commercial opportunities. I think that we are at the beginnning of the end of the beginning for the semantic web and that next year will be a year when it really starts to flourish.
Some Links I hope you will find use full:

Intro to semantic web
http://www.youtube.com/watch?v=OGg8A2zfWKg

Tim Berners Lee talking about the semantic web
http://www.youtube.com/watch?v=mVFY52CH6Bc

Tim Berners Lee on ted
http://www.ted.com/talks/lang/eng/tim_berners_lee_on_the_next_web.html

RDFA Intro
http://www.youtube.com/watch?v=ldl0m-5zLz4

Just a great site on whats happening ihe he sntintic world:
http://www.semanticfocus.com/

Great list of videos on the semantic web
http://www.semanticfocus.com/blog/entry/title/17-semantic-web-rdf-and-owl-videos/

Interesting Article on cross pollinan
http://www.semanticfocus.com/blog/entry/title/cross-pollinating-dbpedia-and-freebase/

Comments