More than Buzz — The Semantic Web

Sir Tim Berners-Lee’s, Director of the World Wide Web Consortium (W3C) has been evangelizing his vision for “The Semantic Web” for quite some time. Heck, a lot of people in the industry have been talking about it… and like most things, when people start talking, the “buzz” is created and hits the mainstream.

I typically recognize that “the buzz” has escaped the halls of tech-geekdom and entered the mainstream when my clients start asking me about the specifics of this or that. With this as my unscientific buzz-measuring tool, I can clearly see “the buzz” of “The Semantic Web” being on the rise.

As silly as it sounds, I’ve recently been asked by folks in meetings “How can we be more Semantic?” I guess the look to me for answers, as it is my job to answer these types of questions for them. In regards to this subject specifically, it is a really hard question to answer at this stage in the Semantic Web game. (And a rather silly question if I do say so myself).

It reminds me of a conversation almost two years ago when I heard someone say “Our online stuff is old. We need to ‘Ajax’ it”.

Yikes!

I did make an attempt to answer them the best I could while being put on the spot. Much to their dismay, my answer really centered on “give it some time… ”

So, while thoughts about The Semantic Web are fresh in my head, I figured I’d hammer out some ideas related to it, and make an attempt to demystify things the best I can. (Without writing a giant paper on the subject.) Let me preface the rest of this posting by saying that this is so, so, so high-level. (For other thoughts on technologies related to the Semantic Web, check out my post on Microformats).


What’s the Semantic Web Buzz? Is it a better “Web 2.0?”

So, Semantic Web as a Buzzword: I have a hard time lumping something that is potentially so revolutionary into the “buzzword” category.

“Web 2.0” is a buzz-word. The Semantic Web is a fundamental change to how we create, consume and integrate content, of one form or another, into their online applications and into the lives of users (consumers).

The “Web 2.0” that people refer to is nothing more than a better “Web 1.0”.

It took about 15 years, but the industry has finally learned that user experience really does matter, therefore existing technologies (JavaScript, CSS and DHTML) were evolved to make Web sites more engaging, interactive, real-time and “software-like”.

Don’t get me wrong, everything that people consider to be facets of “Web 2.0” absolutely thrill me. From much cooler and fun to use user interfaces to social content and collaboration… The innovations (and implementations) over the last several years have really made much of the Web a much better experience for users.

I admit it, I use my “Web 2.0” apps as much as anyone and find a lot of the social content stuff like Twitter, Pownce, and Digg to be essential to my daily information consuming activities.

Before I get off track here, my point is really that while we call things “Web 2.0”, all we’ve really done is put lipstick on the Web 1.0 pig.

It’s better, it’s kissable… but it isn’t a fundamental transformation of the way people (and computers) communicate. We’ve made applications better, but we haven’t changed the way we think about content.

Cases in point:

  • In 1996 we had: Personal Web Pages, IRC, ICQ, PowWow, Web Rings, etc, etc
  • In 2008, we have: Blogs, Jabber, Twitter, Social Networks, etc, etc.
  • We’ve gotten smarter through experience, but we haven’t yet really seen a revolution…

Enter “The Semantic Web”

At it’s root, the term “semantic” stands for “meaning of.”

The “semantic of ‘X’” = The “meaning of ‘X'”.

With this in mind, we can say that the Semantic Web is a Web that is able to describe things (content: text, data, video, audio, etc) in a way that computer software can understand and that allows computers (software) to interpret and relate content in a specific fashion (to a specific user, or company, or subject, etc).

It is easy for people to understand the meaning of content, as our brains don’t need to be programmed to do so.

We understand the meaning of things by learning. Unfortunately, software hasn’t evolved to the point where AI is just “built in” and computer programs aren’t yet sophisticated enough to learn solely on their own. They need help from people, and more so from the way that people tell them to behave. (That’s you I’m talking about)

As an example, a few specific sentences worth of information can give a person a fairly well-rounded understanding of a topic:

  • Barack Obama is a Democrat.
  • Barack Obama is running for president in the Democratic Primary against Hillary Clinton.
  • Both Barack Obama and Hillary Clinton are US Senators.
  • The Democratic Primary winner will run against John McCain in the General Election.
  • I live in Chicago as does Barack Obama.
  • Hillary Clinton is From Chicago also, but spent most of her time in Arkansas and is now living in New York.
  • My sister lives in New York.

These sentences can easily be understood by people, but not so easily by computer systems. The theory behind the Semantic Web is to allow computers to understand information like this, and put it into context related to other information.

We can understand these sentences because we understand the syntax of the English language. All sentences are constructed with the same type of rules / syntax. The syntax of a language defines the rules used to construct meaningful statements that can be understood and put into context with other statements.

This is what the Semantic Web is all about. Defining a way that things can be described so that computers applications can understand them.

According to the W3C, The Semantic Web is about two things:

  1. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents.

 

  • It is about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.

 

Also, the Semantic Web is NOT about:

  1. The Semantic Web is not about links between web pages.

 

  • The Semantic Web describes the relationships between things (like A is a part of B and Y is a member of Z) and the properties of things (like size, weight, age, and price)

 


Let’s break it down

It is a lot to take in and it is a relatively hard thing to understand unless specific examples of how semantics actually play are presented in an practical way.

While the march towards a Semantic Web is highly technical in nature (hey… someone has to build this stuff)… The real output of the effort to make this happen is all about user experience.

Computers don’t care about content, people do. When all is said and done, this is about delivering better content to people. Yes, “systems” will benefit from this. But what good is a system if it doesn’t provide value to end users at some point?

The first time I think the impact of the semantic Web concept became clear to me was at the end of 2004 when I stumbled across a video called “Epic 2014”. I went from being confused and skeptical about the whole concept to thrilled about the possibility of a Web that could deliver on the future that Tim Berners-Lee and the W3C were proposing.

Since this time, my understanding of what a Semantic Web means have evolved, but for the sake of fun, (and a brief intermission), check out the Epic 2014 and Epic 2015 videos below. They should provide some context around the rest of this blog post.

Epic 2014 (The Original Video)


Epic 2015 – The Video Updated

EPIC 2014” was created by Robin Sloan and Matt Thompson and based on a presentation that they gave at the Poynter Institute in the spring of 2004. The “Museum of Media History” is a fictitious organization, and as you can clearly see, the actual scenario in the video is also made up to demonstrate their point.

Sloan and Thompson were inspired to create their movies after a speech in 2003 by Martin Niselholz, CEO of New York Times Digital. While not a direct representation of Niselholz’ speech, the film producers borrowed from his general concept and ran with it in their own direction.

True or not, it is quite thought stimulating and really does explore the potential long-term evolution of news aggregators like Google News and Newsbot with other Web 2.0 technologies such as blogging, social networking and user-generated content.

The second video, “EPIC 2015″, takes things a little bit further and incorporates additional “Web 2.0” concepts such as podcasting, GPS and web-based mapping services.

“Mash-Ups Vs. The Semantic Web”

We’ve all seen Mash-Up Web sites. I am assuming that if you are reading this, than you know what a mash-up is. For the edge-case users that happen upon this posting, a “mash-up” is a term to describe a Web site that takes data from different sources and “mashes” them together to create an application.

For the sake of being lazy, I’ve included this link to WikiPedia that lays out the whole Mash-up concept.

So, what’s different about the Semantic Web concept and the Mash-ups that we currently have?
The easiest way for me to digest this is to go back to the W3C’s description of what the Semantic Web “is not” (from above):

1) The Semantic Web is NOT ABOUT LINKS between Web pages.

It is also not about how we currently see “mash-ups” being executed by Web developers. In a mash-up, we are relating information (data) to other data based on simple look-up values. We click on an item in a Web page and the Web server returns related information to the screen as that information is pulled from other data sources.

2) The Semantic Web describes the relationships between things (like A is a part of B and Y is a member of Z) and the properties of things (like size, weight, age, and price)

In your typical mash-up, we combine one or more sources of relational data to provide additional content to users. For example, we know that if a user is looking at Real Estate in Brooklyn, that we can also present a Map of Brooklyn with points plotted out on that map for each property for sale, as well as crime statistics from the NYPD.

This is the result of doing relational lookups of existing “dumb” data and presenting them to the user in a usable manner.

Under the rules of the Semantic Web, the data itself is much richer and in a format that not only describes the data itself, but the relationship between items and all of the different properties that individual data points could potentially have.

In the book “The Road to the Semantic Web”, Alex Iskold explains his thinking that the core idea of the Semantic Web is to create the meta data describing data, which will enable computers to process the meaning of things. “Once computers are equipped with semantics, they will be capable of solving complex semantical optimization problems.”

So, What does It Mean Now?

Let’s be clear. A real realization of true semantic Web applications is quite some time away.

The description of data consumed by Semantic Web applications is done so using a concept known as RDF (Resource Description Framework). RDF has been invented and evolved by individuals with academic backgrounds in Artificial Intelligence and Logic, and as a result, is not so easy for your typical Web developer (or even experience programmer) to understand and implement. There is quite a learning curve!

It is complicated stuff, not 100% perfected, and will definitely continue to change over time until it is more fully baked.

There is somewhat of a shortcut though. While not as robust as a future vision of RDF, it is quite possible to use the very popular RSS format to build Semantic applications today. I love how we’ve learned to take a technology intended for one use, and expand upon it to do other things. For a tutorial on using RSS in this manner, check this lesson brought to you by Eric van der Vlist and the folks at O’Reilly.

A fully realized Semantic Web will be quite amazing indeed, but it is going to take a long time to get to the point where the technology regularly intersects with our daily lives.

It is going to take a long time to annotate the world’s information and then to capture personal information in the right way in order to really make it work the way it is supposed to.

We are a few years away before we really start to see real traction in terms of Semantic Web technology.

We have to start somewhere though, and there are a variety of interesting companies out there working to be early adopters and technology “shapers” of the Semantic Web:

I’d like to wrap this one up by saying that if you have additional information, thoughts or opinions about where The Semantic Web is headed, please drop me an email with your thoughts. Also, check out this other post about Microformats and how they relate to the concept of a Semantic Web.

I’d be more than happy to incorporate them and share them with the community!