While interviewing with Yahoo I was asked a simple question, “Are you familiar with the idea of the Semantic Web?” Embarrassingly enough, I did not know the answer. Luckily I still got the job, and later I did my research. Here’s the answer:
Modern computing has evolved over many generations that are defined by how we store, search, and view our data. To understand the next generation, what is being called the “Semantic Web,” we must first look at its origins.
Desktop Computers
Remember before everyone had the internet? Data was stored in local directory structures and databases.
Web 1.0 - 1994 to 2004
In the early days of the internet, information was only transmitted one way. Companies focused on broadcasting their information via a static website, and the entire web experience was controlled by the companies who created and hosted web pages.
Web 2.0 - 2004 to Present
Social networking, blogging, tagging, sharing, and updating. Information is no longer one-way. Gone are websites, now we have web applications. Bloggers create their own news and information sources. Users choose their news via RSS feeds. Communities collectively write their own encyclopedia. Videos and pictures are hosted, shared, and commented, and tagged. Networks are created by interpersonal connections. Widgets and add-ons are created by individuals to expand their favorite web applications.
Thus, vast amounts of data are available on the web, in orders of magnitude above what was available in the Web 1.0. The problem we currently have is how to make sense of this information and allow the user to search it all. Industry leaders such as Google and Yahoo use keyword searching. The problem is that keyword searching returns a haystack of web pages that contain the keyword(s), and you are left to look for the needle. This because the web 2.0 is built on technologies that focus on user interaction and ignore putting their data into some sort of universal underlying structure.
The Semantic Web - The Future
The web 3.0 won’t make the web more social or more interactive. Instead it will turn this giant cloud of user generated information into something meaningful and easily searchable. Currently the web is meant to be read by humans and is stored in a way that computers cannot understand.
For example, a simple storefront web page selling something is easily human readable, but a computer can not understand. This is because the information is simply thrown onto the page with HTML, and then styled to be human readable with CSS. Thus, you have <div>’s, <table>’s, and <span>’s, but imagine if you had <item for sale> <item cost> <item description> <item benefit> <item use> where each object is part of a huge interconnecting description of human knowledge.
The entire web would then be written in a universally understandable format. Display to the user will not be changed, just the underlying structure. The major problem facing the Semantic Web is that all web pages will have to be carefully designed, and the usage of semantic techniques will have to be widespread for there to be any real benefit.
If these challenges can be overcome, then the internet will evolve into a large artificial knowledge system. The obvious next step is to overlay some form of universal decision making…