Frequent thinker, occasional writer, constant smart-arse

Tag: semantic web

The future is webiqitous

In the half century since the Internet was created – and the 20 years that the web was invented – a lot has changed. More recently, we’ve seen the Dot Com bubble and the web2.0 craze drive new innovations forward. But as I’ve postulated before, those eras are now over. So what’s next?

Well, ubiquity of course.

.com

Huh?
Let’s work backwards with some questions to help you understand.

Why do we now need ubiquity, and what exactly that means, requires us to think of another two questions. The changes brought by the Internet are not one big hit, but a gradual evolution. For example, “Open” has existed since the first days of the Internet in culture: it wasn’t a web2.0 invention. But “openess” was recognised by the masses only in web2.0 as a new way of doing things. This “open” culture had profound consequences: it led to the mass socialisation around content, and recognition of the power that is social media.

As the Internet’s permeation in our society continues, it will generate externalities that affect us (and that are not predictable). But the core trend can be identifiable, which is what I hope to explain in this post. And by understanding this core trend, we can comfortably understand where things are heading.

So let’s look at these two questions:
1) What is the longer term trend, that things like “open” are a part of?
2) What are aspects of this trend yet to be fully developed?

The longer term trend
The explanation can be found into why the Internet and the web were created in the first place. The short answer: interoperability and connectivity. The long answer – keep reading.

Without going deep into the history, the reason why the Internet was created was so that it could connect computers. Computers were machines that enabled better computation (hence the name). As they had better storage and querying capacities than humans, they became the way the US government (and large corporations) would store information. Clusters of these computers would be created (called networks) – and the ARPANET was built as a way of building connections between these computers and networks by the US government. More specifically, in the event of a nuclear war and if one of these computing networks were eliminated – the decentralised design of the Internet would allow the US defense network to rebound easily (an important design decision to remember).

The web has a related but slightly different reason for its creation. Hypertext was conceptualised in the 1960s by a philosopher and scientist, as a way of harnessing computers to better connect human knowledge. These men were partly inspired by an essay written in the the 1940s called “As We May Think“, where the chief scientist of the United States stated his vision whereby all knowledge could be stored on neatly categorised microfirm (the information storage technology at the time), and in moments, any knowledge could be retrieved. Several decades of experimentation in hypertext occurred, and finally a renegade scientist created the World Wide Web. He broke some of the conventions of what the ideal hypertext system would look like, and created a functional system that solved his problem. That being, connecting all these distributed scientists around the world and their knowledge.

So as it is clearly evident, computers have been used as a way of storing and manipulating information. The Internet was invented to connect computing systems around the world; and the Web did the same thing for the people who used this network. Two parallel innovative technologies (Internet and hypertext) used a common modern marvel (the computer) to connect the communication and information sharing abilities of machines and humans alike. With machines and the information they process, it’s called interoperability. With humans, it’s called being connected.

Cables

But before we move on, it’s worth noting that the inventor of the Web has now spent a decade advocating for his complete vision: a semantic web. What’s that? Well if we consider the Web as the sum of human knowledge accessible by humans, the Semantic Web is about allowing computers to be able to understand what the humans are reading. Not quite a Terminator scenario, but so computers can become even more useful for humans (as currently, computers are completely dependent on humans for interpretation).

What aspects of the trend haven’t happened yet?
Borders have been broken down that previously restrained us. The Internet and Hyptertext are enabling connectivity with humans and interoperability for computer systems that store information. Computers in turn, are enabling humans to process tasks that could not be done before. If the longer term trend is connecting and bridging systems, then the demon to be demolished are the borders that create division.

So with that in mind, we can now ask another question: “what borders exist that need to be broken down?” What it all comes down to is “access”. Or more specifically, access to data, access to connectivity, and access to computing. Which brings us back to the word ubiquity: we now need to strive to bridge the gap in those three domains and make them omnipresent. Information accessible from anywhere, by anyone.

Let’s now look at this in a bit more detail
Ubiquitous data: We need a world where data can travel without borders. We need to unlock all the data in our world, and have it accessible by all where possible. Connecting data is how we create information: the more data at our hands, the more information we can generate. Data needs to break free – detached from the published form and atomised for reuse.

Ubiquitous connectivity: If the Internet is a global network that connects the world, we need to ensure we can connect to that network irregardless of where we are. The value of our interconnected world can only achieve its optimum if we can connect wherever with whatever. At home on your laptop, at work on your desktop, on the streets with your mobile phone. No matter where you are, you should be able to connect to the Internet.

Ubiquitous computing: Computers need to become a direct tool available for our mind to use. They need to become an extension of ourselves, as a “sixth sense”. The border that prevents this, is the non-assimilation of computing into our lives (and bodies!). Information processing needs to become thoroughly integrated into everyday objects and activities.

Examples of when we have ubiquity
My good friend Andrew Aho over the weekend showed me something that he bought at the local office supplies shop. It was a special pen that, well, did everything.
– He wrote something on paper, and then through his USB, could transfer an exact replica to his computer in his original handwriting.
– He could perform a search on his computer to find a word in his digitised handwritten notes
– He was able to pass the pen over a pre-written bit of text, and it would replay the sounds in the room when he wrote that word (as in the position on the paper, not the time sequence)
– Passing the pen over the word also allowed it to be translated into several other languages
– He could punch out a query with the drawn out calculator, to compute a function
– and a lot more. The company has now created an open API on top of its platform – meaning anyone can now create additional features that build on this technology. It has the equivalent opportunity to when the Web was created as a platform, and anyone was allowed to build on top of it.

The pen wasn’t all that bulky, and it did this simply by having a camera attached, a microphone and special dotted paper that allowed the pen to recognise its position. Imagine if this pen could connect to the Internet, with access to any data, and the cloud computing resources for more advanced queries?

Now watch this TED video to the end, which shows the power when we allow computers to be our sixth sense. Let your imagination run wild as you watch it – and while it does, just think about ubiquitous data, connectivity, and computation which are the pillars for such a future.

Trends right now enabling ubiquity
So from the 10,000 feet view that I’ve just shown you, let’s now zoom down and look at trends occurring right now. Trends that are heading towards this ever growing force towards ubiquity.

From the data standpoint, and where I believe this next wave of innovation will centre on, we need to see two things: Syntactic Interoperability and Semantic Interoperability. Syntactic interoperability is when two or more systems can communicate with each other – so for example, having Facebook being able to communicate with MySpace (say, with people sending messages to each other). Semantic interoperability is the ability to automatically interpret the information exchanged meaningingfully – so when I Google Paris Hilton, the search engine understands that I want a hotel in a city in Europe, not a celebrity.

The Semantic Web and Linked Data is one key trend that is enabling this. It’s interlinking all the information out there, in a way that makes it accessible for humans and machines alike to reuse. Data portability is similarly another trend (of which I try to focus my efforts), where the industry is fast moving to enable us to move our identities, media and other meta data wherever we want to.

As Chris Messina recently said:

…the whole point of working on open building blocks for the social web is much bigger than just creating more social networks: our challenge is to build technologies that enhance the network and serve people so that they in turn can go and contribute to building better and richer societies…I can think of few other endeavors that might result in more lasting and widespread benefits than making the raw materials of human connection and knowledge sharing a basic and fundamental property of the web.

The DiSo Project that Chris leads is an umbrella effort that is spearheading a series of technologies, that will lay the infrastructure for when social networking will become “like air“, as Charlene Li has been saying for the last two years.

One of the most popular open source pieces of software (Drupal) has now for a while been innovating on the data side rather than on other features. More recently, we’ve seen Google announce it will cater better for websites that markup in more structured formats, giving an economic incentive for people to participate in the Semantic Web. API‘s (ways for external entities to access a website’s data and technology) are now flourishing, and are providing a new basis for companies to innovate and allow mashups (like newspapers).

As for computing and connectivity, these are more hardware issues, which will see innovation at a different pace and scale to the data domain. Cloud computing has long been understood as a long term shift, and which aligns with the move to ubiquitous computing. Theoretically, all you will need is an Internet connection, and with the cloud, be able to have computing resources at your disposal.

CERN

On the connectivity side, we are seeing governments around the world make broadband access a top priority (like the Australian governments recent proposal to create a national broadband network unlike anything else in the world). The more evident trend in this area however, will be the mobile phone – which since the iPhone, has completely transformed our perception of what we can done with this portable computing device. The mobile phone, when connected to the cloud carrying all that data, unleashes the power that is ubiquity.

And then?
Along this journey, we are going to see some unintended impacts, like how we are currently seeing social media replacing the need for a mass media. Spin-off trends will occur which any reasonable person will not be able to predict, and externalities (both positive and negative) will emerge as we drive towards this longer term trend of everything and everyone being connected. (The latest, for example, being the real time web and the social distribution network powering it).

Computing is life

It’s going to challenge conventions in our society and the way we go about our lives – and that’s something that we can’t predict but just expect. For now, however, the trend is pointing to how do we get ubiquity. Once we reach that, then we can ask the question of what happens after it – that being: what happens when everything is connected. But until then, we’ve got to work out on how do we get everything connected in the first place.

The evolution of news and the bootstrapping of the Semantic Web

The other month (as in, the ones where I am working 16 hour days and don’t have time to blog), I read in amazement a stunning move made by the New York Times. It was the announcement of its first API, where you could query campaign finance data. It turns out this wasn’t an isolated incident, as evidenced by yet another API release, this time for movies, with plenty more to come.

Fake New York Times newspaper That is massive! Basically, using the same data people will be able to create completely different information products.

I doubt the journalists toiling away at the Times have any idea what this will do to their antiquated craft (validating that to get the future of media you need to track technology). As the switched on Marshall Kirkpatrick said in the above linked article for Read Write Web "We believe that steps like this are going to prove key if big media is to thrive in the future."

Hell yeah. The web has now evolved beyond ‘destination’ sites as a business model. News organisations need to harness the two emerging business models – platforms and networks. Whilst we’ve seen lots of people trying the platform model (as aggregators – after all, that is what a traditional newspaper has been in society), this is the first real example I have seen of the heritage media doing the network model. The network model means your business thrives by people using *other* peoples’ sites and services. It sounds counter intuitive but it’s the evolution of the information value chain.

This will certainly make Sir Tim Berners-Lee happy. The Semantic Web is a vision that information on the web is machine readable so that computers can truly unleash their power. However this vision is gaining traction very slowly. We will get there, but I am wondering whether the way we get there is not how we expect.

The New Improve Semantic Web: now with added meaning!

These API’s that allow web services to reuse their data in a structured way may just be what the Semantic Web needs to bootstrap it. There’s an assumption with the vision, which is that for it to work, all data needs to be open and publicly accessible. The economics are just not there yet for companies to unlock their data and my work this year with the DataPortability Project has made me realise to get value out of your data you simply need access to it (which doesn’t necessarily mean public data).

Either way, for me this was one of the biggest news events of the year, and one that very quietly has moved on. This will certainly be something worth tracking in 2009 as we see the evolution of not just the Semantic Web, but also Social Media.

It’s all still alpha in my eyes

The invention of hypertext has been the most revolutionary thing since two previous technologies before: the printing press and the alphabet. Combined with computing and the Internet, we have seen a new world represented by the World Wide Web that has transformed entire industries in its mere 19 15 year existence.

The web caught our imagination in the nineties, which became the Dot-Com bubble. Several years after the bust, optimism reawakened when the Google machine listed on the stock exchange – heralding a new era dubbed “web2.0”. This era has now been recognised in the mainstream, elevated by the mass adoption of the social computing services, and has once again seen the web transform traditional ideas and generate excitement.

davewiner
The web2.0 era is far from over – the recent global recession however has flagged though that the pioneers of the industry are looking for something new. As the mainstream is rejuvenated by web2.0 like the Valley was not that long ago, it’s time to now look for what the next big thing will be. Innovation on the web is apparently flattening. Perhaps it has – but the seeds of the next generation of innovation on the web are already here.

Controversy of the meaning of web2.0 – and what its successor will be – should not distract us. We are seeing the web and associated technologies evolve to new heights. So the question is not when web2.0 ends, but what are we seeing now, that will dominate in the future?

My view:
• The mobile web. The mobile phone is now evolving into a generic entertainment device, becoming a new computing device that extends the reach of the internet. First with the desktop computer, and then with the laptop computer – new opportunities presented themselves in the way we could use computers. The use of this new computing platform will create new opportunities that we have only scratched the surface.
• The 3D web. Visit second life, the virtual world, as you quickly note the main driver of activity is sex and that it’s just a game. However, porn and games have spearheaded a lot of the innovation of technology in the past. The 3D web is now emerging with four separate but related trends: virtual worlds, mirror worlds, augmented reality and lifelogging.
• The data web. Data has now become a focus in the industry. The semantic web, eventually, will allow a weak form of artificial intelligence that will allow computer agents to work in an automated fashion. Vendor Relationship Management is changing the fundamental assumptions of advertising, with a new way of how we transact in our world. Those trends, when combined with the drive for portability of peoples data, is having us see the web in a new light with new potential. Not as a collection of documents, and not as a platform for computing, but as a database that can be queried.

So to get some discussion, I thought I might ping some smart people I know in the industry on what they think: Chris Saad, Daniela Barbosa, Ben Metcalfe, Ross Dawson, Mick Liubinskas, Randal Leeb-du Toit, Stewart Mader, Tim Bull, Seth Yates, Richard Giles as well as you reading this now.
What do you think is currently in the landscape that will dominate the next generation of the web?

What is the DataPortability Project

When we created the DataPortability workgroup in November 2007, it was after discussion amongst a few of us to further explore an idea; a vision for the future of the social web. By working together, we thought we could make real change in the industry. What we didn’t realise, was how quickly and how big the attention generated by this workgroup was to be. A press release has been released that details the journey to date, which highlight’s some interesting tidbits. What I am going to write below, are how my own thoughts have evolved over the last few months, and what it is that I think DataPortability is.

1) Getting companies to adopt open, existing standards
RSS , OpenID , APML , oAuth , RDF , and the rest. These technologies exist, with of which have been around for many years. Everyone that understands what they are, know that they rock. If these standards are all so great – why hasn’t the entire technology industry adopted them yet? Now we just need awareness, education and in some cases pressure on the industry heavies to adopt them.

2) Create best practices of implementing these standards
When you are part of a community, you are in the know, and don’t realise how the outside world looks in. Let the standards communities focus their precious energies on creating and maintaining the technologies; and DataPortability can help provide resources for people to implement them. Is providing PHP4 support for oAuth really a priority? It isn’t for them – but by pooling the community with people that have diverse skillsets and are committed to the overall picture, it has a better chance of happening.

3) Synthesise these open standards to play nice with each other.
All these different communities working in isolation have been doing their own thing. An example is how Yadis-XRDS are working on service discovery and have a lacklustre catalogue. Do we just leave them to do their own thing? Does someone else in Bangalore create his own catalogue? (Which is highly likely given the under-exposure of this key aspect to groups needing it for the other standards, and the current state its in). Thanks to Kaliya for mentioning that the XRDS guys have been more then proficient in working with other groups – "how do you think their spec is part of the OpenID spec?". Julian Bond goes on to say: "Yadis-XRDS is only months old and XRDS-Simple is literally days old…Having trouble thinking of a community that is working in isolation. And that isn’t likely to be hugely offended if you suggested it. " So let me leave the examples here, and just say the DataPortability Project when defining technical and policy blueprints, can identify issues and from the bigger picture perspective focus attention on where it’s needed. By embracing the broader community, and focusing our attention on weaknesses, we can ensure no one is reinventing wheels .

4) Communicate all the good things the existing communities are doing, under the one brand, to the end user.
RSS is by far the most recognised open standard. Have you ever tried explaining RSS to someone who is outside of the tech industry? I have. Multiple times. It’s like I’ve just told them about the future with flying cars and settlements on Mars. I’ve done it in in the corporate world, to friends, family, girls I date, guys I weight train with and anyone else. Moving onto OpenID – does anyone apart from Scoble and the technorati who try all the webservices they can, really care? Most people use Facebook, Hotmail (the cutting edge are using Gmail) and that’s it. On your next trip to Europe ask a cultured French (wo)man if they know what OpenID is; why they need it; what they can do with it. Now try explaining RSS to the mix. And APML. And oAuth. Bonus if you can explain RDF to yourself.

Wouldn’t it be just easier if you explained what DataPortability is, and explained the benefits that can be achieved by using all these standards? Standards are invisible things that consumers shouldn’t need to care about; they just care about the benefits. Do consumers care about the standards behind Wi-Fi, as defined by Zero-conf – or do they care about clicking "enable wireless" on their laptop and them connecting to the Internet. If you are going around evangelising the technical standards, the only audience you will get are the corporates in IT departments, who couldn’t care less. The corporate IT guys respond to their customer/client facing guys, who in turn respond to consumers – and consumers couldn’t care less on how its done, but just what they can do. Have the consumer channel their demand, and it benefits the whole ecosystem.


The new DataPortability trustmark

It has been said the average consumer doesn’t care about DataPortability. Of course they don’t – we are still in the investigation phase of the Project ; which later on will evolve to the design phases and then evangelising phases. We know people would want RSS, oAuth, and the rest of the Alphabet soup – so lets use DataPortability as a brand that we can communicate this. Sales is about creating demand – lets coordinate our ‘selling’ to make it overwhelming – and make it easy for consumers to channel that want in a way they can relate to. You don’t say "oAuth"; you say "preventing password theft" to them instead.

5) Make the business case that a user should get open access to their data
Why should Facebook let other applications use the data it has on its servers? Why should google give up all this data they have about their users to a competitor? Why should a Fortune 500 adopt solutions that decentralise their control? Why should a user adopt RDF on their blog when they get no clear benefit from it? Is a self-trained PHP coder who can whack something together, going to be able to articulate that to the VC’s?

The tech industry has this obsession that nothing gets done unless the developers are on board. No surprises there – if we don’t have an engineer to build the bridge, we are going to have to keep jumping off the cliff hoping we make it to the other side. But at the same time, if you don’t have the people persuading the people that would fund this bridge; or the broader population about how important it is for them to have this bridge – that engineer can build what he wants but the end result is that no one will ever walk on it. Funny how web2.0 companies suck at the revenue model thing : overhype on the development innovation, with under-hype on the value-proposition to the ordinary consumer who funds their business .

Developers need to be on board because they hassle their bosses and sometimes that evangelising from within works; but imagine if we get the developers bosses bosses on board because some old bear on the board of directors wants DataPortability after his daughter explained it to him (the same person that also told him about Facebook and Youtube). I can assure you, as I’ve seen it first hand with the senior leadership at my own firm, this is exactly what is happening.

Intel is one of the best selling computer-chip companies in the world. Do you really think as a consumer I care about what chip my computers works on? Logically – no. But "Intel’s Inside" marketing campaign gave them a monopoly, because end consumers would ask "does it have intel inside?" and this pressure forced Intel’s customers (IBM and the rest) to actually use Intel. Steve Greenberg corrects me by saying "The Intel Inside campaign came a decade after Intel took over the world. It wasn’t what got them there. It was in response to Microsoft signaling that they liked AMD. Looked like AMD was going to take off… but then they didn’t". So my facts were slightly wrong, but the point still remains.
At the same time, it isn’t just political pressure but its also to educate. I genuinely believe opening up your data is a smart business strategy that will change the potential of web services.

You make people care by giving them an incentive to do it (business opportunities; customer political pressure; peer pressure as individuals and an industry which later evolve to industry norms). The semantic web communities, the VRM communities, the entire open standards communities – all have a common interest in doing this. DataPortability is culture change on an industry wide level, that will improve the entire ecosystem. Apparently innovation has died – I say it’s just beginning .

Here’s a secret: the semantic web is the boring bit

Marshall Kirkpatrick caused a wave today, when he gave a brutally honest assessment of one of the most talked up semantic web applications, Twine. It was as per usual, an excellent analysis by Marshall and I don’t think he needs to hide behind his words as they are fair. However, what I think is crucial is now that the semantic web is gaining traction into the mainstream from a academic thesis to real world web applications, is we do a little bit of stakeholder management.

Ready? The semantic web is as boring as bat shit.

Essentially, the semantic web is about structuring content in a way so that computers can interpret the information. It’s a bit like linking every word on the web, to a dictionary entry so that computers understand the language that humans use.

But seriously, how is that exciting? People don’t get the semantic web, because it’s the fundamentals – and thats boring! Take for example RDF, the semantic web building block, and which is about structuring data into subject, predicate and object. This is straight from primary school grammar lessons, where we learn about the fundamentals of the English language (no coincidence I just linked to an grammar guide, not the RDF guide). And if you have heard of subject, predicate and object before in the context of the semantic web, you probably didn’t even realise it’s how the entire English language is based. It’s because you probably did learn it, and forgot – it’s boring as bat shit. But damn, without them, we wouldn’t be communicating right now to each other.

The point I want to make, is that the building blocks are not where the excitement: the excitement, is what you can do once we have those building blocks. In English, we have poetry, literature, and just language in general where we communicate as human beings. Once we get the basics down of information, we are laying the foundation of a whole new world of computational possibilities. Marshall is spot on in saying “…semantics may be best suited to the back end…” because the excitement is what they enable, not the actual semantics itself which is going to take a long time to build up.

Imagine, the sum of human knowledge accessible by a computer to query? Semantic web applications are boring and you won’t ever get them – but what they enable, is a whole new world of potential which once we can flick the switch, will mean a world we will barely recognise from today’s standpoint.

My presentation at Kickstart forum

I’m currently at Kickstart forum (along with the Mickster), and I just gave a presentation on DataPortability to a bunch of Aussie journalists. I didn’t write a speech, but I did jot down some points on paper before I spoke, so I thought I might share them here given I had a good response.

My presentation had three aspects: background, explanation, and implications of DataPortability. Below is a summary of what I said

Background

  • Started by a bunch of Australians and a few other people overseas in November 2007 out of a chatroom. We formed a workgroup to explore the concept of social network data portability
  • In January 2008, Robert Scoble had an incident, which directed a lot of attention to us. As a consequence, we’ve seen major companies such as Google, Microsoft, Yahoo, Facebook, Six Apart, LinkedIn, Digg, and a host of others pledge support for the project.
  • We now have over 1000 people contributing, and have the support of a lot of influential people in the industry who want us to succeed.

Explanation

  • The goal is to not invent anything new. Rather, it’s to synthesise existing standards and technologies, into one blueprint – and then we push it out to the world under the DataPortability brand
  • When consumers see the DataPortability brand, they will know it represents certain things – similar to how users recognise the Centrino brand represents Intel, mobility, wireless internet, and a long battary life. The brand is to communicate some fundamental things about a web service, that will allow a user to recognise a supporting site respects it’s users data rights and certain functionality.
  • Analogy of zero-networking: before the zeroconf initiative it was difficult to connect to the internet (wirelessly). Due to the standardisation of policies, we can now connect on the internet wirelessly at the click of a button. The consequence of this is not just a better consumer experience, but the enablement of future opportunities such as what we are seeing with the mobile phone. Likewise, with DataPortability we will be able to connect to new applications and things will just “work” – and it will see new opportunity for us
  • Analogy of the bank: I stated how the attention economy is something we give our attention to ie, we put up with advertising, and in return we get content. And that the currency of the attention economy is data. With DataPortability, we can store our data in a bank, and via “electronic transfer”, we can interact with various services controlling the use of that data in a centralised manner. We update our data at the bank, and it automatically synchronises with the services we use ie, automatically updating your Facebook and MySpace profiles

Implications

  1. Interoperability: When diverse systems and organisations work together. A DataPortability world will allow you to use your data generated from other sites ie, if you buy books on Amazon about penguins, you can get movie recommendations on your pay TV movie catalog for penguins. Things like the ability to log in across the web with one sign-on, creates a self-supporting ecosystem where everyone benefits.
  2. Semantic web: I gave an explanation of the semantic web (which generated a lot of interest afterwards in chats), and then I proceeded to explain that the problem for the semantic web is there hasn’t been this uptake of standards and technologies. I said that when a company adopts the DataPortability blueprint, they will effectively be supporting the semantic web – and hence enabling the next phase of computing history
  3. Data rights: I claimed the DataPortability project is putting data rights in the spotlight, and it’s an issue that has generated interest from other industries like the health and legal sectors, and not just the Internet sector. Things like what is privacy, and what exactly does my “data” mean. DataPortability is creating a discussion on what this actually means
  4. Wikiocracy: I briefly explained how we are doing a social experiment, with a new type of of governance model, which can be regarded as an evolution of the open source model. “Decentralised” and “non-hierarchical”, which with time it will be more evident with what we are trying to do

Something that amused me was in the sessions I had afterwards when the journalists had a one-on-one session with me, one woman asked: “So why are you doing all of this?”. I said it was an amazing opportunity to meet people and build my profile in the tech industry, to which she concluded: “you’re doing this to make history, aren’t you?”. I smiled 🙂

Don’t get the Semantic Web? You will after this

Prior to 2006, I had sort of heard of the Semantic Web. To be honest, I didn’t know much – it was just another buzzword. I’ve been hearing about Microformats for years, and cool but useless initiatives like XFN. However to me it was simply just another web thing being thrown around.

Then in August 2006, I came across Adrian Holovaty’s article where he argues journalism needs to move from a story-centric world to a data-centric world. And that’s when it dawned on me: the Semantic web is some serious business.

I have since done a lot of reading, listening, and thinking. I don’t profess to be a Semantic Web expert – but I know more than the average person as I have (painfully) put myself through videos and audios of academic types who confuse the crap out of me. I’ve also read through a myriad of academic papers from the W3C, which are like the times when you read a novel and keep re-reading the same page and still can’t remember what you just read.

Hell – I still don’t get things. But I get the vision, so that’s what I am going to share with you now. Hopefully, my understanding will benefit the clueless and the skeptical alike, because it’s a powerful vision which is entirely possible

1) The current web is great for humans; useless for machines
When you search for ambiguous terms, at best, search engines can algorithmically predict some sort of answer that partially answers your query. Sometimes not. But the complexity of language, is not something engineers can engineer to deal with. After all, without ambiguity of natural languages, the existence of poetry is impossible.

Fine.

What did you think when you read that? As in: “I’ve had it – fine!” which is like another way of saying ok or agreeing with something. Perhaps you thought about that parking ticket I just got – illegal parking gets you fined. Maybe you thought I am applauding myself by saying that was one fine piece of wordcraftship I just wrote, or said in another context, like a fine wine.

Language is ambiguous, and depending on the context with other words, we can determine what the meaning of the word is. Search start-up company Powerset, which is hoping to kill Google and rule the world, is employing exactly this technique to improve search: intelligent processing of words depending on context. So by me putting in “it’s a fine”, it understands the context that it’s a parking ticket, because you wouldn’t say “it’s a” in front of ‘fine’ when you use it to agree with something (the ‘ok’ meaning above).

But let’s use another example: “Hilton Paris” in Google – the worlds most ‘advanced’ search engine. Obviously, as a human reading that sentence, you understand because of the context of those words I would like to find information about the Hilton in Paris. Well maybe.

Let’s see what Google comes up with: Of the ten search results (as of when I wrote this blog posting), one was a news item on the celebrity; six were on the celebrity describing her in some shape or form, and three results were on the actual Hotel. Google, at 30/70 – is a little unsure.

Why is Paris Hilton, that blonde haired thingy of a celebrity, coming up in the search results?

Technologies like Powerset apparently produce a better result because it understands the order of the words and context of the search query. But the problem with these searches, isn’t the interpretation of what the searcher wants – but also the ability to understand the actual search results. Powerset can only interpret so much of the gazilions of words out there. There is the whole problem of the source data, no just the query. Don’t get what I mean? Keep reading. But for now, learn this lesson

Computers have no idea about the data they are reading. In fact, Google pumping out those search results is based on people linking. Google is a machine, and reads 1s and 0s – machine language. It doesn’t get human language

2) The Semantic web is about making what human’s read, machine readable
Tim Berner’s Lee, the guy that invented the World Wide Web and the visionary behind the Semantic Web, prefers to call it the ‘data web’. The current web is a web of documents – by adding this extra data to content – machines will be able to understand it. Metadata, is data about data.

A practical outcome of having a semantic web, is that Google would know that when it pulls up a web page regardless of the context of the words – it will understand what the content is. Think of every word on the web, being linked to a master dictionary.

The benefit of the semantic web is not for humans – at least immediately. The Semantic Web is actually pretty boring with what it does – what is exciting, is what it will enable. Keep reading.

3) The Semantic web is for machines to interpret, not people
A lot of the skeptics of the semantic web, usually don’t see the value of it. Who cares about adding all this extra meta data? I mean heck – Google still was able to get the website I needed – the Hilton in Paris. Sure, the other 60% of the results on that page were irrelevant, but I’m happy.

I once came across a Google employee and he asked “what’s the point of a semantic web; don’t we already enough metadata?” To some extent, he’s right – there are some websites out there that have metadata. But the point of the semantic web is so that machines once they read the information, can start thinking like how a human would and connecting it to other information. There needs to be across the board metadata.

For example, my friend Michael was recently looking to buy a car. A painful process, because there are so many variables. So many different models, different makes, different dealers, different packages. We have websites, with cars for sale neatly categorised into profile pages saying what model it is, what colour it is, and how much. (Which may I add, are hosted on multiple car sites with different types of profiles). A human painfully reads through these profiles, and computes as fast as a human can. But a machine can’t read these profiles.

Instead of wasting his (and my) weekends driving around Sydney to find his car, a machine could find it for him. So, Mike would enter his profile in – what he requires in a car, what his credit limit is, what his prior history with cars are – everything that would affect his judgement of a car. And then, the computer can query every online website with cars to match the criteria. Because the computer can interpret these websites across the board, it can evaluate and it can go back to Michael and say “this is the car for you, at this dealer – click yes to buy”.

The semantic web is about giving computers the information to be able to interpret data, so that it can do what they do really well – compute.

4) A worldwide database
What essentially Berner’s Lee envisions, is turning the entire world wide web into a database that can be queried. Currently, the web looks like Microsoft Word – one swab of text. However, if that swab of text was neatly categorised in an Excel spreadsheet, you could manipulate that data and do what you please – create reports, reorder them, filter, and do whatever until your heart is content.

At university, I was forced to do an Information Systems subject which was essentially about the theory of databases. Damn painful. I learned only two things from that course. The first thing was that my lecturer, tutor, and classmates spoke less intelligible English than a caterpillar. But the second thing was that I learned what information is and how it differs from data. I am now going to share with you that lesson, and save you three months of your life.

You see, data is meaningless. For example, 23 degrees is data. On its own, it‚Äôs useless. Another piece of data in Sydney. Again, – useless. I mean, you can think all sorts of things when you think of Sydney, but it doesn‚Äôt have any meaning.

Now put together 23 degrees and Sydney, and you have just created information. Information is about creating relationships between data. By creating a relationship, an association, between these two different pieces of data – you can determine it’s going to be a warm day in Sydney. And that is what information is: Relationship building; connecting the dots; linking the islands of data together to generate something meaningful.

The semantic web is about allowing computers to be able to query the sum of human knowledge like one big database to generate information

Concluding thoughts
You are probably now starting to freak out and think “Terminator” images with computers suddenly erupting form under your computer desk, and smashing you against the wall as a battle between humans and computers begins. But I don’t see it like that.

I think about the thousands of hours humans spend trying to compute things. I think of the cancer research, whereby all this experimentation occurring in labs, is trying to connect new pieces of data with old data to create new information. I think about computers being about to query the entire taxation legislation to make sure I don’t pay any tax, because it knows how it all fits together (having studied tax, I can assure you – it takes a lifetime to only understand a portion of tax law). In short, I understand the vision of the Semantic web as a way of linking things together, to enable computers to compute – so that I can sit on my hammock drinking my beer, as I can delegate the duties of my life to the machines.

All the semantic web is trying to do, is making sure everything is structured in a consistent manner, with a consistent dictionary behind the content, so that a machine can draw connections. As Berner’s Lee said on one of the videos I saw: “it’s all about creating links”.

The process to a Semantic Web is boring. But once we have those links, we can then start talking about those hammocks. And that’s when the power of the internet – the global network – will really take off.

On the future of search

Robert Scoble has put together a video presentation on how Techmeme, Facebook and Mahalo will kill Google in four years time. His basic premise is that SEO’s who game Google’s algorithm are as bad as spam (and there are some pissed SEO experts waking up today!). People like the ideas he introduces about social filtering, but on the whole – people are a bit more skeptical on his world domination theory.

There are a few good posts like Muhammad‘s on why the combo won’t prevail, but on the whole, I think everyone is missing the real issue: the whole concept of relevant results.

Relevance is personal

When I search, I am looking for answers. Scoble uses the example of searching for HDTV and makes note of the top manufacturers as something he would expect at the top of the results. For him – that’s probably what he wants to see – but for me, I want to be reading about the technology behind it. What I am trying to illustrate here is that relevance is personal.

The argument for social filtering, is that it makes it more relevant. For example, by having a bunch of my friends associated with me on my Facebook account, an inference engine can determine that if my friend called A is also friends with person B, who is friends with person C – than something I like must also be something that person C likes. When it comes to search results, that sort of social/collaborative filtering doesn’t work because relevance is complicated. The only value a social network can provide is if the content is spam or not – a yes or no type of answer – which is assuming if someone in my network has come across this content. Just because my social network can (potentially) help filter out spam, doesn’t make the search results higher quality. It just means less spam results. There is plenty of content that may be on-topic but may as well be classed as spam.

Google’s algorithm essentially works on the popularity of links, which is how it determines relevance. People can game this algorithm, because someone can make a website popular to manipulate rankings through linking from fake sites and other optimisations. But Google’s pagerank algorithm is assuming that relevant results are, at their core, purely about popularity. The innovation the Google guys brought to the world of search is something to be applauded for, but the extreme lack of innovation in this area since just shows how hard it is to come up with new ways of making something relevant. Popularity is a smart way of determining relevance (because most people would like it) – but since that can be gamed, it no longer is.

The semantic web

I still don’t quite understand why people don’t realise the potential for the semantic web, something I go on about over and over again (maybe not on this blog – maybe it’s time I did). But if it is something that is going to change search, it will be that – because the semantic web will structure data – moving away from the document approach that webpages represent and more towards the data approach that resembles a database table. It may not be able to make results more relevant to your personal interests, but it will better understand the sources of data that make up the search results, and can match it up to whatever constructs you present it.

Like Google’s page rank, the semantic web will require human’s to structure data, which a machine will then make inferences – similar to how Pagerank makes inferences based on what links people make. However Scoble’s claim that humans can overtake a machine is silly – yes humans have a much higher intellect and are better at filtering, but they in no way can match the speed and power of a machine. Once the semantic web gets into full gear a few years from now, humans will have trained the machine to think – and it can then do the filtering for us.

Human intelligence will be crucial for the future of search – but not in the way Mahalo does it which is like manually categorising pieces of paper into a file cabinet – which is not sustainable. A bit like how when the painters of the Sydney harbour bridge finish painting it, they have to start all over again because the other side is already starting to rust again. Once we can train a machine that for example, a dog is an animal, that has four legs and makes a sound like “woof” – the machine can then act on our behalf, like a trained animal, and go fetch what we want; how those paper documents are stored will now be irrelevant and the machine can do the sorting for us.

The Google killer of the future will be the people that can convert the knowledge on the world wide web into information readeable by computers, to create this (weak) form of artificial intelligence. Now that’s where it gets interesting.

Google: the ultimate ontology

A big issue with the semantic web is ontologies – the use of consistent definitions to concepts. For those that don’t understand what I’m talking about – essentially, the next evolution of the web is about making content readable by not just humans but also machines. However for a machine to understand something it reads, it needs consistent definitions. Human’s for example, are intelligent – they understand that the word “friend” is also related to the word “acquaintance”, but a computer would treat them to mean two different things. Or do they?

Just casually looking at some of my web analytics, I noticed some people landed on my site by doing a google search for how many acquaintances do people have, which took them to a popular posting of mine about how many friends people have on facebook. I’ve had a lot of visitors because of this posting, and its been an interesting case study for me on how search engines work. However today was something different from other times: I found the word acquaintance weird. I know I didn’t use that word in my posting – and when I went to the Google cache I realised something interesting: because someone linked to me using that word, the search engine replaced the word ‘friend’ with ‘acquaintances’.

acquaintances

Google’s linking mechanism is one powerful ontology generator.

Patents: more harm than good

When I was in Prague two years ago, I met a bloke from Bristol (UK) that very convincingly explained how patents as a concept, are stupid. Because alcohol was involved, I can’t recall his actual argument, but it has since made me question: do you really need a patent to protect your business idea?

Narendra Rocherolle, an experienced entrepreneur, has written a good little article explaining when you should, and shouldn’t, spend money to protect your IP. Racherolle offers a good analysis, but I am going to extend it by stating that a patent can be dangerous for your business, and not just because of the monetary cost. Radar Networks is my case-study – a stealth-mode “Semantic web” company, that has received a lot of press lately because apparently they are doing something big but they are not going to tell us until later this year.

Continue reading