Tag Archive for 'information'

Page 2 of 10

Here’s a secret: the semantic web is the boring bit

Marshall Kirkpatrick caused a wave today, when he gave a brutally honest assessment of one of the most talked up semantic web applications, Twine. It was as per usual, an excellent analysis by Marshall and I don't think he needs to hide behind his words as they are fair. However, what I think is crucial is now that the semantic web is gaining traction into the mainstream from a academic thesis to real world web applications, is we do a little bit of stakeholder management.

Ready? The semantic web is as boring as bat shit.

Essentially, the semantic web is about structuring content in a way so that computers can interpret the information. It's a bit like linking every word on the web, to a dictionary entry so that computers understand the language that humans use.

But seriously, how is that exciting? People don't get the semantic web, because it's the fundamentals - and thats boring! Take for example RDF, the semantic web building block, and which is about structuring data into subject, predicate and object. This is straight from primary school grammar lessons, where we learn about the fundamentals of the English language (no coincidence I just linked to an grammar guide, not the RDF guide). And if you have heard of subject, predicate and object before in the context of the semantic web, you probably didn't even realise it's how the entire English language is based. It's because you probably did learn it, and forgot - it's boring as bat shit. But damn, without them, we wouldn't be communicating right now to each other.

The point I want to make, is that the building blocks are not where the excitement: the excitement, is what you can do once we have those building blocks. In English, we have poetry, literature, and just language in general where we communicate as human beings. Once we get the basics down of information, we are laying the foundation of a whole new world of computational possibilities. Marshall is spot on in saying "...semantics may be best suited to the back end..." because the excitement is what they enable, not the actual semantics itself which is going to take a long time to build up.

Imagine, the sum of human knowledge accessible by a computer to query? Semantic web applications are boring and you won't ever get them - but what they enable, is a whole new world of potential which once we can flick the switch, will mean a world we will barely recognise from today's standpoint.

Pageview’s are a misleading metric

Recently MySpace, the social networking site that once dominated but is now being overtaken by Facebook, sent me an e-mail informing me that a friend of mine had a birthday. What is unusual, is that although I have received notifications of this type when I had logged into the site, I had never been e-mailed.

Below is a copy of the e-mail, and lets see if you notice what I did:
birthdayreminder

It doesn't tell me whose birthday it is. In fact, it is even ambiguous as to whether it was just the one person or not. Big deal? Not really. But it very clearly tells me something: MySpace is trying to increase its pageviews.

Social networking sites are very useful services to an individual; they enable a person to manage and monitor their personal networks. Not only am I in touch with so many people I lost contact with, but I am in the loop with their lives. I may not message them, but by passive observation, I know what everyone is up to. Things like what they're studying, where they work, what countries they will be holidaying in, and useful things like when they have their birthday.

Social networking sites are not just a website, but an information service, to help you manage your life. However as useful as I find these services, the revenue model is largely dependent on advertising, with premium features a rare thing now. So when you rely on advertising, you are going to be looking at ways of boosting the key figures that determine that revenue stream.

Friendster's surprising growth in May was due to some clever techniques of using e-mail, to drive pageviews. And it worked. E-mail notifications, when done tactfully, can drive a huge amount of activity. Of the what seems like hundreds of web services I have joined, e-mail at times is the only way for me to remember I even subscribed to it once upon a time. Combine e-mail with information I want to be updated with, and you've got a great recipe for using e-mail as a tool to drive page views.

...And that is the problem. MySpace has very cleverly sent this e-mail to get me to log into my account. A marketing campagn like that will at the very least, see a good day in pageview growth. But the reason I am logging in, is just so I can see whose birthday it is. Myspace now to me is irrelevant: those pageviews attributed to me are actually, not one of an engaged user.

Pageviews as a metric for measuring audience engagement is prone to manipulation. Increases in pageviews on the face of it, make a website appear more popular. But in reality, dig a little deeper and the correlation for what really matters (audience engagement) is not quite on par.

So everyone, repeat after me: Pageviews - we need to drop them as a concept if we are ever going to make progress.

How Google reader can finally start making money

Today, you would have heard that Newsgator, Bloglines, Me.dium, Peepel, Talis and Ma.gnolia have joined the APML workgroup and are in discussions with workgroup members on how they can implement APML into their product lines. Bloglines created some news the other week on their intention to adopt it, and the announcement today about Newsgator means APML is now fast becoming an industry standard.

Google however, is still sitting on the side lines. I really like using Google reader, but if they don?¢‚Ǩ‚Ñ¢t announce support for APML soon, I will have to switch back to my old favourite Bloglines which is doing some serious innovating. Seeing as Google reader came out of beta recently, I thought I?¢‚Ǩ‚Ñ¢d help them out to finally add a new feature (APML) that will see it generate some real revenue.

What a Google reader APML file would look like
Read my previous post on what exactly APML is. If the Google reader team was to support APML, what they could add to my APML file is a ranking of blogs, authors, and key-words. First an explanation, and then I will explain the consequences.

In terms of blogs I read, the percentage frequency of posting I read from a particular blog will determine the relevancy score in my APML file. So if I was to read 89% of Techcrunch posts ?¢‚Ǩ‚Äú which is information already provided to users ?¢‚Ǩ‚Äú it would convert this into a relevancy score for Techcrunch of 89% or 0.89.

ranking

APML: pulling rank

In terms of authors I read, it can extract who posted the entry from the individual blog postings I read, and like the blog ranking above, perform a similar procedure. I don?¢‚Ǩ‚Ñ¢t imagine it would too hard to do this, however given it?¢‚Ǩ‚Ñ¢s a small team running the product, I would put this on a lower priority to support.

In terms of key-words, Google could employ its contextual analysis technology from each of the postings I read and extract key words. By performing this on each post I read, the frequency of extracted key words determines the relevance score for those concepts.

So that would be the how. The APML file generated from Google Reader would simply rank these blogs, authors, and key-words - and the relevance scores would update over time. Over time, the data is indexed and re-calculated from scratch so as concepts stop being viewed, they start to diminish in value until they drop off.

What Google reader can do with that APML file
1. Ranking of content
One of the biggest issues facing consumers of RSS is the amount of information overload. I am quite confident to think that people would pay a premium, for any attempt to help rank the what can be the hundreds of items per day, that need to be read by a user. By having an APML file, over time Google Reader can match postings to what a users ranked interests are. So rather than presenting the content by reverse chronology (most recent to oldest); it can instead organise content by relevancy (items of most interest to least).

This won?¢‚Ǩ‚Ñ¢t reduce the amount of RSS consumption by a user, but it will enable them to know how to allocate their attention to content. There are a lot of innovative ways you can rank the content, down to the way you extract key works and rank concepts, so there is scope for competing vendors to have their own methods. However the point is, a feature to ?¢‚ǨÀúSort by Personal Relevance?¢‚Ǩ‚Ñ¢ would be highly sort after, and I am sure quite a few people will be willing to pay the price for this God send.

I know Google seems to think contextual ads are everything, but maybe the Google Reader team can break from the mould and generate a different revenue stream through a value add feature like that. Google should apply its contextual advertising technology to determine key words for filtering, not advertising. It can use this pre-existing technology to generate a different revenue stream.

2. Enhancing its AdSense programme

blatant ads

Targeted advertising is still bloody annoying

One of the great benefits of APML is that it creates an open database about a user. Contextual advertising, in my opinion is actually a pretty sucky technology and its success to date is only because all the other types of targeted advertising models are flawed. As I explain above, the technology instead should be done to better analyse what content a user consumes, through keyword analysis. Over time, a ranking of these concepts can occur ?¢‚Ǩ‚Äú as well as being shared from other web services that are doing the same thing.

An APML file that ranks concepts is exactly what Google needs to enhance its adwords technology. Don?¢‚Ǩ‚Ñ¢t use it to analyse a post to show ads; use it to analyse a post to rank concepts. Then, in aggregate, the contextual advertising will work because it can be based off this APML file with great precision. And even better, a user can tweak it ?¢‚Ǩ‚Äú which will be the equivalent to tweaking what advertising a user wants to get. The transparency of a user being able to see what 'concept ranking' you generate for them, is powerful, because a user is likely to monitor it to be accurate.

APML is contextual advertising biggest friend, because it profiles a user in a sensible way, that can be shared across applications and monitored by the user. Allowing a user to tweak their APML file for the motivation of more targeted content, aligns their self-interest to ensure the targeted ads thrown at them based on those ranked concepts, are in fact, relevant.

3. Privacy credibility
Privacy is the inflation of the attention economy. You can?¢‚Ǩ‚Ñ¢t proceed to innovate with targeted advertising technology, whilst ignoring privacy. Google has clearly realised this the hard way by being labeled one of the worst privacy offenders in the world. By adopting APML, Google will go a long way to gain credibility in privacy rights. It will be creating open transparency with the information it collects to profile users, and it will allow a user to control that profiling of themselves.

APML is a very clever approach to dealing with privacy. It?¢‚Ǩ‚Ñ¢s not the only approach, but it a one of the most promising. Even if Google never uses an APML file as I describe above, the pure brand-enhancing value of giving some control to its users over their rightful attention data, is something alone that would benefit the Google Reader product (and Google?¢‚Ǩ‚Ñ¢s reputation itself) if they were to adopt it.

privacy

Privacy. Stop looking.

Conclusion
Hey Google - can you hear me? Let's hope so, because you might be the market leader now, but so was Bloglines once upon a time.