Tag Archive for 'results'

Page 2 of 3

On the future of search

Robert Scoble has put together a video presentation on how Techmeme, Facebook and Mahalo will kill Google in four years time. His basic premise is that SEO's who game Google's algorithm are as bad as spam (and there are some pissed SEO experts waking up today!). People like the ideas he introduces about social filtering, but on the whole - people are a bit more skeptical on his world domination theory.

There are a few good posts like Muhammad's on why the combo won't prevail, but on the whole, I think everyone is missing the real issue: the whole concept of relevant results.

Relevance is personal

When I search, I am looking for answers. Scoble uses the example of searching for HDTV and makes note of the top manufacturers as something he would expect at the top of the results. For him - that's probably what he wants to see - but for me, I want to be reading about the technology behind it. What I am trying to illustrate here is that relevance is personal.

The argument for social filtering, is that it makes it more relevant. For example, by having a bunch of my friends associated with me on my Facebook account, an inference engine can determine that if my friend called A is also friends with person B, who is friends with person C - than something I like must also be something that person C likes. When it comes to search results, that sort of social/collaborative filtering doesn't work because relevance is complicated. The only value a social network can provide is if the content is spam or not - a yes or no type of answer - which is assuming if someone in my network has come across this content. Just because my social network can (potentially) help filter out spam, doesn't make the search results higher quality. It just means less spam results. There is plenty of content that may be on-topic but may as well be classed as spam.

Google's algorithm essentially works on the popularity of links, which is how it determines relevance. People can game this algorithm, because someone can make a website popular to manipulate rankings through linking from fake sites and other optimisations. But Google's pagerank algorithm is assuming that relevant results are, at their core, purely about popularity. The innovation the Google guys brought to the world of search is something to be applauded for, but the extreme lack of innovation in this area since just shows how hard it is to come up with new ways of making something relevant. Popularity is a smart way of determining relevance (because most people would like it) - but since that can be gamed, it no longer is.

The semantic web

I still don't quite understand why people don't realise the potential for the semantic web, something I go on about over and over again (maybe not on this blog - maybe it's time I did). But if it is something that is going to change search, it will be that - because the semantic web will structure data - moving away from the document approach that webpages represent and more towards the data approach that resembles a database table. It may not be able to make results more relevant to your personal interests, but it will better understand the sources of data that make up the search results, and can match it up to whatever constructs you present it.

Like Google's page rank, the semantic web will require human's to structure data, which a machine will then make inferences - similar to how Pagerank makes inferences based on what links people make. However Scoble's claim that humans can overtake a machine is silly - yes humans have a much higher intellect and are better at filtering, but they in no way can match the speed and power of a machine. Once the semantic web gets into full gear a few years from now, humans will have trained the machine to think - and it can then do the filtering for us.

Human intelligence will be crucial for the future of search - but not in the way Mahalo does it which is like manually categorising pieces of paper into a file cabinet - which is not sustainable. A bit like how when the painters of the Sydney harbour bridge finish painting it, they have to start all over again because the other side is already starting to rust again. Once we can train a machine that for example, a dog is an animal, that has four legs and makes a sound like "woof" - the machine can then act on our behalf, like a trained animal, and go fetch what we want; how those paper documents are stored will now be irrelevant and the machine can do the sorting for us.

The Google killer of the future will be the people that can convert the knowledge on the world wide web into information readeable by computers, to create this (weak) form of artificial intelligence. Now that's where it gets interesting.

Facebook poll: how many friends do you have?

One of Facebook's new features is the ability to create surveys, targeted to certain groups of people within the community site. One caught my eye today, which asked 1,000 random people "How many friends do you have?". Although I am not sure of the conditions this poll was conducted under (ie, did only Australian's see it?), 1,000 random people should theoretically be a fairly representative sample of the entire population.

Whilst the results immediately show some interesting information on the typical size of a person's network (which is a discussion in itself), I am equally fascinated by the specific genders and age breakdown of people who answered the poll and the correlation with their network size. One theory I have of why people spend so much time on the site, is because people 'collect' friends. They are constantly discovering old friends through mutual friends - a friend's list leads a person to another profile where they may discover someone they have lost touch with. Check the results first, before I continue:

Poll on

Facebook poll breakdown

Facebook poll breakdown by age

Some of my interpretations of the results

  • Despite being open to anyone since late last year, university students still dominate the site as over half the survey was answered by people in the 18-24 age bracket
  • About 46% of males and 49% of females have over 200+ people. It's impossible to have 200 'friends' - no one can physically see 200 friends on a regular basis This tells me Facebook is now more about 'contacts' and keeping in touch with people you know. This makes it more than just a closed network of your close friends and more of a networking tool - validating what some commentators have been saying of late. I could spend a whole blog post explaining the implications of this, but basically, this means facebook is 'the' social networking site now and it's only going to get more entrenched due to the law of cumulative advantage.
  • Of people aged 35 and above, 70% have under 99 friends - which is only the case of 41% of people aged 25-34, and 19% of 18-24. This is interesting, because the people in the 24+ age group didn't have facebook when they were at university (which is why 18-24 is so dominant in this regard). Over time, you would expect the age groups to be fairly synchronised - in fact older people would have much larger networks. This tells me despite all the hype, Facebook is still not mainstream - there is a heck of a lot more growth to occur.
  • ...and leading off where I started the blog posting: the fact that more males answered the poll (53%) - despite women generally outnumbering men in Western countries - implies men are more interested in knowing how many friends people have. So if you tie that with my 'friend collector' theory means more men spend time 'collecting'...in other words, men stalk more!

Thoughts on attention, advertising, and a metric to measure both: keep it simple

Advertising on the Internet is exploding. Assuming you accept my premise that the Internet will be the backbone of the world's attention economy - then, I am sure you can see the urgency of developing an effective metric for measuring audiences that consume content online. Advertisers are expecting more accountability online and there is increasing demand for an independent third-party to verify results. But you can't have accountability and there is no value in audits, if one place measures in apples and the other in bananas.

The Attention Economy is seriously lacking an effective measurement system

Ajax broke the pageview model of impressions, the one billion-dollar practice of click-fraud is the dirty big secret of pay-for-performance advertising, and the other major metric of using unique visitors (through cookies) is proving inaccurate.

It sounds crazy, doesn't it? The Internet has the best potential for targeted advertising, and advertisers are moving onto it in stampedes - and yet, we still can't work out how to measure audiences effectively. Measurement is broken on the Net.

(Although I am focusing on advertising, this can be applied in other contexts. An advertising metric is simply putting a monetary value on what is really an attention metric.)

Yet when we look at the traditional media, are we being a little harsh on this new media? Is the problem with the web's measurement systems just that it is more accountable for its errors? After all - radio, television, and print determine their audience through inference which are based on sampling methods and not actually directly measuring an audience. Sampling is about making educated guesses - but a guess is still a guess.

Maybe another way of looking at it is that the old way of doing advertising is no longer effective. Although we can say pageviews are broken due to AJAX, the truth is it was always an ineffective measurement system, as it was based on the traditional media's premise of how many viewers/subscribers theoretically and potentially could see that ad. As an example of why this is not how it should be: when people visit my blog via Google Images, they hang around for 30 seconds. People that search for business issues on the web that I write about, like stuff you are reading right now - spend 5+ minutes. If both are equal in terms of page views, but the later actually reads the pages and the former only scans the content for an image - why are we treating them equally? My blog is half about travel, and half about the business of the internet, which is why I have two very different audiences. Just because I get high page views from my travel content, doesn't mean I can justify higher CPM's for people that want to advertise on internet issues. Not all pageviews are the same - especially when I know the people giving me high pageviews, arn't really consuming my content

Another issue is that advertisers are so caught up on who can create the most entertaining 30 second ad, that the creativity to get people entertained has ovetaken the reason why advertising happens in the first place: to make sales. The way you do that, is by communicating your product to the people that would want to buy it. If I placed advertising on this blog, from people who want to do web-business related stuff, they should only pay for the peope that read my blog postings for 5+ minutes on the Attention economy, not for the Google images searchers who are looking for porn (my top keywords, and how people find my blog, makes me laugh out loud sometimes!).

When we create a metric that measures attention, lets be sure of one thing: the old way is broken, and the new ways will continue to be broken if we simply copy and paste the old ways. New ways like click-through ads that appear on search results, and account for 40% of internet advertising is not how advertising should be measured. The reason is because it is putting the burden of an effective advertising campaign, on a publisher. Why should a publisher not get paid, with the opportunity cost of not using another ad that would have paid, because of the ineffectiveness of the advertisers campaign strategy at targeting?

When measuring audience attention, lets not overcomplicate it. It should be purely measuring if someone saw it. As an advertiser, I should be able to determine which people from which demograph can see it my ad - and yes, I will pay the premium for that targeting. If it turns into a sale, or if they enjoyed the content - is where your complex web analytic packages come in. But for a simple global measurement system, lets keep it simple.

Concluding thought

If I stood at the toll booths of the Sydney Harbour bridge naked, some people will honk at me and others won't. If I can guarantee that they can see me naked, that's all as a publisher I need to do. It's the advertisers problem if people honk at me or not. (Not enough honks means as a model I should still get my wage. They just need to hire a better looking model next time!)