Tag Archive for 'analysis'

Page 2 of 2

Ouch – widgets bypassing Google’s wall

Feedjit
On the right of my blog as I write this, I have a widget – it’s a simple piece of javacript, from the company Feedjit, that allows me to embed a short piece of code to indicate to my readers how other people find my blog. Since the launch of the widget, it seems like it has become very popular with 60 million widgets claimed by the company’s website.

I made a discovery today almost by accident: I accessed my blog on another computer. Or rather, I accessed my blog via Google’s cache – who have replicated my content for their search results, widgets and all. Now when you look at the Feedjit widget (image below left), the data is very different: it no longer shows visitors to my blog, but visitors to Google servers.

If you follow through to the detailed statistics you will even see what the most popular sites are that day, as well as the locations of the visitors. As this is data from the Google cache server, you are effectively getting an analysis of visitors – who they are, what keywords they are searching for, and what they found. So because my blog is part of Google cache, I can effectively hack and sneak in the backdoor of Google’s data.

(Having a quick look, it seems this URL is the main Google cache address; however data will only get logged when someone looks at the cache.)

Feedjit google cacheDoes it matter?
While this is a fun thing to look at and then move on, I think it raises some serious issues – multiple ones at that.

On widgets: With the prolifiration of widgets on the web, has this become potentially the next biggest security risk on the web?

On privacy: It’s not that hard to identify the people making those searches. Search engines handing over data to the government has been a hot issue, with Google resisiting a much hyped story as the company tried to prove it protected its users. With the growing cross-pollination of the web, exemplified with widgets, are we prepared for what it means to have open data (which is becoming inevitable)?

On metrics: Google has a complete download of my blog in its cache, but what I didn’t realise, is that it is a copy of the full blog (with scripts like my web stats). When I look at my statistics, I see an awful lot of activity from computer bots for example. Is this because every time Google, Yahoo or MSN analyse content that has been ripped off my site, I can actually see what they are doing behind their closed walls?

Those are questions with simple but also complicated answers. Either way, if its that easy to hack even Google, then God help us.

How Google reader can finally start making money

Today, you would have heard that Newsgator, Bloglines, Me.dium, Peepel, Talis and Ma.gnolia have joined the APML workgroup and are in discussions with workgroup members on how they can implement APML into their product lines. Bloglines created some news the other week on their intention to adopt it, and the announcement today about Newsgator means APML is now fast becoming an industry standard.

Google however, is still sitting on the side lines. I really like using Google reader, but if they don?¢‚Ǩ‚Ñ¢t announce support for APML soon, I will have to switch back to my old favourite Bloglines which is doing some serious innovating. Seeing as Google reader came out of beta recently, I thought I?¢‚Ǩ‚Ñ¢d help them out to finally add a new feature (APML) that will see it generate some real revenue.

What a Google reader APML file would look like
Read my previous post on what exactly APML is. If the Google reader team was to support APML, what they could add to my APML file is a ranking of blogs, authors, and key-words. First an explanation, and then I will explain the consequences.

In terms of blogs I read, the percentage frequency of posting I read from a particular blog will determine the relevancy score in my APML file. So if I was to read 89% of Techcrunch posts ?¢‚Ǩ‚Äú which is information already provided to users ?¢‚Ǩ‚Äú it would convert this into a relevancy score for Techcrunch of 89% or 0.89.

ranking

APML: pulling rank

In terms of authors I read, it can extract who posted the entry from the individual blog postings I read, and like the blog ranking above, perform a similar procedure. I don?¢‚Ǩ‚Ñ¢t imagine it would too hard to do this, however given it?¢‚Ǩ‚Ñ¢s a small team running the product, I would put this on a lower priority to support.

In terms of key-words, Google could employ its contextual analysis technology from each of the postings I read and extract key words. By performing this on each post I read, the frequency of extracted key words determines the relevance score for those concepts.

So that would be the how. The APML file generated from Google Reader would simply rank these blogs, authors, and key-words – and the relevance scores would update over time. Over time, the data is indexed and re-calculated from scratch so as concepts stop being viewed, they start to diminish in value until they drop off.

What Google reader can do with that APML file
1. Ranking of content
One of the biggest issues facing consumers of RSS is the amount of information overload. I am quite confident to think that people would pay a premium, for any attempt to help rank the what can be the hundreds of items per day, that need to be read by a user. By having an APML file, over time Google Reader can match postings to what a users ranked interests are. So rather than presenting the content by reverse chronology (most recent to oldest); it can instead organise content by relevancy (items of most interest to least).

This won?¢‚Ǩ‚Ñ¢t reduce the amount of RSS consumption by a user, but it will enable them to know how to allocate their attention to content. There are a lot of innovative ways you can rank the content, down to the way you extract key works and rank concepts, so there is scope for competing vendors to have their own methods. However the point is, a feature to ?¢‚ǨÀúSort by Personal Relevance?¢‚Ǩ‚Ñ¢ would be highly sort after, and I am sure quite a few people will be willing to pay the price for this God send.

I know Google seems to think contextual ads are everything, but maybe the Google Reader team can break from the mould and generate a different revenue stream through a value add feature like that. Google should apply its contextual advertising technology to determine key words for filtering, not advertising. It can use this pre-existing technology to generate a different revenue stream.

2. Enhancing its AdSense programme

blatant ads

Targeted advertising is still bloody annoying

One of the great benefits of APML is that it creates an open database about a user. Contextual advertising, in my opinion is actually a pretty sucky technology and its success to date is only because all the other types of targeted advertising models are flawed. As I explain above, the technology instead should be done to better analyse what content a user consumes, through keyword analysis. Over time, a ranking of these concepts can occur ?¢‚Ǩ‚Äú as well as being shared from other web services that are doing the same thing.

An APML file that ranks concepts is exactly what Google needs to enhance its adwords technology. Don?¢‚Ǩ‚Ñ¢t use it to analyse a post to show ads; use it to analyse a post to rank concepts. Then, in aggregate, the contextual advertising will work because it can be based off this APML file with great precision. And even better, a user can tweak it ?¢‚Ǩ‚Äú which will be the equivalent to tweaking what advertising a user wants to get. The transparency of a user being able to see what ‘concept ranking’ you generate for them, is powerful, because a user is likely to monitor it to be accurate.

APML is contextual advertising biggest friend, because it profiles a user in a sensible way, that can be shared across applications and monitored by the user. Allowing a user to tweak their APML file for the motivation of more targeted content, aligns their self-interest to ensure the targeted ads thrown at them based on those ranked concepts, are in fact, relevant.

3. Privacy credibility
Privacy is the inflation of the attention economy. You can?¢‚Ǩ‚Ñ¢t proceed to innovate with targeted advertising technology, whilst ignoring privacy. Google has clearly realised this the hard way by being labeled one of the worst privacy offenders in the world. By adopting APML, Google will go a long way to gain credibility in privacy rights. It will be creating open transparency with the information it collects to profile users, and it will allow a user to control that profiling of themselves.

APML is a very clever approach to dealing with privacy. It?¢‚Ǩ‚Ñ¢s not the only approach, but it a one of the most promising. Even if Google never uses an APML file as I describe above, the pure brand-enhancing value of giving some control to its users over their rightful attention data, is something alone that would benefit the Google Reader product (and Google?¢‚Ǩ‚Ñ¢s reputation itself) if they were to adopt it.

privacy

Privacy. Stop looking.

Conclusion
Hey Google – can you hear me? Let’s hope so, because you might be the market leader now, but so was Bloglines once upon a time.

Bloglines to support APML

Tucked away in a post by one of the leading RSS readers in the world, Bloglines had announced that they will be investigating on how they can implement APML into their service. The thing about standards is that as fantastic as they are, if no one uses them, they are not a standard. Over the last year, dozens of companies have implemented APML support and this latest annoucement by a revitalised Bloglines team that is set to take back what Google took from them, means we are going to be seeing a lot more innovation in an area that has largely gone unanswered.

The annoucement has been covered by Read/WriteWeb, APML founders Faraday Media,?Ç? and a thoughtful analysis has been done by Ross Dawson. Ben Melcalfe had also written a thought-provoking analysis, of the merits of APML.

What this means?

APML is about taking control of data that companies collect about you. For example, if you are reading lots of articles about dogs, RSS readers can make a good guess you like dogs – and will tick the “likes dogs” box on the profile they build of you which they use to determine advertising.?Ç? Your attention data is anything you give attention to – when you click on a link within facebook, that’s attention data that reveals things about you implicitly.

The big thing about APML is that is solves a massive problem when it comes to privacy. If you look at my definition of what constitutes privacy, the abillity to control what data is collected with APML, completely fits the bill. I was so impressed when I first heard about it, because its a problem I have been thinking about for years, that I immediately joined the APML workgroup.

Privacy is the inflation of the attention economy, and companies like Google are painfully learning about the natural tension between privacy and targetted advertising. (Targetted advertising being the thing that Google is counting on to fund its revenue.) The web has seen a lot of technological innovation, which has disrupted a lot of our culture and society. It’s time that the companies that are disrupting the world’s economies, started innovating to answer the concerns of the humans that are using their services. Understanding how to deal with privacy is a key competitive advantage for any company in the Internet sector. It’s good to see some finally realising that.

Understand your content

I picked up a book my parents used on their recent trip to Greece, which was a guidebook of the Peloponnese. Flicking through this paper book reminded me of my thoughts of how the content business is so rife with piracy. Especially with an online world now, people can copy content – images, text, audio – and mash it up into their own creation. It seems crazy but why do people enter a business like that?

The Information Sector is not only a big money maker, but very unique as well. Yes, it can be copied and ripped off – unlike a barbie doll where its form can’t really be manipulated into a new product. However different from selling barbies, is that information products do things that are very unique in this world and extremely powerful. In my view there are four types of information product, which can be explained under the categories of data or culture.

Data

New data
A friend and aspiring politician, once said to me that “information is the currency of politics”. Reuters, the famed news organisation that supplies breaking news to media outfits across the world – derives 90% of its revenue from selling up-to-the-minute financial information to stockbrokers and the like who profit on getting information before others. New information, like what the weather will be tomorrow, loses value with time (no many care what the weather was eight days ago). But people are willing to pay a price, and a big one, to get access to this breaking news because it can help make decisions.

Old data
On the flip side, old information can be very valuable because of the ability to conduct research and analysis. Search engines effectively fit into this segment of the information economy, because they can query past news and knowledge to produce answers. Extending the weather example, being about to find out that data eight days ago along with the weather exactly one, five and ten years ago – can help you identify trends that, for example, validates the global warming theory.

Culture

Analysis
The third category of information products, I call them simply analysis because what they are is unique insight into things. We all have access to the same news for example, but it takes a smart thinker to create a prediction, by pulling the pieces together and creating new value from them. Analytical content usually gets plagiarised by students writing essays, but its also the stuff that shapes peoples perceptions in world-changing ways.

Entertainment
One of the most powerful uses of content is the way it can impact people – entertainment type content is the stuff that generates emotion in people. Emotions are a key human trait that you should keep in mind in any decision – no matter how logical someone is, the emotional self can overtake. A documentary that portrays an issue negatively, and that can generate an angry response in a person, is the stuff that can topple governments and corporations.

Not all information is equal
If you are a content creator, you need to accept that other people can copy your creation. The key is to understand what type of content you are creating, and develop a content strategy that exploits its unique characteristics.

Information products need different strategies in order to effectively monetise them. Below is a brief discussion which extends on the above to help you understand.
New data
With this type of content, the value is in the time; the quicker that information can be accessed, the more useful it is. News items (like current affairs) fit into this category. As a news consumer, I don’t care how I get my news, but I care about how quickly I can get it. It’s for this reason I no longer read newspapers, yet through various technologies like RSS and my mobile phone, that I probably consume more news than ever before.

You should sell this data based on access – the more you pay, the quicker the access. Likewise, the ability to enable multiple outputs is key – you need to be able to deliver your content to as many different places as possible: SMS, email, RSS etc. You should not discriminate on the output; the value is on the time.

If you create news breaks, why are you wasting your time on who can access that information, because of the threat that someone can copy it? If the value is in the time, who cares who copies it because by the time they republish it, its already lost value. A flash driven site like the Australian Financial Review is an example of a management that doesn’t realise this.

Old data
A recent example of action in this space is the New York Times who have recently removed their paid subscription wall, which was previously only available via subscription but now can be accessed by anyone for free. This is a smart business move, because if you are selling archived content, you will make more money by having more people know what exists. A paid wall limits people using it which decreases the opportunity for consumption: you a relying on a brand only to create demand. If you are website with a lot of historical content – restricting access is stupid because you are effectively asking people to pay for access to something that they have no idea what value it holds for them. It’s a bit like traveling – if you’ve never been overseas, you don’t know what you are missing out on. Give people a taste of the travel bug, and they will never be able to sit still.

Unlike new data where the value is based on time, old data finds value on accessibility. People will place value on things like search, and the ability to find relevant content through the mountains of content available. Here the multitude of outputs doesn’t matter, because researchers have all the time in the world. What matters is a good interface, and powerful tools to mine the data: the value is on being to find information. You shouldn’t charge people on access to the content; where you will make money is on the tools to mine the data.

Analysis
This type of content is difficult to create, but easily ripped off by other people – just think of how rife plagiarism is with schools and universities, where the latter treats plagiarism as a crime just short of murder. You can distinguish this type of content as it demonstrates the ability to offer content that is was produced from a common set on inputs that anyone could access, and creating a viewpoint that only a certain type of person could create. The value is on the unique insight.

Despite the higher intlellect to product, it unfortunately is content that is harder to capitalise on. A lot of technology blogs feel the pressure of moving into a more news style than analytical service because news is what gets eyeballs. If you are a blogger looking to make money – the new data approach above should be your strategy. But if you are a blogger trying to build your brand – do analysis. The consequence with analysis is that its harder to do, so you shouldn’t feel pressured to produce more content. I’ve noticed a trend for example, that if I post more blog postings, I will get more traffic. But on the same token, more postings puts more pressure on me, which means less quality content. Understand that the value of analysis isn’t dependent on time. Or better said, the value of analysis is not how quickly it gets pumped out and realised, but how thoroughly it gets incubated as an idea and later communicated.

The value for analysis is clarity and ability to offer new thoughts. To look at the relationship with advertising models, new data like news (discussed above) typically gets higher viewers – which works for the pageview model (the more people refreshing, the more CPMs). Analysis, on the other hand, works with the time spent model. Take advantage of the engagement you have with those types of readers, because you are cultivating a community of smart people – there can be a lot more loyalty with that type of readership.

Entertainment
My sister downloads the Chaser’s War on Everything as a podcast. She first came across them on the radio, but she now downloads the podcasts religiously. Even though I knew about the Chaser’s efforts for years in their various products, I didn’t realise they were still around. If the last few weeks, I have been noticing my friends bring up the shows they are doing. The value in this content was the ability to make people laugh, due to their unique stunts. Their brand is built because of word of mouth recommendations.

Like analysis, entertainment can be a very hard thing to generate because it relies on unique thinking. With a strong brand, people will pay for access to that content. Although it may seem that the viral spreading of funny content for free is a nightmare for a content producer trying to collect royalties, it’s actually a good thing because it entrenches the brand: more people will find out about it. The nature of entertainment, like analysis, is that it is difficult to do repeatedly. Sure people can copy your individual tricks – but they can only do so after the fact. They can’t pre-anticipate the next thing you will do; because unlike breaking news which is on how quickly you can pump out content, entertainment content requires a unique creative process to produce it.

The key with entertainment content, is to build a relationship with an audience and to sustain it. Create a predictable flow of content. Encourage people copying it, because all it does it get more people wanting to see what you come up with next. If it wasn’t for Stephen Colbert‘s clips on Youtube, I would never have realised his brilliance. Not knowing he existed, means a DVD set of his shows means nothing to me (but which holds a lot of value now). The value of entertainment is to generate emotions in people repeatedly. Emotions are a powerful influence on human behaviour – master that and you can be dangerous!

Concluding thoughts
This posting only touches on the issues, but what I suggest is that creators of content need to look at what type of content they are producing, for them to exploit its unique aspects. Content represents human ideas, and content isn’t distiguished by a physical form. The theft of your content should be a given and can actually help you. Depending on what that content is, there may be natural safeguards that make it irrelevant (ie, the time value of news).

Understanding the Facebook poll feature

A little while ago, I was lucky to catch a Facebook poll, as a way of advertising its new poll feature. As a follow up from that experience, I thought I might purchase my own poll to validate its effectiveness. Here are a few of my observations:

1) Answers appear to be clustered

One of the interesting things about the poll feature, is that it is real time. You are getting answers as people vote. You select what type of people you want to target, and Facebook will then quiz users of that criteria by putting the poll on their homescreen. Something I noticed however, was that answers seemed to come in together followed by a gap. I also noticed that these answers that come in groups, usually have similar responses.
clustered yes

I appears that users are highly responsive to a poll. If it appears on their survey, a lot of people appear to answer it. I know this because I specifically targeted my poll to Australians, in the middle of the day when I wouldn’t expect people to be using facebook.
The placing of the options seems to affect the results. I suppose anyone that has studied polling before, would probably know the order of a ballot heavily influences the poll. This appears evident here. Usefully however, Facebook allows you to randomise the poll so that different users see a different order. However as is demonstrated above, with this clustering, its groups of users that see a different order, not individuals

2) Facebook users appear to be more male, and younger
Something I noticed in my previous blog posting on the poll feature, was that there appeared to be more males answering. This seems to have happened occur with this poll as well, and indicates to me that Facebook’s population of users have a higher male base – which is unusual given that women generally outnumber men in society.

fcfb2

It should also be noted that there is no age groups option for people above 50 years old.

3) Takers of the poll appear to be a genuinely random population
The reason I picked 200 people, was that that is the minimum amount a poll needs to be before it can statistically be considered accurate to represent a population. However as I was able to obtain data as the poll was running, it gave me insight into how random (and representative) the population that took the test was.

Below is a screenshot half way through, as well as the final result

results half wayb

fcfb1b

The results for the poll are almost identical. Without reading too much into it, that tells me the conditions of the test were genuinely random.

There are a few other things I noticed, but this isn’t me trying to promote a Facebook service, and will leave to make your own analysis in combination with the other Facebook poll I blogged about. I just want to highlight that for absolutely nothing, you can get an insight into a market in literally hours.

IBM recently released a report saying that the Internet has overtaken TV, changing the dynamics of the advertising industry, and that they see the role of advertising agencies in the future to go “beyond traditional creative roles to become brokers of consumer insight

Facebook is an amazing company because of the amount of data it holds about the population in various societies. And for a fee – the rest of the world can take advantage of this as well. Welcome Facebook – the world’s most competitive agency for consumer insight.

Half the problem has been solved with time spent

On Thursday, I attended the internal launch of the Australian Entertainment & Media Outlook for 2007-2011. It was an hour packed with interesting analysis, trends, and statistics across a dozen industry segments. You can leave a comment on my blog if you are interested in purchasing the report and I’ll see if I can arrange it for you.

One valuable thing briefly mentioned, was the irony of online advertising.
Continue reading ‘Half the problem has been solved with time spent’

Patents: more harm than good

When I was in Prague two years ago, I met a bloke from Bristol (UK) that very convincingly explained how patents as a concept, are stupid. Because alcohol was involved, I can’t recall his actual argument, but it has since made me question: do you really need a patent to protect your business idea?

Narendra Rocherolle, an experienced entrepreneur, has written a good little article explaining when you should, and shouldn’t, spend money to protect your IP. Racherolle offers a good analysis, but I am going to extend it by stating that a patent can be dangerous for your business, and not just because of the monetary cost. Radar Networks is my case-study – a stealth-mode “Semantic web” company, that has received a lot of press lately because apparently they are doing something big but they are not going to tell us until later this year.

Continue reading ‘Patents: more harm than good’