Tag Archive for 'data'

Page 2 of 7

Can you answer my question?

We at the DataPortability project have kick started a research phase, because we've realised we need to spend more time consulting with the community working out issues which don't quite have one answer.

As Chris Saad and myself are also experimenting with a new type of social organisation as we incubate the DataPortability project, which I call wikiocracy (Chris calls it participant democracy), I thought I might post these issues on my blog to keep in line with the decentralised ethos we are encouraging with DataPortability. This is something the entire world should be questioning,

So below are some thoughts I have had. They've changed a lot since I first thought about what a users data rights are, and no doubt, they will change again. But hopefully my thoughts can act as a catalyst for what people think data rights really are, and a focus on the issue at stake which I conclude as my question. I think the bill of rights for users on the social web is not quite adequate, and we need a more careful analysis of the issues.

It's the data, stupid
Data is essentially an object. Standalone it's useless - take for example the name "Elias". In the absence of anything else, that piece of datum means nothing. However when you associate that name with my identity (ie, appending my surname Bizannes or linking it to my facebook profile), that suddenly becomes "information". Data is an object and information is generated when you create linkages between different types of data - the 'relationships'.

Take this data definition from DMReview which defines data (and information):

Items representing facts, text, graphics, bit-mapped images, sound, analog or digital live-video segments. Data is the raw material of a system supplied by data producers and is used by information consumers to create information.

Data is an object and information is a relationship between data - I've studied database theory at university to be authoritative on that! But since I didn't do philosophy, then what is knowledge?

Knowledge can be considered as the distillation of information that has been collected, classified, organized, integrated, abstracted and value added
(source)

Relationships, facts, assumptions, heuristics and models derived through the formal and informal analysis or interpretation of data
(source)

So in other words, knowledge is the application of information to a scenario. Whilst I apologise if this appears that I am splitting hairs, I think clarifying what these terms are is fundamental to the implementation of DataPortability. Why this is relevant will be seen below, but now we need to move onto what does the second concept mean.

Portability
On first interpretation, portability means the ability to move something - exporting and importing. I think we shouldn't take the ability to move data around as the sole definition of portability but it should also mean being able to port the context that data is used. After all, information and knowledge is based on the manipulation of data, and you don't need to move data per se but merely change the context to do that. A vendor can add value to a consumer by building unique relationships between data and giving unique application to other scenarios - where the original data is stored is irrelevant as long as its accessible.

Portability to me means a person needs to have the ability to determine where their data is used. But to do that, they need control over that data - which means determining how it is used. Yet there is little point being able to determine how your data is used, if you can't determine who can access your data. Therefore, the concept of portability invokes an understanding of what exactly control and accessibility means.

So to discuss portability, requires us to also understand what does data control and data accessibility really mean. You can't "port" something unless you control it; and you can't "control" something, if you can't determine who can "access" it. As I state, as long as the data is accessible, the location of it can be on the moon for all I care: for the concept of portability by context to exist, we must ensure as a condition that the data is open to access.

Ownership
Now here is where it gets complicated: who owns what? Maybe the conversation should come to who owns the information and knowledge generated from that data. Data on its own, potentially doesn't belong to anyone. My name "Elias" is shared by millions of other people in the world. Whilst I may own my identity, which my name is a representation of that, is it fair to say I own the name "Elias"? On the flip side, if a picture I took is considered data - I think it's fair to say I "own" that piece of data.

Information on the other hand, requires a bit of work to create. Therefore, the generator of that information should get ownership. However when we start applying this concept to something like a social relationship, it gets a bit tricky. If I add a friend on Facebook, and they accept me, who "owns" that relationship? Effectively both of us - so we become join partners in ownership of that piece of information. If I was to add someone as a friend on MySpace, they don't necessarily have to reciprocate - therefore it's a one way relationship. Does that mean, I own that information?

This is when the concept of privacy comes in. If I am generating information about someone, am I entitled to it? If someone owns the underlying data I used to generate that information - then it would be fair to say, I am "licensing" usage of that data to generate information which de-facto is owned by them. But privacy as a concept and in the legislation of many countries doesn't work like that. Privacy is even a right along side other basic rights like freedom of expression and religion in the constitution of Iraq (Article 17). So what's privacy in the context of information that relates to someones identity?

Perhaps we should define privacy as the right to control information that represents an entity's identity (being a person or legal body). Such as definition ties with defamation law for example, and the principle of privacy: you have control over what's been said about you, as a fundamental human right. But yet again, I've just opened up a can of worms: what is "identity"? Maybe the Identity commons people can answer that? Would it be fair to say, that in the context of an "identity", an entity like a person 'owns' that? So when it comes to information relating to someones identity, do we override it with this human right to privacy as to who owns that information, regardless of who generated that information?

This posting is a question, rather than an answer. When we say we want "data portability", we need to be clear what exactly this means. Companies I believe are slightly afraid of DataPortability, because they think they will lose something, which is not true. Companies commercial interests are something I am very mindful when we have these discussions, and I will ensure with my involvement that DataPortability pioneers not some unrealistic ideal but a genuine move forward in business thinking. It needs to be clear what constitutes ownership and of what so we can design a blueprint that accounts for users' data rights, without ruining the business models of companies that rely on our data.

Which brings me to my question - "who owns what"?

How many people are there on Facebook?

Facebooks new advertising features allow people to create targetted advertising campaigns. I took advantage of this feature to uncover some data about Facebook's user base as I designed a mock campaign, because I've been curious to know where its strongest.

Although not all countries are listed below (ie, I have friends in Russia and Serbia whose data I could not fetch), this does give a good indication on users by country. The subtotal of 50 million is about the amount of users I'd expect to be on Facebook; the countries not included are obviously small and would make an immaterial difference. Fifty million users is within the ballpark of what sounds right (sorry, no link, but I read it somewhere), so the breakdown seems pretty complete.

I thought it might also be useful to add the data of under 18 year olds, to show social networking is certainly an adults tool now and not just some teen fad.

facebook users in US	Canada	UK	Australia	China	Columbia	Dominican Republic	Egypt	France	Germany	India	Ireland	Israel	Italy	Japan	Lebanon	Malaysia	Mexico	Netherlands	New Zealand	Norway	Pakistan	Saudi Arabia	Singapore	South Africa	Korea, Republic of	Spain	Sweden	Switzerland	Turkey	United Arab Emirates<br />

Update March 2008: I've done a follow up posting on March 2008 numbers

How Google reader can finally start making money

Today, you would have heard that Newsgator, Bloglines, Me.dium, Peepel, Talis and Ma.gnolia have joined the APML workgroup and are in discussions with workgroup members on how they can implement APML into their product lines. Bloglines created some news the other week on their intention to adopt it, and the announcement today about Newsgator means APML is now fast becoming an industry standard.

Google however, is still sitting on the side lines. I really like using Google reader, but if they don?¢‚Ǩ‚Ñ¢t announce support for APML soon, I will have to switch back to my old favourite Bloglines which is doing some serious innovating. Seeing as Google reader came out of beta recently, I thought I?¢‚Ǩ‚Ñ¢d help them out to finally add a new feature (APML) that will see it generate some real revenue.

What a Google reader APML file would look like
Read my previous post on what exactly APML is. If the Google reader team was to support APML, what they could add to my APML file is a ranking of blogs, authors, and key-words. First an explanation, and then I will explain the consequences.

In terms of blogs I read, the percentage frequency of posting I read from a particular blog will determine the relevancy score in my APML file. So if I was to read 89% of Techcrunch posts ?¢‚Ǩ‚Äú which is information already provided to users ?¢‚Ǩ‚Äú it would convert this into a relevancy score for Techcrunch of 89% or 0.89.

ranking

APML: pulling rank

In terms of authors I read, it can extract who posted the entry from the individual blog postings I read, and like the blog ranking above, perform a similar procedure. I don?¢‚Ǩ‚Ñ¢t imagine it would too hard to do this, however given it?¢‚Ǩ‚Ñ¢s a small team running the product, I would put this on a lower priority to support.

In terms of key-words, Google could employ its contextual analysis technology from each of the postings I read and extract key words. By performing this on each post I read, the frequency of extracted key words determines the relevance score for those concepts.

So that would be the how. The APML file generated from Google Reader would simply rank these blogs, authors, and key-words - and the relevance scores would update over time. Over time, the data is indexed and re-calculated from scratch so as concepts stop being viewed, they start to diminish in value until they drop off.

What Google reader can do with that APML file
1. Ranking of content
One of the biggest issues facing consumers of RSS is the amount of information overload. I am quite confident to think that people would pay a premium, for any attempt to help rank the what can be the hundreds of items per day, that need to be read by a user. By having an APML file, over time Google Reader can match postings to what a users ranked interests are. So rather than presenting the content by reverse chronology (most recent to oldest); it can instead organise content by relevancy (items of most interest to least).

This won?¢‚Ǩ‚Ñ¢t reduce the amount of RSS consumption by a user, but it will enable them to know how to allocate their attention to content. There are a lot of innovative ways you can rank the content, down to the way you extract key works and rank concepts, so there is scope for competing vendors to have their own methods. However the point is, a feature to ?¢‚ǨÀúSort by Personal Relevance?¢‚Ǩ‚Ñ¢ would be highly sort after, and I am sure quite a few people will be willing to pay the price for this God send.

I know Google seems to think contextual ads are everything, but maybe the Google Reader team can break from the mould and generate a different revenue stream through a value add feature like that. Google should apply its contextual advertising technology to determine key words for filtering, not advertising. It can use this pre-existing technology to generate a different revenue stream.

2. Enhancing its AdSense programme

blatant ads

Targeted advertising is still bloody annoying

One of the great benefits of APML is that it creates an open database about a user. Contextual advertising, in my opinion is actually a pretty sucky technology and its success to date is only because all the other types of targeted advertising models are flawed. As I explain above, the technology instead should be done to better analyse what content a user consumes, through keyword analysis. Over time, a ranking of these concepts can occur ?¢‚Ǩ‚Äú as well as being shared from other web services that are doing the same thing.

An APML file that ranks concepts is exactly what Google needs to enhance its adwords technology. Don?¢‚Ǩ‚Ñ¢t use it to analyse a post to show ads; use it to analyse a post to rank concepts. Then, in aggregate, the contextual advertising will work because it can be based off this APML file with great precision. And even better, a user can tweak it ?¢‚Ǩ‚Äú which will be the equivalent to tweaking what advertising a user wants to get. The transparency of a user being able to see what 'concept ranking' you generate for them, is powerful, because a user is likely to monitor it to be accurate.

APML is contextual advertising biggest friend, because it profiles a user in a sensible way, that can be shared across applications and monitored by the user. Allowing a user to tweak their APML file for the motivation of more targeted content, aligns their self-interest to ensure the targeted ads thrown at them based on those ranked concepts, are in fact, relevant.

3. Privacy credibility
Privacy is the inflation of the attention economy. You can?¢‚Ǩ‚Ñ¢t proceed to innovate with targeted advertising technology, whilst ignoring privacy. Google has clearly realised this the hard way by being labeled one of the worst privacy offenders in the world. By adopting APML, Google will go a long way to gain credibility in privacy rights. It will be creating open transparency with the information it collects to profile users, and it will allow a user to control that profiling of themselves.

APML is a very clever approach to dealing with privacy. It?¢‚Ǩ‚Ñ¢s not the only approach, but it a one of the most promising. Even if Google never uses an APML file as I describe above, the pure brand-enhancing value of giving some control to its users over their rightful attention data, is something alone that would benefit the Google Reader product (and Google?¢‚Ǩ‚Ñ¢s reputation itself) if they were to adopt it.

privacy

Privacy. Stop looking.

Conclusion
Hey Google - can you hear me? Let's hope so, because you might be the market leader now, but so was Bloglines once upon a time.