Frequent thinker, occasional writer, constant smart-arse

Tag: web (Page 3 of 6)

Facebook users: more and more in just four months

I am currently doing some research for an analyst report at work, and I thought I might update my November findings of how many Facebook users there are.

The total is within the ballpark figures of total users (Mix08 panel indicates around 65million from memory) so listing seems fairly complete, with maybe less than million missing for small countries not listed.

I found some of the results impressive, especially given the user growth in less than four months- even in countries like the US and Australia which I’d thought would be peaking. Sweden appears to have a bit of Facebook fatigue with canceled accounts, and looks like fundamentalist Saudia Arabia has a bigger userbase then tech-savvy Russians showing.

facebook users march08 update

Here’s a secret: the semantic web is the boring bit

Marshall Kirkpatrick caused a wave today, when he gave a brutally honest assessment of one of the most talked up semantic web applications, Twine. It was as per usual, an excellent analysis by Marshall and I don’t think he needs to hide behind his words as they are fair. However, what I think is crucial is now that the semantic web is gaining traction into the mainstream from a academic thesis to real world web applications, is we do a little bit of stakeholder management.

Ready? The semantic web is as boring as bat shit.

Essentially, the semantic web is about structuring content in a way so that computers can interpret the information. It’s a bit like linking every word on the web, to a dictionary entry so that computers understand the language that humans use.

But seriously, how is that exciting? People don’t get the semantic web, because it’s the fundamentals – and thats boring! Take for example RDF, the semantic web building block, and which is about structuring data into subject, predicate and object. This is straight from primary school grammar lessons, where we learn about the fundamentals of the English language (no coincidence I just linked to an grammar guide, not the RDF guide). And if you have heard of subject, predicate and object before in the context of the semantic web, you probably didn’t even realise it’s how the entire English language is based. It’s because you probably did learn it, and forgot – it’s boring as bat shit. But damn, without them, we wouldn’t be communicating right now to each other.

The point I want to make, is that the building blocks are not where the excitement: the excitement, is what you can do once we have those building blocks. In English, we have poetry, literature, and just language in general where we communicate as human beings. Once we get the basics down of information, we are laying the foundation of a whole new world of computational possibilities. Marshall is spot on in saying “…semantics may be best suited to the back end…” because the excitement is what they enable, not the actual semantics itself which is going to take a long time to build up.

Imagine, the sum of human knowledge accessible by a computer to query? Semantic web applications are boring and you won’t ever get them – but what they enable, is a whole new world of potential which once we can flick the switch, will mean a world we will barely recognise from today’s standpoint.

DataPortability is about user value, fool!

In a recent interview, VentureBeat asks Facebook creator and CEO Mark Zuckerberg the following:

VB: Facebook has recently joined DataPortability.org, a working group among web companies, that intends to develop common standards so users can access their data across sites. Is Facebook going to let users — and other companies — take Facebook data completely off Facebook?

MZ: I think that trend is worth watching.

It disappoints me to see that, because it seems like a quick journalists hit at a contentious issue. On the other hand, we have seen amazing news today which are examples of exactly the type of thing we should be expecting in a data portability enabled world: the Google contacts API which has been a thing we have highlighted for months now as an issue for data security and Google analytics allowing benchmarking which is a clear example of a company that understands by linking different types of data you generate more information and therefore value for the user. The DataPortability project is about trying to advocate new ways of thinking, and indeed, we don’t have to formally produce a product in as much maintain the agenda in the industry.

However the reason I write this is that it worries me a bit that we are throwing around the term “data portability” despite the fact the DataPortability Project has yet to formally define what that means. I can say this because as a member of the policy action group and the steering action group which are responsible for making this distinction, we have yet to formally decide.

Today, I offer an analysis of what the industry needs to be talking about, because the term is being thrown around like buggery. Whilst it may be weeks or months before we finalise this, it’s starting to bother me that people seem to think the concept means solving the rest of the world’s problems or to disrupt the status quo. It’s time for some focus!

Value creation
First of all, we need to determine why the hell we want data portability. DataPortability (note the distinction of the term with that of ‘data portability’ Рthe latter represents the philosophy whilst the former is the implementation of that philosophy by DataPortability.org) is not a new utopian ideal; it’s a new way of thinking about things that will generate value in the entire Information sector. So to genuinely want to create value for consumers and businesses alike, we need to apply thinking that we use in the rest of the business world.

A company should be centered on generating value for its customers. Whilst they may have obligations to generate returns for their shareholders, and may attempt different things to meet those obligations; they also have an obligation to generate shareholder value. To generate shareholder value, means to fund the growth of their business ultimately through increased customer utility which is the only long term way of doing so (taking out acquisitions and operational efficiency which are other ways companies generate more value but which are short term measures however). Therefore an analysis of what value DataPortability creates should be done with the customer in mind.

The economic value of a user having some sort of control over their data is that they can generate more value through their transactions within the Information economy. This means better insights (ie, greater interoperability allowing the connection of data to create more information), less redundancy (being able to use the same data), and more security (which includes better privacy which can compromise a consumers existence if not managed).

Secondly, what does it mean for a consumer to have data portability? Since we have realised that the purpose of such an exercise is to generate value, questions about data like “control”, “access” and “ownership” need to be reevaluated because on face value, the way they are applied may have either beneficial or detrimental effects for new business models. The international accounting standards state that you can legally “own” an asset but not necessarily receive the economics benefits associated with that asset. The concept of ownership to achieve benefit is something we really need to clarify, because quite frankly, ownership does not translate into economic benefit which is what we are at stake to achieve.

Privacy is a concept that has legal implications, and regardless of what we discuss with DataPortability, it still needs to be considered because business operates within the frameworks of law. Specifically, the human rights of an individual (who are consumers) need to be given greater priority than any other factor. So although we should be focused on how we can generate value, we also need to be mindful that certain types of data, like personally identifiable data, needs to be considered in adifferent light as there are social implications in addition to the economic aspects.

The use cases
The technical action group within the DataPortability project has been attempting to create a list of scenarios that constitute use cases for DataPortability enablement. This is crucial because to develop the blueprint, we also need to know what exactly the blueprint applies to.

I think it’s time however we recognise, that this isn’t merely a technical issue, but an industry issue. So now that we have begun the research phase of the DataPortability Project, I ask you and everyone else to join me as we discuss what exactly is the economic benefit that DataPortability creates. Rather than asking if Facebook is going to give up its users data to other applications, we need to be thinking on what is the end value that we strive to achieve by having DataPortability.

Portability in context, not location
When the media discuss DataPortability, please understand that a user simply being able to export their data is quite irrelevant to the discussion, as I have outlined in my previous posting. What truly matters is “access”. The ability for a user to command the economic benefits of their data, is the ability to determine who else can access their data. Companies need to be thinking that value creation comes from generating information – which is simply relationships between different data ‘objects’. If a user is to get the economic benefits of using their data from other repositories, companies simply need to allow the ability for a user to delegate permission for others to access that data. Such a thing does not compromise a company’s competitive advantage as they won’t necessarily have to delete data they have of a user; rather it requires them to try to to realise that holding in custody a users data or parts of it gives them a better advantage as hosting a users data gives them complete access, to try to come up with innovative new information products for the user.

So what’s my point? When discussing DataPortability, let’s focus on the value to the user. And the next time the top tech blogs confront the companies that are supporting the movement with a simplistic “when are you going to let users take their data completely off ” I am going to burn my bra in protest.

Disclosure: I’m a hetrosexual male that doesn’t cross-dress

Update: I didn’t mean to scapegoat Eric from VentureBeat who is a brilliant writer. However I used him to give an example of the language being used in the entire community which now needs to change. With the DP research phase now officially underway for the next few months, the questions we should be asking should be more open-ended as we at the DataPortability project have realised these issues are complex, and we need to get the entire community to come to a consensus. DataPortability is no longer just about exporting your social graph – it’s an entirely new approach to how we will be doing business on the net, and as such, requires us to fundamentally reexamine a lot more than we originally thought.

My presentation at Kickstart forum

I’m currently at Kickstart forum (along with the Mickster), and I just gave a presentation on DataPortability to a bunch of Aussie journalists. I didn’t write a speech, but I did jot down some points on paper before I spoke, so I thought I might share them here given I had a good response.

My presentation had three aspects: background, explanation, and implications of DataPortability. Below is a summary of what I said

Background

  • Started by a bunch of Australians and a few other people overseas in November 2007 out of a chatroom. We formed a workgroup to explore the concept of social network data portability
  • In January 2008, Robert Scoble had an incident, which directed a lot of attention to us. As a consequence, we’ve seen major companies such as Google, Microsoft, Yahoo, Facebook, Six Apart, LinkedIn, Digg, and a host of others pledge support for the project.
  • We now have over 1000 people contributing, and have the support of a lot of influential people in the industry who want us to succeed.

Explanation

  • The goal is to not invent anything new. Rather, it’s to synthesise existing standards and technologies, into one blueprint – and then we push it out to the world under the DataPortability brand
  • When consumers see the DataPortability brand, they will know it represents certain things – similar to how users recognise the Centrino brand represents Intel, mobility, wireless internet, and a long battary life. The brand is to communicate some fundamental things about a web service, that will allow a user to recognise a supporting site respects it’s users data rights and certain functionality.
  • Analogy of zero-networking: before the zeroconf initiative it was difficult to connect to the internet (wirelessly). Due to the standardisation of policies, we can now connect on the internet wirelessly at the click of a button. The consequence of this is not just a better consumer experience, but the enablement of future opportunities such as what we are seeing with the mobile phone. Likewise, with DataPortability we will be able to connect to new applications and things will just “work” – and it will see new opportunity for us
  • Analogy of the bank: I stated how the attention economy is something we give our attention to ie, we put up with advertising, and in return we get content. And that the currency of the attention economy is data. With DataPortability, we can store our data in a bank, and via “electronic transfer”, we can interact with various services controlling the use of that data in a centralised manner. We update our data at the bank, and it automatically synchronises with the services we use ie, automatically updating your Facebook and MySpace profiles

Implications

  1. Interoperability: When diverse systems and organisations work together. A DataPortability world will allow you to use your data generated from other sites ie, if you buy books on Amazon about penguins, you can get movie recommendations on your pay TV movie catalog for penguins. Things like the ability to log in across the web with one sign-on, creates a self-supporting ecosystem where everyone benefits.
  2. Semantic web: I gave an explanation of the semantic web (which generated a lot of interest afterwards in chats), and then I proceeded to explain that the problem for the semantic web is there hasn’t been this uptake of standards and technologies. I said that when a company adopts the DataPortability blueprint, they will effectively be supporting the semantic web – and hence enabling the next phase of computing history
  3. Data rights: I claimed the DataPortability project is putting data rights in the spotlight, and it’s an issue that has generated interest from other industries like the health and legal sectors, and not just the Internet sector. Things like what is privacy, and what exactly does my “data” mean. DataPortability is creating a discussion on what this actually means
  4. Wikiocracy: I briefly explained how we are doing a social experiment, with a new type of of governance model, which can be regarded as an evolution of the open source model. “Decentralised” and “non-hierarchical”, which with time it will be more evident with what we are trying to do

Something that amused me was in the sessions I had afterwards when the journalists had a one-on-one session with me, one woman asked: “So why are you doing all of this?”. I said it was an amazing opportunity to meet people and build my profile in the tech industry, to which she concluded: “you’re doing this to make history, aren’t you?”. I smiled 🙂

I’m back!

This is just a short note to say I am still alive, after I had a few of you ask why I haven’t blogged in a while! As I said in my previous entry, I had to prepare for an exam in mid December, and then I spent a month in South America – my first holiday in two years. Since I’ve come back however, I am been spending nearly all my free time helping manage the DataPortability Project which I helped create in November and has just exploded in press – to the point of being called one of the key trends in 2008!

I’ve updated my blog behind the scenes to the latest, and considering a few cosmetic changes. However for those of you that have subscribed, I just want to warn that I will be cleaning up my historical blog posts for weird characters, so be aware if my feed starts pinging a bit crazy.

In addition to this blog, I’ve also got a new feed where I share links I come across the web. I am a still trying to get back into routine, but you can subscribe to it with this address: http://feeds.feedburner.com/LiakoBiz/shareditems

And finally, I just wish to say I am going to post less but with a focus on quality – the frequent posting thing has me realise it puts a strain on you, and for your readers, they have enough trouble keeping up with their information overload. Once a week is my goal…except for the next month which will be murder at work 🙂

I have a lot of interesting things happening to me and in the pipeline, some small and some dramatic. This is going to be an interesting year…

Ouch – widgets bypassing Google’s wall

Feedjit
On the right of my blog as I write this, I have a widget – it’s a simple piece of javacript, from the company Feedjit, that allows me to embed a short piece of code to indicate to my readers how other people find my blog. Since the launch of the widget, it seems like it has become very popular with 60 million widgets claimed by the company’s website.

I made a discovery today almost by accident: I accessed my blog on another computer. Or rather, I accessed my blog via Google’s cache – who have replicated my content for their search results, widgets and all. Now when you look at the Feedjit widget (image below left), the data is very different: it no longer shows visitors to my blog, but visitors to Google servers.

If you follow through to the detailed statistics you will even see what the most popular sites are that day, as well as the locations of the visitors. As this is data from the Google cache server, you are effectively getting an analysis of visitors – who they are, what keywords they are searching for, and what they found. So because my blog is part of Google cache, I can effectively hack and sneak in the backdoor of Google’s data.

(Having a quick look, it seems this URL is the main Google cache address; however data will only get logged when someone looks at the cache.)

Feedjit google cacheDoes it matter?
While this is a fun thing to look at and then move on, I think it raises some serious issues – multiple ones at that.

On widgets: With the prolifiration of widgets on the web, has this become potentially the next biggest security risk on the web?

On privacy: It’s not that hard to identify the people making those searches. Search engines handing over data to the government has been a hot issue, with Google resisiting a much hyped story as the company tried to prove it protected its users. With the growing cross-pollination of the web, exemplified with widgets, are we prepared for what it means to have open data (which is becoming inevitable)?

On metrics: Google has a complete download of my blog in its cache, but what I didn’t realise, is that it is a copy of the full blog (with scripts like my web stats). When I look at my statistics, I see an awful lot of activity from computer bots for example. Is this because every time Google, Yahoo or MSN analyse content that has been ripped off my site, I can actually see what they are doing behind their closed walls?

Those are questions with simple but also complicated answers. Either way, if its that easy to hack even Google, then God help us.

Pageview’s are a misleading metric

Recently MySpace, the social networking site that once dominated but is now being overtaken by Facebook, sent me an e-mail informing me that a friend of mine had a birthday. What is unusual, is that although I have received notifications of this type when I had logged into the site, I had never been e-mailed.

Below is a copy of the e-mail, and lets see if you notice what I did:
birthdayreminder

It doesn’t tell me whose birthday it is. In fact, it is even ambiguous as to whether it was just the one person or not. Big deal? Not really. But it very clearly tells me something: MySpace is trying to increase its pageviews.

Social networking sites are very useful services to an individual; they enable a person to manage and monitor their personal networks. Not only am I in touch with so many people I lost contact with, but I am in the loop with their lives. I may not message them, but by passive observation, I know what everyone is up to. Things like what they’re studying, where they work, what countries they will be holidaying in, and useful things like when they have their birthday.

Social networking sites are not just a website, but an information service, to help you manage your life. However as useful as I find these services, the revenue model is largely dependent on advertising, with premium features a rare thing now. So when you rely on advertising, you are going to be looking at ways of boosting the key figures that determine that revenue stream.

Friendster’s surprising growth in May was due to some clever techniques of using e-mail, to drive pageviews. And it worked. E-mail notifications, when done tactfully, can drive a huge amount of activity. Of the what seems like hundreds of web services I have joined, e-mail at times is the only way for me to remember I even subscribed to it once upon a time. Combine e-mail with information I want to be updated with, and you’ve got a great recipe for using e-mail as a tool to drive page views.

…And that is the problem. MySpace has very cleverly sent this e-mail to get me to log into my account. A marketing campagn like that will at the very least, see a good day in pageview growth. But the reason I am logging in, is just so I can see whose birthday it is. Myspace now to me is irrelevant: those pageviews attributed to me are actually, not one of an engaged user.

Pageviews as a metric for measuring audience engagement is prone to manipulation. Increases in pageviews on the face of it, make a website appear more popular. But in reality, dig a little deeper and the correlation for what really matters (audience engagement) is not quite on par.

So everyone, repeat after me: Pageviews – we need to drop them as a concept if we are ever going to make progress.

Facebook’s privacy is smart on technology but stupid in thought

I’ve had to neglect this blog because I have been insanely busy with work and my studies, and will continue to do so for the rest of the year. But I thought I’d post a quick observation I made today, that I found interesting. Even more interesting, because I rarely notice details!

Whenever Facebook notifies you of an e-mail – like for example when a friend messages you – it will actually show you their e-mail. An example is in the screen shot below, which would enable me to click ‘reply’ to their e-mail and it would go directly to their personal e-mail. (I’ve noticed however, that this will only occur if you have already added the person as a friend.)

direct e-mail

This raises some interesting issues regarding privacy. The first being, why the heck is Facebook allowing this? Am I going to reply to my friends asking them what did they say in the message?! Privacy is my right to determine when people can see information about me when I want to – and I don’t want my friends seeing my e-mail. I can think of an example when a friend collected my e-mail from my profile, and adding me to a forward list of chain e-mails. Unlike the postal system for snail mail, where people pay for sending me a message with a stamp, e-mail forces the user to pay when they receive a message through their time. Before I didn’t have a choice, but now with new ways of communicating, I can control what gets sent to me.

This actually is a bit deeper. I’ve seen fake profiles friend request me – I always deny people I don’t know, but I know that lots of my friends usually add people blindly (I remember asking a friend who a friend requester was when I noticed she was a mutual friend with him, to which he replied: “No idea, but she’s hot!”). This now just became a very easy way to obtain someones e-mail – certainly, not as easy as harvesting e-mails from a public facing website, but still another means. The concerns however is not spam but identity threats.

A crucial thing to understand about privacy, is the concept of identifiable data. Corporations can collect data about me until their heart is content and I wouldn’t mind- but only on the basis they can’t specifically identify me. An e-mail address is what I regard as identifiable information: the e-mail I use on various web services that hold different data about me, can be easily linked purely through my e-mail address.

I’ve previously said how social networking sites are a new type of communications, that are far better than e-mail. E-mail is one of the worlds most powerful technologies but also one of the most dangerous. Whilst most would think it is because of e-mail overload and spam, what I really mean is how a single e-mail address can do so much damage if used by someone trying to investigate you and your life.

As our digital world becomes more sophisticated (and scary), lets be clear of some things. People no longer need e-mail to contact you; they can instead contact your ‘identity’ which is far superior (I discussed this in the posting I linked to just above). However with this advancement, also comes the opportunity to regard what your e-mail address really is: a key piece of identifiable data that can link your multiple identity’s across the digital world into one mega profile.

How Google reader can finally start making money

Today, you would have heard that Newsgator, Bloglines, Me.dium, Peepel, Talis and Ma.gnolia have joined the APML workgroup and are in discussions with workgroup members on how they can implement APML into their product lines. Bloglines created some news the other week on their intention to adopt it, and the announcement today about Newsgator means APML is now fast becoming an industry standard.

Google however, is still sitting on the side lines. I really like using Google reader, but if they don?¢‚Ǩ‚Ñ¢t announce support for APML soon, I will have to switch back to my old favourite Bloglines which is doing some serious innovating. Seeing as Google reader came out of beta recently, I thought I?¢‚Ǩ‚Ñ¢d help them out to finally add a new feature (APML) that will see it generate some real revenue.

What a Google reader APML file would look like
Read my previous post on what exactly APML is. If the Google reader team was to support APML, what they could add to my APML file is a ranking of blogs, authors, and key-words. First an explanation, and then I will explain the consequences.

In terms of blogs I read, the percentage frequency of posting I read from a particular blog will determine the relevancy score in my APML file. So if I was to read 89% of Techcrunch posts ?¢‚Ǩ‚Äú which is information already provided to users ?¢‚Ǩ‚Äú it would convert this into a relevancy score for Techcrunch of 89% or 0.89.

ranking

APML: pulling rank

In terms of authors I read, it can extract who posted the entry from the individual blog postings I read, and like the blog ranking above, perform a similar procedure. I don?¢‚Ǩ‚Ñ¢t imagine it would too hard to do this, however given it?¢‚Ǩ‚Ñ¢s a small team running the product, I would put this on a lower priority to support.

In terms of key-words, Google could employ its contextual analysis technology from each of the postings I read and extract key words. By performing this on each post I read, the frequency of extracted key words determines the relevance score for those concepts.

So that would be the how. The APML file generated from Google Reader would simply rank these blogs, authors, and key-words – and the relevance scores would update over time. Over time, the data is indexed and re-calculated from scratch so as concepts stop being viewed, they start to diminish in value until they drop off.

What Google reader can do with that APML file
1. Ranking of content
One of the biggest issues facing consumers of RSS is the amount of information overload. I am quite confident to think that people would pay a premium, for any attempt to help rank the what can be the hundreds of items per day, that need to be read by a user. By having an APML file, over time Google Reader can match postings to what a users ranked interests are. So rather than presenting the content by reverse chronology (most recent to oldest); it can instead organise content by relevancy (items of most interest to least).

This won?¢‚Ǩ‚Ñ¢t reduce the amount of RSS consumption by a user, but it will enable them to know how to allocate their attention to content. There are a lot of innovative ways you can rank the content, down to the way you extract key works and rank concepts, so there is scope for competing vendors to have their own methods. However the point is, a feature to ?¢‚ǨÀúSort by Personal Relevance?¢‚Ǩ‚Ñ¢ would be highly sort after, and I am sure quite a few people will be willing to pay the price for this God send.

I know Google seems to think contextual ads are everything, but maybe the Google Reader team can break from the mould and generate a different revenue stream through a value add feature like that. Google should apply its contextual advertising technology to determine key words for filtering, not advertising. It can use this pre-existing technology to generate a different revenue stream.

2. Enhancing its AdSense programme

blatant ads

Targeted advertising is still bloody annoying

One of the great benefits of APML is that it creates an open database about a user. Contextual advertising, in my opinion is actually a pretty sucky technology and its success to date is only because all the other types of targeted advertising models are flawed. As I explain above, the technology instead should be done to better analyse what content a user consumes, through keyword analysis. Over time, a ranking of these concepts can occur ?¢‚Ǩ‚Äú as well as being shared from other web services that are doing the same thing.

An APML file that ranks concepts is exactly what Google needs to enhance its adwords technology. Don?¢‚Ǩ‚Ñ¢t use it to analyse a post to show ads; use it to analyse a post to rank concepts. Then, in aggregate, the contextual advertising will work because it can be based off this APML file with great precision. And even better, a user can tweak it ?¢‚Ǩ‚Äú which will be the equivalent to tweaking what advertising a user wants to get. The transparency of a user being able to see what ‘concept ranking’ you generate for them, is powerful, because a user is likely to monitor it to be accurate.

APML is contextual advertising biggest friend, because it profiles a user in a sensible way, that can be shared across applications and monitored by the user. Allowing a user to tweak their APML file for the motivation of more targeted content, aligns their self-interest to ensure the targeted ads thrown at them based on those ranked concepts, are in fact, relevant.

3. Privacy credibility
Privacy is the inflation of the attention economy. You can?¢‚Ǩ‚Ñ¢t proceed to innovate with targeted advertising technology, whilst ignoring privacy. Google has clearly realised this the hard way by being labeled one of the worst privacy offenders in the world. By adopting APML, Google will go a long way to gain credibility in privacy rights. It will be creating open transparency with the information it collects to profile users, and it will allow a user to control that profiling of themselves.

APML is a very clever approach to dealing with privacy. It?¢‚Ǩ‚Ñ¢s not the only approach, but it a one of the most promising. Even if Google never uses an APML file as I describe above, the pure brand-enhancing value of giving some control to its users over their rightful attention data, is something alone that would benefit the Google Reader product (and Google?¢‚Ǩ‚Ñ¢s reputation itself) if they were to adopt it.

privacy

Privacy. Stop looking.

Conclusion
Hey Google – can you hear me? Let’s hope so, because you might be the market leader now, but so was Bloglines once upon a time.

Explaining APML: what it is & why you want it

Lately there has been a lot of chatter about APML. As a member of the workgroup advocating this standard, I thought I might help answer some of the questions on people’s minds. Primarily – “what is an APML file”, and “why do I want one”. I suggest you read the excellent article by Marjolein Hoekstra on attention profiling that she recently wrote, if you haven’t already done so, as an introduction to attention profiling. This article will focus on explaining what the technical side of an APML file is and what can be done with it. Hopefully by understanding what APML actually is, you’ll understand how it can benefit you as a user.

APML – the specification
APML stands for Attention Profile Markup Language. It’s an attention economy concept, based on the XML technical standard. I am going to assume you don’t know what attention means, nor what XML is, so here is a quick explanation to get you on board.

Attention
There is this concept floating around on the web about the attention economy. It means as a consumer, you consume web services – e-mail, rss readers, social networking sites – and you generate value through your attention. For example, if I am on a Myspace band page for Sneaky Sound System, I am giving attention to that band. Newscorp (the company that owns MySpace) is capturing that implicit data about me (ie, it knows I like Electro/Pop/House music). By giving my attention, Newscorp has collected information about me. Implicit data are things you give away about yourself without saying it, like how people can determine what type of person you are purely off the clothes you wear. It’s like explicit data – information you give up about yourself (like your gender when you signed up to MySpace).

Attention camera

I know what you did last Summer

XML
XML is one of the core standards on the web. The web pages you access, are probably using a form of XML to provide the content to you (xHTML). If you use an RSS reader, it pulls a version of XML to deliver that content to you. I am not going to get into a discussion about XML because there are plenty of other places that can do that. However I just want to make sure you understand, that XML is a very flexible way of structuring data. Think of it like a street directory. It’s useless if you have a map with no street names if you are trying to find a house. But by having a map with the street names, it suddenly becomes a lot more useful because you can make sense of the houses (the content). It’s a way of describing a piece of content.

APML – the specification
So all APML is, is a way of converting your attention into a structured format. The way APML does this, is that it stores your implicit and explicit data – and scores it. Lost? Keep reading.

Continuing with my example about Sneaky Sound System. If MySpace supported APML, they would identify that I like pop music. But just because someone gives attention to something, that doesn’t mean they really like it; the thing about implicit data is that companies are guessing because you haven’t actually said it. So MySpace might say I like pop music but with a score of 0.2 or 20% positive – meaning they’re not too confident. Now lets say directly after that, I go onto the Britney Spears music space. Okay, there’s no doubting now: I definitely do like pop music. So my score against “pop” is now 0.5 (50%). And if I visited the Christina Aguilera page: forget about it – my APML rank just blew to 1.0! (Note that the scoring system is a percentage, with a range from -1.0 to +1.0 or -100% to +100%).

APML ranks things, but the concepts are not just things: it will also rank authors. In the case of Marjolein Hoekstra, who wrote that post I mention in my intro, because I read other things from her it means I have a high regard for her writing. Therefore, my APML file gives her a high score. On the other hand, I have an allergic reaction whenever I read something from Valleywag because they have cooties. So Marjolein’s rank would be 1.0 but Valleywag’s -1.0.

Aside from the ranking of concepts (which is the core of what APML is), there are other things in an APML file that might confuse you when reviewing the spec. “From” means ‘from the place you gave your attention’. So with the Sneaky Sound System concept, it would be ‘from: MySpace’. It’s simply describing the name of the application that added the implicit node. Another thing you may notice in an APML file is that you can create “profiles”. For example, the concepts about me in my “work” profile is not something I want to mix with my “personal” profile. This allows you to segment the ranked concepts in your APML into different groups, allowing applications access to only a particilar profile.

Another thing to take note of is ‘implicit’ and ‘explicit’ which I touched on above – implicit being things you give attention to (ie, the clothes you wear – people guess because of what you wear, you are a certain personality type); explicit being things you gave away (the words you said – when you say “I’m a moron” it’s quite obvious, you are). APML categorises concepts based on whether you explicitly said it, or it was implicitly determined by an application.

Okay, big whoop – why can an APML do for me?
In my eyes, there are five main benefits of APML: filtering, accountability, privacy, shared data, and you being boss.

1) Filtering
If a company supports APML, they are using a smart standard that other companies use to profile you. By ranking concepts and authors for example, they can use your APML file in the future to filter things that might interest you. As I have such a high ranking for Marjolein, when Bloglines implements APML, they will be able to use this information to start prioritising content in my RSS reader. Meaning, of the 1000 items in my bloglines reader, all the blog postings from her will have more emphasis for me to read whilst all the ones about Valleywag will sit at the bottom (with last nights trash).

2) Accountability
If a company is collecting implicit data about me and trying to profile me, I would like to see that infomation thank you very much. It’s a bit like me wearing a pink shirt at a party. You meet me at a party, and think “Pink – the dude must be gay”. Now I am actually as straight as a doornail, and wearing that pink shirt is me trying to be trendy. However what you have done is that by observation, you have profiled me. Now imagine if that was a web application, where this happens all the time. By letting them access your data – your APML file – you can change that. I’ve actually done this with Particls before, which supports APML. It had ranked a concept as high based on things I had read, which was wrong. So what I did, was changed the score to -1.0 for one of them, because that way, Particls would never show me content on things it thought I would like.

3) Privacy
I joined the APML workgroup for this reason: it was to me a smart away to deal with the growing privacy issue on the web. It fits my requirements about being privacy compliant:

  • who can see information about you
  • when can people see information about you:
  • what information they can see about you

The way APML does that is by allowing me to create ‘profiles’ within my APML file; allowing me to export my APML file from a company; and by allowing me to access my APML file so I can see what profile I have.

drivers

Here is my APML, now let me in. Biatch.

4) Shared data
An APML file can, with your permission, share information between your web-services. My concepts ranking books on Amazon.com, can sit alongside my RSS feed rankings. What’s powerful about that, is the unintended consequences of sharing that data. For example, if Amazon ranked what my favourite genres were about books – this could be useful information to help me filter my RSS feeds about blog topics. The data generated in Amazon’s ecosystem, can benefit me and enjoy a product in another ecosystem, in a mutually beneficial way.

5) You’re the boss!
By being able to generate APML for the things you give attention to, you are recognising the value your attention has – something companies already place a lot of value on. Your browsing habits can reveal useful information about your personality, and the ability to control your profile is a very powerful concept. It’s like controlling the image people have of you: you don’t want the wrong things being said about you. 🙂

Want to know more?
Check the APML FAQ. Othersise, post a comment if you still have no idea what APML is. Myself or one of the other APML workgroup members would be more than happy to answer your queries.

« Older posts Newer posts »