Tag Archive for 'data portability'

Page 2 of 3

Data portability and media: explaining the business case

The information value chain I wrote about a while back, although in need of further refinement, underpins my entire thinking in how I think the business
case for data portability exists.

In this post, I am going to give a brief illustration of how interoperability is a win-win for all involved in the digital media business.

To do this, I am going to explain it using the following companies:
- Amazon (EC2)
- Facebook
- Yahoo! (Flickr)
- Adobe (Photoshop Express)
- Smugmug
- Cooliris

How the world works right now
I've listed six different companies, each of which can provide services for your photos. Using a simplistic view of the market, they are all competitors - they ought to be fighting to be the ultimate place where you store your photos. But the reality is, they aren't.

Our economic system is underpinned by a concept known as "comparative advantage". It means that even if you are the best at everything, you are better off specialising in one area, and letting another entity perform a function. In world trade, different countries specialise in different industries, because by focusing on what you are uniquely good at and by working with other countries, it actually is a lot more efficient.

Which is why I take a value chain approach when explaining data portability. Different companies and websites, should have different areas of focus - in fact, we all know, one website can't do everything. Not just because of lack of resources, but the conflict it can create in allocating them. For example, a community site doesn't want to have to worry about storage costs, because it is better off investing in resources that support its community. Trying to do both may make the community site fail.

How specialisation makes for a win-win
With that theoretical understanding, let's now look into the companies.

Amazon
They have a service that allows you to store information in the cloud (ie, not on your local computer and permanently accessible via a browser). The economies of scale by the Amazon business allows it to create the most efficient storage system on the web. I'd love to be able to store all my photos here.

Facebook
Most of the people I know in the offline world, are connected to me on Facebook. Its become a useful way for me to share with my friends and family my life, and to stay permanently connected with them. I often get asked my friends to make sure I put my photos on Facebook so they can see them.

Yahoo
Yahoo owns a company called Flickr - which is an amazing community of people passionate about photography. I love being able to tap into that community to share and compare my photos (as well as find other people's photos to use in my blog posts).

Adobe
Adobe makes the industry standard program for graphic design: Photoshop. When it comes to editing my photos - everything from cropping them, removing red-eye or even converting them into different file formats - I love using the functionality of Photoshop to perform that function. They now offer an online Photoshop, which provides similar functionality that you have on the desktop, in the cloud.

Smugmug
I actually don't have a Smug mug account, but I've always been curious. I'd love to be able to see how my photos look in their interface, and be able to tap into some of the features they have available like printing them in special ways.

Cooliris
Cooliris is a cool web service I've only just stumbled on. I'd love be able to plug my photos in the system, and see what cool results get output.

Putting it together

  • I store my photos on Amazon, including my massive RAW picture files which most websites can't read.
  • I can pull my photos into Facebook, and tag them how I see fit for my friends.
  • I can pull my photos into Flickr, and get access to the unique community competitions, interaction, and feedback I get there.
  • With Adobe Photoshop express, I can access my RAW files on Amazon, to create edited versions of my photos based on the feedback in the comments I received on Flickr from people.
  • With those edited photos now sitting on Amazon, and with the tags I have on Facebook adding better context to my photos (friends tagging people in them), I pull those photos into Smug mug and create really funky prints to send to my parents.
  • Using those same photos I used in Smug Mug, I can use them in Cooliris, and create a funky screensaver for my computer.

As a customer to all these services - that's awesome. With the same set of photos, I get the benefit of all these services, which uniquely provide something for me.

And as a supplier that is providing these services, I can focus on what I am good at - my comparative advantage - so that I can continue adding value to the people that use my offering.

Sounds simple enough, eh? Well the word for that is "interoperability", and it's what we are trying to advocate at the DataPortability Project. A world where data does not have borders, and that can be reused again and again. What's stopping us for having a world like this? Well basically, simplistic thinking that one site should try to do everything rather than focus on what they do best.

DataPortability Project

Help us change the market's thinking and demand for data portability.

Why open wins

Open standards matter, but so does the water; and just like water is not what creates a Mona Lisa or a Hoover Dam alone, so too do open standards not really matter that much to what we are trying to do with the DataPortability Project in the longer term. But they matter for the industry, which is why we advocate for them. Here's why.

Hoover dam

Bill Washburn is one of the soft-spoken individuals that has driven a lot of change, like leading the charge to open government technology (the Internet as we know it) to the rest of the world. He's been around long enough to see trends, so I asked him: why does open always win? What is it about the walled garden that makes it only temporary?

Bill gave me two reasons: technologies need to be easy to implement and they also need to be cheap. It may sound obvious, but below I offer my interpretation why in the context of standards

1) Easy to implement
If you are a developer constantly implementing a standard, you want the easiest one to implement. Having to learn a new standard each time you need to do something is a burden - you want to learn how to do something once and that's it. And if there is a choice to implement two standards that do the same thing, guess which one will win?

That's why you will see the technically inferior RSS dominate over ATOM. Both allow syndication and give the end-user the same experience, but for a developer trying to parse it, ATOM is an absolute pain in the buttocks. Compare also JSON and XML - the former being a data structure that's not even really a standard, and the latter which is one of the older data format standards on Internet. JSON wins out for using asynchronous technologies in the web2.0 world, because it's just easier to do. Grassroots driven micro-formats and W3C endorsed RDF? Same deal. RDF academically is brilliant - but academic isn't real world.

2) Cheap to implement
This is fairly obvious - imagine if you had two ways of performing something that did the same thing, but one was free and the other had licensing costs - what do you think a developer or company will use? Companies don't want to pay licensing fees, especially for non-core activities; and developers can't afford license fees for a new technology. Entities will bias their choices to the cheaper of the two, like free.

I think an interesting observation can be made about developer communities. Look at people that are the .Net community, compared to say something like Python advocates. You tend to find Python people are more open to collaboration, meetups, and other idea exchanges rather than the .Net developers who keep to themselves (a proprietary language). With the Microsoft owned .Net suite requiring a lot more costs to implement, it actually holds back the adoption of the technology to dominate the market. If people aren't collaborating as much when compared to rival technologies, that means less innovation, more costs to learning - a longer term barrier to market adoption.

The most important point to make is on the actual companies that push these standards. Let's say you are Facebook pushing your own standard, which although free, could only be modified by and adapted by the Facebook team. That's going to cost resources - at the very least, a developer overseeing it. Maybe a team of evangelists to promote your way of thinking; a supervisor to manage this team. If you are the sole organisation in charge of something, it's going to cost you (not anyone else) a lot of money.

Bridge being built on the Hoover dam

Compare that to an open community effort, where lots of companies and people pool their resources. Instead of one entity bearing the cost, it's hundreds of entities bearing the cost. On a singular basis, it's actually cheaper to create a community driven standard. And honestly, when you think about it, why a company fights over what standard gets implemented has nothing to do with their core strategic objectives. Sure they might get some marketing out of it (as the Wikipedia page says "this company created this standard"), but realistically, it's rewarding more the individuals within these companies who can now put on their resume "I created this technology that everyone is using now".

Why Open wins
In the short run, open doesn't win because it's a longer process, that in part relies on an industry reacting to a proprietary approach. In the long run, Internet history has proven that the above two factors always come to dominate. Why? Because infrastructure is expensive to build and maintain, and usually, it's better to pool our efforts to build that infrastructure. You don't want to spend your money on something that's for the public benefit, only to have no one in the public using it - do you, Mr Corporate Vice-President?

Best error message ever (for Data Portability in action)

As we were preparing for the upgrade of DataPortability Project's website, we realised we needed to close off some of our legacy mailing lists...but we didn't want to lose the hundreds of people already on these mailing lists. So we decide to export the emails and paste them into the new Google group as subscribers.

I then got this error message.

email permissions

The has to be one of the best error messages I have ever seen. Yes I'm happy that I could port the data from a legacy system/group to a new one, using an open standard (CSV). Yes, I was impressed that the Google Groups team supports this functionality (who I am told is just one Google engineer and are completely understaffed). But what blew me away was the fact Google was able to recognise how to treat these emails.

These particular people have opted to not allow someone to reuse their e-mail, other than the intended purpose for which they submitted it (which was to be subscribed to this legacy Group). Google recognised that and told me I wasn't allowed to do it as part of my batch add.

That's Google respecting their users, while making life a hell of a lot easier for me as an administrator of these mailing lists.

I'm happy to be helped out like that, because I don't want to step on any toes. And these people are happy, because they have control of the way their data is used. That's what I call "Awesome".