Tag Archive for 'link'

The changing dynamics of news

In the recent controversy that has erupted due to the firing of Michael Arrington from TechCrunch, I believe it represents an era in innovation led by TechCrunch that we’re only starting to appreciate.

To start on this thought experiment, consider how four years ago (meaning, things haven’t changed) I wrote about the two kinds of content that exist: data like breaking news or archived news; and culture which includes analysis like editorials and entertainment such as satire.

UnderstandingI argue that each content form has unique characteristics that needs to be exploited in different ways. Think about that before digesting this blog post, because understanding the product (such as news) impacts the way the market will operate.

Some trends of the past
Over the last two decades, we’ve seen the form (and costs) of news be disrupted dramatically.

It started with hypertext systems that helped humans share knowledge (with the most successful hyperterxt implementation, the world wide web 20 years ago forever changing the world); search engines helping us find information easier (with Google transforming the world 10 years ago), and content management systems helping people reduce the costs of publishing to practically zero (with Moveable Type and especially WordPress driving this).

While the sourcing of news still requires unique relationships that journalists can extract to the world, even that’s changed due to social media that’s created a distributed ‘citizen journalism’ world. Related to this is a movement Julian Assange calls “scientific journalism” where the sourcing of news is now democratised and exposed in its raw form.

Some observations of the present
With that, I’ve noticed two interesting things about the tech news ecosystem, who are are helping shape the trends in news more broadly: tech bloggers kill themselves to break stories, to the point where blogs like TechCrunch have become cults for those that work there; separately, the rise of the news aggregators like TechMeme and HackerNews (or Slashdot and Digg before them) have built the audiences who have been overwhelmed by information overload and crave a filter from a quality editorial voice (the latter being why news personalisation technologies cannot work on their own).

The big secret (that’s not particularly secret due to the abundance of ‘share this’ buttons on webpages) about the news ecosystem is that it’s the aggregators who drive traffic to news outlets that report the news. When you understand that point, a lot of other things become clearer.

Content Aggregation infographic

On the other hand, tech entrepreneurs break their backs for the hope of getting written about on the Tech blogs. The reasons vary from getting credibility so they can recruit talent; exposure so they raise money; and a belief that they can acquire customers (the whole point of building a startup).

Which leads me to think despite all these random observations I’ve listed above, there is a fundamental efficiency evolving in news reporting that may give an insight into the future.

Let’s keep thinking. Other things to consider include:

  • The audience starts with the aggregators for news and the articles whereby the better headlines tend to perform better
  • News in its barest form is making awareness of an event (data); anything additional is analysis (cultural) which is to shape understanding around the event
  • The rise of ‘scientific journalism’ and social media allows society to discover and share information without a third party (due to technology tools).
  • Press releases are an invention to communicate a message so reporters can base their writing on, who often just copy and paste the words.

Some thinking about the future
News should be stripped to its barest form: a description of the event. It should be what we consider currently a “headline”, with preferably a link to the source material. Therefore professional journalists, bloggers, and the rest of the world should be competing to break news not on who can write the best prose but who can share a one line summary based on their ability to extract that information (either by being accidentally at the event or having exclusive relationships with the event maker). The cost of breaking the news should be simply a matter of who can share a link the quickest.

News Article - Wichita Falls Record News

Editorial, which is effectively analysis (or entertainment in some cases) and what blogging has become, should be left to what we now consider as “comments”. Readers get to have the “news” coloured, based on a managed curation of the top commentators.

Tying this together: Imagine a world where anyone could submit “news” and anyone could provide “editorial”? A rolling river of news of submitted headlines and links, and discussions roaring underneath the item reflecting the interpretation of the masses.

You could argue Twitter has become the first true example of that where most content is in full public view but with a restricted output (140 characters); people can share links with their comments; and the top stories tend to get retweeted which further gains exposure. Things could be similarly said about Digg, Reddit and Hacker News. But these services, along with Twitter (and Facebook) are simply an insight into a future that’s already begun. I think they are just early pioneers before the real solution comes, similar to how Tim Berners-Lee created a hypertext system in a saturated market that then became the standard; Google created a search engine in a saturated market that then became the standard; and WordPress created a blogging platform in a saturated market that then become the standard. Lots of people have tried to innovate in the news ecosystem, but I still don’t think the nut’s been cracked.

News has a lot of value, but there is different value based on who breaks it and who interprets it. For example, when I fire up some of my favourite aggregators, I tend to not click on the original headline but on brands I like so as to read their take on the event (though when I’m deeply looking into something, I dig for the source material). But the problem with news now, is there is a fundamental disruption on the cost structures supporting it: the economics favour those who break the news, with those that interpret news suffering as traditionally both these roles were considered the one function. Something’s going on and the answer is cheaper production, faster distribution and more of a decentralised effort across society and not the self-appointed curators.

While the newspaper industry is collapsing, something more fundamental is happening with news and we’re simply in the eye of the storm. Stay tuned.

The new magazine

The Facebook homescreen is a remarkable thing. I just saw a video of a friend throwing food at birds; relatives taking pictures of themselves in a hot tub; a link to a mind-expanding article; and a status message that made me laugh. It made me think: the homescreen is the new magazine.

Sure, we can be simplistic with this and say lots of pictures and content makes thee a magazine. But what strikes me as fascinating is how much personal content is shared. People’s thoughts, insight into their lives, and the real-time autobiographical dictation by our “friends”. It makes me think of the fascination people have with celebrities, and how gossip magazines are some of the highest grossing of their kind. The same phenomenon is being exploited here — which is people want to know more about people they know. While with celebrities you could potentially say people do it due to a fixation on celebrity status and looks, I would argue the reason gossip magazines are so popular is due to the curiousity into the lives of people who are familiar. People would be equally fascinated with a magazine about celebrities as a magazine of their neighbours, if it was practical.

It’s almost like Facebook’s homescreen is the new media version of a publication. But of your friends. And like a glossy magazine. Of original content from otherwise hard-to-obtain situations.

Or more practically speaking, like a gossip magazine of your neighbours.

Why the angel bubble is not a bubble but actually the missing link

Naval Ravikant has written a thought-provoking post on the growing “angel bubble”. His thesis is that there is no bubble because the total money amount of money being invested in venture hasn’t increased. What’s changed he claims, is simply that instead of bigger Venture Capital (VC) rounds that are fewer in number, we’re seeing smaller but many more Angel investments occurring. In other words, the VC industry — not the Federal Reserve — are the ones that should be worried about this “bubble”.

I actually think what’s happening is that the market is now more resistent to bubbles. Contrary to a previous post of mine where I hypothesised the seed investment bubble (which I’ve since reconsidered and I’ll explain later in this post), the Angel “bubble” is a externality of one simple fact: it’s now a lot cheaper to build a startup. To understand this, watch the presentation Naval gave a few month’s ago which is the best I’ve seen to date in this trend.

So as a consequence, angel investment has now becoming (and rightfully so) the dominant way for a company to fund a startup company, with the existing VC model being relegated to more of a latter stage role.

Why is this a good thing? Well first of all, a lot more startups are being funded — but with the same amount of money in the economy. Statistical theory will claim that this alone will be good thing for the economy, as there is a higher probability of home runs. By spreading risk among more bases, there’s a better opportunity to generate returns.

But something more important is happening. VC’s now have a better qualification of a business to invest in. The huge amounts of capital they can invest into a business, are now going to be done after having seen a more advanced startup’s potential future, pushed to that stage by the seed accelerators or angels that cover their startup cost.

What I mean, is that by the time a company gets to VC, they will no longer be a startup — which is a business searching for a business model — but instead a high-growth business that’s now executing on their newly discovered and high potential business model. The VC firms are no longer needed in the business of starting something in information technology; they are instead now purely in the business of growing a business (where already some of the larger funds exclusively focus on). And the capital they are putting at risk on behalf of the endowments and pension funds that gave them that money, now have a lower risk of achieving higher returns.

Better still, the VC’s funds can focus on the future of technology like clean energy, biotechnology, and nano  technology — industries that were what information technology was in the 1970s: high startup cost, low chance of return.

And while that’s all well and good for the VC’s, this new funding lifecyle actually opens up opportunities for returns for everyone (which is why this isn’t a bubble). The seed accelerators and angels have the ability to pass the baton and exit their investments to better capitalised groups like the VC’s, allowing them to focus on the earlier stage of the market. With the IPO market dead since the introduction of the Sarbanes-Oxley legislation, tech has relied on acquisitions as the sole form of return. But with earlier stage investors like the Angels getting exits to VC’s, and the VC’s having better qualified businesses that they can grow to a large IPO, this is actually going to see the IPO market reopen due to this focus.

All in all, that’s not a bubble: that’s called efficiency and a rejuvenation. The Angel bubble isn’t a bubble but a maturity and evolution of the technology ecosystem. This is actually the missing link in efficient information technology being built — the link which now connects the super-highways of the economy to sustainable growth and value, not bubble.

How to piss your customers off – a lesson courtesy from eBay

I get e-mails from companies. Sometimes I request it; but on the whole I always tick the option “please do NOT send me promotional material”. So when I receive e-mails from companies, I give them the benefit of the doubt that it was my error, although this is being extremely generous because I know I never allow them to send communications above what I need. The fact I am getting an e-mail from them already has me tense.

So if a company is going to send me promotional e-mails, I expect courtesy because they are taking up my time. Note to companies about how not to do it:

ebay

“…to change your communication preferences, log into eBay…” and click through the barrage of poor usability options to find that hidden box that allows you to stop being spammed. After all, a one click unsubscribe option or even a link of where you need to go makes it more likely that you would unsubscribe so we adopt of model of trying to discourage you, because we know most people haven’t got the effort to action and would rather delete it than remove the sending from the source. Hey, marking us as ‘spam’ or deleting each incoming e-mail is a better option because the more numbers we have on our mailing list as ‘receiving’ the more it makes the marketing director feel all warm and fuzzy that we have distribution outlets for campaigns, even though we know you don’t read them.

“Please note it may take us up to 10 business days to process your request” because it takes 10 microseconds to technologically do so but we are a bunch of losers who are going to hope you forgot you tried un-subscribing and will send follow up e-mails in that time hoping to win you back, because we refuse to accept we screwed up and have ruined our relationship with you”.

Bloglines to support APML

Tucked away in a post by one of the leading RSS readers in the world, Bloglines had announced that they will be investigating on how they can implement APML into their service. The thing about standards is that as fantastic as they are, if no one uses them, they are not a standard. Over the last year, dozens of companies have implemented APML support and this latest annoucement by a revitalised Bloglines team that is set to take back what Google took from them, means we are going to be seeing a lot more innovation in an area that has largely gone unanswered.

The annoucement has been covered by Read/WriteWeb, APML founders Faraday Media,?Ç? and a thoughtful analysis has been done by Ross Dawson. Ben Melcalfe had also written a thought-provoking analysis, of the merits of APML.

What this means?

APML is about taking control of data that companies collect about you. For example, if you are reading lots of articles about dogs, RSS readers can make a good guess you like dogs – and will tick the “likes dogs” box on the profile they build of you which they use to determine advertising.?Ç? Your attention data is anything you give attention to – when you click on a link within facebook, that’s attention data that reveals things about you implicitly.

The big thing about APML is that is solves a massive problem when it comes to privacy. If you look at my definition of what constitutes privacy, the abillity to control what data is collected with APML, completely fits the bill. I was so impressed when I first heard about it, because its a problem I have been thinking about for years, that I immediately joined the APML workgroup.

Privacy is the inflation of the attention economy, and companies like Google are painfully learning about the natural tension between privacy and targetted advertising. (Targetted advertising being the thing that Google is counting on to fund its revenue.) The web has seen a lot of technological innovation, which has disrupted a lot of our culture and society. It’s time that the companies that are disrupting the world’s economies, started innovating to answer the concerns of the humans that are using their services. Understanding how to deal with privacy is a key competitive advantage for any company in the Internet sector. It’s good to see some finally realising that.

Don’t get the Semantic Web? You will after this

Prior to 2006, I had sort of heard of the Semantic Web. To be honest, I didn’t know much – it was just another buzzword. I’ve been hearing about Microformats for years, and cool but useless initiatives like XFN. However to me it was simply just another web thing being thrown around.

Then in August 2006, I came across Adrian Holovaty’s article where he argues journalism needs to move from a story-centric world to a data-centric world. And that’s when it dawned on me: the Semantic web is some serious business.

I have since done a lot of reading, listening, and thinking. I don’t profess to be a Semantic Web expert – but I know more than the average person as I have (painfully) put myself through videos and audios of academic types who confuse the crap out of me. I’ve also read through a myriad of academic papers from the W3C, which are like the times when you read a novel and keep re-reading the same page and still can’t remember what you just read.

Hell – I still don’t get things. But I get the vision, so that’s what I am going to share with you now. Hopefully, my understanding will benefit the clueless and the skeptical alike, because it’s a powerful vision which is entirely possible

1) The current web is great for humans; useless for machines
When you search for ambiguous terms, at best, search engines can algorithmically predict some sort of answer that partially answers your query. Sometimes not. But the complexity of language, is not something engineers can engineer to deal with. After all, without ambiguity of natural languages, the existence of poetry is impossible.

Fine.

What did you think when you read that? As in: “I’ve had it – fine!” which is like another way of saying ok or agreeing with something. Perhaps you thought about that parking ticket I just got – illegal parking gets you fined. Maybe you thought I am applauding myself by saying that was one fine piece of wordcraftship I just wrote, or said in another context, like a fine wine.

Language is ambiguous, and depending on the context with other words, we can determine what the meaning of the word is. Search start-up company Powerset, which is hoping to kill Google and rule the world, is employing exactly this technique to improve search: intelligent processing of words depending on context. So by me putting in “it’s a fine”, it understands the context that it’s a parking ticket, because you wouldn’t say “it’s a” in front of ‘fine’ when you use it to agree with something (the ‘ok’ meaning above).

But let’s use another example: “Hilton Paris” in Google – the worlds most ‘advanced’ search engine. Obviously, as a human reading that sentence, you understand because of the context of those words I would like to find information about the Hilton in Paris. Well maybe.

Let’s see what Google comes up with: Of the ten search results (as of when I wrote this blog posting), one was a news item on the celebrity; six were on the celebrity describing her in some shape or form, and three results were on the actual Hotel. Google, at 30/70 – is a little unsure.

Why is Paris Hilton, that blonde haired thingy of a celebrity, coming up in the search results?

Technologies like Powerset apparently produce a better result because it understands the order of the words and context of the search query. But the problem with these searches, isn’t the interpretation of what the searcher wants – but also the ability to understand the actual search results. Powerset can only interpret so much of the gazilions of words out there. There is the whole problem of the source data, no just the query. Don’t get what I mean? Keep reading. But for now, learn this lesson

Computers have no idea about the data they are reading. In fact, Google pumping out those search results is based on people linking. Google is a machine, and reads 1s and 0s – machine language. It doesn’t get human language

2) The Semantic web is about making what human’s read, machine readable
Tim Berner’s Lee, the guy that invented the World Wide Web and the visionary behind the Semantic Web, prefers to call it the ‘data web’. The current web is a web of documents – by adding this extra data to content – machines will be able to understand it. Metadata, is data about data.

A practical outcome of having a semantic web, is that Google would know that when it pulls up a web page regardless of the context of the words – it will understand what the content is. Think of every word on the web, being linked to a master dictionary.

The benefit of the semantic web is not for humans – at least immediately. The Semantic Web is actually pretty boring with what it does – what is exciting, is what it will enable. Keep reading.

3) The Semantic web is for machines to interpret, not people
A lot of the skeptics of the semantic web, usually don’t see the value of it. Who cares about adding all this extra meta data? I mean heck – Google still was able to get the website I needed – the Hilton in Paris. Sure, the other 60% of the results on that page were irrelevant, but I’m happy.

I once came across a Google employee and he asked “what’s the point of a semantic web; don’t we already enough metadata?” To some extent, he’s right – there are some websites out there that have metadata. But the point of the semantic web is so that machines once they read the information, can start thinking like how a human would and connecting it to other information. There needs to be across the board metadata.

For example, my friend Michael was recently looking to buy a car. A painful process, because there are so many variables. So many different models, different makes, different dealers, different packages. We have websites, with cars for sale neatly categorised into profile pages saying what model it is, what colour it is, and how much. (Which may I add, are hosted on multiple car sites with different types of profiles). A human painfully reads through these profiles, and computes as fast as a human can. But a machine can’t read these profiles.

Instead of wasting his (and my) weekends driving around Sydney to find his car, a machine could find it for him. So, Mike would enter his profile in – what he requires in a car, what his credit limit is, what his prior history with cars are – everything that would affect his judgement of a car. And then, the computer can query every online website with cars to match the criteria. Because the computer can interpret these websites across the board, it can evaluate and it can go back to Michael and say “this is the car for you, at this dealer – click yes to buy”.

The semantic web is about giving computers the information to be able to interpret data, so that it can do what they do really well – compute.

4) A worldwide database
What essentially Berner’s Lee envisions, is turning the entire world wide web into a database that can be queried. Currently, the web looks like Microsoft Word – one swab of text. However, if that swab of text was neatly categorised in an Excel spreadsheet, you could manipulate that data and do what you please – create reports, reorder them, filter, and do whatever until your heart is content.

At university, I was forced to do an Information Systems subject which was essentially about the theory of databases. Damn painful. I learned only two things from that course. The first thing was that my lecturer, tutor, and classmates spoke less intelligible English than a caterpillar. But the second thing was that I learned what information is and how it differs from data. I am now going to share with you that lesson, and save you three months of your life.

You see, data is meaningless. For example, 23 degrees is data. On its own, it’s useless. Another piece of data in Sydney. Again, Рuseless. I mean, you can think all sorts of things when you think of Sydney, but it doesn’t have any meaning.

Now put together 23 degrees and Sydney, and you have just created information. Information is about creating relationships between data. By creating a relationship, an association, between these two different pieces of data – you can determine it’s going to be a warm day in Sydney. And that is what information is: Relationship building; connecting the dots; linking the islands of data together to generate something meaningful.

The semantic web is about allowing computers to be able to query the sum of human knowledge like one big database to generate information

Concluding thoughts
You are probably now starting to freak out and think “Terminator” images with computers suddenly erupting form under your computer desk, and smashing you against the wall as a battle between humans and computers begins. But I don’t see it like that.

I think about the thousands of hours humans spend trying to compute things. I think of the cancer research, whereby all this experimentation occurring in labs, is trying to connect new pieces of data with old data to create new information. I think about computers being about to query the entire taxation legislation to make sure I don’t pay any tax, because it knows how it all fits together (having studied tax, I can assure you – it takes a lifetime to only understand a portion of tax law). In short, I understand the vision of the Semantic web as a way of linking things together, to enable computers to compute – so that I can sit on my hammock drinking my beer, as I can delegate the duties of my life to the machines.

All the semantic web is trying to do, is making sure everything is structured in a consistent manner, with a consistent dictionary behind the content, so that a machine can draw connections. As Berner’s Lee said on one of the videos I saw: “it’s all about creating links”.

The process to a Semantic Web is boring. But once we have those links, we can then start talking about those hammocks. And that’s when the power of the internet – the global network – will really take off.

John Hagel – What do you think is the single most important question after everything is connected?

I recently was pointed to a presentation of John Hagel who is a renowned strategy consultant and author on the impact the Internet has on business. He recently joined Deloitte and Touche, where he will head a new Silicon Valley research institute. At the conference (Supernova 2007), John outlined critical research questions regarding the future of digital business that remain unresolved, which revolved around the following:

What happens after everything is connected? What are the most important questions?

I had to watch the video a few times because its not possible to capture everything he says in one hit. So I started writing notes each time, which I have reproduced below to help guide your thoughts and give a summary as you are watching the presentation (which I highly recommend).

I also have discovered (after writing these notes – damn it!) that he has written his speech (slightly different however) and posted it on his blog. I’ll try and reference my future postings on these themes here, by pinging or adding links to this posting.
Continue reading ‘John Hagel – What do you think is the single most important question after everything is connected?’

On the future of search

Robert Scoble has put together a video presentation on how Techmeme, Facebook and Mahalo will kill Google in four years time. His basic premise is that SEO’s who game Google’s algorithm are as bad as spam (and there are some pissed SEO experts waking up today!). People like the ideas he introduces about social filtering, but on the whole – people are a bit more skeptical on his world domination theory.

There are a few good posts like Muhammad‘s on why the combo won’t prevail, but on the whole, I think everyone is missing the real issue: the whole concept of relevant results.

Relevance is personal

When I search, I am looking for answers. Scoble uses the example of searching for HDTV and makes note of the top manufacturers as something he would expect at the top of the results. For him – that’s probably what he wants to see – but for me, I want to be reading about the technology behind it. What I am trying to illustrate here is that relevance is personal.

The argument for social filtering, is that it makes it more relevant. For example, by having a bunch of my friends associated with me on my Facebook account, an inference engine can determine that if my friend called A is also friends with person B, who is friends with person C – than something I like must also be something that person C likes. When it comes to search results, that sort of social/collaborative filtering doesn’t work because relevance is complicated. The only value a social network can provide is if the content is spam or not – a yes or no type of answer – which is assuming if someone in my network has come across this content. Just because my social network can (potentially) help filter out spam, doesn’t make the search results higher quality. It just means less spam results. There is plenty of content that may be on-topic but may as well be classed as spam.

Google’s algorithm essentially works on the popularity of links, which is how it determines relevance. People can game this algorithm, because someone can make a website popular to manipulate rankings through linking from fake sites and other optimisations. But Google’s pagerank algorithm is assuming that relevant results are, at their core, purely about popularity. The innovation the Google guys brought to the world of search is something to be applauded for, but the extreme lack of innovation in this area since just shows how hard it is to come up with new ways of making something relevant. Popularity is a smart way of determining relevance (because most people would like it) – but since that can be gamed, it no longer is.

The semantic web

I still don’t quite understand why people don’t realise the potential for the semantic web, something I go on about over and over again (maybe not on this blog – maybe it’s time I did). But if it is something that is going to change search, it will be that – because the semantic web will structure data – moving away from the document approach that webpages represent and more towards the data approach that resembles a database table. It may not be able to make results more relevant to your personal interests, but it will better understand the sources of data that make up the search results, and can match it up to whatever constructs you present it.

Like Google’s page rank, the semantic web will require human’s to structure data, which a machine will then make inferences – similar to how Pagerank makes inferences based on what links people make. However Scoble’s claim that humans can overtake a machine is silly – yes humans have a much higher intellect and are better at filtering, but they in no way can match the speed and power of a machine. Once the semantic web gets into full gear a few years from now, humans will have trained the machine to think – and it can then do the filtering for us.

Human intelligence will be crucial for the future of search – but not in the way Mahalo does it which is like manually categorising pieces of paper into a file cabinet – which is not sustainable. A bit like how when the painters of the Sydney harbour bridge finish painting it, they have to start all over again because the other side is already starting to rust again. Once we can train a machine that for example, a dog is an animal, that has four legs and makes a sound like “woof” – the machine can then act on our behalf, like a trained animal, and go fetch what we want; how those paper documents are stored will now be irrelevant and the machine can do the sorting for us.

The Google killer of the future will be the people that can convert the knowledge on the world wide web into information readeable by computers, to create this (weak) form of artificial intelligence. Now that’s where it gets interesting.

Google: the ultimate ontology

A big issue with the semantic web is ontologies – the use of consistent definitions to concepts. For those that don’t understand what I’m talking about – essentially, the next evolution of the web is about making content readable by not just humans but also machines. However for a machine to understand something it reads, it needs consistent definitions. Human’s for example, are intelligent – they understand that the word “friend” is also related to the word “acquaintance”, but a computer would treat them to mean two different things. Or do they?

Just casually looking at some of my web analytics, I noticed some people landed on my site by doing a google search for how many acquaintances do people have, which took them to a popular posting of mine about how many friends people have on facebook. I’ve had a lot of visitors because of this posting, and its been an interesting case study for me on how search engines work. However today was something different from other times: I found the word acquaintance weird. I know I didn’t use that word in my posting – and when I went to the Google cache I realised something interesting: because someone linked to me using that word, the search engine replaced the word ‘friend’ with ‘acquaintances’.

acquaintances

Google’s linking mechanism is one powerful ontology generator.