On the right of my blog as I write this, I have a widget – it’s a simple piece of javacript, from the company Feedjit, that allows me to embed a short piece of code to indicate to my readers how other people find my blog. Since the launch of the widget, it seems like it has become very popular with 60 million widgets claimed by the company’s website.
I made a discovery today almost by accident: I accessed my blog on another computer. Or rather, I accessed my blog via Google’s cache – who have replicated my content for their search results, widgets and all. Now when you look at the Feedjit widget (image below left), the data is very different: it no longer shows visitors to my blog, but visitors to Google servers.
If you follow through to the detailed statistics you will even see what the most popular sites are that day, as well as the locations of the visitors. As this is data from the Google cache server, you are effectively getting an analysis of visitors – who they are, what keywords they are searching for, and what they found. So because my blog is part of Google cache, I can effectively hack and sneak in the backdoor of Google’s data.
(Having a quick look, it seems this URL is the main Google cache address; however data will only get logged when someone looks at the cache.)
Does it matter?
While this is a fun thing to look at and then move on, I think it raises some serious issues – multiple ones at that.
On widgets: With the prolifiration of widgets on the web, has this become potentially the next biggest security risk on the web?
On privacy: It’s not that hard to identify the people making those searches. Search engines handing over data to the government has been a hot issue, with Google resisiting a much hyped story as the company tried to prove it protected its users. With the growing cross-pollination of the web, exemplified with widgets, are we prepared for what it means to have open data (which is becoming inevitable)?
On metrics: Google has a complete download of my blog in its cache, but what I didn’t realise, is that it is a copy of the full blog (with scripts like my web stats). When I look at my statistics, I see an awful lot of activity from computer bots for example. Is this because every time Google, Yahoo or MSN analyse content that has been ripped off my site, I can actually see what they are doing behind their closed walls?
Those are questions with simple but also complicated answers. Either way, if its that easy to hack even Google, then God help us.