Big data is a rather ubiquitous term. It is a lot like sex in high school. Everyone is talking about it. Everyone claims to be doing it. Yet no-one really knows what it is all about. In the most simplest terms, big data is when someone takes massive amounts of empirical data, often meaningless on its own and uses it to solve problems, build solutions or to sell a service. It has the ability to provide seemingly endless answers through modelling human behaviour and communications.
Big data, like the name suggests, is a collection of data sets so large and complex it becomes difficult to manage and therefore understand using traditional hands-on database processing and data management tools. Think of it through the four V's - velocity, volume, veracity and variety. There are currently 18.9 billion internet connections. If each one of them sent a simple work identifying their present emotional state, there would be a lot of sorting through to identify any trends or to understand the collective mood of the connected world. As for volume, is it estimated that there is 2.5 QUINTILLION bytes of new data created every day. The New York Stock Exchange captures one terabyte of trading information every trading session. As for veracity - poor information costs the US economy nearly $1.3 trillion every single year.
Advocates of this new science claim that big data will help solve some of the world’s most difficult issues, such as food distribution to the impoverished, improving traffic congestion, allocating emergency services during natural and man-made disasters and providing better understanding the impacts of climate change. If the 20th century was the theoretical century riddled with implementation of competing economic theories, then the 21st century will be the empirical century - rooted in the “science” of empirical “big data”.
Like everything technological, the utopians of societal advancement are met with the protectionism of cyber-skeptics. With last week’s story in the Telegraph that the hospital records of all NHS patients have been sold to insurers by the NHS, it may be that a more nuanced approach for examining the risks and benefits of big data is now warranted. After all, the Telegraph reported that:
“A report by a major UK insurance society discloses that it was able to obtain 13 years of hospital data – covering 47 million patients – in order to help companies 'refine' their premiums.”
As University of East Anglia Lecturer Paul Bernal points out, this isn’t big data to perform a social good, but rather “to help business to make more money – potentially to the detriment of many thousands of individuals, and entirely without those individuals’ consent or understanding”.
And Bernal is correct. Just this weekend I witnessed a woman arguing quite vehemently with a senior law professor from the London School of Economics over how the NHS was able to sell her data to an insurance company. The professor rightly and firmly pointed out: “You don’t own your own data under the current legal regime.”
Think about that for a second: the data you volunteer to your GP and/or your hospital is not owned by you. And if you check that little tick box about giving the NHS the ability to protect your data, you have satisfied the “consent” requirement of the Data Protection Act. The DPA only governs how people treat your data if they plan to collect it and once they have received it. While some in the legal profession have been calling for a new “data privacy” right, I am not convinced that this is the proper solution either.
This week I will be looking at the costs, benefits, and legal regime regulating that ever more ubiquitous “big data”. I will leave it up to you to determine whether or not the benefits from big data outweigh the costs of regulating. All I ask is that you keep an open mind. Make it big.