Yesterday, I introduced (or re-introduced) the concept of “big data” to help shed light on the legal issues that come from large data set management, after attending a seminar on Saturday evening at the London School of Economics on privacy and its relationship with technology. I make no apologies of my support of the use of technology to help bring about a better society, but as I pointed out in my article, helping insurance companies achieve better premiums isn’t exactly what I had in mind when I argued that big data will help bring about an “empirical century” based in data science and information rather than theories and guess work. So when the NHS sells patient data to a consulting company, this is the type of action where we need privacy advocates to roar in disapproval and they will have my full support.
It emerged this week that a private company called PA Consulting bought the entire England and Wales health service database and uploaded it to Google’s proprietary BigQuery database service. The data filled nearly 27 DVDs and took nearly two weeks to upload to the servers, which incidentally were hosted in non-EU jurisdictions. The NHS dataset included each patient's NHS number, post code, address, date of birth and gender, as well as all their inpatient, outpatient and emergency hospital records [Update: PA Consulting Group says this is not the case. See its response below]. There was no easy way to opt-out of the service. In fact, it wasn’t feasible to do so until members of the public created a fax service that people could send their requests to.
Now some of that data has been made available online. The Hospital Episode Statistics (HES) allowed people to access psuedonymised data collected about patients when they visit hospital - which includes patient age, gender, ethnicity, diagnoses, operations, time waited etc - after being made available publicly via an online tool created by a company called Earthware, a commercial real estate web mapping company. The company, which offers services including property data, claimed to allow users to locate areas in England where a single individual had gone for specialised treatment. Data protection rules prohibit the release of information where a person may be readily identifiable from accessing the data, even if the person is not readily identifiable by name.
On Monday night, the Health and Social Care Information Centre (HSCIC) said the site’s tool had been taken down and an investigation had been launched into how the company had come into possession the data. Earthware does not appear to have been cleared to use the data.
A spokesperson for the HSCIC said: “"The link to this tool has been taken down following a request by the HSCIC. We are investigating urgently the source of the data used and whether controls demanded of any organisation using data have been maintained. After this investigation we will take any necessary action."
A statement released by Earthware added that the company was "confident that we have not breached any legal or regulatory rules regarding the licensing or publication of [Hospital Episode Statistics] data".
Questions still remain over how PA Consulting was able to get a hold of the dataset from HSCIC. The HSCIC said: "PA Consulting used a product called Google BigQuery to manipulate the datasets provided and the NHS IC was aware of this. The NHS IC had written confirmation from PA Consulting prior to the agreement being signed that no Google staff would be able to access the data; access continued to be restricted to the individuals named in the data sharing agreement."
In a statement, PA Consulting Group said it had purchased the data from a predecessor of the HSCIC: "The data set does not contain information linked to specific individuals. The information is held securely in the cloud in accordance with conditions specified and approved by HSCIC."
Update: PA Consulting Group has contacted The Drum to state that the data it has does not contain patient name, address, NHS number or date of birth. It also want to clarify that it acquired the data it does have after signing a data sharing agreement to gain access to the Hospital Episode Statistics dataset from the Health and Social Care Information Centre. You can read more detail here.