Government privacy regulations and calls for better self-governance of data collection by companies today all revolve around the concept of “personal data.” But defining what “personal data” is exactly will be a major issue in the battle around consumer privacy.
One breakthrough in a new field of privacy protection known as “differential privacy” will reset notions around the concept of personal data. Once again, technology development will outpace regulatory development to ultimately solve the very problem it created.
Ask a consumer what personal data is and he or she will say name, date of birth, address, contact information, and any government identification numbers. Ask GDPR regulators, and they will go so far as to say any information relating to an identified or identifiable person (Internet browsing habits, for example). Ask California regulators trying to figure out enforcement of the upcoming CaCPA legislation, and they will say any data that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with particular consumer or household.
The general consensus around the definition of “personal data” generally includes who a person is and any information on their activity and interests that can be reasonably tied back to that particular person. No one wants digital tracking to tie back to real-world identities in a way that allows unauthorized entities to amass a large profile to a specific person.
Now, what if information can be collected, but not tied to, a specific, identifiable person? In his call to arms to fight for privacy, Apple CEO Tim Cook has demanded companies to “challenge themselves to strip identifying information from customer data.” He recognizes that the more data breaches and leakages that occur, the more data tied to an identifiable person gets out there, leading to more personal privacy loss and greater digital security vulnerabilities. If those data breaches and data leakages didn’t include a way to identify the individual, the liability and negative consequences would be greatly mitigated. At minimum, Cook is asking that information be tied to an anonymized ID rather than anything personally identifiable.
But recent advancements in privacy technology have been positive and negative. Taking anonymized data sets and combining them with other de-anonymized data sets (from data breaches, public data, etc.) lets you correlate information to find patterns that may help you re-identify users in the anonymous data. This de-anonymization technique threatens to turn any people-related data set collected into personal data.
Despite these risks, sharing data is paramount to achieving better outcomes. For digital advertising, sharing ad exposure data leads to more ad transparency, more ad spend, and more ad-supported media, as a result. For digital apps, sharing usage data leads to improved software stability and user experience.
It’s no wonder that Apple has been leading the positive development for privacy by adopting “differential privacy.” When Apple devices ask to collect analytics data from each device owner about events such as crashes and usage, Apple may seem like it is amassing data about an individual, but it isn’t. It is now using differential privacy to mask the identity of any particular data contributor and their identifiable patterns while getting accurate tracking information across all the devices.
In a simplified example of using a form of differential privacy, if you wanted to research a sensitive topic such as the prevalence of infidelity, you could survey a large group of couples. Any individual in the group might reasonably worry their information could be compromised — even if promised anonymity. So, instead of just directly answering the question, “Have you been unfaithful to your partner?”, the respondent is first asked to flip a coin. If it’s heads, the response should be the truth, but the outcome of the coin shouldn’t be shared. If the coin flips tails, the person needs to flip a second coin; if that one is heads, the response should be “yes.” If the second is tails, it’s “no.”
Because coin tosses over the long run come up tails or heads 50 percent of the time, the researcher can roughly estimate how many people actually had affairs. But if the data set leaked or a bad actor finds out that one particular individual answered “yes,” that bad actor has no idea if that is because the person cheated or because he said so after getting a tails and then heads on his coin flip. So, even knowing the algorithm wouldn’t compromise individual personal data.
Actual differential privacy algorithms can be much more complicated. However, applying these principles to advertising and data tracking can now enable advertising data collection that doesn’t risk re-identification (i.e. not personal data).This approach may usher in the the rise of “community gardens,” where advertising data can be shared with advertisers to generate better products and experiences, but personal data isn’t leaked or compromised.
While technology has found a way to track users across their digital lives — putting personally identifiable information at risk — it now also offers a solution to protect consumer privacy while enabling an ad-supported Internet that has come to depend on targeting and measurement.
Victor Wong is chief executive of Thunder Experience Cloud