An existential threat: Italy’s ChatGPT ban underscores why consent is fundamental
The problem with generative AI programs like OpenAI‘s wildly popular ChatGPT is fundamentally one of user privacy and consent, writes DataGrail’s Daniel Barber as part of The Drum’s Data Deep Dive, The New Data & Privacy Playbook.
/ Adobe Stock
Open AI‘s ChatGPT has captured the attention of millions of people all over the world. It’s been thrilling to see all kinds of possibilities to make life easier and more informed.
But modern generative AI is a bit terrifying from a privacy perspective.
You may be thinking, ‘I just want to use ChatGPT to write a blog post – where is the harm in that?’
The underlying issue is one of consent.
Why consent matters
All current generative AI models, whether it’s Stable Diffusion or ChatGPT, are built without the concept of consent. They’re created based on information people didn’t realize would be used for this purpose.
It’s one thing to agree to have your information used to personalize an experience on a website, or even leave a review for a local restaurant in the public domain. It’s something else entirely to have your data factored into a massive algorithm being used for who knows what. No one explicitly agreed to have their creative output, thoughts, ideas or work ingested into these models.
Today, companies like Stable Diffusion and OpenAI source their models by pulling information from the internet (although Stable Diffusion does allow artists to identify used images and opt out via a designated tool). This process is problematic because the modern internet does not have the concept of consent attached to any particular piece of data.
The closest thing we have is the modern copyright system, which wasn't constructed to afford consent but rather to give control to rights holders. As such, billions of data points are being scraped at a remarkable rate in ways that people don’t know about, understand or have the ability to approve.
The issue gets particularly sticky because some identities are getting confused, or in the absence of data, ChatGPT has been known to make up false information about known individuals – sometimes known as ‘hallucination’ in the field – and there is very little recourse. It remains unclear how people can rectify errors about themselves, opening the door for ChatGPT to become the grandest internet troll of all.
This speakes to a larger regulatory issue. In the US, people forget to opt out of sharing their information all the time. They want to use a site or post a review and move on without thinking of potential consequences. What they share is then in the public domain, ripe for large language models (LLMs) like ChatGPT to pull into its training data.
In the EU, it’s a different story; robust privacy legislation provided through the EU‘s General Data Protection Regulation (GDPR) mandates that people must opt-in to allow access to and sharing of their information.
Generative AI programs under regulatory scrutiny
OpenAI and other companies using LLMs in ways that access personal information are officially at risk. On March 31, Italy ordered ChatGPT to stop processing its citizens’ data, effective immediately.
And Italy is not likely to be the last European country to issue such a ban. Because OpenAI doesn’t have an established legal entity in the EU, it is subject to intervention by any data protection authority under GDPR regulation that applies any time an EU citizen’s personal information is processed.
Assuming that OpenAI has processed the data of Europeans in training its LLMs, it’s possible that beyond potential bans, Europe‘s various data protection authorities could order such data to be deleted. This would wreak havoc that would affect usage everywhere. The ripple effects would be profound. And the process of deleting data is incredibly complex at such a scale. Even if it were possible, models would need to be retrained or the sign-up process would have to change profoundly.
Another possibility is that companies leveraging ChatGPT could face substantial fines from the EU for processing personal information without consent. This could be a pretty big deal, as GDPR touches about 500 million people, and the EU has grown more aggressive in terms of enforcement.
Italy fired the first shot at ChatGPT, and we’ll see how OpenAI responds in the coming days. But one thing is clear: this issue will not go away any time soon.
Suggested newsletters for you
The onus falls on companies
Generative AI poses an urgent threat, as ChatGPT and similar competitors proliferate at an astounding rate. More LLMs, fueled by greater amounts of personal information, will make it out into the world before we can fully press pause.
In the US, lawmakers have dragged their feet on federal data privacy legislation for far too long. It has to move beyond a myopic focus on TikTok’s privacy risks to expand its horizons. ChatGPT and other generative AI programs, if left unregulated, have the potential to do even greater damage. Regulation is necessary in terms of obtaining consent, determining who is entitled to access that content and establishing guidelines for how to protect identities and intellectual property in the new generative AI world. It’s time for Congress to do its job.
In the interim, businesses themselves must take on the responsibility. Those that absorb and house user-generated content have a responsibility. How can they protect their users – or at the very least make them aware of how their data is being used?
Companies must think about how users are affected by consent and generative AI today – or how they could be in the future. Many organizations have made their money by making the content of individual users widely accessible. Now the rules have changed.
If any organization buys into the argument that nonconsensual uses of user content should be prohibited, it is imperative that they see the potential harm to users by making their data publicly accessible to generative AI models. Companies must figure out what they are going to do, even if it is just a checkbox on consent to have their content fed into LLMs.
Of course, it’s important to point out that generative AI is not intrinsically bad. The technology itself is actually pretty incredible. But consent must be baked into the process, and the models that power it need to be built with the knowledge and agreement of contributors.
Without a doubt, generative AI is going to change society. It's incumbent on us that the ways in which it touches lives are net beneficial disruptions. Generative AI socializes cost and privatizes benefit, and we have to find a way to equalize that power flow. Time is not on our side.
Daniel Barber is co-founder and chief executive officer at DataGrail. To read more from The Drum’s latest Deep Dive, where we’ll be demystifying data & privacy for marketers in 2023, head over to our special hub.