Artificial Intelligence GDPR Data Privacy

Will OpenAI and other LLM developers be able to weather the winds of privacy regulation?

By Kendra Barnett, Associate Editor

April 30, 2024 | 10 min read

AI developers are facing a growing number of legal challenges and enforcement complaints over alleged violations of consumer privacy rights. Whether they’ll change their practices, however, remains to be seen.

LLMs are increasingly coming under fire for their data privacy practices / Adobe Stock

OpenAI is facing a new complaint that its uber-popular generative AI platform ChatGPT has violated the EU’s sweeping consumer privacy law, the General Data Protection Regulation (GDPR), by ‘hallucinating’ – or producing inaccurate information – about private citizens.

The complaint, lodged by nonprofit group Noyb (founded by Austrian privacy advocate Max Schrems, who has helped pioneer rules around EU-US data transfers), was filed with the Austrian Data Protection Authority (DPA) on Monday on behalf of an individual complaint. The individual, an unnamed person identified only as a “public figure” by Noyb, alleges that ChatGPT produced an inaccurate birthdate for them.

The GDPR guarantees EU residents a handful of rights regarding their personal data, including the right to have false or inaccurate data about them corrected. Noyb argues that OpenAI is failing to fulfill this requirement, saying that OpenAI rejected a request to fix the private citizen’s incorrect birthdate on the grounds that it was technically impossible.

“Factual accuracy in large language models remains an area of active research,” OpenAI reportedly told Noyb.

As Noyb wrote in a blog post published Monday: “In the EU, the GDPR requires that information about individuals is accurate and that they have full access to the information stored, as well as information about the source. Surprisingly, however, OpenAI openly admits that it is unable to correct incorrect information on ChatGPT. Furthermore, the company cannot say where the data comes from or what data ChatGPT stores about individual people. The company is well aware of this problem but doesn’t seem to care.”

It’s not the first time that ChatGPT and other generative AI platforms have been criticized for allegedly violating consumer data privacy legislation.

In September, a class action lawsuit filed against OpenAI and its largest investor, Microsoft, alleged that the companies train their AI models on illegally acquired personal data belonging to hundreds of millions of internet users. Clearview AI, meanwhile, is facing a lawsuit for allegedly violating Illinois’s Biometric Information Privacy Act by using billions of images online for its facial recognition software without the consent of those depicted. A number of other AI companies – and consumer companies employing AI technology – are also fielding legal challenges over data privacy concerns.

As it stands, 97% of attorneys indicate some level of concern about data privacy in light of the proliferation of generative AI, according to a recent survey from Bloomberg Law.

Individuals are also voicing their concerns to DPAs across various jurisdictions. One such complaint was filed in January by Lukasz Olejnik, an independent researcher, consultant and co-author of the 2023 book Philosophy of Cybersecurity, who has claimed that he found ChatGPT not only producing erroneous data about him but going so far as to attribute fictitious works to him.

Olejnik’s complaint alleges that the model violates the GDPR’s rules regarding the principle of transparent and fair data processing, permitting EU residents to access and rectify inaccuracies about their personal information, the security of processed data and ‘privacy by design’ rules.

He’s pleased that others, like Noyb, are also raising concerns with Europe’s DPAs. “I welcome another complaint joining my existing one concerning the same aspects of LLMs and ChatGPT,” he tells The Drum. “Those data protection issues are real and they raise aspects of human dignity.”

Suggested newsletters for you

Daily Briefing

Daily

Catch up on the most important stories of the day, curated by our editorial team.

Ads of the Week

Wednesday

See the best ads of the last week - all in one place.

The Drum Insider

Once a month

Learn how to pitch to our editors and get published on The Drum.

And OpenAI and other LLM developers won’t shake off scrutiny around their privacy practices anytime soon, experts say.

“The GDPR focuses on rights surrounding personal data and is deliberately designed to be a technology-neutral legislative instrument. However, the GDPR was developed at a time when generative AI as we know it was not foreseen,” says Anne Flanagan, vice-president of artificial intelligence at the Future of Privacy Forum, a privacy-focused Washington, DC-based thinktank. “Advances in technology will continue to challenge the interpretation of the law and we will likely continue to see regulators respond by leaning into the spirit of the GDPR when issuing guidance.”

The issue at the heart of Noyb’s GDPR complaint – the problem of hallucination and EU citizens’ right to correct erroneous data about themselves – may prove especially challenging, however.

“Due to the complexity of generative AI models, rectification and deletion [of data about individuals] are particularly difficult,” explains Daniel Barber, the co-founder and CEO of DataGrail, a data privacy software firm.

In Barber’s view, protections around data corrections will remain at the center of generative AI’s privacy problems, even if AI developers get better at demonstrating how and what kind of data they process and use to train their models. “LLMs may need to provide confidence ratings on data retrieved, but EU rectification rights will continue to be in question,” he says.

But there are other, broader data privacy concerns that are also facing the AI industry. Common protections built into data privacy legislation across the world, including transparency about the kind of data collected and rules about data minimization – which require many companies to collect, process and store only personal data that is relevant and necessary for specific purposes – may be threatened by the inherent design of many AI models. In short, many models can’t operate without violating some of the core principles of many privacy laws.

“The way that OpenAI and other generative AI companies have built their products is in direct conflict with international privacy laws – GDPR, LGPD in Brazil, the Privacy Act in Canada [and others] – and basic data processing and privacy principles …” says Calli Schroeder, senior counsel and global privacy counsel at the public interest research organization the Electronic Privacy Information Center.

While Noyb’s complaint focuses on the GDPR right to rectify inaccurate personal data, AI programs will soon have to grapple with other kinds of data rights, “like right to deletion or objection – and basic data processing requirements, like establishing a legal basis to use the personal data in training datasets in the first place,” says Schroeder.

Mitigating privacy violations in AI development is, of course, possible. Developers could filter out data that is inaccurate or not obtained legally to prevent such information from being processed and used to train models. Regular auditing that evaluates inputs, outputs and algorithmic design would also help to ensure compliance with global privacy regulations.

Schroeder adds: “They could be forced to divest training datasets that contain inaccurate information or personal data that they are using without a proper legal basis – and divest any algorithms trained on the ill-gotten data.”

These practices – which would ultimately reduce the amount of data processed and stored by developers – are, of course, likely to undercut the reach and capabilities of AI tools. For this reason, experts like Schroeder are skeptical that players like OpenAI will change their ways without an imminent and existential threat to their businesses.

As she puts it: “Could … generative AI companies solve these privacy problems? Yes. Will they? Not until their only options are ‘follow the law or don’t do business.’ [These changes] cost time and money, so my guess is they will avoid this – and claim enforcing these laws will ‘stifle innovation’ – until they have no other option.”

Schroeder predicts that many generative AI developers will argue to DPAs and judiciaries that AI simply cannot comply with certain privacy requirements and should, therefore, be exempted under the law.

Of course, privacy advocates, lawmakers concerned with data rights and enforcement agencies aren’t likely to back down, either. In Schroeder’s words: “Expect law enforcement and privacy advocates to respond that if [AI developers] can’t comply with basic protections, maybe they shouldn’t exist.”

For more, sign up for The Drum’s daily newsletter here.