As European data authorities scrutinize ChatGPT, experts see AI regulation on the horizon
Consumer data privacy and protection are at the heart of growing concerns around OpenAI’s wildly popular ChatGPT and similar AI programs. Now, the pressure is mounting on lawmakers across the globe.
ChatGPT is raising alarms among some privacy regulators – which could be a sign of things to come / Adobe Stock
Europe’s preeminent data protection authority, the European Data Protection Board (EDPB) announced last week that it’s assembling a task force focused on ChatGPT, the OpenAI-backed generative artificial intelligence (AI) program that, since its November launch, has made a splash as the fastest-growing consumer application in history.
In a statement, the group cited growing concerns among regulators over potential privacy risks of the tool, pointing to a decision in early April by Italy’s data protection agency to temporarily restrict ChatGPT and launch an investigation after a March 20 data breach put some users’ personal information at risk.
“The EDPB members discussed the recent enforcement action undertaken by the Italian data protection authority against OpenAI about the Chat GPT service,” the April 13 statement said.“The EDPB decided to launch a dedicated task force to foster cooperation and to exchange information on possible enforcement actions conducted by data protection authorities.”
Meanwhile, Spain’s privacy watchdog, AEPD, announced on the same day that it is opening up a probe into potential data breaches by ChatGPT. Germany also signaled in early April that it is considering barring ChatGPT over data privacy and security concerns.
Across the pond, Canada’s privacy commissioner said it plans to investigate ChatGPT, and some experts believe the California Privacy Protection Agency and other state authorities could attempt to lump new AI rules into various efforts to crack down on automated decision-making.
This growing momentum across markets, experts posit, could accelerate the establishment of broad privacy regulations around AI in Europe and North America.
Here’s what to know about ChatGPT’s potential privacy pitfalls – and key predictions for the future of AI regulation in Europe and the US, according to leaders in the privacy space.
What are the major privacy concerns surrounding ChatGPT and similar applications?
Though there are a plethora of potential concerns about data protection and security within the context of AI, Ariell Garcia, chief privacy officer at ad agency UM Worldwide, says that “privacy concerns surrounding ChatGPT and generative AI can broadly be thought of as falling into two categories: those relating to the inputs, and those relating to the impacts.”
A lot of the concerns in the first camp involve what Garcia calls “traditional privacy and data protection rights and expectations,” including “notice, choice, and individual privacy rights.”
And many top concerns relating to inputs have to do with the information that’s used to train large language models (LLMs) like ChatGPT. “[It’s about] the large-scale scraping of information over the web to build the model, without notice or consent and the failure to properly safeguard customer accounts that include some personal information and access to prompt history,” says Ben Winters, senior counsel at the Electronic Privacy Information Center. Winters leads the organization’s AI and Human Rights Project. “Particularly with OpenAI’s decision to close access to the data set, it is unclear what the dataset is comprised of, and there are massive privacy and intellectual property concerns associated with that. There is no knowledge of or power over access to your data in that circumstance.”
And of course once the inputs of a system are affecting its outputs, it can be difficult to weed out individual bits of data. “Since model training is expensive and time-consuming, and since it’s more complicated than merely deleting an entry in a database, the process whereby information – or downstream variants thereof – is removed from these systems is also not always clear,” says Rob Leathern, an investor, privacy expert and former Google product vice-president. “There are a lot of questions people and privacy regulators have both about the training data and also the data that people input in their prompts, and how those data are used. Companies have been giving employees guidance not to put confidential data into these systems but presumably it is still happening.”
Some privacy pros are especially concerned that sensitive categories of data – such as individual consumers’ religion, race and personal health information – could have been used in the training sets of some LLMs, and whether those programs could then reveal highly-sensitive information.
But concerns about inputs also go beyond personal information, Garcia points out. “It also touches on intellectual property considerations – such as where training data might include copyrighted material, or where users share confidential and proprietary information, as was the case with Samsung employees inadvertently leaking source code and other commercially sensitive information ”– an incident associated with ChatGPT’s March data breach.
Plus, as existing privacy regulations like the EU’s General Data Protection Regulation and the US’ California Consumer Privacy Act have highlighted, there is growing worry about organizations’ ability to act in accordance with consumers’ explicit rights and choices, Garcia says, “such as the rights to be forgotten, correction, objection and opt out.” She says that “honoring individual rights is invariably more challenging in the context of algorithmic models, and will bolster existing calls for algorithmic transparency.”
Beyond the profusion of concerns about system inputs, there are also a number of issues that fall into Garcia’s second camp – concerns related to impacts.
Many of these are closely tied to concerns about inputs. For example, if an LLM is indeed using sensitive consumer data in its training, how might this fact affect the outputs it produces? “Are the systems producing outputs that are fair – in different ways – to different populations? Is it safe to be releasing these systems when we’re still plagued by a ‘black box’ problem with AI, where AI and machine learning systems cannot adequately explain the origins of and reasoning behind specific outputs?,” says Justin Sherman, a senior fellow at Duke’s Sanford School of Public Policy and the chief exec of Global Cyber Strategies, a Washington, DC-based research and advisory firm.
Some organizations are already attempting to correct for these kinds of issues. Amazon, for instance – which this month made its first foray into generative AI – is developing a metric it calls ‘conditional demographic disparity’ to ensure some measure of fairness is built into the system’s code. Of course, without legal, agreed-upon definitions of fairness, definitive answers about how equitable an LLM is can be nearly impossible to reach.
Other perhaps more general concerns involve the safety accuracy of LLM outputs – something the Italian authorities allude to in their new order, pointing out that, without stringent age verification requirements, LLMs like ChatGPT can expose children to misinformation and inappropriate or potentially harmful content. “Concerns about data-enabled harms including the proliferation of harmful content, misinformation and disinformation, as well as algorithmic manipulation, bias and discrimination, are not new at their core – despite the expansion of concerns relating to synthetic media like deepfakes and voice clones,” says Garcia.
Concern around LLM-driven plagiarism is also increasing. “That’s going to be a major issue for advertisers, marketing agencies, and other companies looking to take advantage of these systems,” says Sherman. “Is a text output going to be infringing on someone else’s intellectual property? And related, how specific must queries be to solicit an output that is sufficiently unique compared to what others might be asking?”
It’s an issue that’s already arising – especially as it relates to image-generating models like Midjourney and OpenAI’s DALL-E and DALL-E 2, some of which have already been accused of stealing independent artists’ work.
So, how will AI regulation shape up in Europe and North America?
While efforts to regulate AI touch on a range of issues, data privacy and protection are at the heart of many.
“What we’re seeing is that privacy and data regulators are emerging as the frontline of regulatory activity on large language models because of their authority over personal information,” says Cameron F. Kerry, a distinguished visiting fellow at the Brookings Institution and a global leader in privacy, information technology and artificial intelligence. “With all the buzz about GhatGPT and other LLMs, a push for broader regulation is gaining momentum.”
Some regulatory progress is already being made. In the EU, for example, a proposed bill called the Artificial Intelligence Act (or AI Act) intends to create new guardrails. But experts aren’t confident that the proposal will stand up to the challenges posed by ChatGPT and similar programs. “The European Parliament is struggling to adapt the Artificial Intelligence Act to make it work with generative AI,” says Dr. Lukasz Olejnik, a leading security and privacy researcher and consultant. “It isn’t clear how successful it will be since the original proposal issued by the European Commission did not foresee [the proliferation of generative AI systems]. There are limits to what the European Parliament can fine-tune.”
Nonetheless, many experts are optimistic that rising pressure from Europe’s data protection authorities (DPA) could catalyze legislative change. “The investigation into Chat GPT by the Italian DPA will show whether the existing legal framework applicable to AI in Europe is robust enough to equip regulators – in this case a data protection regulator – to conduct investigations and, where appropriate, impose fines and corrective actions,” says Isabelle Roccia, the European managing director at the International Association of Privacy Professionals (IAPP). She also believes that scrutiny from Europe’s DPAs will put more pressure on lawmakers to make progress on the AI Act, which they aim to finalize by the end of the year.
As with many other regulatory issues, Europe is likely to outpace the US when it comes to cracking down on ChatGPT and other LLMs over privacy concerns. “Going forward, we can expect Europe to take more action than the US in passing laws and regulations around AI chatbots and large language models,” says Duke’s Sherman.
Sherman is sure to caveat that “this is not to say European tech policies are necessarily better,” but notes that “data privacy is a prime example of where Europe, while imperfect, has clearly done a better job than the US so far.” The US, he says,“remains tied up in Congress’ ineffectiveness at getting things done – technology regulation included.”
Of course, the US isn’t totally without AI regulatory attention – though, at this point, such attention largely concerns enforcement rather than lawmaking. The US Federal Trade Commission (FTC), for example, is making AI a priority, as the agency’s commissioner Alvaro Bedoya made clear in a keynote speech delivered at the IAPP Global Privacy Summit 2023 in Washington, D.C. earlier this month. Bedoya highlighted how AI can be regulated under the agency’s authority to crack down on what it deems “unfair and deceptive trade practices,” civil rights violations and product liability legislation.
“Given the FTC’s increasingly active enforcement actions, and their use of more substantive remedies such as algorithmic disgorgement, further enforcement action relating to AI is nearly certain to preclude federal legislation,” Garcia says. In the shorter term, she predicts that “the FTC may choose to sustain focus on enforcement against more common instances of algorithmic harm” rather than generative AI specifically, though it could use this approach to “further signal and reinforce expectations of generative AI applications.”
Plus, outside of the FTC, other government agencies may involve themselves, the Brookings Institution’s Kerry says. For instance, “in the US, agencies that regulate employment (like the Equal Employment Opportunity Commission ), housing (like the Department of Housing and Urban Development), and financial services (like the Consumer Financial Protection Bureau) and others are looking closely at how to use antidiscrimination law to police algorithmic discrimination.”
Despite the relatively slow pace of progress, pressure is mounting on lawmakers in Europe and North America to institute new restrictions on AI development and privacy protections for consumers. Just last week, US Senate majority leader Chuck Schumer announced that the Senate will begin developing new legislation focused on AI transparency.
And the need for change is pressing, Garcia says. “At a time where the tech industry is fervently pursuing efficiency, even to the extent that it impacts growth, generative AI endeavors appear an attractive means to advance dual ends. Amidst the intensifying frenzy to develop, release and adopt generative AI, the absence of regulation creates an all-too-familiar risk of enabling a race to the bottom on trust, safety and ethics.”
Suggested newsletters for you
Implications for marketers
For marketers, there’s an opportunity to be more scrupulous about how AI is used in campaigns – and to see growing regulatory pressure as another hazard in a landscape already inhospitable to non-privacy-preserving methods of tracking and targeting, experts say.
The Electronic Privacy Information Center’s Winters says that scrutiny around AI and its potential threats to privacy should inspire marketers and advertisers “to be more innovative and creative in advertising with less data-intensive or privacy-violating technologies and methods.”
It shouldn't be too great a task for the industry, Winters posits. “It may require some reorientation for companies that already are employing generative AI in their business, but the advertising and creative industries have been able to operate without these tools for a long time, and should be able to continue to do so. For creatives particularly, a ban may reduce instances of their work being copied or content-generating jobs being diminished.”
For more, sign up for The Drum’s daily US newsletter here.