What you need to know about copyright issues surrounding generative AI
While the technology is being hailed within the marketing industry for its ability to supercharge and supplement human creativity, it’s also presenting some thorny legal questions.
ChatGPT is based upon a LLM called GPT-4, which is trained using vast amounts of data from the internet. / Adobe Stock
2023 could very well be remembered within the marketing industry as the Year of Generative AI.
Over the past several months, platforms like ChatGPT, Dall-E 2, Midjourney and Stable Diffusion have been making waves within the world of advertising for their ability to quickly and competently produce content from text-based user inputs. Some marketers have hailed generative AI as a complete paradigm shift. And at the most recent Cannes Lions festival – the ad industry’s biggest annual event – the tech was the undisputed center of attention.
But generative AI has a glaring problem – one that’s becoming increasingly difficult for marketers to ignore: it’s opening up a messy and dangerous web of copyright concerns.
The problem with training LLMs
Rather than being programmed like traditional computer software, generative AI models are trained using an enormous quantity of data. Large language models (LLMs) like OpenAI’s GPT-4 and Google’s LaMDA, for example, were trained from huge volumes of text culled from the internet. Through techniques like natural language processing and reinforcement learning, LLMs gradually develop the capability to mimic the information that’s fed to them in a training dataset by producing ‘original’ text which convincingly appears as though it could’ve been composed by a human being (even if that text occasionally deviates from the truth).
As generative AI becomes increasingly powerful and accessible, a growing number of voices are rising in protest to what they view as the technology’s flagrant disregard for copyright law.
A number of artists, for example, have publicly claimed that image-generating platforms like Midjourney and Stable Diffusion are plagiarizing their work. And in June, two lawsuits filed on behalf of a total of five authors – including the comedian Sarah Silverman – accused OpenAI and Meta of illegally using copyrighted book material to train LLMs.
The attorneys behind the class-action lawsuits, Joseph Saveri and Matthew Butterick, claim in their case briefing that the tech companies “copied” the authors’ work “without consent, without credit, and without compensation.” (Saveri and Butterick have also filed similar lawsuits against Stable Diffusion and the code-generating AI platform GitHub.)
Generative AI, Saveri and Butterick write in their new briefing, “is just human intelligence, repackaged and divorced from its creators.”
The June lawsuits against OpenAI and Meta are just two examples from what is already becoming a long string of such cases made against the companies that are building generative AI.
“The basic question of whether or not an AI using copyrighted work constitutes copyright infringement is, for now, an open issue,” says patent attorney Robert McFarlane. Ultimately, McFarlane believes that some uses of generative AI will be deemed to constitute copyright infringement while others won’t. “These cases that are just starting now are going to try to draw that line,” he says.
How applicable is ‘fair use’?
Columbia Law School professor Shyamkrishna Balganesh echoes that belief that the American legal system’s eventual determination of whether or not the training of an LLM constitutes copyright infringement will almost certainly not be cut-and-dry. “The biggest impediment we have right now, in my view, is the assumption that everyone makes that there is a clear answer,” he says, adding that “there’s a lot of misplaced reliance” within the legal profession on the so-called “fair use” doctrine, a US law intended to limit the reach of copyright claims and allow for the legal and permissionless use of some copyrighted content.
In a foundational case for US fair use law, Authors Guild v Google, a federal court held in 2015 that the Google Book Search program – through which millions of copyrighted books were scanned and digitized without the authors’ permission – did not violate US copyright law. The decision was later upheld by the US Supreme Court. In essence, the courts reasoned that Google was not attempting to “substitute” the original copies of the books through the digital versions and also that the tech company was in fact providing a legitimate public service by distributing information among the public and widening authors’ readerships.
“A lot of lawyers for the AI industry strongly believe that the fair use doctrine coming out of [Authors Guild v Google] is going to protect the training purposes that are behind the ML model,” Balganesh says. “Myself, I’m not sure that that’s an open-and-shut case.”
The courts could ultimately decide, for example, that the scanning and digitization of books for the purpose of online search is fundamentally different from a machine ingesting those same books in order to refine its capabilities. “There’s a universe in which there’s a difference [and] it may seem subtle, but I think it’s a subtlety with a potentially significant variation,” says Balganesh.
Commerciality – that is, the intention or lack thereof to make a profit – is also a salient issue here. To this point, Balganesh points to the recent Supreme Court decision in Andy Warhol Foundation for the Visual Arts Inc v Goldsmith, which held that Warhol’s painting of the musician Prince, based on a photograph taken by Lynn Goldsmith, did not constitute fair use, primarily because the artist (who died in 1987) had created the painting for commercial purposes via a contract with the magazine publisher Condé Nast.
When considering the lawsuits that are being leveled against the tech companies that are developing generative AI models, Balganesh says that courts will need to assess: “Are you producing it for commercial purposes, or are you producing it for non-commercial purposes? OpenAI may well be different from Meta in terms of commerciality… that’s why the belief that the fair use doctrine has a clear answer to the question [of whether or not using copyrighted material to train generative AI models] is probably a big mistake.”
How might tech companies respond to the accusations of copyright infringement that are being leveled against them? One possible defense might be the argument that the content produced by LLMs is sufficiently different from the texts upon which they were trained so as to exonerate them from any such accusations.
Suggested newsletters for you
Brenda Leong, an attorney who specializes in AI, compares this hypothetical scenario to a human painter who borrows from the style of a well-known artist: “I can paint something in the style of Van Gogh as long as I don’t try to assert that it’s by Van Gogh,” she says. “I can’t copy his exact picture, but I can paint in his style, and I can even copy his exact picture if I make enough changes to it. I can make a spoof, I can make a parody, I can make these different categories of reinterpretation of someone else’s art and use that commercially.”
The question of whether or not a machine can legally do the same thing, Leong says, is one that the courts will now have to grapple with.
Advice for marketers, from marketers
Given the uncertain and still-evolving legal landscape which currently surrounds generative AI, how should marketers approach this technology?
Broadly speaking, in light of the fact that these are legal questions that are just beginning to be debated, the best thing that marketers can do at the moment is to pay attention to the relevant cases. “For now, marketers working with [generative AI] would be foolish not to keep their ears to the ground on legal challenges,” says Mark Penn, president and managing partner of the Stagwell Group.
The legal landscape surrounding the tech “is very fluid between lawsuits and regulation and it will take some time for the complexity of copyright and IP to get sorted out and unwind,” says Brian Yamada, chief innovation officer at VMLY&R. “It is critical to stay connected to both agency and client legal teams for any AI-forward work.”
If a brand partner is concerned about the associated legal risks of leveraging generative AI, “then it might be best to stay away from using models that are trained on copyrighted data,” says Mike Creighton, executive director of experience innovation at Instrument.
It’s also worth bearing in mind the current legal status of content that’s produced by generative AI. In a recently published document called Generative Artificial Intelligence and Copyright Law, the Congressional Research Service highlighted the fact that US law only “affords copyright protection to ‘original works of authorship,’” and that “the US Copyright Office recognizes copyright only in works ‘created by a human being.’”
The document also notes that “works created by humans using generative AI could arguably be entitled to copyright protection, depending on the nature of human involvement in the creative process.” But simply entering a text-based prompt, according to the document, will probably not constitute a sufficient degree of “human involvement” to be afforded copyright protection.
“This certainly affects client-agency relationships in which content is created under a work-for-hire agreement,” says Paul Roetzer, founder and CEO of the Marketing AI Institute. “If agencies are using AI to generate content for clients, the client usually assumes they own the copyright for that content, but that may not be the case.”
For more on the latest happenings in AI, web3 and other cutting-edge technologies, sign up for The Emerging Tech Briefing newsletter.