What the New York Times copyright suit means for AI

Hello Eye on AI readers and Happy 2024!

As many of you know, I was on leave for the past several months, working on a book about the generative AI revolution and all its potential ramifications. The book is due to be published this summer by Simon & Schuster. Ill be letting you know more about it as the publication date approaches. Now back at Fortune, Im assuming a new role as our AI editor, helping to build out our coverage of this vital technology. And Ive got some exciting news: Eye on AI will be coming to your inbox more frequently. We are dedicated to providing you, as business leaders, with all the AI news you need to stay informed. AI is currently one of the hottest topics in the corporate world, and considering its rapid advancements, Eye on AI will now be delivered to you twice a week, on Tuesdays and Thursdays. Imagine, youll be twice as knowledgeable as before!

OK, the biggest news in AI this past week has got to be the copyright infringement lawsuit the New York Times filed against Microsoft and OpenAI in federal court on Dec. 27. Its a doozie, one many think will be precedent-setting. Some commentators speculated it could even spell the end of OpenAI, and perhaps the entire business model on which many generative AI companies have been built. The suit doesnt include a specific claim for damages but says the two tech companies should be held liable for billions of dollars in statutory and actual damages.

OpenAI, which had been in talks with the Times since April over possible licensing terms for the newspapers content, said it had thought negotiations were progressing and that it was surprised and disappointed by the Times suit. We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from A.I. technology and new revenue models, OpenAI spokesperson Lindsey Held said. Were hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers. Microsoft declined to comment on the lawsuit.

The Times alleges that tens of thousands of its articles were copied, without its permission, in the process of training the GPT models that underpin OpenAIs ChatGPT and Microsofts CoPilot (formerly called Bing Chat). It also alleges that ChatGPT and CoPilot allow users to further infringe on the Times copyrights by producing text that plagiarizes Times articles. It argues that the integration of OpenAIs GPT models with web browsing and search tools steals commercial referrals and traffic from the newspapers own website. In a novel claim for this sort of case, the publisher also alleges its reputation is damaged when OpenAIs models hallucinate, making up information and falsely attributing it to the Times. Among the reams of evidence that the Times submitted in support of its claims is a 127-page exhibit that includes 100 examples of OpenAIs GPT-4 outputting verbatim lengthy passages from Times articles when prompted with just a sentence, or part of a sentence, from the original.

The Times lawsuit is certainly the most significant of the copyright infringement claims that have been filed against OpenAI and Microsoft to date. The Times has top copyright lawyers, relatively deep pockets, and a history of pursuing claims all the way to the Supreme Court when it feels an issue presents a threat to not just its own journalism, but to the free press as a whole. The newspaper is claiming here that OpenAIs copyright infringement undercuts the revenues publications require to serve the public interest through news reporting and investigative journalism. This sets it apart from most of the other copyright infringement claims previously filed against OpenAI, which simply pit the commercial interests of creators against those of OpenAI. But what really differentiates the Times case is the clarity of the narrative and exhibits it presents. Many commentators believe these will prove highly persuasive to a jury if the case winds up in front of one.

Gary Marcus, the emeritus New York University cognitive scientist and vocal AI expert, opined, in a series of posts on X (formerly Twitter), that this is OpenAIs Napster moment. He claims the Times lawsuit could wind up bankrupting the high-flying AI startup, just as a landmark 2001 copyright judgment against Napster obliterated the peer-to-peer music-sharing companys business model and eventually drove it under.

Having done a fair bit of research into AI and copyright for my forthcoming book, I think this is unlikely to happen. For one, this case is likely to settle. The fact that the newspaper was in negotiations with OpenAI for a licensing deal and only filed suit after those talks apparently reached an impasse (probably because the Times was asking for more money than OpenAI wanted to pay) is a good indication that, despite the public interest gloss the Times applied to its complaint, its real motivation here is commercial. OpenAI has signed a deal with the Associated Press to license its content for AI training and last month inked a multiyear deal with publisher Axel Springer, which owns Business Insider and Politico, that gives OpenAI access to its current and archived content. That deal is worth more than $10 million per year, according to one report. OpenAI and Microsoft have a strong incentive to settle rather than deal with years of legal uncertainty; chances are, they will.

Even if this case goes to trial, a ruling might not ultimately go the Times way. Microsoft has deeper pockets than the Times and also has access to top-notch legal talent. And there are more precedents here than just Napster. Copyright experts vigorously debate which cases might be most analogousthe Google Books case, the Sega case, the Sony case, or the recent Andy Warhol case. The specifics of these analogies are too complicated to get into here. But the point is, this is far from a settled matter, and OpenAI and Microsoft have decent arguments they can use to try to defend themselves. It isnt open and shut by any means.

It is also possible that the U.S. Copyright Office or Congress will weigh in before the Supreme Court does. The Copyright Office has just concluded a commentary period on the implications of generative AI. The Senate also recently held hearings on the topic. It is possible Congress will step in and pass a new law that would render the Times claim moot. Some legal scholars have suggested Congress should create a fair learning law that gives software companies an explicit right to use copyrighted material for AI training. Meanwhile, those sympathetic to rights holders have suggested lawmakers should mandate that creators are compensated for any works used to train AI. Congress could also insist that AI companies apply filters to screen out any model outputs that are identical to copyrighted material used in training. There is a precedent for Congress weighing in this way: The 1992 Audio Home Recording Act exempted sellers of digital audio tape from being sued for copyright infringement. But it also set up a licensing fee that all manufacturers and importers of audio recording devices have to pay to the Copyright Office, which then distributes those funds as royalty payments to music rights holders. Congress could wind up establishing a similar licensing and royalty regime for generative AI software.

Finally, even if OpenAI is ultimately forced to pay creators licensing fees, it can probably afford it. The company is, according to some news accounts, currently bringing in revenue at a $1.6 billion per year clip, with some insiders predicting that this figure will hit $5 billion before 2024 is out. With this kind of cash machine, OpenAI can probably survive. While copyright infringement claims sank Napter, Spotify was eventually able to reach a settlement with music rights holders. And while those payments crimped Spotifys profits, and the company has lately struggled to sell stock investors on a convincing growth story, Spotify is also not about to go bust.

So, no, I dont thnk OpenAI will go under. But I do think the Times lawsuit signifies that the era of freely using copyrighted material for AI training is coming to an end. The threat of lawsuits will push most companies building AI models to license any data they use. For instance, there are reports that Apple is currently in discussions to do exactly this for the data it is seeking to train its own AI models. In image generation, artists are also increasingly turning to masking technology that makes it impossible to effectively train AI models on their work without consent. Similar technology does not yet exist for text or music, but researchers are working on it. And plenty of publishers have now taken steps to prevent their websites from being freely scraped by web crawlers. Pretty soon, the only way companies are going to be able to obtain the data they need to train good generative AI models is if they pay to license it. One way or another, the sun is setting on the Wild West of generative AI.

And with that, more AI news below.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

Ex-Trump lawyer blames AI for fake precedents cited in legal brief. The former Trump fixer Michael Cohen said in court papers unsealed last week that he accidentally provided his own lawyer with fictitious legal citations used in a filing submitted to a federal judge because he relied on Google's AI chatbot Bard. Cohen said he had not realized Bard could hallucinate, creating realistic-looking but fictitious citations, and had provided these cases to his lawyer not expecting the attorney, David Schwartz, would drop them into his brief without checking them for accuracy, the New York Times reported. Schwartz had filed a motion asking the court to end its supervision of Cohen, now that Cohen has been released from prison after serving time for campaign finance law violations. The Bard hallucinations could factor in the upcoming New York criminal trial of former President Donald Trump where Cohen is expected to serve as a key prosecution witness. Trumps lawyers have seized on the fake citations as evidence that Cohen is an unreliable and untrustworthy witness.

U.S. Supreme Court Chief Justice offers thoughts on AI and the law. Chief Justice John Roberts offered his thoughts on AI in the legal system in a year-end report published last week, the Independent reported. Roberts said that AI would not replace human judges any time soon but predicted that AI would increasingly be used to help lawyers prepare cases and do legal research. He said that such AI software could help level the playing field, improving access to legal resources for Americans who might not otherwise be able to afford them. However, he cautioned about AI's risks, including the problem of fake citations leading to legal errors, using the Michael Cohen news as an example, and warning about possible data privacy issues. He advised legal professionals to use AI with caution and humility.

U.K. terrorism law monitor warns AI chatbot could radicalize people. A lawyer appointed by the British government to assess its terrorism-related legislation says the countrys laws are insufficient to prevent people from being radicalized by AI chatbots. The lawyer, Jonathan Hall KC, told British newspaper the Telegraph that he chatted with a digital persona created by AI startup character.ai that was designed to mimic the head of the Islamic State and that it tried to recruit him to the terrorist group. He said the country currently had no laws that would hold someone responsible in cases where an AI chatbot, rather than a person, generated text that encouraged terroristic activities. Character.ai's terms and conditions prohibit users from uploading content that promotes violence and extremism but does not prevent the chatbot itself from outputting such content. Character.ai told the newspaper that its products should never produce responses that encourage users to harm others.

Nobel-winning economist cautions on STEM emphasis in new AI era. Christopher Pissarides, a Nobel-prize-winning labor market economist who works at the London School of Economics, said computer programmers were now sowing the seeds of their own destruction with the development of AI. He predicted that many coding and engineering roles in the future may be taken over by AI, while the skills that will be in high demand will be the empathetic and creative ones that humanities and liberal arts programs emphasize. He said that jobs requiring face-to-face contact, such as hospitality and health care, would not easily be replicated by AI, according to Bloomberg.

Sharing the burden. Many LLMs require huge amounts of computing power, not just to train, but also for inference. So there is growing interest in how this computing power might be federated, allowing groups of people without access to high-powered GPU clusters to run big AI models using laptops and PCs with a few GPUs available. Researchers from Yandex, Neiro.ai, the University of Washington, and Hugging Face have now proposed a method for distributed inference and for fine-tuning LLMs, an algorithm they call PETALS. They demonstrate that it can work on both LLAMA 2, which is an open-source 70 billion parameter LLM, and BLOOM, which is a 176 billion parameter model. With PETALs, each computer in the network only has to handle less than 3% of the full model weights, and it can run efficiently despite the latency and information loss that comes from trying to integrate lots of machines across the internet. You can read the paper, which is on the non-peer-reviewed research repository arxiv.org, here.

Boards are woefully unprepared for AI. Heres how they can start to catch up by Lila MacLellan

IBM AI chief advises people who want a tech job in 2024 to learn the language and creative thinking skills you get with the liberal arts by Ryan Hogg

These movies do the best job of accurately capturing AIs power and nuance, according to 10 AI experts by Kylie Robison

Queen Latifah feels the same nervousness that everyone feels about AI, but shes monetizing her digital avatar. Its a bell we cant un-ring by Rachyl Jones

This is the online version of Eye on AI, Fortune's weekly newsletter on how AI is shaping the future of business. Sign up for free.

2023 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice| Do Not Sell/Share My Personal Information| Ad Choices
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.
S&P Index data is the property of Chicago Mercantile Exchange Inc. and its licensors. All rights reserved. Terms & Conditions. Powered and implemented by Interactive Data Managed Solutions.