


OpenAI's Motion to Dismiss Copyright Claims Rejected by Judge (arstechnica.com) 92
Is OpenAI's ChatGPT violating copyrights? The New York Times sued OpenAI in December 2023. But Ars Technica summarizes OpenAI's response. The New York Times (or NYT) "should have known that ChatGPT was being trained on its articles... partly because of the newspaper's own reporting..."
OpenAI pointed to a single November 2020 article, where the NYT reported that OpenAI was analyzing a trillion words on the Internet.
But on Friday, U.S. district judge Sidney Stein disagreed, denying OpenAI's motion to dismiss the NYT's copyright claims partly based on one NYT journalist's reporting. In his opinion, Stein confirmed that it's OpenAI's burden to prove that the NYT knew that ChatGPT would potentially violate its copyrights two years prior to its release in November 2022... And OpenAI's other argument — that it was "common knowledge" that ChatGPT was trained on NYT articles in 2020 based on other reporting — also failed for similar reasons...
OpenAI may still be able to prove through discovery that the NYT knew that ChatGPT would have infringing outputs in 2020, Stein said. But at this early stage, dismissal is not appropriate, the judge concluded. The same logic follows in a related case from The Daily News, Stein ruled. Davida Brook, co-lead counsel for the NYT, suggested in a statement to Ars that the NYT counts Friday's ruling as a win. "We appreciate Judge Stein's careful consideration of these issues," Brook said. "As the opinion indicates, all of our copyright claims will continue against Microsoft and OpenAI for their widespread theft of millions of The Times's works, and we look forward to continuing to pursue them."
The New York Times is also arguing that OpenAI contributes to ChatGPT users' infringement of its articles, and OpenAI lost its bid to dismiss that claim, too. The NYT argued that by training AI models on NYT works and training ChatGPT to deliver certain outputs, without the NYT's consent, OpenAI should be liable for users who manipulate ChatGPT to regurgitate content in order to skirt the NYT's paywalls... At this stage, Stein said that the NYT has "plausibly" alleged contributory infringement, showing through more than 100 pages of examples of ChatGPT outputs and media reports showing that ChatGPT could regurgitate portions of paywalled news articles that OpenAI "possessed constructive, if not actual, knowledge of end-user infringement." Perhaps more troubling to OpenAI, the judge noted that "The Times even informed defendants 'that their tools infringed its copyrighted works,' supporting the inference that defendants possessed actual knowledge of infringement by end users."
OpenAI pointed to a single November 2020 article, where the NYT reported that OpenAI was analyzing a trillion words on the Internet.
But on Friday, U.S. district judge Sidney Stein disagreed, denying OpenAI's motion to dismiss the NYT's copyright claims partly based on one NYT journalist's reporting. In his opinion, Stein confirmed that it's OpenAI's burden to prove that the NYT knew that ChatGPT would potentially violate its copyrights two years prior to its release in November 2022... And OpenAI's other argument — that it was "common knowledge" that ChatGPT was trained on NYT articles in 2020 based on other reporting — also failed for similar reasons...
OpenAI may still be able to prove through discovery that the NYT knew that ChatGPT would have infringing outputs in 2020, Stein said. But at this early stage, dismissal is not appropriate, the judge concluded. The same logic follows in a related case from The Daily News, Stein ruled. Davida Brook, co-lead counsel for the NYT, suggested in a statement to Ars that the NYT counts Friday's ruling as a win. "We appreciate Judge Stein's careful consideration of these issues," Brook said. "As the opinion indicates, all of our copyright claims will continue against Microsoft and OpenAI for their widespread theft of millions of The Times's works, and we look forward to continuing to pursue them."
The New York Times is also arguing that OpenAI contributes to ChatGPT users' infringement of its articles, and OpenAI lost its bid to dismiss that claim, too. The NYT argued that by training AI models on NYT works and training ChatGPT to deliver certain outputs, without the NYT's consent, OpenAI should be liable for users who manipulate ChatGPT to regurgitate content in order to skirt the NYT's paywalls... At this stage, Stein said that the NYT has "plausibly" alleged contributory infringement, showing through more than 100 pages of examples of ChatGPT outputs and media reports showing that ChatGPT could regurgitate portions of paywalled news articles that OpenAI "possessed constructive, if not actual, knowledge of end-user infringement." Perhaps more troubling to OpenAI, the judge noted that "The Times even informed defendants 'that their tools infringed its copyrighted works,' supporting the inference that defendants possessed actual knowledge of infringement by end users."
Is it copying their work though? (Score:2, Interesting)
It seems to me like AI is just sort of 'ingesting' content, internalizing it, and building its world view based on it... Just like a person would. No word for word copying is going on (otherwise the model would be many many terabytes)... So IMHO this should be dismissed, plain and simple.
Re: Is it copying their work though? (Score:1)
Re: (Score:2)
Re: (Score:2, Informative)
Re: (Score:3)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
That's the issue. These Large language models are useless without stealing other people's data and copyrighted work. All it is a glorified search engine profiting off others
I just stole your post by looking at it. Sue me.
Re: Is it copying their work though? (Score:4, Insightful)
That's the issue. These Large language models are useless without stealing other people's data and copyrighted work. All it is a glorified search engine profiting off others
You are either a lame-ass troll, or a software engineer who just got replaced by an LLM. I'm going with the former. If it's the latter, just grow up already, and find a new career. Calling it stealing doesn’t make it so. That may work as clickbait, but it fails as analysis. Courts have a well-defined process for determining whether use of copyrighted material qualifies as infringement—and that process includes doctrines like fair use, transformative use, and de minimis use. You can scream theft all day, but until a judge agrees with you, all you’ve got is a talking point that tattoos "Troll here" on your forehead.
And no, LLMs are not “glorified search engines.” That’s not even wrong—and I get that trolls like you aren't interested in being right-- but it misunderstands both what search engines do and how large language models work. LLMs are generative systems that create statistically probable outputs based on token prediction across massive, contextually learned patterns. They don’t fetch—they synthesize. That’s a critical difference, and if you’re going to rage-troll about the technology, you should at least try to describe it accurately.
If you want to argue that LLMs raise real ethical or legal issues, fine—we’re all ears. But if you show up with vague accusations, tech illiteracy, and zero nuance, don’t expect the rest of us to mistake your trollish drivel for a point. There are a lot of serious conversations happening in this thread. Maybe try contributing to one. *plonk*.
Re: (Score:2)
Re: (Score:3)
OpenAI's defense at this point is geared around minimizing damages. They have already lost the infringement, and they know it. The Times has a very strong case, including willful infringement, so the only question is what the penalties will be. All that's happening now is going through the motions to the guilty verdict. The Times will have to screw up royally in order to lose.
Re: (Score:2)
If they lose some lawyers are going to write to the owners of every registered work under the sun to start the largest class action in history. With statutory damages for wilful infringement times millions they'd be bankrupt if the truth comes out. How many people is Sam willing to kill to prevent the truth of the training set of getting out?
I think fair use is their only chance.
Re: (Score:2)
Re: (Score:2)
Copyright protects reproduction, like the reproduction into the training set. What the LLM contains and produces is entirely besides the point ... start with the low hanging fruit, statutory damages for copies of registered works into the training set. That alone can bankrupt OpenAI.
They pirated every text in the world, same as Meta. Even with assamsinations, I don't think they can cover that up.
Re: (Score:2)
You are treating a denied motion to dismiss like it’s the closing statement at trial. It is not. The judge didn’t rule that OpenAI infringed copyright—he ruled that the allegations, if proven, are strong enough to warrant discovery and trial. That is a big deal, yes—but it is not a verdict. Pretending otherwise oversimplifies what’s actually happening and turns complex legal proceedings into fanfiction.
OpenAI's defense at this point is geared around minimizing damages.
Not even close. OpenAI is still actively defending the core claim that train
Re: (Score:2)
Re: (Score:1)
So every news station that reports "The New York Times today said XYZ" should fall in the same bucket because they are doing the SAME THING... They are reading their content and parroting parts of it to their audience. How is this any different? If OpenAI sited the NYT would that be better?
Re:Is it copying their work though? (Score:5, Insightful)
So every news station that reports "The New York Times today said XYZ" should fall in the same bucket because they are doing the SAME THING...
That's an interesting analogy, but flawed because the news station is only referring to the New York times article, not pretending that it's their own reporting.
A better analogy might be a person (or company) that buys a copy of the New York Times every day, then rewrites all the articles by hand into their own words, and publishes the rewritten articles as a "competing newspaper" at a lower price, because they didn't have to pay for any actual reporting or information gathering, only for rewriting. Dunno where that would stand legally, but it seems ethically dubious.
Re: (Score:1)
Or, I improved my property based on an article in Home and Garden, and then subsequently rented out the property?
All the arguments seem to be that someone else is making money that isn't them. That seems more like jealously or a lack of imagination on the part of the first-party.
Re: (Score:2)
Re: (Score:2)
A better analogy might be a person (or company) that buys a copy of the New York Times every day
So far so good....
then rewrites all the articles by hand into their own words
This is getting off track..... (See also every English teacher ever demanding this to avoid plagiarism charges....)
and publishes the rewritten articles as a "competing newspaper" at a lower price
And... we've lost the plot! This is a direct means rea. You intentionally bought a newspaper with the explicit goal of copying the content and selling it.
This isn't a "Well, I read this 6 months / 20 years / etc. ago and applied that concept to make X" issue at all. Nor is it a "Person A read concept 1, and person B asked Person A how something could be
Re: (Score:2)
Re: (Score:1)
Despite what TV and Hollywood show you, courts/judges do not like when either side plays stupid word games.
The entire legal profession almost exclusively revolves around discussing the legal definitions of words and playing word games you quarterwit. Judges do this more than anyone else.
Re: (Score:2)
Except exactly like the way the OP did, and worth pointing out that definitions is precisely the basis of this case as we.
No I don't watch TV, and if I did it wouldn't be boring arse courtroom stuff. I have actually been in court, twice, and one was a contract dispute where my council spent ages describing why my precise actions fit the definition of a specific word in the contract, I have taken several legal subjects in my degree and my sister is a lawyer who does this stuff on a daily basis. The definitio
Re: (Score:3)
Set booby traps as a form of watermarking.
Companies will start deliberately seeding articles with fake news, grammatical oddities, made up words and other forms of digital subterfuge. Much like dictionaries and mapmakers used to insert phantom content to detect copying.
Then when bots scrape your content, you can show the judge the fingerprinting you inserted.
You may inadvertently invent a whole new vocabulary but once you've draffered the April sneggleklergen, you're past the point of no return.
That doesn't really work (Score:2)
That's a problem that was already seen in advance and a lot of work and effort has already gone into solving it and a lot of work and effort will continue to go into solving it.
So you don't need the poison your content. If they can't find a workable solution to The problem you're raising AI will collapse on its own and if they can then you're wasting your time poisoning
Re: (Score:2)
That's why services like CloudFlare go and send AI bots down a maze of twisty passages, all filled with AI generated slop. Normal users aren't likely to come across links to those traps because they're going to be things like white on white text, or links embedded in tiny fonts or on punctuation and other things other than by accident.
The goal of which is those services that blindly scrape get their AI demise by internalizing AI generate slop. The other stuff of which can be excluded by robots.txt. The AI
Re: (Score:2)
This is happening now.
You can go to recipe websites and find stupid instructions like, "Cook the chicken at 368 degrees for 20 minutes". No one would ever actually use that as real instructions for chicken, because it doesn't matter if it's 368 degrees or 350 for that amount of time. That number in inserted so the folks making the recipe websites can tell when someone has copied their recipes.
Re: (Score:2)
Re: (Score:2)
Since as far as I know, none of us are copyright lawyers, are opinions are basically of no value.
Our opinions have value because, we as a society, define the laws that we all abide under. This also means we have a stake in this decision because we are directly impacted by it and it's potential to limit our ability to compete globally / domestically.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: Is it copying their work though? (Score:2)
For thousands of years, man took a small boat and went to fish in the ocean to feed his family. Now mega trawlers rake the ocean floor with nets that catch everything swimming for miles around the ship.
For thousands years, fish populations have existed and been caught by humans. Now, fish populations are going extinct because the trawlers are fishing faster than humans did.
Re:Is it copying their work though? (Score:4, Insightful)
It's copying because it cannot be transformative because for it to have a new purpose and meaning, would imply that so-called AI systems have the ability to express purpose or meaning.
Data go into computer, data come out of computer. Copyright still holds.
Laypeople are far too quick to anthropomorphize an algorithm. It's not like you or I reading a Harry Potter, then deciding that it would be fun to write about a wizard's school for cats. Sure maybe derivative, but very rarely is anything in art is cut from whole cloth.
Re: (Score:2)
You’ve taken a philosophical objection—“AI can’t express meaning”—and tried to turn it into a legal slam dunk. But copyright law is not that simple, and courts do not require sentience to evaluate whether a use is transformative or infringing. You’re confusing how you feel about AI with how the law actually works. More to the point—you claim an AI cannot express purpose or meaning, and therefore cannot be transformative. That may sound deep, but it collapses u
Re: (Score:2)
The LLM might use data from Harry Potter under that prompt, but it might not. Removing the Harry Potter bit only makes it more uncertain. Just because an LLM could use data from Harry Potter doesn't mean that the LLM's output is Harry Potter. Nor does it guarantee that Harry Potter's copyright would legally apply.
People may be quick to anthropomorphize things, but just as many others are quick to declare something as cut and dry
Re: (Score:2)
I'd recommend suing the AI service's owner if you find a few matching words in the output with your copyrighted work. Then have them prove that the data they ingested didn't get accidentally used in the LLM's output. Since it's a civil case, you pretty much just have to show that a business was materially harmed by what is probably a copyright violation. As it did ingest copyrighted material without permission, and the defendant has no way of knowing if the copyrighted content was use as a basis for the gen
Re: (Score:3)
more than 100 pages of examples of ChatGPT outputs and media reports showing that ChatGPT could regurgitate portions of paywalled news articles
You should at least TRY to get to the end of the summary before you make a fool of yourself in public.
So the problem is the ingestation (Score:3)
I don't think it matters. The courts tend to side with whoever has the most money and whoever can make the most money and in this case the AI companies have an unlimited capacity to make money here and unlimited amounts of ve
Re: (Score:2)
Let me get this straight. Your argument boils down to: “Ingesting is copying, copying is illegal, therefore case closed,” followed by a shrug and a rant about how money always wins. That is not a legal position. That is a trollish tantrum trying to look like cynicism. The court walked in with a ruling that will echo through every AI model, license agreement, and copyright claim for years to come. If you want to troll at the bumper-sticker level, that is definitely your lane—but do not mis
Re: (Score:2)
Re: Is it copying their work though? (Score:2)
Re: Is it copying their work though? (Score:2)
Re: (Score:2)
It seems to me like AI is just sort of 'ingesting' content, internalizing it, and building its world view based on it... Just like a person would. No word for word copying is going on (otherwise the model would be many many terabytes)... So IMHO this should be dismissed, plain and simple.
You’re absolutely right in spirit—these models do internalize data and abstract patterns from it in a way that feels eerily human. That’s the fascinating part.
But the legal system isn’t just asking how it learns, it’s asking what it can regurgitate, and under what circumstances. The NYT's case is not claiming that the entire model is a giant database of news articles—it’s alleging that, under the right conditions, ChatGPT can reproduce near-verbatim excerpts from th
Re: Is it copying their work though? (Score:2)
"No word for word copying is going on"
You don't know that. It can't keep a record of everything it's read, but the models are largely black boxes. There's really no way to know if it's memorizing long passages
The only way to test this would be to see if it generated identical passages. And these tools have indeed done so. It actually doesn't even matter if it's actually copying word for word or whether it's parallel construction. Its still infringement if it's the same words.
Re: (Score:2)
Re: (Score:2)
If material could never be reproduced (reading and remembering) then the material would be worthless to everyone. But if it could always be reproduced with no benefit to its creators, then they could not feed themselves and survive. Where to set the balance is full of detail and difficulty.
LLMs may well need their own special rules. For example, I for one gave up my O'Reilly subscription because now I can get most of that quickly looking up the basics of some tech thing quite quickly from an LLM -- so someh
Re: Is it copying their work though? (Score:2)
Lawyers do not understand LLMs (Score:2)
OpenAI should be liable for users who manipulate ChatGPT to regurgitate content in order to skirt the NYT's paywalls ...
LLMs are notoriously bad at verbatim retrieval, and the notion that someone would use ChatGPT or whatever to read the NYT is the stupidest thing I'm likely to read on an unusually stupid news day.
.zip file of NYT articles being published, it's the combined influence of a myriad of sources--including the New York Times. If this isn't fa
The New York Times is a drop, or at most a bucket, in the ocean of training material used to generate a vast soup of vectors. This is profoundly transformative: it's not a
Re:Lawyers do not understand LLMs (Score:5, Insightful)
Google won, and if there's any consistency at all then the LLM trainers will win too.
Google Library Project and LLM trainers aren't even remotely similar, so it would be incredibly inconsistent for OpenAI to prevail. At the very least, Library Project shows only a snippet of a book. It then points users to legitimate purchasing options rather than charging users for access to material for which Google has no legal rights. This does not infringe on the rights-holders ability to monetize their rights. OpenAI, on the other hand, copies the entirety of such material, then directly charges for access to it. This deprives the rights-holders of their ability to control/monetize their creations.
OpenAI is the largest, most blatant copyright infringer ever created. If OpenAI were to prevail, it would destroy anyone's ability to make a living from any creative endeavor that can exist digitally.
Re: (Score:2, Troll)
OpenAI, on the other hand, copies the entirety of such material, then directly charges for access to it.
What part of "LLMs are bad at verbatim regurgitation" do you not understand? Do you think OpenAI or Alibaba or whoever discovered a 1000:1 lossless compression scheme?
You sound like a lot of people I know who have strong opinions on the subject but little to no experience actually using LLMs.
It has nothing to do with the retrieval (Score:2)
Copyright doesn't cover what goes into a human brain. But as soon as you start reading bits and bytes and then copying those bits and bites you've triggered copyright. If the law is applied as written then AI isn't legal.
I do not expect the law to be applied as written though. There's so much money to be made in AI and judges tend to side with whoever's got the most money. Like th
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
The issue here is that the original content still gets stored in some form in order to be used to create the new content.
Nope, that's not alleged in the NYT lawsuit, or I can't find it in their filing despite a thorough search just now. Their beef is with the training, alleged verbatim retrieval, associated search stuff etc.
Look up "transformative" fair use. Your post reveals a fundamental misunderstanding of how copyright is applied and of the specific issues at hand in this suit.
Re: It has nothing to do with the retrieval (Score:3)
Transformative fair use might apply to the finished product. But it doesn't apply to training. If I am a corporation selling my employees' services to others and I copy my competitors' training material to train my workforce, I have infringed.
Individuals are covered by fair use for copying things for learning, commercial entities are not.
Re: (Score:2)
Transformative fair use might apply to the finished product. But it doesn't apply to training.
[citation missing]
You won't find that citation, either, because neither statute nor case law has caught up with the technology. Also, your analogy is misapplied because "training" isn't really the same thing in those two cases.
Re: It has nothing to do with the retrieval (Score:2)
Training is not transformative. It's simply copying the data from a website to a database for future processing.
Re: (Score:2)
1: Training is not transformative, which is patently untrue.
2: Copying the data is infringing, which is complete nonsense because neither the web nor search engines as we know them would be possible.
You really need to read up on fair use and modern copyright case law.
Public is public (Score:2)
This is no different from a student educating themself by reading publicly available stuff.
There is a bigger issue here, the future of copyright and the value of content.
As content, whether text, images or video becomes effortless to create, its value will drop to zero.
Some things, like professional journalism, still require great effort to create. This is expensive.
In the past, the costs were paid by advertising, but bots don't read ads.
Expect the end of publicly available quality journalism. Expect subscr
Re: (Score:3)
more than 100 pages of examples of ChatGPT outputs and media reports showing that ChatGPT could regurgitate portions of paywalled news articles
Paywalled isn't public.
Re: Public is public (Score:2)
Correction: this is no different than a teacher photocopying textbooks to give to his students.
There is no agree to disagree (Score:1)
Online? Readable. (Score:2)
Yes, I know there are s lot of amateurish, ill-behaved bots out there. That's a different problem altogether. The point is: Material made freely available on the internet is free to read: for humans, aliens, or AIs.
Share the money (Score:2)
It's really simple. We have the technology.
Let AI read everything.
If you charge for AI, then you share the revenue you make off the knowledge you learned.
If you charge $10 for a response that includes information learned from 3 books written by John Doe, John Doe gets a % of what you charge.
Maybe your CEO doesn't get paid $100M a month as a result, I'm ok with that.
Fair Use Takes a Hit in NYT v. OpenAI (Score:5, Interesting)
I just plowed through the full 47-page ruling in New York Times v. OpenAI, and the Ars Technica summary leaves out some of the most important bits.
Yes, the judge let key claims move forward—including contributory infringement—but Ars barely mentions the most striking part: the court rejected OpenAI’s “substantial noninfringing use” defense, calling it a “straw man.” The judge made clear that ChatGPT’s ongoing relationship with users means OpenAI could still be liable if its models regurgitate copyrighted material. This is a big deal. It shows courts may not treat LLMs like neutral tools—and that alone could reshape how AI output liability works.
Also missing: while the NYT’s “hot news” and DMCA claims were tossed, similar DMCA claims from other plaintiffs in the consolidated cases survived. That nuance matters. As a co-defendant in this case, Microsoft got out clean this time, but the ruling invites a deeper discussion on whether Big Tech partners are merely embedding AI—or helping build bullet-proof copyright infringement engines.
This ruling is a canary in the coal mine for Meta and others. If courts follow this logic, arguments about “fair use” and “general-purpose tools” may not be enough to avoid discovery—or liability. The AI legal landscape just got a lot more real.
Re: (Score:1)
Re: (Score:2)
There are two sides to generative AI:
1) Training. Is someone allowed to train on the NYT data?
2) Output. Are they allowed to produce output too similar to the original data?
1) will still take a few court cases and it is dangerous to dismiss fair use as it would affect a lot of other uses than AI training.
2) is not that complicated, I would think. Let's take ChatGPT as blackbox and not care if AI is in there or not. If it now produces content that infringes copyright, it should not matter what's inside the b
AI has no taste (Score:1)
AI News PiHole (Score:1)
Why only this one suit on LLM outputs? (Score:2)
TLDR: Bullshit defense rejected (Score:2)
This does not affect the merits of the case.
OpenAI tried to weasel out of it with technicalities (NYT could have known we were crawling because we've talked about such things in the past) and the judge told them they won't get out that easily.