OpenAI's Motion to Dismiss Copyright Claims Rejected by Judge (arstechnica.com) 102

Posted by EditorDavid on Saturday April 05, 2025 @10:34AM from the prompt-ruling dept.

Is OpenAI's ChatGPT violating copyrights? The New York Times sued OpenAI in December 2023. But Ars Technica summarizes OpenAI's response. The New York Times (or NYT) "should have known that ChatGPT was being trained on its articles... partly because of the newspaper's own reporting..."

OpenAI pointed to a single November 2020 article, where the NYT reported that OpenAI was analyzing a trillion words on the Internet.

But on Friday, U.S. district judge Sidney Stein disagreed, denying OpenAI's motion to dismiss the NYT's copyright claims partly based on one NYT journalist's reporting. In his opinion, Stein confirmed that it's OpenAI's burden to prove that the NYT knew that ChatGPT would potentially violate its copyrights two years prior to its release in November 2022... And OpenAI's other argument — that it was "common knowledge" that ChatGPT was trained on NYT articles in 2020 based on other reporting — also failed for similar reasons...

OpenAI may still be able to prove through discovery that the NYT knew that ChatGPT would have infringing outputs in 2020, Stein said. But at this early stage, dismissal is not appropriate, the judge concluded. The same logic follows in a related case from The Daily News, Stein ruled. Davida Brook, co-lead counsel for the NYT, suggested in a statement to Ars that the NYT counts Friday's ruling as a win. "We appreciate Judge Stein's careful consideration of these issues," Brook said. "As the opinion indicates, all of our copyright claims will continue against Microsoft and OpenAI for their widespread theft of millions of The Times's works, and we look forward to continuing to pursue them."

The New York Times is also arguing that OpenAI contributes to ChatGPT users' infringement of its articles, and OpenAI lost its bid to dismiss that claim, too. The NYT argued that by training AI models on NYT works and training ChatGPT to deliver certain outputs, without the NYT's consent, OpenAI should be liable for users who manipulate ChatGPT to regurgitate content in order to skirt the NYT's paywalls... At this stage, Stein said that the NYT has "plausibly" alleged contributory infringement, showing through more than 100 pages of examples of ChatGPT outputs and media reports showing that ChatGPT could regurgitate portions of paywalled news articles that OpenAI "possessed constructive, if not actual, knowledge of end-user infringement." Perhaps more troubling to OpenAI, the judge noted that "The Times even informed defendants 'that their tools infringed its copyrighted works,' supporting the inference that defendants possessed actual knowledge of infringement by end users."

OpenAI's Motion to Dismiss Copyright Claims Rejected by Judge

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 102 Comments Log In/Create an Account

Comments Filter:

Is it copying their work though? (Score:2, Interesting)

by Talon0ne ( 10115958 ) writes:

It seems to me like AI is just sort of 'ingesting' content, internalizing it, and building its world view based on it... Just like a person would. No word for word copying is going on (otherwise the model would be many many terabytes)... So IMHO this should be dismissed, plain and simple.
- Re: Is it copying their work though? (Score:1)
  
  by hunter44102 ( 890157 ) writes:
  
  That's the issue. These Large language models are useless without stealing other people's data and copyrighted work. All it is a glorified search engine profiting off others
  - Re: (Score:2)
    
    by Valgrus Thunderaxe ( 8769977 ) writes:
    
    If I read the *same* material, freely available on the web, and use it to form some type of world-view or intellectually enrich myself, and then use that information to start a business, for example, is this somehow different? I'm not convinced this is the case.
    - Re: (Score:2, Informative)
      
      by Anonymous Coward writes:
      
      Exactly. They're essentially summarizing information that these companies have made available to the public. Search engines have done this almost since the beginnings of search engines, and everyone here defended this practice, vigorously (see the stuff from Australia where they didn't like Google summarizing their articles in search results).
    - Re: (Score:3)
      
      by Retired Chemist ( 5039029 ) writes:
      
      You are not making commercial use of their content. If you start a competing business using their content, you might well be sued.
      - Re: (Score:2)
        
        by Visarga ( 1071662 ) writes:
        
        NYT is off the mark here. Who needs their 10 year old news today? Nobody. It's useless content, with only historical or reference value. Why are they attacking OpenAI for useless content the model only partially regurgitates? Isn't it easier to infringe directly? I mean why would I generate Harry Potter from ChatGPT when it will be slow, expensive and imprecise, while copying is fast, free and exact? It just shows LLMs are the *worst* copyright infringement method ever invented.
    - - Re: (Score:1)
        
        by Anonymous Coward writes:
        
        We're allowed to, for example, take what we read and turn it into a profitable podcast episode or blog post, even including direct quotes of the material.
  - Re: (Score:2)
    
    by thegarbz ( 1787294 ) writes:
    
    That's the issue. These Large language models are useless without stealing other people's data and copyrighted work. All it is a glorified search engine profiting off others
    I just stole your post by looking at it. Sue me.
  - Re: Is it copying their work though? (Score:4, Insightful)
    
    by rocket rancher ( 447670 ) writes: <themovingfinger@gmail.com> on Saturday April 05, 2025 @07:04PM (#65284037)
    
    That's the issue. These Large language models are useless without stealing other people's data and copyrighted work. All it is a glorified search engine profiting off others
    You are either a lame-ass troll, or a software engineer who just got replaced by an LLM. I'm going with the former. If it's the latter, just grow up already, and find a new career. Calling it stealing doesn’t make it so. That may work as clickbait, but it fails as analysis. Courts have a well-defined process for determining whether use of copyrighted material qualifies as infringement—and that process includes doctrines like fair use, transformative use, and de minimis use. You can scream theft all day, but until a judge agrees with you, all you’ve got is a talking point that tattoos "Troll here" on your forehead.
    And no, LLMs are not “glorified search engines.” That’s not even wrong—and I get that trolls like you aren't interested in being right-- but it misunderstands both what search engines do and how large language models work. LLMs are generative systems that create statistically probable outputs based on token prediction across massive, contextually learned patterns. They don’t fetch—they synthesize. That’s a critical difference, and if you’re going to rage-troll about the technology, you should at least try to describe it accurately.
    If you want to argue that LLMs raise real ethical or legal issues, fine—we’re all ears. But if you show up with vague accusations, tech illiteracy, and zero nuance, don’t expect the rest of us to mistake your trollish drivel for a point. There are a lot of serious conversations happening in this thread. Maybe try contributing to one. *plonk*.
    
    - Re: (Score:2)
      
      by Visarga ( 1071662 ) writes:
      
      I'm sorry I don't have points to upvote you.
    - Re: (Score:1)
      
      by WidjettyOne ( 10203247 ) writes:
      
      Not the OP, but LLMs actually are a little bit like search engines. Both of them scrape tremendous amounts of data, tokenise it, shove that data into a big table/matrix with frequency/importance/reliability measures, and then try to guess which bits of that table is most appropriate given the user's input.
      It's a pretty broad generalisation but on balance I'd say an LLM is much more like a modern Search Engine than it is like Eliza or any of the old rules-based AIs.
      I think the main difference is that LLMs do
- Re: (Score:3)
  
  by StormReaver ( 59959 ) writes:
  
  OpenAI's defense at this point is geared around minimizing damages. They have already lost the infringement, and they know it. The Times has a very strong case, including willful infringement, so the only question is what the penalties will be. All that's happening now is going through the motions to the guilty verdict. The Times will have to screw up royally in order to lose.
  - Re: (Score:2)
    
    by Pinky's Brain ( 1158667 ) writes:
    
    If they lose some lawyers are going to write to the owners of every registered work under the sun to start the largest class action in history. With statutory damages for wilful infringement times millions they'd be bankrupt if the truth comes out. How many people is Sam willing to kill to prevent the truth of the training set of getting out?
    I think fair use is their only chance.
    - Re: (Score:3)
      
      by Visarga ( 1071662 ) writes:
      
      I don't think they need to defend much on this front, copyright is dead, just doesn't realize it yet. There are two choices here: 1. either protect expression while LLMs can generate different enough expression unimpeded, in which case copyright can't be protected anymore, or 2. protect abstractions, styles and facts so LLMs can't use them in any form, but in this case human creators are also going to be barred from the same, which will tank creativty. No way around it, the problem is that now LLMs can quic
      - Re: (Score:2)
        
        by Pinky's Brain ( 1158667 ) writes:
        
        Copyright protects reproduction, like the reproduction into the training set. What the LLM contains and produces is entirely besides the point ... start with the low hanging fruit, statutory damages for copies of registered works into the training set. That alone can bankrupt OpenAI.
        They pirated every text in the world, same as Meta. Even with assamsinations, I don't think they can cover that up.
  - Re: (Score:3)
    
    by rocket rancher ( 447670 ) writes:
    
    You are treating a denied motion to dismiss like it’s the closing statement at trial. It is not. The judge didn’t rule that OpenAI infringed copyright—he ruled that the allegations, if proven, are strong enough to warrant discovery and trial. That is a big deal, yes—but it is not a verdict. Pretending otherwise oversimplifies what’s actually happening and turns complex legal proceedings into fanfiction.
    OpenAI's defense at this point is geared around minimizing damages.
    Not even close. OpenAI is still actively defending the core claim that train
  - Re: (Score:2)
    
    by Visarga ( 1071662 ) writes:
    
    This is a case of regurgitation that used to happen in the era where LLM developers didn't deduplicate their training sets well enough. Have you seen any other regurgitation suits more recently? No? Because it doesn't happen. In fact it only happens if you are entrapping the model with an exact paragraph from the target material as seed. So you already need to have access to the material to be able to make a LLM regugitate it, and it only happens like 1 in 100 times, and it's usually imprecise.
- - Re: (Score:1)
    
    by Talon0ne ( 10115958 ) writes:
    
    So every news station that reports "The New York Times today said XYZ" should fall in the same bucket because they are doing the SAME THING... They are reading their content and parroting parts of it to their audience. How is this any different? If OpenAI sited the NYT would that be better?
    - Re:Is it copying their work though? (Score:5, Insightful)
      
      by Jeremi ( 14640 ) writes: on Saturday April 05, 2025 @11:28AM (#65283417) Homepage
      
      So every news station that reports "The New York Times today said XYZ" should fall in the same bucket because they are doing the SAME THING...
      
      That's an interesting analogy, but flawed because the news station is only referring to the New York times article, not pretending that it's their own reporting.
      A better analogy might be a person (or company) that buys a copy of the New York Times every day, then rewrites all the articles by hand into their own words, and publishes the rewritten articles as a "competing newspaper" at a lower price, because they didn't have to pay for any actual reporting or information gathering, only for rewriting. Dunno where that would stand legally, but it seems ethically dubious.
      
      - Re: (Score:1)
        
        by Valgrus Thunderaxe ( 8769977 ) writes:
        
        What If I look a recipe published in the NYT and made a dish based on that at my hypothetical restaurant?
        
        Or, I improved my property based on an article in Home and Garden, and then subsequently rented out the property?
        
        All the arguments seem to be that someone else is making money that isn't them. That seems more like jealously or a lack of imagination on the part of the first-party.
        
        Re: (Score:1)
        
        by martin-boundary ( 547041 ) writes:
        
        They are if they are collected and organized in a recipe book
        
        Re: (Score:1)
        
        by Anonymous Coward writes:
        
        No; the actual words and photos in the book are copyrighted, but the procedure/process/methods of operation aren't. So you can't xerox (nor write out word-for-word) the recipe out of the book and sell that. But you can re-write the same recipe in your own words and make your own recipe book, and you can make the recipe and sell the result.
      - Re: (Score:2)
        
        by codebase7 ( 9682010 ) writes:
        
        Disclaimer: IANAL.
        A better analogy might be a person (or company) that buys a copy of the New York Times every day
        So far so good....
        then rewrites all the articles by hand into their own words
        This is getting off track..... (See also every English teacher ever demanding this to avoid plagiarism charges....)
        and publishes the rewritten articles as a "competing newspaper" at a lower price
        And... we've lost the plot! This is a direct means rea. You intentionally bought a newspaper with the explicit goal of copying the content and selling it.
        
        This isn't a "Well, I read this 6 months / 20 years / etc. ago and applied that concept to make X" issue at all. Nor is it a "Person A read concept 1, and person B asked Person A how something could be
      - Re: (Score:2)
        
        by Visarga ( 1071662 ) writes:
        
        That is not a better analogy because ChatGPT does not write NYT every day. It answers pointed questions from their users, so it doesn't even solve the same task as newspapers. You got to realize there are 2 ways content can enter a LLM - during training, through the training set (this case), or during inference, using a search engine (RAG). In the first case - it's just text from old news - there is no value lost by NYT. In the second case, the user adds their own transformation, and LLMs retrieve 5-10 sour
  - Re: (Score:1)
    
    by thegarbz ( 1787294 ) writes:
    
    Despite what TV and Hollywood show you, courts/judges do not like when either side plays stupid word games.
    The entire legal profession almost exclusively revolves around discussing the legal definitions of words and playing word games you quarterwit. Judges do this more than anyone else.
    - - Re: (Score:2)
        
        by thegarbz ( 1787294 ) writes:
        
        Except exactly like the way the OP did, and worth pointing out that definitions is precisely the basis of this case as we.
        No I don't watch TV, and if I did it wouldn't be boring arse courtroom stuff. I have actually been in court, twice, and one was a contract dispute where my council spent ages describing why my precise actions fit the definition of a specific word in the contract, I have taken several legal subjects in my degree and my sister is a lawyer who does this stuff on a daily basis. The definitio
- Re: (Score:3)
  
  by ChunderDownunder ( 709234 ) writes:
  
  Set booby traps as a form of watermarking.
  Companies will start deliberately seeding articles with fake news, grammatical oddities, made up words and other forms of digital subterfuge. Much like dictionaries and mapmakers used to insert phantom content to detect copying.
  Then when bots scrape your content, you can show the judge the fingerprinting you inserted.
  You may inadvertently invent a whole new vocabulary but once you've draffered the April sneggleklergen, you're past the point of no return.
  - That doesn't really work (Score:2)
    
    by rsilvergun ( 571051 ) writes:
    
    Or if it does AI is just going to collapse on its own. That's because as AI takes over the internet AIs will begin to train on AI.
    
    That's a problem that was already seen in advance and a lot of work and effort has already gone into solving it and a lot of work and effort will continue to go into solving it.
    
    So you don't need the poison your content. If they can't find a workable solution to The problem you're raising AI will collapse on its own and if they can then you're wasting your time poisoning
    - Re: (Score:3)
      
      by tlhIngan ( 30335 ) writes:
      
      That's why services like CloudFlare go and send AI bots down a maze of twisty passages, all filled with AI generated slop. Normal users aren't likely to come across links to those traps because they're going to be things like white on white text, or links embedded in tiny fonts or on punctuation and other things other than by accident.
      The goal of which is those services that blindly scrape get their AI demise by internalizing AI generate slop. The other stuff of which can be excluded by robots.txt. The AI
  - Re: (Score:2)
    
    by xevioso ( 598654 ) writes:
    
    This is happening now.
    You can go to recipe websites and find stupid instructions like, "Cook the chicken at 368 degrees for 20 minutes". No one would ever actually use that as real instructions for chicken, because it doesn't matter if it's 368 degrees or 350 for that amount of time. That number in inserted so the folks making the recipe websites can tell when someone has copied their recipes.
- Re: (Score:2)
  
  by Retired Chemist ( 5039029 ) writes:
  
  That is what the lawsuit is about. What uses are permitted under the copyright law and what are not. Since no one had heard of LLMs when the law was written, this is open to debate. Since as far as I know, none of us are copyright lawyers, are opinions are basically of no value.
  - Re: (Score:2)
    
    by codebase7 ( 9682010 ) writes:
    
    Since as far as I know, none of us are copyright lawyers, are opinions are basically of no value.
    Our opinions have value because, we as a society, define the laws that we all abide under. This also means we have a stake in this decision because we are directly impacted by it and it's potential to limit our ability to compete globally / domestically.
    - Re: (Score:2)
      
      by Retired Chemist ( 5039029 ) writes:
      
      We are entitled to opinions about what the law ought to be. What the law is, is another matter. Whether LLMs should be allowed to use other people's data is a matter of opinion, whether it is legal is a matter for lawyers, judges, and juries.
      - Re: (Score:2)
        
        by codebase7 ( 9682010 ) writes:
        
        Laws that fail to account for the needs of all, are not laws, but tools wielded against the public for little benefit. Such tools have no place in a democracy. If you want to justify living under the reign of tools, there are plenty of countries around the world that would be a better fit for you.
        
        Re: (Score:2)
        
        by Retired Chemist ( 5039029 ) writes:
        
        It is called the rule of law, and it is all that protects us from the arbitrary acts of the powerful. The laws may be wrong or out of date, but anarchy or arbitrary totalitarianism are the alternatives. If you do not like the laws elect someone who will change them. That is what the MAGAs have done. Now they get to live with the consequences, that they did not understand. Ignorance is the great enemy of democracy.
- Re: Is it copying their work though? (Score:2)
  
  by Big Hairy Gorilla ( 9839972 ) writes:
  
  This is why, as posted by martin-boundary 8 hours ago, on the thread about how Wikipedia is serving 80% of the hits on the site to bots:
  
  For thousands of years, man took a small boat and went to fish in the ocean to feed his family. Now mega trawlers rake the ocean floor with nets that catch everything swimming for miles around the ship.
  
  For thousands years, fish populations have existed and been caught by humans. Now, fish populations are going extinct because the trawlers are fishing faster than humans did.
- Re:Is it copying their work though? (Score:4, Insightful)
  
  by OrangeTide ( 124937 ) writes: on Saturday April 05, 2025 @12:11PM (#65283457) Homepage Journal
  
  It's copying because it cannot be transformative because for it to have a new purpose and meaning, would imply that so-called AI systems have the ability to express purpose or meaning.
  Data go into computer, data come out of computer. Copyright still holds.
  Laypeople are far too quick to anthropomorphize an algorithm. It's not like you or I reading a Harry Potter, then deciding that it would be fun to write about a wizard's school for cats. Sure maybe derivative, but very rarely is anything in art is cut from whole cloth.
  
  - Re: (Score:2)
    
    by rocket rancher ( 447670 ) writes:
    
    You’ve taken a philosophical objection—“AI can’t express meaning”—and tried to turn it into a legal slam dunk. But copyright law is not that simple, and courts do not require sentience to evaluate whether a use is transformative or infringing. You’re confusing how you feel about AI with how the law actually works. More to the point—you claim an AI cannot express purpose or meaning, and therefore cannot be transformative. That may sound deep, but it collapses u
  - Re: (Score:2)
    
    by codebase7 ( 9682010 ) writes:
    
    Prompt: "Read a Harry Potter, then write about a wizard's school for cats."
    
    The LLM might use data from Harry Potter under that prompt, but it might not. Removing the Harry Potter bit only makes it more uncertain. Just because an LLM could use data from Harry Potter doesn't mean that the LLM's output is Harry Potter. Nor does it guarantee that Harry Potter's copyright would legally apply.
    
    People may be quick to anthropomorphize things, but just as many others are quick to declare something as cut and dry
    - Re: (Score:2)
      
      by OrangeTide ( 124937 ) writes:
      
      I'd recommend suing the AI service's owner if you find a few matching words in the output with your copyrighted work. Then have them prove that the data they ingested didn't get accidentally used in the LLM's output. Since it's a civil case, you pretty much just have to show that a business was materially harmed by what is probably a copyright violation. As it did ingest copyrighted material without permission, and the defendant has no way of knowing if the copyrighted content was use as a basis for the gen
- Re: (Score:3)
  
  by GrumpySteen ( 1250194 ) writes:
  
  more than 100 pages of examples of ChatGPT outputs and media reports showing that ChatGPT could regurgitate portions of paywalled news articles
  You should at least TRY to get to the end of the summary before you make a fool of yourself in public.
- So the problem is the ingestation (Score:3)
  
  by rsilvergun ( 571051 ) writes:
  
  The content still exists on their servers in order to be transformed into whatever the AI creates and they're not allowed to do that under the law without licensing it. Copyright is just that. Your right to make a copy. And they are absolutely making a copy when they ingest the data.
  
  I don't think it matters. The courts tend to side with whoever has the most money and whoever can make the most money and in this case the AI companies have an unlimited capacity to make money here and unlimited amounts of ve
  - Re: (Score:2)
    
    by rocket rancher ( 447670 ) writes:
    
    Let me get this straight. Your argument boils down to: “Ingesting is copying, copying is illegal, therefore case closed,” followed by a shrug and a rant about how money always wins. That is not a legal position. That is a trollish tantrum trying to look like cynicism. The court walked in with a ruling that will echo through every AI model, license agreement, and copyright claim for years to come. If you want to troll at the bumper-sticker level, that is definitely your lane—but do not mis
- Re: (Score:2)
  
  by fluffernutter ( 1411889 ) writes:
  
  You can't call it a 'world view' if it exists in a complete vacuum away from the world. Nor can you call a direct calculation on only the internet a 'world view'. Having a world view involves experiencing the five senses of all the things happening around you.
  - - Re: Is it copying their work though? (Score:2)
      
      by fluffernutter ( 1411889 ) writes:
      
      She had three senses.
      - Re: Is it copying their work though? (Score:2)
        
        by fluffernutter ( 1411889 ) writes:
        
        I would say her world view was definitely smaller than if she had five senses. That is not the common case for a person though, you just picked out one anomaly
- Re: (Score:2)
  
  by rocket rancher ( 447670 ) writes:
  
  It seems to me like AI is just sort of 'ingesting' content, internalizing it, and building its world view based on it... Just like a person would. No word for word copying is going on (otherwise the model would be many many terabytes)... So IMHO this should be dismissed, plain and simple.
  You’re absolutely right in spirit—these models do internalize data and abstract patterns from it in a way that feels eerily human. That’s the fascinating part.
  But the legal system isn’t just asking how it learns, it’s asking what it can regurgitate, and under what circumstances. The NYT's case is not claiming that the entire model is a giant database of news articles—it’s alleging that, under the right conditions, ChatGPT can reproduce near-verbatim excerpts from th
- Re: Is it copying their work though? (Score:2)
  
  by reanjr ( 588767 ) writes:
  
  "No word for word copying is going on"
  You don't know that. It can't keep a record of everything it's read, but the models are largely black boxes. There's really no way to know if it's memorizing long passages
  The only way to test this would be to see if it generated identical passages. And these tools have indeed done so. It actually doesn't even matter if it's actually copying word for word or whether it's parallel construction. Its still infringement if it's the same words.
- Re: (Score:2)
  
  by Visarga ( 1071662 ) writes:
  
  Yes, the training set is 100 to 1000x the size of the model. Even if they wanted, they could not encode the full training set into the model. What the model retains is just a residue of it.
- Re: (Score:2)
  
  by Bongo ( 13261 ) writes:
  
  If material could never be reproduced (reading and remembering) then the material would be worthless to everyone. But if it could always be reproduced with no benefit to its creators, then they could not feed themselves and survive. Where to set the balance is full of detail and difficulty.
  LLMs may well need their own special rules. For example, I for one gave up my O'Reilly subscription because now I can get most of that quickly looking up the basics of some tech thing quite quickly from an LLM -- so someh
- Re: Is it copying their work though? (Score:2)
  
  by zkiwi34 ( 974563 ) writes:
  
  Yet computers don't remember stuff but rather store it on disk. Keeping a copy. And that's the issue. AI cannot function well, if at all, without copying everything it can get hold of.
Lawyers do not understand LLMs (Score:2)

by ihadafivedigituid ( 8391795 ) writes:

OpenAI should be liable for users who manipulate ChatGPT to regurgitate content in order to skirt the NYT's paywalls ...
LLMs are notoriously bad at verbatim retrieval, and the notion that someone would use ChatGPT or whatever to read the NYT is the stupidest thing I'm likely to read on an unusually stupid news day.

The New York Times is a drop, or at most a bucket, in the ocean of training material used to generate a vast soup of vectors. This is profoundly transformative: it's not a .zip file of NYT articles being published, it's the combined influence of a myriad of sources--including the New York Times. If this isn't fa
- Re:Lawyers do not understand LLMs (Score:5, Insightful)
  
  by StormReaver ( 59959 ) writes: on Saturday April 05, 2025 @11:45AM (#65283433)
  
  Google won, and if there's any consistency at all then the LLM trainers will win too.
  Google Library Project and LLM trainers aren't even remotely similar, so it would be incredibly inconsistent for OpenAI to prevail. At the very least, Library Project shows only a snippet of a book. It then points users to legitimate purchasing options rather than charging users for access to material for which Google has no legal rights. This does not infringe on the rights-holders ability to monetize their rights. OpenAI, on the other hand, copies the entirety of such material, then directly charges for access to it. This deprives the rights-holders of their ability to control/monetize their creations.
  OpenAI is the largest, most blatant copyright infringer ever created. If OpenAI were to prevail, it would destroy anyone's ability to make a living from any creative endeavor that can exist digitally.
  
  - Re: (Score:3, Insightful)
    
    by ihadafivedigituid ( 8391795 ) writes:
    
    OpenAI, on the other hand, copies the entirety of such material, then directly charges for access to it.
    What part of "LLMs are bad at verbatim regurgitation" do you not understand? Do you think OpenAI or Alibaba or whoever discovered a 1000:1 lossless compression scheme?
    
    You sound like a lot of people I know who have strong opinions on the subject but little to no experience actually using LLMs.
- It has nothing to do with the retrieval (Score:2)
  
  by rsilvergun ( 571051 ) writes:
  
  The issue here is that the original content still gets stored in some form in order to be used to create the new content.
  
  Copyright doesn't cover what goes into a human brain. But as soon as you start reading bits and bytes and then copying those bits and bites you've triggered copyright. If the law is applied as written then AI isn't legal.
  
  I do not expect the law to be applied as written though. There's so much money to be made in AI and judges tend to side with whoever's got the most money. Like th
  - Re: (Score:2)
    
    by techno-vampire ( 666512 ) writes:
    
    If the case depended on what the law says, and what it means, you would probably be right because it's the judge's job to interpret the law. However, unless the plaintiff's attorneys are incompetent, this case is going to hing on the facts, and the jury is the trier of facts. It doesn't matter what the judge thinks, or how he (she) would rule giving the chance, the only thing that matters is what the jury decides. And, as this is a civil suit, the standard is not the famous "beyond a reasonable doubt," b
    - Re: (Score:2)
      
      by ihadafivedigituid ( 8391795 ) writes:
      
      You're sort of right, but the judge exerts considerable influence on the outcome through decisions on admissibility of evidence and a mass of other procedural rulings.
      - Re: (Score:2)
        
        by techno-vampire ( 666512 ) writes:
        
        True. However, the judge has to be careful not to be too openly biased, lest he leave himself open to an appeal on the grounds of bias.
  - Re: (Score:2)
    
    by ihadafivedigituid ( 8391795 ) writes:
    
    The issue here is that the original content still gets stored in some form in order to be used to create the new content.
    Nope, that's not alleged in the NYT lawsuit, or I can't find it in their filing despite a thorough search just now. Their beef is with the training, alleged verbatim retrieval, associated search stuff etc.
    
    Look up "transformative" fair use. Your post reveals a fundamental misunderstanding of how copyright is applied and of the specific issues at hand in this suit.
    - Re: It has nothing to do with the retrieval (Score:3)
      
      by reanjr ( 588767 ) writes:
      
      Transformative fair use might apply to the finished product. But it doesn't apply to training. If I am a corporation selling my employees' services to others and I copy my competitors' training material to train my workforce, I have infringed.
      Individuals are covered by fair use for copying things for learning, commercial entities are not.
      - Re: (Score:2)
        
        by ihadafivedigituid ( 8391795 ) writes:
        
        Transformative fair use might apply to the finished product. But it doesn't apply to training.
        [citation missing]
        
        You won't find that citation, either, because neither statute nor case law has caught up with the technology. Also, your analogy is misapplied because "training" isn't really the same thing in those two cases.
        
        Re: It has nothing to do with the retrieval (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        Training is not transformative. It's simply copying the data from a website to a database for future processing.
        
        Re: (Score:2)
        
        by ihadafivedigituid ( 8391795 ) writes:
        
        Now you're saying two different things:
        
        1: Training is not transformative, which is patently untrue.
        2: Copying the data is infringing, which is complete nonsense because neither the web nor search engines as we know them would be possible.
        
        You really need to read up on fair use and modern copyright case law.
        
        Re: It has nothing to do with the retrieval (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        Please, oh legal scholar, provide case law citation to support your comments.
        
        Re: (Score:2)
        
        by ihadafivedigituid ( 8391795 ) writes:
        
        Sure thing. Copying & indexing:
        Field v. Google, Inc., 412 F.Supp. 2d 1106 (D. Nev. 2006) [stanford.edu]. Plaintiff, an attorney, asserted your line of thought and got mauled.
        
        I've already referred to Authors Guild v. Google 804 F.3d 202 (2nd Cir. 2015) [justia.com] for an example of a successful use of the transformative fair use exemption. As I already noted, case law (and indeed statute) hasn't caught up with the technology when it comes to LLM training, but it's a judicial absurdity to treat actual copying and verbatim stora
        
        Re: It has nothing to do with the retrieval (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        Both of those cases struggle with commercial damages. The first because it is simply pointing at another's content which has already been made public, and the second as an analog to physical libraries.
        OpenAI is absolutely causing commercial damages. As I mentioned by analogy, you can't copy your competitors training materials to distribute to your own people.
        
        Re: (Score:2)
        
        by ihadafivedigituid ( 8391795 ) writes:
        
        If LLMs distributed verbatim copies, special prompting that goes against the TOS that elicits a few paragraphs notwithstanding, your analogy would be less bogus.
        
        To repeat: the New York Times isn't losing any significant business because ten year old articles were used as a minuscule influence on a vast sea of vectors. I would love to see them prove damages even if they overcame the obvious fair use issues. Anything can happen in court, but given the way the wind is blowing, I wouldn't bet anything I care
        
        Re: It has nothing to do with the retrieval (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        Just leaving this here, cause it seems relevant.
        https://techcrunch.com/2025/04... [techcrunch.com]
        Law professors say:
        "The use of copyrighted works to train generative models is not âtransformative,â(TM) because using works for that purpose is not relevantly different from using them to educate human authors, which is a principal original purpose of all of [authorsâ(TM)] works"
        
        Re: (Score:2)
        
        by ihadafivedigituid ( 8391795 ) writes:
        
        The EFF, whom I trust more than rando "law professors", is siding with Meta on this one:
        
        https://chatgptiseatingtheworld.com/wp-content/uploads/2025/04/EFF-Amicus-Brief-in-Opp-to-Kadrey-Motion-for-SJ-Apr-3-2025.pdf [chatgptise...eworld.com]
        
        Re: It has nothing to do with the retrieval (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        That's fine. But can you admit that you are no more an authority on this than I? That we are both relying on plain readings of actual professionals who are way more qualified to answer this question?
        
        Re: (Score:2)
        
        by ihadafivedigituid ( 8391795 ) writes:
        
        No, I clearly understand the subject better as it stands. Law, especially IP and constitutional law, has been a hobby of mine since the 90s--and I have been beating up on family law attorneys for the past 15 years with great success.
        
        What neither of us know is how this will play out. I will repeat myself repeating myself:
        As I already noted, case law (and indeed statute) hasn't caught up with the technology when it comes to LLM training ...
        While there is uncertainty, it seems pretty obvious in the current environment that if this goes all the way (which is probable given the stakes on all sides) then SCOTUS is not likely to
Public is public (Score:2)

by MpVpRb ( 1423381 ) writes:

This is no different from a student educating themself by reading publicly available stuff.
There is a bigger issue here, the future of copyright and the value of content.
As content, whether text, images or video becomes effortless to create, its value will drop to zero.
Some things, like professional journalism, still require great effort to create. This is expensive.
In the past, the costs were paid by advertising, but bots don't read ads.
Expect the end of publicly available quality journalism. Expect subscr
- Re: (Score:3)
  
  by GrumpySteen ( 1250194 ) writes:
  
  more than 100 pages of examples of ChatGPT outputs and media reports showing that ChatGPT could regurgitate portions of paywalled news articles
  Paywalled isn't public.
- Re: Public is public (Score:2)
  
  by reanjr ( 588767 ) writes:
  
  Correction: this is no different than a teacher photocopying textbooks to give to his students.
There is no agree to disagree (Score:1)

by Iamthecheese ( 1264298 ) writes:

Some people think AIs work is transformative. Some think it's just mindless copying and passing on no matter how much mixing and creativity there is. Never the twain shall meet because of emotional investments on both sides on topics as esoteric as whether machines, however intelligent, can be said to think. There is no common ground on that topic so let's bypass it. If you think AIs violate copyright every time they read an article what remedy do you want? LLMs cannot exist without doing that and cannot p
Online? Readable. (Score:2)

by bradley13 ( 1118935 ) writes:

If OpenAI hacked their way into the NYT, then they are liable. If they just trained on publicly available content? That needs to be declared legal.
Yes, I know there are s lot of amateurish, ill-behaved bots out there. That's a different problem altogether. The point is: Material made freely available on the internet is free to read: for humans, aliens, or AIs.
Share the money (Score:2)

by n0w0rries ( 832057 ) writes:

It's really simple. We have the technology.
Let AI read everything.
If you charge for AI, then you share the revenue you make off the knowledge you learned.
If you charge $10 for a response that includes information learned from 3 books written by John Doe, John Doe gets a % of what you charge.
Maybe your CEO doesn't get paid $100M a month as a result, I'm ok with that.
Fair Use Takes a Hit in NYT v. OpenAI (Score:5, Interesting)

by rocket rancher ( 447670 ) writes: <themovingfinger@gmail.com> on Saturday April 05, 2025 @03:37PM (#65283747)

I just plowed through the full 47-page ruling in New York Times v. OpenAI, and the Ars Technica summary leaves out some of the most important bits.
Yes, the judge let key claims move forward—including contributory infringement—but Ars barely mentions the most striking part: the court rejected OpenAI’s “substantial noninfringing use” defense, calling it a “straw man.” The judge made clear that ChatGPT’s ongoing relationship with users means OpenAI could still be liable if its models regurgitate copyrighted material. This is a big deal. It shows courts may not treat LLMs like neutral tools—and that alone could reshape how AI output liability works.
Also missing: while the NYT’s “hot news” and DMCA claims were tossed, similar DMCA claims from other plaintiffs in the consolidated cases survived. That nuance matters. As a co-defendant in this case, Microsoft got out clean this time, but the ruling invites a deeper discussion on whether Big Tech partners are merely embedding AI—or helping build bullet-proof copyright infringement engines.
This ruling is a canary in the coal mine for Meta and others. If courts follow this logic, arguments about “fair use” and “general-purpose tools” may not be enough to avoid discovery—or liability. The AI legal landscape just got a lot more real.

- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  There are two sides to generative AI:
  1) Training. Is someone allowed to train on the NYT data?
  2) Output. Are they allowed to produce output too similar to the original data?
  1) will still take a few court cases and it is dangerous to dismiss fair use as it would affect a lot of other uses than AI training.
  2) is not that complicated, I would think. Let's take ChatGPT as blackbox and not care if AI is in there or not. If it now produces content that infringes copyright, it should not matter what's inside the b
AI has no taste (Score:1)

by vladoshi ( 9025601 ) writes:

The "art" is terrible. If someone gave me a birthday card drawn by AI I would punch them in the face because the art is so awful. It reproduces 3 art styles in 1 picture resulting in garish colours, nonsense images and out of context objects. Because there is not underlying artistic reason for the objects we just end up the uncanny valley feeling of undigestible food like substance that reminds you of food .
AI News PiHole (Score:1)

by ShadowDragen ( 805730 ) writes:

Would someone please create a scraper for the NYT, the WSJ, the economist, and the rest of the paywalled stuff, compare it to the AP and Reuters, and just publish it for free? I'm too lazy to get the script working yet, but the copyright nonsense is about to be bullshit.
Why only this one suit on LLM outputs? (Score:2)

by Visarga ( 1071662 ) writes:

This is the only suit that focuses on LLM outputs as being infringing. Most other suits focus on inputs, using data in training models. I think this demonstrates that LLMs normally do not infringe copyrights in their outputs, which is a big blow to copyright defenders. If regurgitation was more common, we would see plenty of suits.
TLDR: Bullshit defense rejected (Score:2)

by allo ( 1728082 ) writes:

This does not affect the merits of the case.
OpenAI tried to weasel out of it with technicalities (NYT could have known we were crawling because we've talked about such things in the past) and the judge told them they won't get out that easily.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Is it copying their work though? (Score:2, Interesting)

Re: Is it copying their work though? (Score:1)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: Is it copying their work though? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re:Is it copying their work though? (Score:5, Insightful)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

That doesn't really work (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Is it copying their work though? (Score:2)

Re:Is it copying their work though? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

So the problem is the ingestation (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: Is it copying their work though? (Score:2)

Re: Is it copying their work though? (Score:2)

Re: (Score:2)

Re: Is it copying their work though? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Is it copying their work though? (Score:2)

Lawyers do not understand LLMs (Score:2)

Re:Lawyers do not understand LLMs (Score:5, Insightful)

Re: (Score:3, Insightful)

It has nothing to do with the retrieval (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: It has nothing to do with the retrieval (Score:3)

Re: (Score:2)

Re: It has nothing to do with the retrieval (Score:2)

Re: (Score:2)

Re: It has nothing to do with the retrieval (Score:2)

Re: (Score:2)

Re: It has nothing to do with the retrieval (Score:2)

Re: (Score:2)

Re: It has nothing to do with the retrieval (Score:2)

Re: (Score:2)

Re: It has nothing to do with the retrieval (Score:2)

Re: (Score:2)

Public is public (Score:2)

Re: (Score:3)

Re: Public is public (Score:2)

There is no agree to disagree (Score:1)

Online? Readable. (Score:2)

Share the money (Score:2)

Fair Use Takes a Hit in NYT v. OpenAI (Score:5, Interesting)

Re: (Score:2)

AI has no taste (Score:1)