Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
AI The Media The Courts

OpenAI's Motion to Dismiss Copyright Claims Rejected by Judge (arstechnica.com) 68

Is OpenAI's ChatGPT violating copyrights? The New York Times sued OpenAI in December 2023. But Ars Technica summarizes OpenAI's response. The New York Times (or NYT) "should have known that ChatGPT was being trained on its articles... partly because of the newspaper's own reporting..."

OpenAI pointed to a single November 2020 article, where the NYT reported that OpenAI was analyzing a trillion words on the Internet.

But on Friday, U.S. district judge Sidney Stein disagreed, denying OpenAI's motion to dismiss the NYT's copyright claims partly based on one NYT journalist's reporting. In his opinion, Stein confirmed that it's OpenAI's burden to prove that the NYT knew that ChatGPT would potentially violate its copyrights two years prior to its release in November 2022... And OpenAI's other argument — that it was "common knowledge" that ChatGPT was trained on NYT articles in 2020 based on other reporting — also failed for similar reasons...

OpenAI may still be able to prove through discovery that the NYT knew that ChatGPT would have infringing outputs in 2020, Stein said. But at this early stage, dismissal is not appropriate, the judge concluded. The same logic follows in a related case from The Daily News, Stein ruled. Davida Brook, co-lead counsel for the NYT, suggested in a statement to Ars that the NYT counts Friday's ruling as a win. "We appreciate Judge Stein's careful consideration of these issues," Brook said. "As the opinion indicates, all of our copyright claims will continue against Microsoft and OpenAI for their widespread theft of millions of The Times's works, and we look forward to continuing to pursue them."

The New York Times is also arguing that OpenAI contributes to ChatGPT users' infringement of its articles, and OpenAI lost its bid to dismiss that claim, too. The NYT argued that by training AI models on NYT works and training ChatGPT to deliver certain outputs, without the NYT's consent, OpenAI should be liable for users who manipulate ChatGPT to regurgitate content in order to skirt the NYT's paywalls... At this stage, Stein said that the NYT has "plausibly" alleged contributory infringement, showing through more than 100 pages of examples of ChatGPT outputs and media reports showing that ChatGPT could regurgitate portions of paywalled news articles that OpenAI "possessed constructive, if not actual, knowledge of end-user infringement." Perhaps more troubling to OpenAI, the judge noted that "The Times even informed defendants 'that their tools infringed its copyrighted works,' supporting the inference that defendants possessed actual knowledge of infringement by end users."

OpenAI's Motion to Dismiss Copyright Claims Rejected by Judge

Comments Filter:
  • by Talon0ne ( 10115958 ) on Saturday April 05, 2025 @10:51AM (#65283369)

    It seems to me like AI is just sort of 'ingesting' content, internalizing it, and building its world view based on it... Just like a person would. No word for word copying is going on (otherwise the model would be many many terabytes)... So IMHO this should be dismissed, plain and simple.

    • OpenAI's defense at this point is geared around minimizing damages. They have already lost the infringement, and they know it. The Times has a very strong case, including willful infringement, so the only question is what the penalties will be. All that's happening now is going through the motions to the guilty verdict. The Times will have to screw up royally in order to lose.

      • If they lose some lawyers are going to write to the owners of every registered work under the sun to start the largest class action in history. With statutory damages for wilful infringement times millions they'd be bankrupt if the truth comes out. How many people is Sam willing to kill to prevent the truth of the training set of getting out?

        I think fair use is their only chance.

      • You are treating a denied motion to dismiss like it’s the closing statement at trial. It is not. The judge didn’t rule that OpenAI infringed copyright—he ruled that the allegations, if proven, are strong enough to warrant discovery and trial. That is a big deal, yes—but it is not a verdict. Pretending otherwise oversimplifies what’s actually happening and turns complex legal proceedings into fanfiction.

        OpenAI's defense at this point is geared around minimizing damages.

        Not even close. OpenAI is still actively defending the core claim that train

    • Set booby traps as a form of watermarking.

      Companies will start deliberately seeding articles with fake news, grammatical oddities, made up words and other forms of digital subterfuge. Much like dictionaries and mapmakers used to insert phantom content to detect copying.

      Then when bots scrape your content, you can show the judge the fingerprinting you inserted.

      You may inadvertently invent a whole new vocabulary but once you've draffered the April sneggleklergen, you're past the point of no return.

      • Or if it does AI is just going to collapse on its own. That's because as AI takes over the internet AIs will begin to train on AI.

        That's a problem that was already seen in advance and a lot of work and effort has already gone into solving it and a lot of work and effort will continue to go into solving it.

        So you don't need the poison your content. If they can't find a workable solution to The problem you're raising AI will collapse on its own and if they can then you're wasting your time poisoning
      • by xevioso ( 598654 )

        This is happening now.

        You can go to recipe websites and find stupid instructions like, "Cook the chicken at 368 degrees for 20 minutes". No one would ever actually use that as real instructions for chicken, because it doesn't matter if it's 368 degrees or 350 for that amount of time. That number in inserted so the folks making the recipe websites can tell when someone has copied their recipes.

    • That is what the lawsuit is about. What uses are permitted under the copyright law and what are not. Since no one had heard of LLMs when the law was written, this is open to debate. Since as far as I know, none of us are copyright lawyers, are opinions are basically of no value.
      • Since as far as I know, none of us are copyright lawyers, are opinions are basically of no value.

        Our opinions have value because, we as a society, define the laws that we all abide under. This also means we have a stake in this decision because we are directly impacted by it and it's potential to limit our ability to compete globally / domestically.

    • This is why, as posted by martin-boundary 8 hours ago, on the thread about how Wikipedia is serving 80% of the hits on the site to bots:

      For thousands of years, man took a small boat and went to fish in the ocean to feed his family. Now mega trawlers rake the ocean floor with nets that catch everything swimming for miles around the ship.

      For thousands years, fish populations have existed and been caught by humans. Now, fish populations are going extinct because the trawlers are fishing faster than humans did.
    • It's copying because it cannot be transformative because for it to have a new purpose and meaning, would imply that so-called AI systems have the ability to express purpose or meaning.

      Data go into computer, data come out of computer. Copyright still holds.

      Laypeople are far too quick to anthropomorphize an algorithm. It's not like you or I reading a Harry Potter, then deciding that it would be fun to write about a wizard's school for cats. Sure maybe derivative, but very rarely is anything in art is cut from

      • You’ve taken a philosophical objection—“AI can’t express meaning”—and tried to turn it into a legal slam dunk. But copyright law is not that simple, and courts do not require sentience to evaluate whether a use is transformative or infringing. You’re confusing how you feel about AI with how the law actually works. More to the point—you claim an AI cannot express purpose or meaning, and therefore cannot be transformative. That may sound deep, but it collapses u

      • Prompt: "Read a Harry Potter, then write about a wizard's school for cats."

        The LLM might use data from Harry Potter under that prompt, but it might not. Removing the Harry Potter bit only makes it more uncertain. Just because an LLM could use data from Harry Potter doesn't mean that the LLM's output is Harry Potter. Nor does it guarantee that Harry Potter's copyright would legally apply.

        People may be quick to anthropomorphize things, but just as many others are quick to declare something as cut and dry
    • more than 100 pages of examples of ChatGPT outputs and media reports showing that ChatGPT could regurgitate portions of paywalled news articles

      You should at least TRY to get to the end of the summary before you make a fool of yourself in public.

    • The content still exists on their servers in order to be transformed into whatever the AI creates and they're not allowed to do that under the law without licensing it. Copyright is just that. Your right to make a copy. And they are absolutely making a copy when they ingest the data.

      I don't think it matters. The courts tend to side with whoever has the most money and whoever can make the most money and in this case the AI companies have an unlimited capacity to make money here and unlimited amounts of ve
      • Let me get this straight. Your argument boils down to: “Ingesting is copying, copying is illegal, therefore case closed,” followed by a shrug and a rant about how money always wins. That is not a legal position. That is a trollish tantrum trying to look like cynicism. The court walked in with a ruling that will echo through every AI model, license agreement, and copyright claim for years to come. If you want to troll at the bumper-sticker level, that is definitely your lane—but do not mis

    • You can't call it a 'world view' if it exists in a complete vacuum away from the world. Nor can you call a direct calculation on only the internet a 'world view'. Having a world view involves experiencing the five senses of all the things happening around you.
    • It seems to me like AI is just sort of 'ingesting' content, internalizing it, and building its world view based on it... Just like a person would. No word for word copying is going on (otherwise the model would be many many terabytes)... So IMHO this should be dismissed, plain and simple.

      You’re absolutely right in spirit—these models do internalize data and abstract patterns from it in a way that feels eerily human. That’s the fascinating part.

      But the legal system isn’t just asking how it learns, it’s asking what it can regurgitate, and under what circumstances. The NYT's case is not claiming that the entire model is a giant database of news articles—it’s alleging that, under the right conditions, ChatGPT can reproduce near-verbatim excerpts from th

    • "No word for word copying is going on"

      You don't know that. It can't keep a record of everything it's read, but the models are largely black boxes. There's really no way to know if it's memorizing long passages

      The only way to test this would be to see if it generated identical passages. And these tools have indeed done so. It actually doesn't even matter if it's actually copying word for word or whether it's parallel construction. Its still infringement if it's the same words.

  • OpenAI should be liable for users who manipulate ChatGPT to regurgitate content in order to skirt the NYT's paywalls ...

    LLMs are notoriously bad at verbatim retrieval, and the notion that someone would use ChatGPT or whatever to read the NYT is the stupidest thing I'm likely to read on an unusually stupid news day.

    The New York Times is a drop, or at most a bucket, in the ocean of training material used to generate a vast soup of vectors. This is profoundly transformative: it's not a .zip file of NYT articles being published, it's the combined influence of a myriad of sources--including the New York Times. If this isn't fa

    • by StormReaver ( 59959 ) on Saturday April 05, 2025 @11:45AM (#65283433)

      Google won, and if there's any consistency at all then the LLM trainers will win too.

      Google Library Project and LLM trainers aren't even remotely similar, so it would be incredibly inconsistent for OpenAI to prevail. At the very least, Library Project shows only a snippet of a book. It then points users to legitimate purchasing options rather than charging users for access to material for which Google has no legal rights. This does not infringe on the rights-holders ability to monetize their rights. OpenAI, on the other hand, copies the entirety of such material, then directly charges for access to it. This deprives the rights-holders of their ability to control/monetize their creations.

      OpenAI is the largest, most blatant copyright infringer ever created. If OpenAI were to prevail, it would destroy anyone's ability to make a living from any creative endeavor that can exist digitally.

      • OpenAI, on the other hand, copies the entirety of such material, then directly charges for access to it.

        What part of "LLMs are bad at verbatim regurgitation" do you not understand? Do you think OpenAI or Alibaba or whoever discovered a 1000:1 lossless compression scheme?

        You sound like a lot of people I know who have strong opinions on the subject but little to no experience actually using LLMs.

    • The issue here is that the original content still gets stored in some form in order to be used to create the new content.

      Copyright doesn't cover what goes into a human brain. But as soon as you start reading bits and bytes and then copying those bits and bites you've triggered copyright. If the law is applied as written then AI isn't legal.

      I do not expect the law to be applied as written though. There's so much money to be made in AI and judges tend to side with whoever's got the most money. Like th
      • If the case depended on what the law says, and what it means, you would probably be right because it's the judge's job to interpret the law. However, unless the plaintiff's attorneys are incompetent, this case is going to hing on the facts, and the jury is the trier of facts. It doesn't matter what the judge thinks, or how he (she) would rule giving the chance, the only thing that matters is what the jury decides. And, as this is a civil suit, the standard is not the famous "beyond a reasonable doubt," b
        • You're sort of right, but the judge exerts considerable influence on the outcome through decisions on admissibility of evidence and a mass of other procedural rulings.
          • True. However, the judge has to be careful not to be too openly biased, lest he leave himself open to an appeal on the grounds of bias.
      • The issue here is that the original content still gets stored in some form in order to be used to create the new content.

        Nope, that's not alleged in the NYT lawsuit, or I can't find it in their filing despite a thorough search just now. Their beef is with the training, alleged verbatim retrieval, associated search stuff etc.

        Look up "transformative" fair use. Your post reveals a fundamental misunderstanding of how copyright is applied and of the specific issues at hand in this suit.

        • Transformative fair use might apply to the finished product. But it doesn't apply to training. If I am a corporation selling my employees' services to others and I copy my competitors' training material to train my workforce, I have infringed.

          Individuals are covered by fair use for copying things for learning, commercial entities are not.

          • Transformative fair use might apply to the finished product. But it doesn't apply to training.

            [citation missing]

            You won't find that citation, either, because neither statute nor case law has caught up with the technology. Also, your analogy is misapplied because "training" isn't really the same thing in those two cases.

  • This is no different from a student educating themself by reading publicly available stuff.
    There is a bigger issue here, the future of copyright and the value of content.
    As content, whether text, images or video becomes effortless to create, its value will drop to zero.
    Some things, like professional journalism, still require great effort to create. This is expensive.
    In the past, the costs were paid by advertising, but bots don't read ads.
    Expect the end of publicly available quality journalism. Expect subscr

  • Some people think AIs work is transformative. Some think it's just mindless copying and passing on no matter how much mixing and creativity there is. Never the twain shall meet because of emotional investments on both sides on topics as esoteric as whether machines, however intelligent, can be said to think. There is no common ground on that topic so let's bypass it. If you think AIs violate copyright every time they read an article what remedy do you want? LLMs cannot exist without doing that and cannot p
  • If OpenAI hacked their way into the NYT, then they are liable. If they just trained on publicly available content? That needs to be declared legal.

    Yes, I know there are s lot of amateurish, ill-behaved bots out there. That's a different problem altogether. The point is: Material made freely available on the internet is free to read: for humans, aliens, or AIs.

  • It's really simple. We have the technology.
    Let AI read everything.
    If you charge for AI, then you share the revenue you make off the knowledge you learned.
    If you charge $10 for a response that includes information learned from 3 books written by John Doe, John Doe gets a % of what you charge.
    Maybe your CEO doesn't get paid $100M a month as a result, I'm ok with that.

  • I just plowed through the full 47-page ruling in New York Times v. OpenAI, and the Ars Technica summary leaves out some of the most important bits.

    Yes, the judge let key claims move forward—including contributory infringement—but Ars barely mentions the most striking part: the court rejected OpenAI’s “substantial noninfringing use” defense, calling it a “straw man.” The judge made clear that ChatGPT’s ongoing relationship with users means OpenAI could still be liable if its models regurgitate copyrighted material. This is a big deal. It shows courts may not treat LLMs like neutral tools—and that alone could reshape how AI output liability works.

    Also missing: while the NYT’s “hot news” and DMCA claims were tossed, similar DMCA claims from other plaintiffs in the consolidated cases survived. That nuance matters. As a co-defendant in this case, Microsoft got out clean this time, but the ruling invites a deeper discussion on whether Big Tech partners are merely embedding AI—or helping build bullet-proof copyright infringement engines.

    This ruling is a canary in the coal mine for Meta and others. If courts follow this logic, arguments about “fair use” and “general-purpose tools” may not be enough to avoid discovery—or liability. The AI legal landscape just got a lot more real.

  • The "art" is terrible. If someone gave me a birthday card drawn by AI I would punch them in the face because the art is so awful. It reproduces 3 art styles in 1 picture resulting in garish colours, nonsense images and out of context objects. Because there is not underlying artistic reason for the objects we just end up the uncanny valley feeling of undigestible food like substance that reminds you of food .
  • Would someone please create a scraper for the NYT, the WSJ, the economist, and the rest of the paywalled stuff, compare it to the AP and Reuters, and just publish it for free? I'm too lazy to get the script working yet, but the copyright nonsense is about to be bullshit.

"Oh my! An `inflammatory attitude' in alt.flame? Never heard of such a thing..." -- Allen Gwinn, allen@sulaco.Sigma.COM

Working...