The New York Times Sues OpenAI and Microsoft Over AI Use of Copyrighted Work (nytimes.com) 59
The New York Times sued OpenAI and Microsoft for copyright infringement on Wednesday, opening a new front in the increasingly intense legal battle over the unauthorized use of published work to train artificial intelligence technologies. From a report: The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works. The lawsuit [PDF], filed in Federal District Court in Manhattan, contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information.
The suit does not include an exact monetary demand. But it says the defendants should be held responsible for "billions of dollars in statutory and actual damages" related to the "unlawful copying and use of The Times's uniquely valuable works." It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times. The lawsuit could test the emerging legal contours of generative A.I. technologies -- so called for the text, images and other content they can create after learning from large data sets -- and could carry major implications for the news industry. The Times is among a small number of outlets that have built successful business models from online journalism, but dozens of newspapers and magazines have been hobbled by readers' migration to the internet.
The suit does not include an exact monetary demand. But it says the defendants should be held responsible for "billions of dollars in statutory and actual damages" related to the "unlawful copying and use of The Times's uniquely valuable works." It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times. The lawsuit could test the emerging legal contours of generative A.I. technologies -- so called for the text, images and other content they can create after learning from large data sets -- and could carry major implications for the news industry. The Times is among a small number of outlets that have built successful business models from online journalism, but dozens of newspapers and magazines have been hobbled by readers' migration to the internet.
Times sues $badguy. Read about it in the Times (Score:1)
to get totally objective and unbiased coverage with expert analysis by real experts and not just the reporters' friends, roommates, and fuckbuddies.
Re: (Score:2)
Re: (Score:3)
Can anyone name anyone right-of-center that works for this publication? Just asking? Can you be objective with a staff configured this way?
David Brooks, Ross Douthat, Bret Stephens
Re: (Score:2)
Peter Wehner
Re: Times sues $badguy. Read about it in the Times (Score:1)
Re: (Score:2)
OK I know not the other two guys, but David Brooks is very liberal. He's a Republican, but like me, he's about as liberal as they come without being a Democrat.
Re: (Score:1)
So, right of center, then. As if you guys know where center is anymore.
Re: (Score:1)
Left wing fascism exists. Even if you try to redefine it away.
OP said fascism, not dictatorship. The left really hates to admit one entire form of fascism exists, but they're both equally bad. In fact, the leftist version is alive and well and thriving in the west. In fact, it controls practically everything.
OK you can say ridiculous stuff all you want, secular expansion doesn't become religious just because the people are mostly one religion.
And the Crusades were not how Christianity expanded, it was an attempt to RE-take the holy lands. This is what I referred to in my OP. All major religions expanded largely via military expansion, but none so entirely and early on as Islam.
Yeah, Islam was born in and expanded more through military expansion than any other major religion, and boy does that continue to show. Everywhere Islam exists, violence is normal and hatred is taught.
Re: (Score:2)
It's funnier when someone posts those things and thinks they're far right.
I'm quite far left, actually, and a dedicated atheist. I even explained twice how both religions expanded mainly through conquest, but you chose to not quote that bit when claiming I am "far right." This your problem.
Re: (Score:1)
Re: (Score:2)
I rail against both the far left and far right. And yep, I'm an atheist.
Re: Times sues $badguy. Read about it in the Times (Score:3)
They write opinion, so they explicitly state their political valance and don't (usually) pretend to be neutral. They don't count. They get to be as partisan as they want.
The problem comes when the people who write news inject their opinions into coverage while pretending they're just giving you information you can use to form your own assessment.
Reporting about litigation your employer is a party to is a pretty brazen (apolitical) conflict of interest. I'm a little surprised they're doing it with a straight
The fix is easy (Score:2)
While we can strut around and bray how "Our Stuff is copyright, so eyez only!" fails understand that once it is on the internet, people can print it out, use it for rule 34 stuff, It's almost like putting top secret stuff on the internet, giving the URL, and demanding that no one look at it.
Re: (Score:2)
Also don't put things in libraries. Anyone can just walk in and use their phone to take pictures, or use the copying machines.
Also don't put things on news stands, anyone walking by might see the headlines and pictures and such.
Maybe more than one fix is needed, but boy, they're so easy it's a wonder that no one has done them before!
Re: The fix is easy (Score:1)
Re: (Score:2)
There ackchyually are rules about scraping whole web sites and using them for other purposes, too, and they're usually found at the bottom of the web page along with in the robots.txt file. OpenAI, MetaFace, Alphabet etc. just ignored those rules, and now they're being sued over it.
Re: (Score:2)
Re: (Score:2)
The Terms of Service [nytimes.com] linked from the bottom of NYT articles prohibits (without prior written permission) using a spider to crawl their site without permission and also, with a search indexing exception, to "cache or archive the Content". LLM training will violate those terms.
I would like to see a judge's reaction if a lawyer made the argument "my client didn't bother checking for terms of use, they just assumed that robots.txt completely described the limitations". I do not think it would go well for that
Re: The fix is easy (Score:2)
Even if robots.txt was configured correctly, only recently has OpenAI started respecting it. And by respecting it, I mean that instead of personally scraping your site they buy it from a third party.
Re: (Score:2)
The Terms of Service [nytimes.com] linked from the bottom of NYT articles prohibits (without prior written permission) using a spider to crawl their site without permission and also, with a search indexing exception, to "cache or archive the Content". LLM training will violate those terms.
I would like to see a judge's reaction if a lawyer made the argument "my client didn't bother checking for terms of use, they just assumed that robots.txt completely described the limitations". I do not think it would go well for that lawyer.
Outfits can say what they like, but lots of spiders crawl NYT - and everyone else, BTW - every day.
The AI people are just getting sued because they are a handy target.
Re: (Score:2)
Do you have a citation for that? Not to be a dick, but it seems like an area of law that is very unsettled - and I legitly don't know, were there robots.txt files that were ignored?
It is very unsettled, because if it's there, it can be copied and used by others. And there are these weird gray areas like web searches.
As well, does this mean if I read several articles and create something new from them - I have violated copyright? Point is if John Doe is interviewed, and posted as part of a copyrighted article, am I prohibited from say in a blog, write what John Doe said? It's unsettled as hell. And my point was always, put it on the web, and it isn't yours any more. copying it is
Re: The fix is easy (Score:1)
Re: (Score:2)
If I open a webpage I have to make a local copy. Does this mean just reading it violates copyright?
Re: (Score:2)
Also don't put things in libraries. Anyone can just walk in and use their phone to take pictures, or use the copying machines.
Also don't put things on news stands, anyone walking by might see the headlines and pictures and such.
Maybe more than one fix is needed, but boy, they're so easy it's a wonder that no one has done them before!
Certainly copying machines are a long used way of getting information to use. But there is a bit of a difference. At the library, or at work, there's hella lot of copying taking place. I do it myself for research notes. Some of the open literature is not to be removed from the library.
We can however, make a case for the internet bots taking portions of articles in order to make web searches of any copyrighted words as a defacto violation of copyright, and no more web searches should be allowed.
The web
There's always the NY Post (Score:2)
If the Times doesn't like OpenAI training their AI's on the Times, I wonder how they'd feel about training them on the Post? I expect it would make for a more entertaining Ai, anyway.
Re: (Score:2)
This would be like the publishers of textbooks suing the users of the books for learning from them and using the knowledge learned in the pursuit of their jobs.
Re: (Score:1)
They (Score:1)
They will probably award themselves the Pulitzer for this next year.
One hundred beeeeeljon dollars! (Score:2)
Re: (Score:2)
Seriously, "billions of dollars in statutory and actual damages related to the unlawful copying and use of The Times's uniquely valuable works"?
Those are OpenAI's numbers, they think they are worth $100 billion [yahoo.com]
New York times is the Disney of News (Score:3, Insightful)
Re: New York times is the Disney of News (Score:2)
I agree with your criticism of the Times, but at the same time I think it would be hilarious if all the best AI systems were pirate systems and being forced to follow the stupid copyright laws screwed Microsoft. In situations where everyone is a bad guy, sometimes you just root for chaos.
Re: (Score:2)
Which does beg the question (modern use of the phrase), if the content is paywalled then how were they not remunerated for the computer reading it? Or were they silly enough to allow web crawlers an exception from the paywall? That would be on them.
Re: (Score:2)
A subscription does not give you rights to publish their articles.
Re: (Score:2)
I had a look at their lawsuit and they actually seem to have some good points. There are some screenshots here: https://x.com/jason_kint/statu... [x.com]
The first one shows ChatGPT reproducing their work word-for-word. It's not learned to be like the NYT, it's just copied their work wholesale.
They also note that different sources seem to be weighted differently as training data, and the NYT is one of the most valuable. That's Microsoft admitting that the NYT comment is valuable to it, and that it was selected caref
Re: (Score:2)
I had a look at their lawsuit and they actually seem to have some good points. There are some screenshots here: https://x.com/jason_kint/statu... [x.com]
for what it's worth, that literal quote is all over the internet, and chatgpt didn't even exist when it was written. so even if they prove that the text is an actual bot response (i ignore this very important fact but it doesn't seem relevant to this twitter opiner nor his followers) it could have come from anywhere, and i very much doubt that the nyt maintains articles from 2020 online and behind the paywall.
so this "proof" could actually backfire for failing to police the content they are complaining abou
Re: (Score:2)
Well their very large submission has over 100 other examples to choose from.
Re: (Score:2)
tbh, i wast just shitposting to some extent. what should i know.
the verbatim reproduction is indeed concerning, and surprising to me. that's not how it is supposed to work, so what is happening here? i don't think a court is the best tool to find out, but then my intuition is that indeed openai have been reckless and the nyt has smelled money and is obviously exploiting copyright law in the most toxic way they possibly can, which ... was all to be expected, and we get to watch!
If you make something readable (Score:3, Insightful)
If you make something readable, guess what, it may get read. Does it really matter if a human or an AI does the reading? Both may well learn something, both may even recite parts of what they read.
This is legally muddy, but I hope the courts come down on the side if fair use. If the New York Times doesn't want people reading their articles, they shouldn't publish them.
Re: (Score:2)
I read the Linux source code. I write an OS kernel. Is that software I wrote under the GPL because I read the Linux source code?
Now, I have a
Re: (Score:2)
Microsoft can then run the source code to say, Paint.net, or Notepad++ through an AI and ship New Paint or New Notepad with Windows as a closed source binary.
I'm sure there are plenty of Software as a Service (SaaS) vendors who will instantly run plenty of AGPL frameowrks through an AI so they can use them without obeying the terms of the AGPL, because fair use is fair use.
If you insist copyright doesn't apply to LLM training works, it doesn't apply when an LLM trains on free/open source software either. And without copyright, those FOSS licenses are useless. If the code is free, the GPL isn't required.
Modality of reproduction is irrelevant. Even if by freak chance you independently "innocent infringement" develop something that happens to be copyright protected by someone else that doesn't save you.
What matters is whether the resulting work is deemed by a court of law to be derivative of someone else's copyright.
Re: (Score:2)
Complete misunderstanding of copyright (Score:3)
NYT isn't being serious. I don't think they even intend to win they just want to add to the public noise crying over having their business models upended by technology.
They know full well copyright does not extend to the underlying information. The fact you did the work to gather facts whether it was paying a real journalist to do real journalism or compiling a book filled with all kinds of interesting phone numbers is tough shit. Copyright only protects fixed works. It does not control access to or use of underlying information.
Re: (Score:1)
As for your overlooking the facts here. The facts are that New York Times doesn't present facts they present narratives.
Re: Complete misunderstanding of copyright (Score:2)
... who doesn't?
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
i guess (Score:3)
Copyright on news? (Score:2)
How come a company can claim copyright on news? I mean, anything happening might generate a report, just as we read in this very post from /. . Does that means this post is also a copyrighted news? And does that means that every single news article in the entire world is passive of being copyrighted - or worse, copyright infringed? That will make things look very, very bad in no time.
Re: (Score:1)
Are you legitimately curious about this? News isn't copyrighted. The words used to express news is. Always has been. You can't copyright that there was a terrorist attack, but you can copyright the phrases used to describe it to readers.
Does that means this post is also a copyrighted news?
Yes, and Slashdot could find themselves in legal trouble if they copied the entire story into the summary, as it would breach fair use.
That will make things look very, very bad in no time.
Nope, it would literally look the way it looks now.
Re: (Score:2)
Mass Plagiarism (Score:2)
How long before ... (Score:2)
Re: (Score:2)
Your comment shows a complete lack of understanding as to what copyright is and how it works. I suggest you do some research on the subject.
Re: (Score:2)