OpenAI Claims NYT Tricked ChatGPT Into Copying Its Articles 166
Emilia David reports via The Verge: OpenAI has publicly responded to a copyright lawsuit by The New York Times, calling the case "without merit" and saying it still hoped for a partnership with the media outlet. In a blog post, OpenAI said the Times "is not telling the full story." It took particular issue with claims that its ChatGPT AI tool reproduced Times stories verbatim, arguing that the Times had manipulated prompts to include regurgitated excerpts of articles. "Even when using such prompts, our models don't typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts," OpenAI said.
OpenAI claims it's attempted to reduce regurgitation from its large language models and that the Times refused to share examples of this reproduction before filing the lawsuit. It said the verbatim examples "appear to be from year-old articles that have proliferated on multiple third-party websites." The company did admit that it took down a ChatGPT feature, called Browse, that unintentionally reproduced content. However, the company maintained its long-standing position that in order for AI models to learn and solve new problems, they need access to "the enormous aggregate of human knowledge." It reiterated that while it respects the legal right to own copyrighted works -- and has offered opt-outs to training data inclusion -- it believes training AI models with data from the internet falls under fair use rules that allow for repurposing copyrighted works. The company announced website owners could start blocking its web crawlers from accessing their data on August 2023, nearly a year after it launched ChatGPT. OpenAI stills hopes to form a "constructive partnership with The New York Times and respect its long history," the company said.
Last month, OpenAI struck an unprecedented deal with Politico parent company Axel Springer, allowing ChatGPT to summarize news stories from Politico and Business Insider.
OpenAI claims it's attempted to reduce regurgitation from its large language models and that the Times refused to share examples of this reproduction before filing the lawsuit. It said the verbatim examples "appear to be from year-old articles that have proliferated on multiple third-party websites." The company did admit that it took down a ChatGPT feature, called Browse, that unintentionally reproduced content. However, the company maintained its long-standing position that in order for AI models to learn and solve new problems, they need access to "the enormous aggregate of human knowledge." It reiterated that while it respects the legal right to own copyrighted works -- and has offered opt-outs to training data inclusion -- it believes training AI models with data from the internet falls under fair use rules that allow for repurposing copyrighted works. The company announced website owners could start blocking its web crawlers from accessing their data on August 2023, nearly a year after it launched ChatGPT. OpenAI stills hopes to form a "constructive partnership with The New York Times and respect its long history," the company said.
Last month, OpenAI struck an unprecedented deal with Politico parent company Axel Springer, allowing ChatGPT to summarize news stories from Politico and Business Insider.
Yeah, no (Score:2)
Re: (Score:3)
Linguistic Gymnastics (Score:4, Interesting)
The linguistic gymnastics employed here is astounding. It makes the Wookie defense seem almost plausible.
Re: (Score:2, Interesting)
It's been independently reproduced too.
Image generators are just as bad. Someone asked Midjourney for "man in robes with laser sword" and out popper Luke Skywalker.
Re: (Score:2)
Image generators are just as bad. Someone asked Midjourney for "man in robes with laser sword" and out popper Luke Skywalker.
Do you have a citation or link to the image? Was Luke Skywalker barefoot?
Re: (Score:2)
Impossible, every treckie knows Worfenstein's favourite pocket knife is called a "light saber", not "laser sword".
Re:Linguistic Gymnastics (Score:4, Interesting)
Searching for this reference only turns up your Slashdot post [google.com].
That said, given that Luke Skywalker is everywhere on the internet, why shouldn't it know what he looks like? What sort of gating mechanism are you proposing that says "learn some things that are everywhere verbatim" (flags, the Mona Lisa, etc) but not others (Luke Skywalker, in this case), and what about, say, corporate logos, should it know them? Furthermore, at what point will you aim your fire at the user for deliberately attempting to misuse a tool to violate copyright? You're talking like the AI is a person. The law disagrees that AIs are people. The user is a person. And if they're deliberately trying to use the tool to violate copyright, why isn't that on them? One can draw Snoopy in Photoshop in just a couple minutes - is that Adobe's violation? Yeah, AI tools are faster and better, regardless of what the user is trying to do - does that somehow change the equation of who is deliberately trying to violate copyright in this case?
Re: (Score:2)
Re: (Score:2)
I'm not a lawyer, but I'm pretty sure it is actually against US copyright law to draw snoopy. How do you think before computers people violated copyrights?
Re: Linguistic Gymnastics (Score:2)
Copyright was created as a response to the printing press. So before computers, yes, but only automatic, large scale, easy copying was enough of a threat to create a law stopping such copies.
Before that, people used to spend months creating a single copy of a single book.
Re: (Score:2)
It's precedent that Napster - a tool people used to violate copyright (while the software was never implied to be a person) was responsible for the copyright violations due to some ... well I'd call it interesting logic. So I don't think that's as slam dunk as you think it is.
Re: (Score:2)
The issue is that if you asked a human to draw that, any sensible artist would avoid drawing something that infringes upon Mark Hamil's image and Disney's copyright.
AI is not an artist and does not have agency under US law. Neither is "infringing upon" an image of a person a thing under US law in this context.
This person intentionally, explicitly asked the AI to draw it and then played innocent all I did was routine to get attention. Anyone who doesn't believe me is free to type "man in robes with light sword" and get Skywalker out... It's not happening. Prove me wrong. I tried several different models generating over a hundred images and none of them produced any
Re: (Score:2)
No, the person asked the company Midjourney to make something generic. They did not say "Luke Skywalker". The company Midjourney produced that image, and is responsible for the infringement.
Re: (Score:2)
No, the person asked the company Midjourney to make something generic. They did not say "Luke Skywalker". The company Midjourney produced that image, and is responsible for the infringement.
The actual prompt included far more than the "man in robes with light sword" you quoted. It was intentionally and deliberately designed to navigate the latent space in a way that would produce Skywalker image without explicitly saying "Luke Skywalker" to get attention. While it might have seemed "generic" to someone not familiar with the technology it was certainly not a generic request.
Again if you have any doubts or questions I invite you or anyone else to try and reproduce the prompt "man in robes wit
Re: (Score:2)
Even if what you claim is true, they should still have told their AI not to produce copyright infringing output even when asked to.
You can't shift all the legal liability onto the user, when you are providing a service. It's different to providing a tool. You are doing the work, even if you automated it via AI.
Re: (Score:2)
the last paragraph of openai's statement is particularly telling.
nyt will strike a deal, once they come down their high horse and after a few antics to cover up the sleazy trolling that this lawsuit was.
Re: (Score:3)
You mean this sentence:
the [OpenAI] company maintained its long-standing position that in order for AI models to learn and solve new problems, they need access to "the enormous aggregate of human knowledge."
Sounds like a confession to me.
It is also a stupid statement, "AI models" don't "solve new problems", they regurgitate old data into patterns that "look" the same to the model.
Re: (Score:2)
Good AI models outscore the vast majority of humans in creativity tasks, and they're used every day to solve "novel" problems, particularly in programming.
And FYI, but YOU are a predictive engine. That's literally how human learning works. Your brain makes constant predictions about what your senses will experience; the error between the prediction and the reality then propagates backwards through the network, including through the deeper, more abstract / conceptual layers.
You cannot make good predictions
Re: (Score:2)
Good AI models outscore the vast majority of humans in creativity tasks,
Bullshit. AI models regurgitate the human creativity that was fed into them.
and they're used every day to solve "novel" problems, particularly in programming
LOL, aptly placed quote marks. "AI" hasn't done anything that isn't a regurgitation of fed data.
That's literally how human learning works.
How would you know :)
AIs perform well because they have a good conceptual model of the world underlying what they're predicting.
Quite the opposite, "AI" is a matrix of coefficients that compute a value based on multiplying these to some input data. It doesn't understand what a conceptual model is anymore than you understand what "AI" is.
Re: (Score:2)
Yeah, I skimmed through the claims. Apparently, it is no "AI", but an application of a manually tuned genetic algorithm implementation.
Re: (Score:3)
Re: (Score:2)
Yep.
Re: (Score:2)
It was a genetic algorithm based around an LLM.
Re: (Score:2)
No, it was LLM based around a genetic algorithm, as it is the genetic algorithm that was driving the setup toward a solution.
Re: (Score:2)
If NYT gets a big enough pound of flesh to satisfy them, the next class action from the rest of the world will butcher them.
Supreme court saying it's fair use is their only hope, otherwise it's all over.
Re:Linguistic Gymnastics (Score:5, Interesting)
The linguistic gymnastics employed here is astounding. It makes the Wookie defense seem almost plausible.
I find their response fairly straightforward.
There were two ways ChatGPT could reproduce articles:
1) You could ask it to scrape the URL and reproduce in real time, OpenAI took down this feature fairly quickly when the copyright issue became apparent so probably not a big deal.
2) ChatGPT could regurgitate the memorized article.
#2 is the interesting claim since it gets right to the core of LLMs. Here OpenAI made two main claims.
a) OpenAI has safeguards to prevent ChatGPT from regurgitating content, they're not perfect, but they're much more than the NYTimes suite suggested.
b) The articles in question were already widely reproduced on the web.
I think there's still a couple important questions.
First, the existence of the articles on the web suggests that OpenAI didn't need to scrape the NYTimes to get the data, but it leaves open the possibility that they did.
Second, OpenAI is trying to frame this as a debate about whether ChatGPT spits out copyrighted content and if their good-faith efforts are sufficient to protect them from legal liability. NYTimes wants it to be a debate about whether they're allowed to use the NYTimes IP to train their model in the first place.
Re:Linguistic Gymnastics (Score:5, Insightful)
1) The NYT (employees) created the article => The NYT has the copyright for the article => The NYT can license to its customers the right to display the article in limited circumstances => Everyone who gets the article from them is a licensee of the NYT, so they get some limited personal rights, but do not get the right to republish or sublicense the article further to anyone else.
2) OpenAI scraped the article data illegally from the web: just because it was accessible to their spider does not give them the right to copy it to their own disks, but they did it anyway. At best, the NYT licensee who made the article available unsecured committed an offence by not following the license terms. The spider took advantage, which it had no right to do (eg proceeds of crime).
TL;DR writing spiders is hard, and it's always better to NOT download a file if there's any doubt at all about the licensing terms and legal owner.
Re: (Score:2)
It's legal to scrape the web for training, despite what you or the NYT might think. What's not legal is if it spits out the articles verbatim without being prompted with the text of the articles. OpenAI claims that the NYT was using big pieces of their articles as prompts, and the NYT isn't telling what prompts they use, so at this point it's a he-said/he-said between two disreputable characters. (The NYT is continually publishing articles that downplay genocide, and murder of journalists no less [hyperallergic.com], just like
Re: (Score:3)
just because it was accessible to their spider does not give them the right to copy it to their own disks
Don't they though? How is this different from time-shifting a TV show with my TIVO (yes I'm that old).
I thought copyright was more about protection against me re-publishing or rebroadcasting my time-shifted TV show back to the general public.
I see ChatGPT as more of generating derivative works based on it's training data - more akin to taking 30 second clips of a bunch of TV shows and stitching them back into a new but similar TV show; Which is perfectly legal already.
Re: (Score:2)
is it legal already?
Re: (Score:2)
I don't think it matters whether the NYT tried to prompt the AI to reproduce the text verbatim and that this wouldn't happen normally; the problem is that the AI is capable of doing this _at_all_
GPT4 somehow has in its interior encoded large parts of the archive of the NYTimes and it is using them to generate more text. It's like if I created a program to create-your-own-super-hero, and it included parts of the Spiderman costume that I collage together into new pieces; it does not matter that Marvel had to
Re: (Score:2)
I don't think it matters whether the NYT tried to prompt the AI to reproduce the text verbatim and that this wouldn't happen normally; the problem is that the AI is capable of doing this _at_all_
GPT4 somehow has in its interior encoded large parts of the archive of the NYTimes and it is using them to generate more text.
I can recite copyrighted works as well, it's only an issue if I recite them in certain contexts.
It's like I said at the end. The NYTimes agrees with your belief that OpenAI is in trouble if those memorized texts exist in ChatGPT at all. OpenAI thinks they're alright if they make a fairly effective good faith effort to stop ChatGPT from reciting that copyrighted content.
It's like if I created a program to create-your-own-super-hero, and it included parts of the Spiderman costume that I collage together into new pieces;
I'm not sure that's a good metaphor for a few reasons:
a) In your example you're obviously just trying to save yourself effort of making your
Re: (Score:2)
Yes, but that's basically closing the barn door after the horse has run out. LLMs like ChatGPT respond oddly.
Like how the having it repeat "poem" causes it to suddenly spit out copyrighted text verbatim, this seems like a fruitless task
Re: (Score:2)
Yes, but that's basically closing the barn door after the horse has run out. LLMs like ChatGPT respond oddly.
Like how the having it repeat "poem" causes it to suddenly spit out copyrighted text verbatim, this seems like a fruitless task - are you going to try to close every loophole that results in it spitting out copyrighted text as they come up? Because that's going to be impossible - that's like saying Windows will be secure if you keep patching security holes as they come up. That just means someone else will find another prompt that will cause it do soenthing wierd and who knows what happens then.
So what?
Is occasionally tricking the LLM into spewing out copyrighted data really a source of damage to the IP holders? Because without harm there's no grounds for a lawsuit.
Re: (Score:3)
Which is perfectly allowable under fair use exemptions; copyright doesn't grant you a dictatorship over works. And one of said exemptions is for automated processing to create new transformative products and services. Web spiders constantly download copyrighted content from the entire internet - that doesn't make them illegal. Google constantly posts excerpts of copyrighted content, even entire pages of it, in caches, previews, thumbnails, whole pages from books, etc - all made fr
Re: (Score:2)
it actually does give them that dictatorship; check what happens whenever a newspaper decides they want Google to pay them for displaying their news in news google com instead of redirecting them to the newspaper site: Google delists the newspaper. Why? Because they don't want to lose that fight
Many attempts isn't a defense (Score:5, Insightful)
Re: (Score:2)
Technically, if they fed them the articles, it would be the *same* stories. :-)
Re: (Score:2)
Re: (Score:2)
How many times do you have to try to draw Daffy Duck in photoshop before it's you who's trying to violate copyright, not Adobe?
Re: (Score:2)
Do they really want to use that argument? (Score:4, Insightful)
If it's that easy to trick ChatGPT into breaking the law, maybe it should not be allowed in public until it can be made certain that it doesn't.
Re: (Score:3)
And of course the point is to demonstrate that it is storing NYT articles. If it quit merely spewing them back out it would make it harder to prove, but not any less true.
Re: (Score:3)
storing nyt articles is no violation of copyright, not even if you could prove that they got them frpm the nyt. "publishing" it would be.
Re: (Score:3)
I don't believe that's correct. The copy is being made when it is stored. "Publishing" it would certainly also be, as would creating a derivative work, as would publishing that derivative work.
Re: (Score:2)
I don't believe that's correct. The copy is being made when it is stored. "Publishing" it would certainly also be, as would creating a derivative work, as would publishing that derivative work.
I agree, under US copyright law the rights holder has an exclusive right over the production of "fixed" copies of their copyrighted works even if they are held privately and never distributed or performed.
Separately there seems to be two obvious questions in this regards.
1. Is training a neural network on NYT data producing a copy? The answer is clearly no. The neural network is clearly transformative and as a result rights holders have no rights over the transformative work.
2. Is the AI spitting out copy
Re:Do they really want to use that argument? (Score:4, Informative)
The point the NYT is making is that the neural net was able to spit out large chunks of their article verbatim, indicating that training DOES produce a copy.
Re: (Score:2)
The point is the NYT used large chunks of their articles in the prompts. Yes, if you feed a big part of the article in as a series of tokens, you will have a good chance of getting it out again.
Re: (Score:2)
The point the NYT is making is that the neural net was able to spit out large chunks of their article verbatim, indicating that training DOES produce a copy.
Copyright is not a grant of authority over underlying knowledge and ideas. For example while a phone book is copyrighted the knowledge it contains is not. I can OCR all the numbers in the phone book into a computer database and the copyright holder can't do shit about it.
Google's search index containing petabytes of copyrighted material has already been adjudicated as a transformative work. It does not seem credible to assume neural networks would be considered differently under law. If anything the ca
Re: (Score:2)
Is training a neural network on NYT data producing a copy? The answer is clearly no. The neural network is clearly transformative and as a result rights holders have no rights over the transformative work.
The court may try to divide the resulting work into parts based on how much is from the original work, and how much is from the derived work. (See for example the abstraction filtration comparison test). NYT can't claim to own the entire derived work, but they can claim to own the parts that were from the original work.
2. Is the AI spitting out copyrighted material in response to user prompts a copyright issue?
If a copy doesn't exist in some way in the AI's database, then it couldn't reproduce it. Therefore a copy exists (in some way) in the AI's database. Whether it is an interactive conversation
Re: (Score:2)
A lossy copy is encoded in the mode that can sometimes be reversed back to portions of identical text. But a human mind can do that too. They aren't accused of a violation unless they use that memory to recreate the original work verbatim or a very direct derivative. Human memory is much more limited in fidelity but there has to be some sort of threshold level of fidelity that is the end of acceptable - whether it's a human or computer doing it.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
If it doesn't reproduce the original text, there is probably no copyright issue.
However, copyright law calls a "transformative work" a derivative work and is a right reserved to the copyright holder and the people they choose to license to. You would say that optioning a book and making a movie is transformative - sometimes greatly so. But you have to pay for that.
Re: (Score:2)
You: JPEG is lossy, therefore it's clearly not a copy. It's clearly transformative and the PNG file's copyright owner has no rights over the JPEG image.
Everyone else: The JPEG is a deliberate copy of the PNG, only in a different representation.
Re: (Score:2)
Only if you're not a lawyer. Those of us that are recognize you have to transform the work, not point to some random, potentially unrelated output and say "tah dah! Transformative!"
Besides, you appear overly confident there's no intervening copy between NYT server output to what it thinks is a browser and actual OpenAI entry into the black box. I don't see how you do that without ChatGPT figuratively sucking on the fire hose, which nobody seems to have conten
Re: (Score:2)
under US copyright law the rights holder has an exclusive right over the production of "fixed" copies of their copyrighted works even if they are held privately and never distributed or performed.
i would expect that to be not true even in the most dystopian corner of u.s.
do you happen to have a source to the relevant article of law?
e.g. https://www.copyright.gov/what [copyright.gov]... says nothing of the sort.
https://www.law.cornell.edu/us... [cornell.edu]
"Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following:"
"(1) to reproduce the copyrighted work in copies or phonorecords;"
There are of course "fair use" exemptions that are applicable.
nothing about owning copies, "fixed" or otherwise.
This is a very important point. Copyright only extends to production of fixed copies and public performances. It restricts writing not reading. The copyright holder has no control over ownership or u
Re: (Score:2)
i stand corrected, thank you.
Re: (Score:2)
yeah, i realized that right after posting, nice 'oh, crap' moment. thanks for pointing out.
it isn't where i live (yet), luckily. the mere possession of copyrighted material isn't a copyright infringement. not even sharing it is, unless there is profit involved. of course, as you would expect, companies have repeatedly tried to spin in court that "use" is equivalent to "profit" on several occasions, but afaik to no avail.
and, yes, software is explicitly excluded from this provision for some reason.
Re: (Score:2)
If it's that easy to trick ChatGPT into breaking the law, maybe it should not be allowed in public until it can be made certain that it doesn't.
it isn't breaking the law if it is giving the nyt morons the exact answer they specifically engineered the prompt for, that's the whole point. pun intended: that's not fair use :D
there could be a discussion if some random query produced that output.
otoh, as i pointed out when this story first appeared here a few days ago with some examples, the text reproduced verbatim was years old and replicated all over the internet, rendering the nyt claims that it was protected and valuable content moot. the whole clai
Re: (Score:3)
the text reproduced verbatim was years old and replicated all over the internet, rendering the nyt claims that it was protected and valuable content moot.
Rampant piracy doesn't invalidate a copyright. But one thing it does do for a large language model is heavily weight that phrasing as it enters the model from multiple sources.
Re: (Score:2)
If it's that easy to trick ChatGPT into breaking the law, maybe it should not be allowed in public until it can be made certain that it doesn't.
Should this apply to hammers as well?
Re: (Score:2)
It does, my LLM-powered hammer's input is legally limited to its built-in accelerometer, gravity sensor and whatever touch-sensing stuff it has wrapped around the handle.
The Prompts Don't work (Score:4)
Re:The Prompts Don't work (Score:5, Insightful)
To be fair, they are almost certain to immediately tweak their system once they know what prompts are in play in the suit. So of course it won't work the next time. But that might be on the NYT for not having video evidence of this happening, with some way to prove the video wasn't faked.
Re: (Score:3)
The only valuable evidence they can have is sworn testimony. Everything else can be faked since ChatGPT does not cryptographically sign its answers.
Re: (Score:2)
They probably spent days if not weeks trying to find a way to accomplish it.
Re: The Prompts Don't work (Score:2)
So? That doesn't mean another user couldn't do that without trying for days or weeks.
It's not supposed to refuse to regurgitate (Score:3)
It's supposed to be unable to regurgitate anything other than small particularly famous quotes.
Re: (Score:2)
I don't think that's true. If it were, standard web search engines wouldn't pass this test. They index literally all of the text of NYT and other publications, and then show you snippets long enough for you to know whether you got the right link. Would they "be able to" regurgitate the entire text? Certainly they could, they have that information.
Re: (Score:2)
The design of a search engine doesn't tell you much about the design of a language model. It might say something about what is allowed legally, but that's different too because websites want (and give permission via robots.txt) search engines to index them and benefit symbiotically, but as AI training data they gain nothing.
Re: (Score:2)
This is one of many news stories about news sites who think traditional search engines are violating their copyrights. https://www.reuters.com/articl... [reuters.com] Apparently, they don't all feel the relationship is "symbiotic."
The point is, you suggested that AI should be "unable" to regurgitate content. If it indexes the content, it can regurgitate it. That point is not different from a traditional search engine.
OpenAI and others are really just fancy search engines that pretty-print the results. I think they just n
Re: (Score:2)
That isn't about websites that didn't want to be indexed by search engines -- basically a death sentence. Getting un-indexed would not make them happy at all. What they wanted is some of the search engine's money, or for the search engine's summary to be sufficiently uninformative that people follow the link to their website. Conversely very few sites would go into full-blown panic if they get removed from an AI training dataset.
And no, originally search engines were unable to regurgitate websites they inde
Re: (Score:2)
Memorization is not proof of overtraining, it's just proof of memorization. They were correlated in older ANNs but less so now.
Re: It's not supposed to refuse to regurgitate (Score:2)
If it were, standard web search engines wouldn't pass this test.
Ok, so why don't you create a "prompt" for a search engine that reproduces a long article verbatim?
Search engines typically become "unable" to reproduce a long article by simply limiting the output length. If chatGPT were doing that, nobody would be complaining here, or in the courts.
Re: (Score:3)
In this case, "famous" means the articles were already copied all over the Internet, which does affect the weights on that text phrasing.
Re: (Score:2)
What's your standard for famous? Things that are seen once on the internet, no. But copy the same thing around the internet enough, and it'll learn it - the more common it is, the better it'll learn it.
If you read a poem once, even while trying to memorize, you're not going to be able to cite it verbatim. But if you keep encountering the same poem and trying to memorize it, eventually you're going to learn it verbatim.
Yeah. No. (Score:2)
Re: (Score:2)
Indeed. Dishonest assholes, the lot of them. I hope they burn for what they clearly did.
ChatGPT regurgitated excerpts of articles. (Score:2)
If ChatGPT can be tricked into regurgitating original content then ChatGPT does indeed copy original content.
Cherry-picking really does not matter (Score:2)
If it _sometimes_ delivers articles without those having been part of the queries, the OpenAI is guilty as hell. At this time, they are just trying to confuse the issue, because they _know_ what they did was deeply wrong and they essentially stole most of their training data.
Re:I have a memory (Score:4, Informative)
If someone asks me about a NYT article and I recite it mostly word for word because I have a good memory, is that a copyright violation?
It would likely be considered a performance of the work, depending on just how close to accurate you were and if there is an audience, so I would say the short answer is yes. It would depend on the circumstances, so it's possible it could fall under fair use.
Re: (Score:3)
If someone asks me about a NYT article and I recite it mostly word for word because I have a good memory, is that a copyright violation?
It would likely be considered a performance of the work
This is unreasonable. A conversation is obviously not a public performance.
Re: (Score:2)
Memorizing and reciting verbatim should probably be considered a violation. Learning from it and incorporating knowledge into future works is not an infringement when it's not a computer doing it. Academic papers even include citations of where they copy learned ideas from. I think it's fair to say that OpenAI's argument is good even if their implementation doesn't live up to that standard.
Re: I have a memory (Score:2)
Learning from it and incorporating knowledge into future works is not an infringement when it's not a computer doing it.
It is also not a violation when a computer does it. Otherwise search engines would be in violation for indexing pages.
If the NYT (or anyone else) doesn't want their pages looked at, they shouldn't make them public. Put them behind a login, problem solved.
Re: I have a memory (Score:4, Informative)
Re: (Score:2)
The NYT articles are behind a login and free accounts only get a limited subset of articles. To view them all, you have to have a paid account.
So... problem not solved.
Re: (Score:3)
It is also not a violation when a computer does it. Otherwise search engines would be in violation for indexing pages.
That was actually considered and there have been a bunch of lawsuits about that. There is now effectively considered to be an implied license for search engines for limited use when you put things on the web, but that can be withdrawn using, for example, a robots.txt file which disallows automated downloading. At the beginning, there were even arguments about what happens when you copy from web to computer, from disk to memory and so on. Most of these arguments are solved via "fair use" which is a different
Re: (Score:2)
If someone asks me about a NYT article and I recite it mostly word for word because I have a good memory, is that a copyright violation?
It would likely be considered a performance of the work, depending on just how close to accurate you were and if there is an audience, so I would say the short answer is yes. It would depend on the circumstances, so it's possible it could fall under fair use.
if you were to re-write a NYT article fairly closely (not even word for word, but close enough that a moron in a hurry [wikipedia.org] cant tell the difference) and publish it under your own name that is 100% a violation..
Re: (Score:3)
Re: (Score:2)
Interestingly enough, even some of the giant evil megacorps are giving out their weights for free. While there's a massive open source AI community out there (visit Huggingface some time and browse around!).
AI cannot be monopolized. Giant megacorps may retain a lead, but open source will always catch up eventually.
Re: (Score:2)
NYT is hardly an admirable company, but it is doing everyone a favor by asserting and demanding the value of its data to OpenAI in court. If NYT succeeds, then maybe all of us will get some money for all the data we supply to the giant data trollers for them to ma
Re: (Score:2)
It's fundamentally incompatible with natural rights.
who cares about imaginary rights? other than that, i have to agree.
Re: (Score:2)
It's fundamentally incompatible with natural rights.
who cares about imaginary rights? other than that, i have to agree.
Lol "imaginary rights"? What are the real ones, then?
If there aren't any natural or God given rights, then what rights exactly are there, maybe "znrt [your username] rights"? Or maybe whatever "rights" the mob thinks we should have today?
Re: (Score:2)
it was a tongue tongue cheek, all rights are imaginary, a narrative we invented because it is useful to us.
fortunately we largely left the time behind where we believed in rights granted by cosmic deities, in favor of a more rational social contract. but it's still just narrative.
Re: (Score:3)
Lol "imaginary rights"? What are the real ones, then?
There are no real rights. They are all imaginary.
They are useful to imagine, and then demand, but that doesn't make them unalienable, clearly. History is absolutely filled with examples of people being denied their legal rights, let alone other imagined ones.
If a right is what no one can take away from you, there clearly are no rights. If a right is what no one can take away from you legally, then every single right is situational and conditional. Right to life? Death penalty. Right to freedom? Incarceratio
Re:I have a memory (Score:4, Informative)
If you have to deliberately trick the system into doing so, to the point of repeatedly feeding it parts of articles verbatim (which it learned because they were already copied widely around the internet) and trying to come up with clever ways to word your prompt to get regurgitation and by pass the anti-copyvio filters in the system, then you are misusing a tool to deliberately violate copyright, against the best efforts of the tool developer.
Re: (Score:2)
the problem with this argument is that there is no guarantee that if someone asks a question on something that the AI learned from the NYT, the AI won't start using paragraphs verbatim, or nearly verbatim
see, for example, what happens in MidJourney when you ask for Italian Videogame; you suddenly receive images of mario bros. Those images are clearly a copyright violation, but the person using the tool did not ask for the AI to violate copyright, it just happened and Midjourney cannot do much other than che
Re: Fuck NYT (Score:2)
Is this ChatGPT going on the defensive?
Re:Fuck NYT (Score:4, Informative)
The comment from Bahbus is in accordance with how the anti-AI lawsuits have in general gone thusfar: nowhere.
You have a right to process copyrighted data in an automated fashion to create new transformative products and services. Which is why e.g. ~95% of Google's business model isn't illegal. And even posting whole pages of books verbatim against authors' explicit requests has been ruled to still be transformative.