

Creatives Demand AI Comes Clean On What It's Scraping 45
Over 400 prominent UK media and arts figures -- including Paul McCartney, Elton John, and Ian McKellen -- have urged the prime minister to support an amendment to the Data Bill that would require AI companies to disclose which copyrighted works they use for training. The Register reports: The UK government proposes to allow exceptions to copyright rules in the case of text and data mining needed for AI training, with an opt-out option for content producers. "Government amendments requiring an economic impact assessment and reports on the feasibility of an 'opt-out' copyright regime and transparency requirements do not meet the moment, but simply leave creators open to years of copyright theft," the letter says.
The group -- which also includes Kate Bush, Robbie Williams, Tom Stoppard, and Russell T Davies -- said the amendments tabled for the Lords debate would create a requirement for AI firms to tell copyright owners which individual works they have ingested. "Copyright law is not broken, but you can't enforce the law if you can't see the crime taking place. Transparency requirements would make the risk of infringement too great for AI firms to continue to break the law," the letter states. Baroness Kidron, who proposed the amendment, said: "How AI is developed and who it benefits are two of the most important questions of our time. The UK creative industries reflect our national stories, drive tourism, create wealth for the nation, and provide 2.4 million jobs across our four nations. They must not be sacrificed to the interests of a handful of US tech companies." Baroness Kidron added: "The UK is in a unique position to take its place as a global player in the international AI supply chain, but to grasp that opportunity requires the transparency provided for in my amendments, which are essential to create a vibrant licensing market."
The letter was also signed by a number of media organizations, including the Financial Times, the Daily Mail, and the National Union of Journalists.
The group -- which also includes Kate Bush, Robbie Williams, Tom Stoppard, and Russell T Davies -- said the amendments tabled for the Lords debate would create a requirement for AI firms to tell copyright owners which individual works they have ingested. "Copyright law is not broken, but you can't enforce the law if you can't see the crime taking place. Transparency requirements would make the risk of infringement too great for AI firms to continue to break the law," the letter states. Baroness Kidron, who proposed the amendment, said: "How AI is developed and who it benefits are two of the most important questions of our time. The UK creative industries reflect our national stories, drive tourism, create wealth for the nation, and provide 2.4 million jobs across our four nations. They must not be sacrificed to the interests of a handful of US tech companies." Baroness Kidron added: "The UK is in a unique position to take its place as a global player in the international AI supply chain, but to grasp that opportunity requires the transparency provided for in my amendments, which are essential to create a vibrant licensing market."
The letter was also signed by a number of media organizations, including the Financial Times, the Daily Mail, and the National Union of Journalists.
Okay. (Score:5, Informative)
Here it is: it SCRAPES EVERYTHING!11!!!
Re: (Score:3)
What is needed is for those LLM company's to state it as such. As in, "Yes, we're knowingly stealing everything."
Re: (Score:3)
When they think they've offset the risks, they will even boast about it - "we have stolen everything the humanity has done and more, buy our electric brain".
Re: (Score:2)
That won't happen. The product is utter shite.
Re: (Score:2)
While that is true, the fact that a product is utter shite hasn't stopped anyone in marketing so far.
Re: (Score:2)
Lost count (Score:5, Interesting)
Re: Lost count (Score:2)
Just make the site text only with simple html and with a lot of hidden metadata. Any images with alt text should have texts like 'tricky dick', 'shaved kitten' and other improper descriptions.
Add some throttling too that gets worse the more requesys that are made. Http redirects can also be fun to toy with, just redirect to a random AI source making it eat itself.
Re: (Score:2)
Doesn't work. Your mislabeled data will stand out like a sore thumb on loss graphs. They'll automated ditch the bad labels, and potentially automated-regenerate new labels.
Re: (Score:1)
Re: Lost count (Score:2)
Yup. I've seen how many people on here cared about search engine news summaries, paywalled content and general attitudes towards anything copyrighted that isn't given away freely, the disdain for any kind of intellectual property rights.
If this was about copying books, news, movies, software, etc for their own enjoyment, the attitude changes and the narrative turns into "I wasn't going to buy it anyway". But AI model training bad torrents gud.
Go away (Score:1)
Muccah and his dinoaurs friends have profited off society for decades. They can demand to know all they want, no company should be required to expose their trade secrets to them. It's an absurd idea that you'd have to inform some demented boomer which book you fed to a computer program on your machine. And an absurd precedent: next will be college book authors suing you for using their knowledge against their license. Or self-help authors suing you for giving unlicensed advice to a friend.
Copyright is a lim
Re:Go away (Score:5, Interesting)
As to the idea that it should be ridiculous to demand what companies are doing internally, see also: taxes, income declarations, VAT collection, etc.
Re: (Score:2)
Copyright is no such thing. Maybe in America, maybe a hundred years ago. This story is about UK copyright. Completely different purpose
Completely different when both are under the Berne Convention? There are differences [copyrightservice.co.uk], but they are for the same primary purposes: rent seeking, ownership, and taxation.
Re: Go away (Score:2)
You mean theft secrets.
It's not just about musicians. It's also about everyday Joe and small companies who have their IP stolen without consent or compensation. It's predatory and parasitic.
Re: (Score:3)
You mean theft secrets.
It's not just about musicians. It's also about everyday Joe and small companies who have their IP stolen without consent or compensation. It's predatory and parasitic.
It's not theft. Nothing was taken. Everything is still with the original owner.
At least that's what we're told when someone steals music, movies, or software.
Re: (Score:2)
It's like charging a newspaper or an encyclopedia for a picture, that they made themselves, of a work of art. LLMs are a source of knowledge.
Re: (Score:2)
An AI is not a person. Its an absurd idea that you can just scan anything you want online, process it, and then redistribute the results [wikipedia.org]
Re: (Score:3)
By "the results", you mean an entirely different, unprotected work?
Re: (Score:2)
Yes, just like fan edits of movies.
Re: (Score:2)
No. AI tools are not collagers. That is not how they work.
Re: (Score:2)
Oh really? How do they work?
Re: (Score:2)
Enjoy [transformer-circuits.pub]
(TL/DR: They're *generalizers*. They learn to *generalize* problems.)
I have thoughts (Score:1)
It's such an odd thing to be upset by, honestly. Like screaming into the void, "I want to be forgotten."
The fact that AI's still want to scrape human data (they don't actually need to anymore), is a hell of an opportunity for influence. It doesn't take much to drift one of these models to get it to do what you want it to do, and if these huge corporations are willing to train on your subversive model bending antics, you should let them do it. We'll only get more interesting models out of it.
I get it though.
give it a rest (Score:1)
Creatives demand ... (Score:1)
... then flounce off stage swishing their Ostrich boas and stamping their cuban heels. Send in the lawyers.
Re: Creatives demand ... (Score:2)
Copyright infringement OK for corporations? (Score:2)
The British initiative has some good points. It's a bit strange, that many parts of society has accepted widespread copyright infringement and ignorance of /robots.txt when done by corporations to feed their generative (or plagiarising, depending on your point of view) models. While file sharing, even small-scale, was given harsher penalties, IP blocking and law changes to help mostly the large corporations.
It would probably be much better the other way round - that some regulated personal non-profit sharin
Re: Copyright infringement OK for corporations? (Score:2)
Re: (Score:2)
Well, maybe do some machine learning data injection (which is a quite available branch of attacks) in material scraped without permission to make their generative models output images of Tianmen square, Winnie the Pooh or maps with neither Taiwan nor Tibet :-)?
Also, as they at least on paper claim to follow international treaties, it could be a more reasonable case for applying some trade policy.
Okay, Slashdot, it's time for an intervention. (Score:2)
The front page is almost entirely stories about AI. I get it. AI makes line go up. Stories about AI makes line go up.
But, other things happen in IT and geek culture these days that don't involve line-goes-up at all.
Prior written consent. (Score:5, Insightful)
It's too late now. AI has already consumed and sucked out value out of any available source. Trying to enforce your rights now is a futile attempt. After all, how do you prove with certainty what the training set consisted of?
AI companies should have been forced to seek legal written consent before scrapping any data.
Instead, they have followed the good ol' maxims of "it's better to seek forgiveness than permission" and "damages cost less than consent".
Re: (Score:1)
They're billionaires (Score:2)
A bunch of billionaires who shouldn't have had copyright a day later than 14 years are whining that they can't have even more intellectual property stuffed into copyright.
AI is not copying your songs. They're learning from them to make something different. Humans do this every time they create something new and are influenced by what came before.
We need to reduce copyright back to 7 years where it's economically justified to encourage innovation. And allow low income creators the chance to renew for up to a
Re:We do the same (Score:5, Insightful)
We humans do the same as the AI, we read, think and we learn.
Digital computers don't read, think, or learn. They are data processing engines, and are absolutely nothing like organic organisms. Software like that produced by OpenAI are data storage, processing, and retrieval engines. Nothing more. They are copyright infringement engines like the world has never seen before.
Re: (Score:2)
Nope
Nothing is copied
It's just like a student reading a book to increase their knowledge
The weights in an LLM are just numbers that represent statistical connections, not the training data set
Learn how it works before making accusations
Re: (Score:2)
Yes, they do "learn to think". [transformer-circuits.pub] (albeit in rather alien manners sometimes)
ehh? (Score:1)
"The UK creative industries reflect our national stories"
Can you still say that when My Lady Jane shows the king of England as black, gay, and disabled?
I mean there's certainly a narrative there I say nothing of an agenda ... but in no way does it represent the British "national story".
"Creatives" *eyeroll* (Score:1)
I'll accept the coined word "talenteds". But you are not more "creative" just because you have a talent.
And for the record, most professional musicians today buy riffs or entire backing tracks from others and also outsource the mixing and mastering to third parties, so they can honestly lay off people for outsourcing part of their work to AI.
Better Be Careful– (Score:2)
El Presidente del mundo will go and fire 'em all.
Stop... (Score:1)
The scraping must stop! (Score:2)
Re:The scraping must stop! FTFY (Score:2)
"There will no incentive for anyone to make anything-" online
welcome to the analog
Way past time... (Score:2)
...to arrest the CEOs and charge them with grand theft and receiving stolen goods, and throw 'em in the clink for 10-20 years.