Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
United Kingdom AI

Creatives Demand AI Comes Clean On What It's Scraping 45

Over 400 prominent UK media and arts figures -- including Paul McCartney, Elton John, and Ian McKellen -- have urged the prime minister to support an amendment to the Data Bill that would require AI companies to disclose which copyrighted works they use for training. The Register reports: The UK government proposes to allow exceptions to copyright rules in the case of text and data mining needed for AI training, with an opt-out option for content producers. "Government amendments requiring an economic impact assessment and reports on the feasibility of an 'opt-out' copyright regime and transparency requirements do not meet the moment, but simply leave creators open to years of copyright theft," the letter says.

The group -- which also includes Kate Bush, Robbie Williams, Tom Stoppard, and Russell T Davies -- said the amendments tabled for the Lords debate would create a requirement for AI firms to tell copyright owners which individual works they have ingested. "Copyright law is not broken, but you can't enforce the law if you can't see the crime taking place. Transparency requirements would make the risk of infringement too great for AI firms to continue to break the law," the letter states.
Baroness Kidron, who proposed the amendment, said: "How AI is developed and who it benefits are two of the most important questions of our time. The UK creative industries reflect our national stories, drive tourism, create wealth for the nation, and provide 2.4 million jobs across our four nations. They must not be sacrificed to the interests of a handful of US tech companies." Baroness Kidron added: "The UK is in a unique position to take its place as a global player in the international AI supply chain, but to grasp that opportunity requires the transparency provided for in my amendments, which are essential to create a vibrant licensing market."

The letter was also signed by a number of media organizations, including the Financial Times, the Daily Mail, and the National Union of Journalists.

Creatives Demand AI Comes Clean On What It's Scraping

Comments Filter:
  • Okay. (Score:5, Informative)

    by Mr. Dollar Ton ( 5495648 ) on Tuesday May 13, 2025 @01:44AM (#65372415)

    Here it is: it SCRAPES EVERYTHING!11!!!

    • by evanh ( 627108 )

      What is needed is for those LLM company's to state it as such. As in, "Yes, we're knowingly stealing everything."

    • No way they'll admit it though, they'll plead the 5th or accidentally delete the model and it's training data whoops unrecoverable But seriously Image generation models are already being used instead of hiring a graphic artist, so demand is already falling for digital art
  • Lost count (Score:5, Interesting)

    by ve3oat ( 884827 ) on Tuesday May 13, 2025 @02:02AM (#65372431) Homepage
    I have lost count of the number of times my own website has been scraped, sometimes by the big names and sometimes by bots that I didn't know existed. It's not just the html content that they scrape but often all of my original graphics (charts and diagrams) too. Of course my site is covered by copyright (Creative Commons, Attribution, Non-commercial, and Share-Alike 4.0) but I doubt that ChatGPT will ever tell you that this got this or that piece of information from me or my website. Some say that "search is dying" and I see some evidence of this in the falling number of organic visitors to my site. Maybe I should just take down my site, save a pile of money in domain registration and hosting fees, and move on to some other, more satisfying activity.
    • Just make the site text only with simple html and with a lot of hidden metadata. Any images with alt text should have texts like 'tricky dick', 'shaved kitten' and other improper descriptions.

      Add some throttling too that gets worse the more requesys that are made. Http redirects can also be fun to toy with, just redirect to a random AI source making it eat itself.

      • by Rei ( 128717 )

        Doesn't work. Your mislabeled data will stand out like a sore thumb on loss graphs. They'll automated ditch the bad labels, and potentially automated-regenerate new labels.

    • Google been doing it since 97 - and now you got issues?
      • Yup. I've seen how many people on here cared about search engine news summaries, paywalled content and general attitudes towards anything copyrighted that isn't given away freely, the disdain for any kind of intellectual property rights.

        If this was about copying books, news, movies, software, etc for their own enjoyment, the attitude changes and the narrative turns into "I wasn't going to buy it anyway". But AI model training bad torrents gud.

  • Muccah and his dinoaurs friends have profited off society for decades. They can demand to know all they want, no company should be required to expose their trade secrets to them. It's an absurd idea that you'd have to inform some demented boomer which book you fed to a computer program on your machine. And an absurd precedent: next will be college book authors suing you for using their knowledge against their license. Or self-help authors suing you for giving unlicensed advice to a friend.

    Copyright is a lim

  • It's such an odd thing to be upset by, honestly. Like screaming into the void, "I want to be forgotten."

    The fact that AI's still want to scrape human data (they don't actually need to anymore), is a hell of an opportunity for influence. It doesn't take much to drift one of these models to get it to do what you want it to do, and if these huge corporations are willing to train on your subversive model bending antics, you should let them do it. We'll only get more interesting models out of it.

    I get it though.

  • When you start stopping real people from also using your material to train then I think they may have a point, as it is they are just scared they are being made redundant (justifiably so I guess) and wanting to discriminate. If you want your stuff out their to listen to and consume you are going to have to accept that also means AI will potentially consume.
  • ... then flounce off stage swishing their Ostrich boas and stamping their cuban heels. Send in the lawyers.

  • The British initiative has some good points. It's a bit strange, that many parts of society has accepted widespread copyright infringement and ignorance of /robots.txt when done by corporations to feed their generative (or plagiarising, depending on your point of view) models. While file sharing, even small-scale, was given harsher penalties, IP blocking and law changes to help mostly the large corporations.

    It would probably be much better the other way round - that some regulated personal non-profit sharin

      • by pereric ( 528017 )

        Well, maybe do some machine learning data injection (which is a quite available branch of attacks) in material scraped without permission to make their generative models output images of Tianmen square, Winnie the Pooh or maps with neither Taiwan nor Tibet :-)?

        Also, as they at least on paper claim to follow international treaties, it could be a more reasonable case for applying some trade policy.

  • The front page is almost entirely stories about AI. I get it. AI makes line go up. Stories about AI makes line go up.

    But, other things happen in IT and geek culture these days that don't involve line-goes-up at all.

  • by devslash0 ( 4203435 ) on Tuesday May 13, 2025 @05:46AM (#65372743)

    It's too late now. AI has already consumed and sucked out value out of any available source. Trying to enforce your rights now is a futile attempt. After all, how do you prove with certainty what the training set consisted of?

    AI companies should have been forced to seek legal written consent before scrapping any data.

    Instead, they have followed the good ol' maxims of "it's better to seek forgiveness than permission" and "damages cost less than consent".

    • by JoshZK ( 9527547 )
      They are just scraping like everything else has been doing for decades. It just turns out they are doing something meaningful with that data. Or as meaningful as the anger of their scraping would lead me to believe. Either way, draw me a cat arm wrestling Jesus. Haha awesome.
  • A bunch of billionaires who shouldn't have had copyright a day later than 14 years are whining that they can't have even more intellectual property stuffed into copyright.

    AI is not copying your songs. They're learning from them to make something different. Humans do this every time they create something new and are influenced by what came before.

    We need to reduce copyright back to 7 years where it's economically justified to encourage innovation. And allow low income creators the chance to renew for up to a

  • "The UK creative industries reflect our national stories"
    Can you still say that when My Lady Jane shows the king of England as black, gay, and disabled?

    I mean there's certainly a narrative there I say nothing of an agenda ... but in no way does it represent the British "national story".

  • I'll accept the coined word "talenteds". But you are not more "creative" just because you have a talent.

    And for the record, most professional musicians today buy riffs or entire backing tracks from others and also outsource the mixing and mastering to third parties, so they can honestly lay off people for outsourcing part of their work to AI.

  • El Presidente del mundo will go and fire 'em all.

  • Using makeup and autotune or shut the fuck up ;-D
  • There will no incentive for anyone to make anything--if it will only be stolen. AI is a IP theft machine.
  • ...to arrest the CEOs and charge them with grand theft and receiving stolen goods, and throw 'em in the clink for 10-20 years.

"I have just one word for you, my boy...plastics." - from "The Graduate"

Working...