Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
United Kingdom AI

Creatives Demand AI Comes Clean On What It's Scraping 60

Over 400 prominent UK media and arts figures -- including Paul McCartney, Elton John, and Ian McKellen -- have urged the prime minister to support an amendment to the Data Bill that would require AI companies to disclose which copyrighted works they use for training. The Register reports: The UK government proposes to allow exceptions to copyright rules in the case of text and data mining needed for AI training, with an opt-out option for content producers. "Government amendments requiring an economic impact assessment and reports on the feasibility of an 'opt-out' copyright regime and transparency requirements do not meet the moment, but simply leave creators open to years of copyright theft," the letter says.

The group -- which also includes Kate Bush, Robbie Williams, Tom Stoppard, and Russell T Davies -- said the amendments tabled for the Lords debate would create a requirement for AI firms to tell copyright owners which individual works they have ingested. "Copyright law is not broken, but you can't enforce the law if you can't see the crime taking place. Transparency requirements would make the risk of infringement too great for AI firms to continue to break the law," the letter states.
Baroness Kidron, who proposed the amendment, said: "How AI is developed and who it benefits are two of the most important questions of our time. The UK creative industries reflect our national stories, drive tourism, create wealth for the nation, and provide 2.4 million jobs across our four nations. They must not be sacrificed to the interests of a handful of US tech companies." Baroness Kidron added: "The UK is in a unique position to take its place as a global player in the international AI supply chain, but to grasp that opportunity requires the transparency provided for in my amendments, which are essential to create a vibrant licensing market."

The letter was also signed by a number of media organizations, including the Financial Times, the Daily Mail, and the National Union of Journalists.
This discussion has been archived. No new comments can be posted.

Creatives Demand AI Comes Clean On What It's Scraping

Comments Filter:
  • Okay. (Score:5, Informative)

    by Mr. Dollar Ton ( 5495648 ) on Tuesday May 13, 2025 @01:44AM (#65372415)

    Here it is: it SCRAPES EVERYTHING!11!!!

    • by evanh ( 627108 )

      What is needed is for those LLM company's to state it as such. As in, "Yes, we're knowingly stealing everything."

    • No way they'll admit it though, they'll plead the 5th or accidentally delete the model and it's training data whoops unrecoverable But seriously Image generation models are already being used instead of hiring a graphic artist, so demand is already falling for digital art
  • Lost count (Score:5, Interesting)

    by ve3oat ( 884827 ) on Tuesday May 13, 2025 @02:02AM (#65372431) Homepage
    I have lost count of the number of times my own website has been scraped, sometimes by the big names and sometimes by bots that I didn't know existed. It's not just the html content that they scrape but often all of my original graphics (charts and diagrams) too. Of course my site is covered by copyright (Creative Commons, Attribution, Non-commercial, and Share-Alike 4.0) but I doubt that ChatGPT will ever tell you that this got this or that piece of information from me or my website. Some say that "search is dying" and I see some evidence of this in the falling number of organic visitors to my site. Maybe I should just take down my site, save a pile of money in domain registration and hosting fees, and move on to some other, more satisfying activity.
    • Just make the site text only with simple html and with a lot of hidden metadata. Any images with alt text should have texts like 'tricky dick', 'shaved kitten' and other improper descriptions.

      Add some throttling too that gets worse the more requesys that are made. Http redirects can also be fun to toy with, just redirect to a random AI source making it eat itself.

      • by Rei ( 128717 )

        Doesn't work. Your mislabeled data will stand out like a sore thumb on loss graphs. They'll automated ditch the bad labels, and potentially automated-regenerate new labels.

      • by allo ( 1728082 )

        If fake content would be that easy, you would see a lot more spam both in search engines and LLM results. There are good pipelines for post-processing low-quality data.

    • Google been doing it since 97 - and now you got issues?
      • Yup. I've seen how many people on here cared about search engine news summaries, paywalled content and general attitudes towards anything copyrighted that isn't given away freely, the disdain for any kind of intellectual property rights.

        If this was about copying books, news, movies, software, etc for their own enjoyment, the attitude changes and the narrative turns into "I wasn't going to buy it anyway". But AI model training bad torrents gud.

  • Muccah and his dinoaurs friends have profited off society for decades. They can demand to know all they want, no company should be required to expose their trade secrets to them. It's an absurd idea that you'd have to inform some demented boomer which book you fed to a computer program on your machine. And an absurd precedent: next will be college book authors suing you for using their knowledge against their license. Or self-help authors suing you for giving unlicensed advice to a friend.

    Copyright is a lim

    • Re:Go away (Score:5, Interesting)

      by martin-boundary ( 547041 ) on Tuesday May 13, 2025 @02:33AM (#65372457)
      Copyright is no such thing. Maybe in America, maybe a hundred years ago. This story is about UK copyright. Completely different purpose, completely different laws, completely different copyright.

      As to the idea that it should be ridiculous to demand what companies are doing internally, see also: taxes, income declarations, VAT collection, etc.

      • Copyright is no such thing. Maybe in America, maybe a hundred years ago. This story is about UK copyright. Completely different purpose

        Completely different when both are under the Berne Convention? There are differences [copyrightservice.co.uk], but they are for the same primary purposes: rent seeking, ownership, and taxation.

    • You mean theft secrets.

      It's not just about musicians. It's also about everyday Joe and small companies who have their IP stolen without consent or compensation. It's predatory and parasitic.

      • You mean theft secrets.

        It's not just about musicians. It's also about everyday Joe and small companies who have their IP stolen without consent or compensation. It's predatory and parasitic.

        It's not theft. Nothing was taken. Everything is still with the original owner.

        At least that's what we're told when someone steals music, movies, or software.

      • by sosume ( 680416 )

        It's like charging a newspaper or an encyclopedia for a picture, that they made themselves, of a work of art. LLMs are a source of knowledge.

    • An AI is not a person. Its an absurd idea that you can just scan anything you want online, process it, and then redistribute the results [wikipedia.org]

      • by Rei ( 128717 )

        By "the results", you mean an entirely different, unprotected work?

        • Yes, just like fan edits of movies.

          • by Rei ( 128717 )

            No. AI tools are not collagers. That is not how they work.

            • Oh really? How do they work?

              • by Rei ( 128717 )

                Enjoy [transformer-circuits.pub]

                (TL/DR: They're *generalizers*. They learn to *generalize* problems.)

                • Similar to how the homeless guy I met at the library in Toronto generalized magazine advertisements using the library copy machine and a pair of scissors.

                  • by Rei ( 128717 )

                    You clearly read nothing.

                    • Na, there was just some nuance I didn't communicate clearly. I actually don't really care "how" they work, just that the end result is a hodge podge of token streams cribbed from the training data. My bad.

                    • by Rei ( 128717 )

                      Again, I'll repeat: if you believe that, you clearly read nothing.

                    • I read it and even looked at the pictures. Unfortunately, you can ask any LLM to tell you the first line of The Bible, or the 10th line of Count of Monte Cristo and both chatgpt and copilot (after prompting away the safety features) can reliably regurgitate the exact text. The way it stores and decodes the data doesn't change the fact that the data is in there and it is possible to get it out.

  • It's such an odd thing to be upset by, honestly. Like screaming into the void, "I want to be forgotten."

    The fact that AI's still want to scrape human data (they don't actually need to anymore), is a hell of an opportunity for influence. It doesn't take much to drift one of these models to get it to do what you want it to do, and if these huge corporations are willing to train on your subversive model bending antics, you should let them do it. We'll only get more interesting models out of it.

    I get it though.

  • When you start stopping real people from also using your material to train then I think they may have a point, as it is they are just scared they are being made redundant (justifiably so I guess) and wanting to discriminate. If you want your stuff out their to listen to and consume you are going to have to accept that also means AI will potentially consume.
  • ... then flounce off stage swishing their Ostrich boas and stamping their cuban heels. Send in the lawyers.

  • The British initiative has some good points. It's a bit strange, that many parts of society has accepted widespread copyright infringement and ignorance of /robots.txt when done by corporations to feed their generative (or plagiarising, depending on your point of view) models. While file sharing, even small-scale, was given harsher penalties, IP blocking and law changes to help mostly the large corporations.

    It would probably be much better the other way round - that some regulated personal non-profit sharin

      • by pereric ( 528017 )

        Well, maybe do some machine learning data injection (which is a quite available branch of attacks) in material scraped without permission to make their generative models output images of Tianmen square, Winnie the Pooh or maps with neither Taiwan nor Tibet :-)?

        Also, as they at least on paper claim to follow international treaties, it could be a more reasonable case for applying some trade policy.

  • The front page is almost entirely stories about AI. I get it. AI makes line go up. Stories about AI makes line go up.

    But, other things happen in IT and geek culture these days that don't involve line-goes-up at all.

  • by devslash0 ( 4203435 ) on Tuesday May 13, 2025 @05:46AM (#65372743)

    It's too late now. AI has already consumed and sucked out value out of any available source. Trying to enforce your rights now is a futile attempt. After all, how do you prove with certainty what the training set consisted of?

    AI companies should have been forced to seek legal written consent before scrapping any data.

    Instead, they have followed the good ol' maxims of "it's better to seek forgiveness than permission" and "damages cost less than consent".

    • by JoshZK ( 9527547 )
      They are just scraping like everything else has been doing for decades. It just turns out they are doing something meaningful with that data. Or as meaningful as the anger of their scraping would lead me to believe. Either way, draw me a cat arm wrestling Jesus. Haha awesome.
      • by allo ( 1728082 )

        They are doing something visible with it.

        Have a look into your webserver logs. What benefit do I have from the "SEO analytics" bot? They spam their link in the referer, but when I click on it, it asks me to buy a subscription, while they use the content to oracle what my domain name may be worth and what other sites may benefit from cooperating with me. They give back nothing and use the data for shady business (don't get me started on "person search engines") for many years before the first LLM was trained

  • A bunch of billionaires who shouldn't have had copyright a day later than 14 years are whining that they can't have even more intellectual property stuffed into copyright.

    AI is not copying your songs. They're learning from them to make something different. Humans do this every time they create something new and are influenced by what came before.

    We need to reduce copyright back to 7 years where it's economically justified to encourage innovation. And allow low income creators the chance to renew for up to a

  • "The UK creative industries reflect our national stories"
    Can you still say that when My Lady Jane shows the king of England as black, gay, and disabled?

    I mean there's certainly a narrative there I say nothing of an agenda ... but in no way does it represent the British "national story".

  • I'll accept the coined word "talenteds". But you are not more "creative" just because you have a talent.

    And for the record, most professional musicians today buy riffs or entire backing tracks from others and also outsource the mixing and mastering to third parties, so they can honestly lay off people for outsourcing part of their work to AI.

  • El Presidente del mundo will go and fire 'em all.

  • Using makeup and autotune or shut the fuck up ;-D
  • There will no incentive for anyone to make anything--if it will only be stolen. AI is a IP theft machine.
  • ...to arrest the CEOs and charge them with grand theft and receiving stolen goods, and throw 'em in the clink for 10-20 years.

  • "Create X in the style of Y" - if that results in anything remotely correct, they scraped Y.
  • We should be restricting the non-consensual act of data collection and amalgamation itself. To require allowance only through written consent, required to be updated annually or that consent is automatically withdrawn.
  • It's really easy to tell them what it is training on- Everything!
  • The head of the us copyright office has been fired, so who are these creatives complaining to?

  • Its just transforming the next stage of human consciousness https://www.youtube.com/watch?... [youtube.com]

E = MC ** 2 +- 3db

Working...