Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
AI Youtube

Is OpenAI's Video-Generating Tool 'Sora' Scraping Unauthorized YouTube Clips? (msn.com) 17

"OpenAI's video generation tool, Sora, can create high-definition clips of just about anything you could ask for..." reports the Washington Post.

"But OpenAI has not specified which videos it grabbed to make Sora, saying only that it combined 'publicly available and licensed data'..." With ChatGPT, OpenAI helped popularize the now-standard industry practice of building more capable AI tools by scraping vast quantities of text from the web without consent. With Sora, launched in December, OpenAI staff said they built a pioneering video generator by taking a similar approach. They developed ways to feed the system more online video — in more varied formats — including vertical videos and longer, higher-resolution clips... To explore what content OpenAI may have used, The Washington Post used Sora to create hundreds of videos that show it can closely mimic movies, TV shows and other content...

In dozens of tests, The Post found that Sora can create clips that closely resemble Netflix shows such as "Wednesday"; popular video games like "Minecraft"; and beloved cartoon characters, as well as the animated logos for Warner Bros., DreamWorks and other Hollywood studios, movies and TV shows. The publicly available version of Sora can generate only 20-second clips, without audio. In most cases, the look-alike scenes were made by typing basic requests like "universal studios intro." The results also showed that Sora can create AI videos with the logos or watermarks that broadcasters and tech companies use to brand their video content, including those for the National Basketball Association, Chinese-owned social app TikTok and Amazon-owned streaming platform Twitch...

Sora's ability to re-create specific imagery and brands suggests a version of the originals appeared in the tool's training data, AI researchers said. "The model is mimicking the training data. There's no magic," said Joanna Materzynska, a PhD researcher at Massachusetts Institute of Technology who has studied datasets used in AI. An AI tool's ability to reproduce proprietary content doesn't necessarily indicate that the original material was copied or obtained from its creators or owners. Content of all kinds is uploaded to video and social platforms, often without the consent of the copyright holder... Materzynska co-authored a study last year that found more than 70 percent of public video datasets commonly used in AI research contained content scraped from YouTube.

Netflix and Twitch said they did not have a content partnership for training OpenAI, according to the article (which adds that OpenAI "has yet to face a copyright suit over the data used for Sora.")

Two key quotes from the article:
  • "Unauthorized scraping of YouTube content continues to be a violation of our Terms of Service." — YouTube spokesperson Jack Malon
  • "We train on publicly available data consistent with fair use and use industry-leading safeguards to avoid replicating the material they learn from." — OpenAI spokesperson Kayla Wood

Is OpenAI's Video-Generating Tool 'Sora' Scraping Unauthorized YouTube Clips?

Comments Filter:
  • by topham ( 32406 ) on Saturday September 20, 2025 @10:48AM (#65672690) Homepage

    Terms of service aren't much of a concern here. Honestly, completely irrelevant unless OpenAI opens up 30,000 streams at once.

    Terms of service have little value in court when it comes to scraping of content that doesn't cause issues with the service itself. Contract violation with near zero repercussions.

    • by allo ( 1728082 )

      Most ToS are rather lengthy explanations that the service may close your account if they dislike what you do and mentioning a lot of things so you know that you shouldn't to them or they can at least argue in court that you should have known, if you try to sue them for closing your account. Using ToS to sue for damages is a lot harder, as you would need to clarify if the ToS had effect, while using them defensive means that the user with the banned account has to show they don't.

  • Former CTO (Score:4, Interesting)

    by EltonJuan ( 10503148 ) on Saturday September 20, 2025 @10:49AM (#65672692)
    A year ago, I recall the Wall Street Journal interviewing OpenAI's former CTO, Mira Murati. When she was asked about whether they scrape Youtube videos, she got suspiciously hesitant and responded like she was in a deposition coached by a lawyer saying she wasn't sure.
  • by forrie ( 695122 ) on Saturday September 20, 2025 @11:16AM (#65672730)

    Sora, which I refer to as "Sore-a" as it's not always great at doing *what you ask for*, has certainly used a ton of data that wasn't "authorized" -- but I have taken note that there are specific filters it refuses to act on, such as popular cartoon or comic characters, etc. This likely means they have a big "DMCA list" somewhere.

    I once asked for Miss Piggy to be dancing in a background, and it gave me something that looked like "Elf on a Shelf" instead LOL.

    But, AI *has* to use reference material. That's how it learns, works and extrapolates. It's how the human brain works, too. If something is that publicly available, then what is "unauthorized use"? This is about content quality and control of copyright, for one. But if paint Miss Piggy and it becomes a popular artwork piece, would I then be sued? Did I use unauthorized reference? Where would that even go?

    The courts are going to have to decide this one and I think it will take a while. There are fair arguments from both sides.

    I'm concerned that a balance won't ever be achieved, that there is the potential for censorship being the norm, with multiple DMCA takedowns and the fear of being sued will impact the level and quality of service the public has. That will spawn a number of free tools that accomplish the same (I think that's already happening). I'm sure this will get very interesting.

    • Reform copyright, allow derivative works, abolish moral rights. What's the worst that could happen? Solves the problem of AI being "inspired" by existing works. Well, perhaps someone will write a crappy HP-inspired story about Tanya Grotter, a machine-gun wielding lady wizard who goes after bad Chechens (that is a real book, BTW). So what? The goal of copyright is cultural abundance, and that will (eventually) include AI generated works.

      Look at Nosferatu, considered to be one of the great vampire mo
      • by forrie ( 695122 )

        Another aspect to this, is AI is causing humanity and society at-large to begin redefining itself. We have been spoon-fed a lot of "fear" through media, movies about AI; and we know the military will weaponize anything that moves, but for the rest of us -- we are likely witness to a fundamental change. We have AI and blockchain, social media... all of which is already likely being impacted or effected with/by AI. It's a rabbit hole to traverse that is OT. But something to consider, in context.

  • Any bad habit is repeated until that smack hits your bum.

    • by gweihir ( 88907 )

      That seems to characterize the whole "gen AI" industry nicely. They should rightfully all be behind bars with their personal assets seized. But the law is far too slow for the Internet age.

  • It's an intelligence.

    And like all those, it obviously WATCHES YouTube. :-)

    It has just a slightly better recollection, like a savant.

    • by gweihir ( 88907 )

      Only that it is NOT an "intelligence" and that the training data (!) gets fed to it by human decision.

We can defeat gravity. The problem is the paperwork involved.

Working...