AI is stealing your content
44sReveals the shocking truth that AI models are trained on stolen content, including the creator's own videos, sparking outrage and curiosity.
▶ Play ClipThe video discusses how AI companies are non-consensually using copyrighted content, such as YouTube videos and music, to train their AI models, labeling it as theft. The creator highlights a report from The Atlantic that provides an open tool to search what data these models are trained on. The creator personally found 67 of their own videos in nine datasets used by AI corporations.
AI is already deeply affecting people's lives, often without their knowledge or consent.
The Atlantic's report reveals that AI models are trained on stolen content without permission, using the term 'stealing' as the scientific term.
The Atlantic provides an open tool where users can directly search what music and videos AI is being trained on.
Huge artists are calling out AI companies like Sunno for ripping their music, including unreleased material, to train models.
The creator searched their name in the dataset and found 67 of their videos in nine datasets used by AI corporations.
Examples include a video about a horrible chair, Doom mod commentary, washing machine repair championship, and a foot fetish wedding dildo circus.
YT Temporal 180M is a collection of 5.4 million YouTube videos compiled by University of Washington and Allen Institute for AI to train a multimodal model called Merllo, released in 2021.
The creator argues that stealing content is theft regardless of the entity's wealth, but billion-dollar corporations get to skirt around it.
Nintendo, a highly litigious company, had 4,926 trailers used to train AI models without permission, including by Runway Gen 3.
Runway Gen 3 took 4,772 Nintendo trailers, likely without permission, to train their AI model.
Internal documents obtained by 404 media list 3,970 channels identified as high-quality video sources for training, including Nintendo video game trailers.
The search tool includes all YouTube videos from named channels published before May 17, 2024, one month before Runway introduced their model.
The creator wagers that since videos are in the dataset, they were likely used to train the models, though not 100% confirmed.
The video underscores the massive scale of unapproved data scraping by AI companies, framing it as theft that disproportionately affects smaller creators while large corporations evade accountability, and emphasizes the need for greater transparency through tools like The Atlantic's dataset search.
"The title 'What a Surprise' is mildly clickbaity by being vague, but the video delivers on the surprise of widespread content theft by AI companies, aligning well with the actual content."
What does the Atlantic's AI watchdog tool allow users to do?
Search what music and videos AI is being trained on.
0:53
How many of the creator's videos were found in AI datasets?
67 videos in nine datasets.
1:59
What is YT Temporal 180M?
A collection of 5.4 million YouTube videos compiled by University of Washington and Allen Institute for AI to train a multimodal model called Merllo.
3:23
Which company had 4,926 trailers used in AI training without permission?
Nintendo.
6:33
How many Nintendo trailers did Runway Gen 3 likely use?
4,772 trailers.
7:57
What date cutoff does the search tool use for YouTube videos?
May 17, 2024.
9:06
AI theft confirmed by The Atlantic
Provides concrete evidence that AI models are trained on stolen content without permission.
0:20Creator's own content stolen
Personal anecdote illustrating the widespread nature of unauthorized data scraping.
1:56Legal double standard for wealthy corporations
Highlights the inequity where theft is redefined when committed by billion-dollar companies.
4:50Nintendo's trailers used without permission
Shows that even the most litigious companies are not exempt from AI data scraping.
6:18Runway's internal documents reveal source channels
Demonstrates the systematic compilation of high-quality video sources for AI training.
8:09[00:02] That's the sound of AI coming in and
[00:04] non-consensually motorboating us all.
[00:07] Whether you want to admit it or not,
[00:09] whether you even recognize it or not,
[00:11] your life is already being affected
[00:14] deeply by AI. And I know I've yapped
[00:17] about it a lot, but I did just see The
[00:20] Atlantic's most recent report where they
[00:22] actually pulled down the pants on a lot
[00:24] of what AI models are being trained on
[00:26] here, aka what they're stealing to train
[00:29] their AI models on because they don't
[00:31] have permission for it. The word, the
[00:33] nomenclature is stealing. That's the
[00:35] scientific term. But because these
[00:37] multi-billion corporations are in the
[00:40] the driver's seat now at the helm, they
[00:42] can go ahead and rebrand that whole
[00:44] stealing thing into something entirely
[00:46] different. They're trying to argue that
[00:48] it's all by the book in fact. So, uh,
[00:51] The Atlantic now has this AI watchdog,
[00:53] which is just a open tool. Anyone can go
[00:55] in and they can just directly search
[00:57] what music AI is being trained on as
[01:00] well as like videos AI is being trained
[01:02] on. Now, a ton of huge artists have been
[01:04] made aware of this and have called out
[01:06] AI companies for just directly ripping
[01:09] their music. Some of it unreleased, by
[01:11] the way, in order to train their AI
[01:13] models on. Notably, Sunno has been
[01:16] getting called out with spitballs fired
[01:18] at it and Rotten Tomatoes thrown at it
[01:20] because they're pretty shameless and
[01:22] unapologetic about it. Anyone remember
[01:25] back in the day the whole you wouldn't
[01:26] download a car, one of the most iconic
[01:28] [ __ ] antipiriracy advertisements
[01:31] ever? arguably just one of the most
[01:33] well-known uh campaigns ever as well.
[01:36] Well, now the same people that used to
[01:39] run that [ __ ] are the ones that are
[01:41] literally just downloading everything,
[01:43] stealing everything they can. It It's
[01:46] It's pretty incredible the uh 180
[01:48] they've done there on that whole
[01:50] messaging. And I know what you're
[01:52] thinking. Yes, they have stolen my
[01:56] videos to train their AI models on. I
[01:59] searched my name in their data set. 67
[02:02] of my videos have made it into nine data
[02:05] sets used by AI corporations to train
[02:08] their models.
[02:10] Heaven help us. Lord have mercy on the
[02:14] absent souls of these AI models that are
[02:17] being [ __ ] clockworked orangeed
[02:19] having the eyelids pulled wide open to
[02:22] be trained off my videos here. Those
[02:25] have to be the dumbest AI models you can
[02:27] find. Just look at this poor bastard
[02:29] here from YT Temporal 180M who's got 221
[02:33] of my videos shoved down its throat
[02:36] being trained off of things like
[02:37] horrible chair where I'm just making fun
[02:40] of a dog [ __ ] chair. Some old game plan
[02:42] commentary like Doom four feathers where
[02:44] I'm playing a Doom mod where I'm a
[02:46] chicken shooting at warthogs from Halo.
[02:50] It's [ __ ] uh washing machine repair
[02:53] championship where I'm just commentating
[02:56] a washing machine repair competition
[02:59] between some of the highest quality
[03:00] athletes you can find in the washing
[03:02] machine circuit. Not to bismerch the
[03:04] good name there. A foot fetish wedding
[03:06] dildo circus where you know I did a lot
[03:08] of cool trick shots with dildos. Like
[03:11] this this [ __ ] AI model. It must be
[03:14] just sitting there drooling. Actually,
[03:16] let me learn a little bit about my son
[03:18] here, seeing as I taught him everything
[03:20] he knows. YT Temporal 180M. It's a
[03:23] collection of 5.4 million YouTube videos
[03:26] compiled by a team of researchers at the
[03:29] University of Washington and the Allen
[03:31] Institute for AI to train a multi a
[03:33] multimodal model called Merllo.
[03:37] It was released in 2021.
[03:40] Bro, is Merllo an idiot? Be honest with
[03:42] me. Is this the dumbest AI you can find?
[03:44] If it's being trained even partly on
[03:46] some of my videos, there's a chance you
[03:47] ask it who wrote the Declaration of
[03:49] Independence and it says dildo titty
[03:51] fart or something. Very fascinating.
[03:54] Very interesting. So, YT Temporal got a
[03:58] huge dollop of some of my incredible
[04:02] work like hintai survive. What the hell
[04:05] is this? I don't even remember this. As
[04:07] the 21st century continues to evolve,
[04:10] human sexual fetishes are evolving right
[04:12] there alongside it. Okay, you know what?
[04:14] That one might actually be somewhat
[04:15] educational. That that actually might
[04:18] help them out a little bit because it
[04:20] it's not wrong. Now, obviously, my
[04:22] videos weren't handpicked for these AI
[04:24] models or anything. They made it into
[04:26] those giant compilations a lot of these
[04:29] groups put together solely to train
[04:32] models off of to give like a huge sample
[04:35] size. I I understand that, but it
[04:38] doesn't make it any less garbage. It's
[04:40] so [ __ ] ridiculous because it is just
[04:43] stealing it. Same thing when they do it
[04:45] with music. It is just stealing all of
[04:47] that to train their models off of. And
[04:50] it's been a huge contentious topic for a
[04:53] while now when it comes to AI. And no
[04:55] amount of like trying to put makeup on
[04:58] it changes the truth that they are just
[05:01] stealing to train their models. Now, if
[05:04] you as just a normal person try and
[05:06] follow their footsteps and do the same
[05:08] thing of just taking a ton of artists
[05:11] music and videos in order to make your
[05:14] own product off of those works, you're
[05:17] going to get arrested and charged with a
[05:19] little something called theft. But that
[05:22] word doesn't exist once you reach a
[05:24] certain level of wealth. These
[05:26] billion-dollar corporations get to kind
[05:29] of skirt around that a little bit.
[05:31] They're able to tiptoe around that. and
[05:33] we don't need to worry about a little
[05:34] pesky thing called theft for them. It's
[05:36] very different rules they play by there.
[05:39] And I know there's not a soul on this
[05:41] planet surprised by this, but it's still
[05:44] something I think worth yapping about,
[05:46] especially now that it's so easily
[05:47] accessible to see how many things just
[05:50] get up by these AI companies to train
[05:55] their models off of. all of these huge
[05:57] compilations of work and IP that they
[06:00] steal to train their models on in order
[06:02] to sell it to the people that are now
[06:04] hooked on AI as everyone is in this
[06:06] giant [ __ ] gold rush, this whirlwind
[06:09] of the in the industry. It's just so
[06:12] bizarre how things have just accelerated
[06:16] to this point. Now, one thing I got very
[06:18] curious about is trying to think of like
[06:20] the most latigious company I could think
[06:22] of to see if their work had also just
[06:24] been stolen by a ton of these AI models.
[06:27] So, obviously, the first thing that pops
[06:30] in my noodle is Nintendo. And yeah,
[06:33] Nintendo is not exempt from this. 4,926.
[06:37] That is a pretty big chunk of their
[06:40] trailers being used to train AI models
[06:43] that I'm sure Nintendo didn't sign off
[06:45] on because Nintendo wouldn't just be
[06:47] giving away this for free. 1,000%.
[06:51] You would need to pay oodles of clams to
[06:55] have access to their material in order
[06:57] to train your own stuff off of to sell
[06:59] that product. They are extremely strict
[07:03] when it comes to their copyright. They
[07:05] rule that [ __ ] with an iron fist. They
[07:07] are judge dread when it comes to their
[07:10] copyright. And yet here we have Runaway
[07:12] Gen 3 that just shamelessly takes 4,772
[07:17] of their trailers. Again, Nintendo is
[07:19] extremely strict with their trailers.
[07:21] You can use their trailers in like
[07:23] YouTube videos if you follow a very very
[07:26] particular set of guidelines around it
[07:28] that is extremely transformative.
[07:31] Like there are in intense rules
[07:33] Nintendo. so brutal with this. I
[07:35] remember there was a couple YouTube
[07:37] channels that got taken down because
[07:38] they used Nintendo music. Like they they
[07:41] took Nintendo music from games they
[07:43] owned, put it in their videos, and they
[07:44] lost their whole channels for it. There
[07:46] was also that time where some streamers
[07:48] got banned because they watched Nintendo
[07:50] trailers during the direct on stream.
[07:52] Like the point is they take that very
[07:54] seriously. And now Runaway Gen 3 just
[07:57] comes in here from the top turnbuckle.
[07:58] Takes 4,772
[08:00] of their trailers, most likely
[08:03] completely for free without Nintendo's
[08:05] permission, didn't pay a dime for it, to
[08:07] train their model on. So, uh, let's see
[08:09] what this is. Runaway AI collected
[08:11] YouTube videos to train a
[08:12] videogenerating AI model released as Gen
[08:14] 3 in 2024. An internal company document
[08:17] that was obtained by 404 media lists
[08:19] 3,970.
[08:21] What? Wait, that can't be right. Oh, oh,
[08:23] channels. I thought I was talking about
[08:24] videos in general. I was like, "No, they
[08:26] have more than that from just Nintendo."
[08:27] That Runaway identified as sources of
[08:29] highquality video for training.
[08:32] Nintendo video game trailers. Okay. The
[08:35] spreadsheet contains comments describing
[08:37] what is desirable about some of the
[08:38] channels. For example, beautiful
[08:40] cinematic landscapes, high quality
[08:42] scenes from movies, only four videos,
[08:44] but they are really well done. Super
[08:45] high quality sci-fi short films, and the
[08:48] holy grail of car cinematics so far.
[08:52] That must be talking about like Mario
[08:53] Kart or something. It's not clear which
[08:56] if any videos Runway actually used for
[08:58] training its AI system. Our search our
[09:00] search tool includes all YouTube videos
[09:02] from the named channels that were
[09:03] published before May 17th, 2024, which
[09:06] is 1 month before Runway introduced
[09:08] their model. So, this is something that
[09:11] The Atlantic also made note of. It's not
[09:14] 100% confirmed that they used all of
[09:17] these videos, but the fact that they
[09:18] were compiled by these companies. I
[09:20] think you can make a pretty educated
[09:22] guess that it was likely used to train
[09:24] their models. And also another thing
[09:26] they mention is that there are likely a
[09:29] lot of other ones that even though
[09:30] they're not here, doesn't mean they
[09:31] weren't used to train their model. It's
[09:33] a very tricky and messy, sloppy thing to
[09:36] nail down exactly what and what isn't
[09:38] being used to train models. But I would
[09:41] wager a guess that since it's here in
[09:44] the data set, they probably used it.
[09:47] Much like all of these also most likely
[09:49] used it, like this company, which also
[09:51] used a lot of my YouTube videos, which
[09:54] that actually kind of gave me a giggle
[09:56] when I saw that Nintendo of America was
[09:58] part of the same data set as this one
[10:00] where my videos are in, such as
[10:03] Seaweed's [ __ ] cool or whatever that
[10:06] one was. I already forgot. So Nintendo
[10:09] and I, we're basically in the in the
[10:12] same ballpark now when it comes to
[10:14] quality. That's pretty cool. I bet
[10:16] Nintendo's thrilled about that. Uh but
[10:19] anyway, point is thanks to the Atlantic
[10:22] data sets here that you can freely
[10:23] explore. You can see so so so many
[10:28] things have just been taken by these
[10:29] companies and put in these data sets
[10:31] that are presumably being used to train
[10:33] their AI models on. It's just it's
[10:36] pretty egregious. I I wanted to yap
[10:38] about it a little bit. That's it. See
[10:40] you.
⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.