29 videos of mine were swiped from YouTube by large tech companies who used those videos to train their AI chatbots without my knowledge or consent. I am not only upset but also deeply frustrated. Let's start with a brief recap of what happened a couple of months ago.
Remember when OpenAI showed off their Sora video generation program? An executive was asked about the data used to train Sora, to which the response was "publicly available data and like recent data so videos on YouTube." The recent article by Proof, a nonprofit investigative journalism organization, showed how large corporations used thousands of YouTube videos to train AI.
Proof created a tool where creators could check if their videos were used in these AI models. To my shock, I found that 29 of my own videos were there and one collaborative video with Abigail from Philosophy Tube. These AI companies are scraping YouTube captions and subtitles as text input to train large language models, part of a dataset called "the pile."
The article, titled "Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI," detailed how these companies abused YouTube's terms of service. Subtitles from over 170,000 YouTube videos across 48,000 channels were siphoned off for AI training. The dataset seems to pull heavily from educational channels and even problematic sources like Infowars and Ben Shapiro. Some data contained racial and gender slurs.
Proof reached out to these companies for statements. Most were silent, but Anthropic confirmed the use, albeit calling it "a very small subset." The companies were essentially passing the blame onto the dataset's creators rather than taking full responsibility. Even YouTube, owned by Google, has not taken action, likely because Google itself reportedly used the same method to train its AI models.
The theft is affecting small creators severely. For instance, Jay from Fancy Geeks found five of his videos were stolen, all having user-uploaded captions. Smaller channels, which likely weren't profitable to begin with, are suffering more as their hard work goes unnoticed, yet exploited.
In my case, the stolen videos were ones where I wasn't as thoughtful or accurate. It’s frustrating because putting effort and money into creating and captioning these videos doesn’t feel worthwhile when they get stolen.
I am thinking about re-releasing remastered versions of my older videos, ones that were popular but not up to my current standards. With better research and conclusions, these revised videos could hopefully rectify any past mistakes.
I'm thankful to my audience for letting me rant. It's a challenging time, and realizing the extent to which our hard work gets exploited is immensely disheartening. Until next time, stay wonderful, nerds.
29 of my videos were swiped, and one collaborative video with Abigail from Philosophy Tube.
Proof created a tool to check if videos were used in AI training data, and I discovered my videos were included.
Absolutely. They used my content without my knowledge, consent, or credit.
A dataset used by AI companies to train large language models, which includes YouTube captions and subtitles.
YouTube, owned by Google, reportedly hasn’t acted because Google itself allegedly used similar methods.
Small creators suffer more since their videos, often not profitable, get exploited without any benefit to them.
I plan to re-release remastered versions of my older videos with better research and conclusions.
In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.
TopView.ai provides two powerful tools to help you make ads video in one click.
Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.
Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.