Around two years ago, the world was inundated with news about how generative AI or large language models would revolutionize the world. At the time it was easy to get caught up in the hype, but in the intervening months these tools have done little in the way of productive work outside of a few edge cases, and mostly serve to burn tons of cash while turning the Internet into even more of a desolate wasteland than it was before. They do this largely by regurgitating human creations like text, audio, and video into inferior simulacrums and, if you still want to exist on the Internet, there’s basically nothing you can do to prevent this sort of plagiarism. Except feed the AI models garbage data like this YouTuber has started doing.
At least as far as YouTube is concerned, the worst offenders of AI plagiarism work by downloading the video’s subtitles, passing them through some sort of AI model, and then generating another YouTube video based off of the original creator’s work. Most subtitle files are the fairly straightfoward .srt
filetype which only allows for timing and text information. But a more obscure subtitle filetype known as Advanced SubStation Alpha, or .ass
, allows for all kinds of subtitle customization like orientation, formatting, font types, colors, shadowing, and many others. YouTuber [f4mi] realized that using this subtitle system, extra garbage text could be placed in the subtitle filetype but set out of view of the video itself, either by placing the text outside the viewable area or increasing its transparency. So now when an AI crawler downloads the subtitle file it can’t distinguish real subtitles from the garbage placed into it.
[f4mi] created a few scripts to do this automatically so that it doesn’t have to be done by hand for each one. It also doesn’t impact the actual subtitles on the screen for people who need them for accessibility reasons. It’s a great way to “poison” AI models and make it at least harder for them to rip off the creations of original artists, and [f4mi]’s tests show that it does work. We’ve actually seen a similar method for poisoning data sets used for emails long ago, back when we were all collectively much more concerned about groups like the NSA using automated snooping tools in our emails than we were that machines were going to steal our creative endeavors.
Thanks to [www2] for the tip!
This won’t stop autotranscription from audio. And a right-thinking person wants to be in more training data for obvious reasons (wider exposure, shifting the text prior).
If they’re using autotranscription then they’re already poisoning themselves, saving us the hassle.
“for obvious reasons (wider exposure”
So it’s not copyright infringement if EVERYONE does it and it’s big enough… then it’s advertising?
Anything that can increase the amount of effort that companies or creators need to go through is awesome, because it increases the possibility that laws can be passed to make this garbage explicitly illegal as opposed to trusting courts containing people who apparently can’t comprehend “if I just steal ENOUGH people’s work it can’t be stealing!”
“So it’s not copyright infringement if EVERYONE does it and it’s big enough… then it’s advertising?”
I’ve never heard of any AI emissions crediting whose work it’s using so any exposure a given person gets from having his content slurped up seems to be “none”.
Your rationale for wanting to be in training data is nigh-incomprehensible. Exposure isn’t relevant in a context that lacks attribution, and “shifting the text prior” is plain word salad when used WRT plagiarism.
Perhaps you shouldn’t have farmed out your internet commenting to an LLM just yet.
Exposure is what you die of. No thank you.
“Wider exposure” how to tell me you aren’t a creative without saying you hate to think.
This trick won’t last long: It should be easy enough to write a program that will take the fancied-up .ass-format text, simulate how it would appear on the screen, then use only the text that is “inside the visible area of the screen” and “displayed prominently enough to be seen when displayed over the video itself” (enough contrast, large enough, etc.) as input to the AI.
Evolution in action.
But unless everyone does this, they won’t bother – and the poisoning will also have legible effect.
Exactly. This isn’t meant to be used at scale. It’s for individual creators who are fed up with AI cash farms on YouTube using their content.
Make an .ass out of AI.
Sounds like they have a lot of easy ways to get around this. They could lower the weights of subtitle data, they could run a check of audio transcription against the data before judging whether it should be used or not, or they could just ignore it completely. The most important part of video training is the video itself, so this isn’t as damaging as people think it is. There still aren’t any video models that include audio, and they get most their information from context on metadata outside of subtitle information.
Yet another naive and already out of date scheme, will become increasingly futile too as reasoning models (text based) are already here and the next generation over the coming months will have visual reasoning. If you don’t want machines learning from your content, in the same way that humans do, you only have one (flawed) option, some nonexistent law that allows you to discriminate, plus a platform that cooperates with you and only shows your content to verified humans and at a rate that is match to human perception speeds. Even then you will have people with AI agents interacting with your content using the human’s credentials and you can’t stop that without getting into legal hot water for discriminating against people with disabilities.
Time you got your head around what just happened (r.e. DeepSeek etc.) we have just experienced a profound paradigm shift where human intelligence’s value has just imploded. Once your value to society, and the rewards it offered you, were closely correlated with your intelligence, this is no longer the case.
Value to society has never been correlated with intelligence. What’s more important is how much people like you and perceive you as intelligent. Hence why names like Thomas Edison and Elon Musk and Mark Zuckerberg are known world-round, despite the fact that these guys are actually dumb as rocks and terrible engineers
Ass? Sounds so wrong 😂
Haha, my thoughts exactly! Though, it sounds like something a hacker would come up with, so it’s kind of amazing at the same time.
I’m pretty confident someone out there has built some accessibility solution for vision impaired people and is pretty pissed by this approach.
I entirely understand if people don’t have time or desire to watch a video, but I’m still surprised how rarely that prevents them from commenting on things that are not only explicitly and repeatedly addressed in said video, but also literally mentioned in the summary article on hackaday.
Shockingly, [f4mi] seems to be quite knowledgeable of all the shortcomings of her approach and as conscientious of the accessibility and other legitimate uses of her youtube videos, as you would hopefully expect of someone who not only seems to be quite smart and thoughtful but also literally does this for a living.
I don’t believe screen readers for the visually impaired are really addressed in the article or the video. I’m not 100 percent sure on the latter since I listened to it on 1.5x speed and jumped pass all of the filler as marked by sponsorblock. All that’s stated is that it doesn’t affect people who need subtitles for accessibility reasons. How so? I’m not sure. I think maybe the author of the poisoning may thinks that screen readers for videos just digitally process the visual data and read the visible text? If so that’s definitely not universally the case and probably isn’t common. The reality is that screen reader software implementations for videos may just actively read the srt regardless of whether it’s visually on screen.
It didn’t appear to me that really any of these types of shortcomings are addressed. On the face of it, of a screen reader could bypass the garbage data with the article/videos poisoning approach then the only thing data miners would have to do is the same bypass during a preprocess phase. If such screen readers cannot bypass the garbage data then the approach shouldn’t be used because it makes accessibility that much more difficult. Either case would seem to make this approach a dead end in practice.
As a side note, I wish people on the Internet would stop implying others should spend 20 to 30 minutes watching a video accompanying a summary article on every article i read on every website. You are trying to distance yourself from the fact that you are implying this but you are indeed still implying to the other commenter. The funny thing is that the video is mostly filler and has maybe 3 to 5 minutes of actual content similar to many other videos on YouTube designed to maximize ad revenue and algorithm views. Practically speaking, people have better things to do than spend large swaths of time watching the accompanying videos on every article they read. It shouldn’t be necessary for the context anyway. I don’t know why folks are surprised that someone wouldn’t do so.
Moreover, let’s suppose the author did address the issues with visual impairedness in detail in the video. Why on earth wouldn’t you just say that, what they said that addressed it in the video, and what timestamp it was addressed at? Imagine writing an essay or paper arguing something without providing any evidence for said argument. That’s equivalent writing a comment telling someone they are wrong without providing any details showing otherwise and telling someone to read the article or watch the video. You presumably put in the effort to watch the video so why is your comment as little effort as possible? Be kind and provide the details if you see going to say something. Don’t ask people to do all of the leg work themselves unnecessarily if you are making a claim.
This kind of behavior happens on summaries of scientific papers all of the time as well. People respond to others telling them that what they asked/said is addressed in the paper, as if it’s not reasonable that a person didn’t read a 4 to 10 page paper, and then don’t even bother to provide where it’s addressed or how. It’s just bad commenting behavior… Once again, be kind and provide the details if you are going to say something or make a claim
Cool hack. I’m in love with that thumbnail.
Glad to hear that in future history books “ASS” will have its place in the strange saga of adversarial anti-AI arms races (that inevitably will fail)
Everything old is new again. Remember pre-Google when search engines worked off simple keyword matching of the content of a webpage, and people would do keyword stuffing in the site text, changing the text colour to background so it wouldn’t be obvious to the end user?
Yeah, it didn’t work great then either.
“mostly serve to burn tons of cash while turning the Internet into even more of a desolate wasteland than it was before” I’m disappointed. I expect this kind of sensationalist garbage from normie media desperately hunting for fear-induced clicks, not Hackaday. We’ve all seen every tool ever made used for good as well as evil, and I’d expect the editors to know better than most that blame as well as merit should be given to the users, not the tools themselves.
Maybe they’re equating AI with cryptocurrency. An easy mistake to make.
What about the .bho (b*** hole) extension? Or .dhd? (d*** head) .cnt .bch .lsr .wkr .gth .stfu .esad
“Butt” is now a word that needs censorship? Did you know that ass is also a synonym for donkey?
And what exactly is your point anyway?
I think it’s hilarious that people really think this stuff is AI. What a trendy buzzword. The algorithms being used are less intelligent than a game AI from twenty years ago. There’s nothing “generative” about it.
Haven’t read the article, only watched her video, but she’s very specifically talking about people scraping her subtitles, then using GenAI to rewrite her script. That’s very much generative.
This seems good in theory but what about people who are both hearing impaired and vision impaired? I can’t say for certain, but maybe they would use software that transcribed the subtitles into brail? Would this software then get confused?