Preventing AI Plagiarism With .ASS Subtitling

Around two years ago, the world was inundated with news about how generative AI or large language models would revolutionize the world. At the time it was easy to get caught up in the hype, but in the intervening months these tools have done little in the way of productive work outside of a few edge cases, and mostly serve to burn tons of cash while turning the Internet into even more of a desolate wasteland than it was before. They do this largely by regurgitating human creations like text, audio, and video into inferior simulacrums and, if you still want to exist on the Internet, there’s basically nothing you can do to prevent this sort of plagiarism. Except feed the AI models garbage data like this YouTuber has started doing.

At least as far as YouTube is concerned, the worst offenders of AI plagiarism work by downloading the video’s subtitles, passing them through some sort of AI model, and then generating another YouTube video based off of the original creator’s work. Most subtitle files are the fairly straightfoward .srt filetype which only allows for timing and text information. But a more obscure subtitle filetype known as Advanced SubStation Alpha, or .ass, allows for all kinds of subtitle customization like orientation, formatting, font types, colors, shadowing, and many others. YouTuber [f4mi] realized that using this subtitle system, extra garbage text could be placed in the subtitle filetype but set out of view of the video itself, either by placing the text outside the viewable area or increasing its transparency. So now when an AI crawler downloads the subtitle file it can’t distinguish real subtitles from the garbage placed into it.

[f4mi] created a few scripts to do this automatically so that it doesn’t have to be done by hand for each one. It also doesn’t impact the actual subtitles on the screen for people who need them for accessibility reasons. It’s a great way to “poison” AI models and make it at least harder for them to rip off the creations of original artists, and [f4mi]’s tests show that it does work. We’ve actually seen a similar method for poisoning data sets used for emails long ago, back when we were all collectively much more concerned about groups like the NSA using automated snooping tools in our emails than we were that machines were going to steal our creative endeavors.

Thanks to [www2] for the tip!

Continue reading “Preventing AI Plagiarism With .ASS Subtitling”

Live Subtitles For Your Life

Personal head-up displays are a technology whose time ought by now to have come, but which notwithstanding attempts such as the Google Glass, have steadfastly refused to catch on. There’s an intriguing possibility in [Basel Saleh]’s CaptionIt project though, a head-up display that provides captions for everyday situations.

The hardware is a tiny I²C OLED screen with a reflector and a 3D-printed mount attached to a pair of glasses, and it’s claimed that it will work with almost any ARM v7 SBC, including more recent Raspberry Pi boards. It uses the vosc speech recognition toolkit to read audio from a USP audio device, with the resulting text being displayed on the screen.

The device is shown in action in the video below the break, and without trying it ourselves we can’t comment on its utility, but aside from the novelty we can see it could have a significant impact as an accessibility aid. But it’s as an electronic Babel fish coupled with translation software that we’d like to see it develop, so that inadvertent but hilarious international misunderstandings can be shared by all.

Regular readers will know that we’ve brought you plenty of HUD tomfoolery in the past.

Continue reading “Live Subtitles For Your Life”

History Of Closed Captions: The Analog Era

Closed captioning on television and subtitles on DVD, Blu-ray, and streaming media are taken for granted today. But it wasn’t always so. In fact, it was quite a struggle for captioning to become commonplace. Back in the early 2000s, I unexpectedly found myself involved in a variety of closed captioning projects, both designing hardware and consulting with engineering teams at various consumer electronics manufacturers. I may have been the last engineer working with analog captioning as everyone else moved on to digital.

But before digging in, there is a lot of confusing and imprecise language floating around on this topic. Let’s establish some definitions. I often use the word captioning which encompasses both closed captions and subtitles:

Closed Captions: Transmitted in a non-visible manner as textual data. Usually they can be enabled or disabled by the user. In the NTSC system, it’s often referred to as Line 21, since it was transmitted on video line number 21 in the Vertical Blanking Interval (VBI).
Subtitles: Rendered in a graphical format and overlaid onto the video / film. Usually they cannot be turned off. Also called open or hard captions.

The text contained in captions generally falls into one of three categories. Pure dialogue (nothing more) is often the style of captioning you see in subtitles on a DVD or Blu-ray. Ordinary captioning includes the dialogue, but with the addition of occasional cues for music or a non-visible event (a doorbell ringing, for example). Finally, “Subtitles for the Deaf or Hard-of-hearing” (SDH) is a more verbose style that adds even more descriptive information about the program, including the speaker’s name, off-camera events, etc.

Roughly speaking, closed captions are targeting the deaf and hard of hearing audience. Subtitles are targeting an audience who can hear the program but want to view the dialogue for some reason, like understanding a foreign movie or learning a new language.

Continue reading “History Of Closed Captions: The Analog Era”

sample of automatically generated comics

Read Your Movies As Automatically Generated Comic Books

A research paper from Dalian University of Technology in China and City University of Hong Kong (direct PDF link) outlines a system that automatically generates comic books from videos. But how can an algorithm boil down video scenes to appropriately reflect the gravity of the scene in a still image? This impressive feat is accomplished by saving two still images per second, then segments the frames into scenes through analysis of region-of-interest and importance ranking.

movie to comic book pipeline diagram

For its next trick, speech for each scene is processed by combining subtitle information with the audio track of the video. The audio is analyzed for emotion to determine the appropriate speech bubble type and size of the subtitle text. Frames are even analyzed to establish which person is speaking for proper placement of the bubbles. It can then create layouts of the keyframes, determining panel sizes for each page based on the region-of-interest analysis.

The process is completed by stylizing the keyframes with flat color through quantization, for that classic cel shading look, and then populating the layouts with each frame and word balloon.

The team conducted a study with 40 users, pitting their results against previous techniques which require more human intervention and still besting them in every measure. Like any great superhero, the team still sees room for improvement. In the future, they would like to improve the accuracy of keyframe selection and propose using a neural network to do so.

Thanks to [Qes] for the tip!