For as visionary as he was, [George Carlin] vastly underestimated the situation with his classic “Seven Words You Can’t Say on TV” bit. At least judging by [Ben Eater]’s reverse engineering of the “TVGuardian Foul Language Filter” device, it seems like the actual number is at least 20 times that.
To begin at the beginning, a couple of weeks ago [Alec] over at everyone’s favorite nerd hangout Technology Connections did a video on the TVGuardian, a device that attempted to clean up the language of live TV and recorded programming. Go watch that video for the details, but for a brief summary, TVGuardian worked by scanning the closed caption text for naughty words and phrases, muted the audio when something suggestive was found in a lookup table, and inserted a closed caption substitute for the offensive content. In his video, [Alec] pined for a way to look at the list of verboten words, and [Ben] accepted the challenge.
The naughty word list ended up living on a 93LC86 serial EEPROM, which [Ben] removed from his TVGuardian for further exploration. Rather than just plug it into a programmer and dumping the contents, he decided to roll his own decoder with an Arduino, because that’s more fun. And can we just point out our ongoing amazement that [Ben] is able to make watching someone else code interesting?
The resulting NSFW word list is titillating, of course, and the video would be plenty satisfying if that’s where it ended. But [Ben] went further and figured out how the list is organized, how the dirty-to-clean substitutions are made, and even how certain words are whitelisted. That last bit resulted in the revelation that Hollywood legend [Dick Van Dyke] gets a special whitelisting, lest his name becomes sanitized to a hilarious [Jerk Van Gay].
Hats off to [Alec] for inspiring [Ben]’s fascinating reverse engineering effort here.
I found the spreadsheet work more impressive than the arduino programming. I didnt even know you could do stuff like that in a spreadsheet.
Ain’t that the truth! Some serious Excel-fu
https://en.m.wikipedia.org/wiki/Scunthorpe_problem
…not to mention Penistone…https://www.dw.com/en/austrian-village-of-fucking-decides-to-change-its-name/a-55740967
Once I raised a support ticket that we were not able to access Scunthorpe University Centre’s site at work.
Ah yes, clbuttic
*&%! That’s some $%&^#@’n impressive work right there. He’s one smart *&*&^% #@*^&
The standards for decency are so vague. Obviously there are differences between broadcast networks and cable shows, but then there seems to be late night where a lot more goes, and exceptions galore like certain movie channels are allowed to show and say a bunch more because … art I guess? Plus the hilarity of how Spanish language TV and radio were “anything goes” for a good long time. Not sure how it is now.
I personally love the dichotomy you get living in WNY, where my broadcast sources are split evenly between Buffalo/Rochester and Toronto (tv AND radio). The Canadian stations generally follow our “safe harbor” hours, but they don’t have to and after sunset anything goes (within reason) – always fun to hear an uncensored song or watch a documentary and hear someone speak without a filter.
And for anyone curious, remember that all that crap only applies to what goes “over the air” – cable/satellite/streaming/any paid service is not part of those regulations. Period. There is no comparison between the two when they operate in different regulatory frameworks. Fox News and CNN are allowed to be polarized shitshows for the same reason HBO can show porn – the rules don’t apply if it’s not broadcast.
One gotcha with simple search-and-replace… you must check for word boundaries. A challenge I’m sure for the buttembly programmers out there.
https://thedailywtf.com/articles/The-Clbuttic-Mistake-
Which is why it’s not just simple search and replace. There’s an implicit word boundary match at the start of each entry, and the end has explicit codes for whether it has to match a word boundary or if a matching prefix is sufficient. There’s even a few special entries for articles to improve the grammar of the final result.
Nonetheless it still screws up quite regularly, since the problem has no simple solution.
What about reversing the product itself ? Make it replace legit words by naughty words ?
Some kind of “Gilles de la Tourette” apparatus :)
That would be such an amazing guerrilla radio / TV hack.
I’ll take eight! XD
It’s gotta have a catchy name, though.
Was there a special exemption for shiitake mushrooms?
Man, I remember the trouble I had on a forum trying to say I had a matsushita CD-ROM one time, I finally settled on matsusheeta or something and still got a ban for “evading the swear filter” … nutcases running that place and long gone, or I’d name and shame so anyone bored could go and troll them.
I actually watched this video to see if my favorite substitution came up.
Watching “Die Hard” on TBS: “Yippee Ki-yai-yay, Mister Falcon!”
I still use that curse to this day.
My German cousin said they replaced it with “pig ear” in the dubbed version . Lol
“You see what happens when you find a Stranger in the Alps!”
-The Big Lebowski
It took a minute to find.
Carlin did a much more complete list:
https://forum.multitheftauto.com/topic/5493-2443-dirty-words-by-george-carlin/
Another funny note is that here in europe I sometimes see in kid shows the pre-censored (at production) word replaced in subtitles with the uncensored version.
Which bring up the annoying thing that the US all too often export their hang-ups and don’t quite seem to get that there are other places where people view things differently and not only do they force-feed that censorship (and such) but they even try to get lawmakers to join in..
Excuse the mini-rant.
You’ll never get the ‘diamond chips’ Cheech and Chong Simpson’s jokes without the broadcast TV version context.
Taking into account the lamentable state of the English-speaking World, I’m surprised that there aren’t different lists depending on political preferences of the head of the family.
When I worked at Digital Electronics Corp in the 80s a story came through the internal feed this reminded me of. The (then new) VAX/VMS machines had a password generator you could invoke. It gave a list of possible passwords, and you picked the one you wanted. But there had been a horror that it might produce a password that was naughty. So someone coded a list of naughty words to watch for. Because people looked through the compiled code these were stored encrypted, but the list was there.
When the rest of Digital became aware of the list it was pointed out that Digital had a global presence. What about other languages? So someone else made up a table with English words in the first column and translations to French, German etc in the others. Afaik this table was never implemented, but the punch line to all this is the Swedish entry for ‘breast’. That was just an asterisk with an accompanying note that the Swedes find nothing naughty about breasts, bless them.