Some sentences have more than meets the eye, and we’re not talking about interpretive nonsense. Rather, some sentences may contain up to four paragraphs’ worth of hidden text, invisible to readers.
Thanks to Zero Width Obfuscation, it is possible to use Zero Width Characters – Unicode characters that are invisible even when you try to highlight them. They’re typically used for abstract foreign languages that require separators that don’t take up an entire space. In this case, they’re used to obfuscate and de-obfuscate hidden messages sent through text.
[inzerosight] published a browser extension that identifies, de-obfuscates, and obfuscates these messages for you on the web. It does this by querying each page for the Unicode of the Zero Width Characters (U+FEFF, U+200C, U+200D, U+200E, U+2060, U+180E) and highlighting where they’ve been spotted. The encoding replaces each Unicode character with a permutation of two of the Zero Width Characters, essentially doing a find and replace across the text message.
I’m just waiting to see how long it takes for Zero Width Obfuscation to become the next Konami Code Easter Egg.
It’s a nice topic, but…
Great, you’ve added zero width characters to the article title but the extension doesn’t display them. It only states there’s some text there but no simple way to view it.
view them? like looking at the page source?
(from the tag but with added spaces because I don’t know if the HaD comments will treat it as text or try to interpret it)
& #8203; & zwnj; & zwnj; & zwj; & #8203; & #8203; & #8203; & #8203; & #8203;
For Android phone users, there’s no menu option, however, there is a url switch.
view-source:https://hackaday.com/2019/10/10/this-sentence-isnt-just-a-sentence/
Apart from [Sheldon]s recommendation to view the code, you are viewing the characters. Finding a renderer to render the characters doesn’t ‘display anything’ because there’s ‘nothing’ to display. The characters themselves are ‘nothing’. It’s like trying to do a calculation on a NaN value, you don’t get a result because there’s nothing to calculate.
Zero width characters are also useful to de-emojify text that would otherwise change into a stupid icon, like listing (a), (b) [beer], (c).
How long before the browser devs decide this is just a bug though?
That’s a great idea.
I already mirror my emoticons, to prevent some of this (-:
I use zero-width spaces to prevent Google Calendar from interpreting numbers, in descriptions of an event, as a time of day, even though it conflicts with the time I expressly selected, before entering the description.
i know better. steganography text inside text (no headers)
https://web.archive.org/web/20101130203351/http://ukrywanieinformacji.appspot.com/
I prefer left-to-right marks.
I dedicated a whole hackaday.io project to it ;-)
https://hackaday.io/project/166833-%E2%80%8E
You can detect these by selecting the first character, and then changing the selection using your arrow keys. When the selection doesn’t change when you press the array key, you detected a zero-width symbol.
That reminds me of military manuals that have pages that have printed on them,
“This page is intentionally blank)
But, they are NOT blank!
They have printing on them!
B^)
Right up there with keyboard error, press F1 to continue, or This system does not support the installed processor.
I have seen “this page intentionally left almost blank.”
Be interesting to see how much this happens in the wild.
Trying to deobfuscate the header gives me only ” the e”. What was it supposed to say?
Hmm. I wonder if this could be used maliciously, like including extra commands in something meant to be copy-pasted, or a way to get an https cert for google.com with some hidden Unicode characters.
In html you can have hidden text that can still be selected to be copied.
See this example: https://output.jsbin.com/qifolexiyi that I slapped together.
bash, c shell, bourne shell don’t see a zero width space as a separator. So something like sh_zws_a1sum …. doesn’t appear work.
It could be weird if text-to-voice doesn’t ignore it.
FYI VoiceOver ignores them.
Twitter won’t accept it!
You have to use “limited twiT” in the version selector before obfuscating for Twitter.
Almost a lifetime ago at high school we used these chars in names for files that we didn’t want people to access or delete, or find on a computer system. Most of the time it was just using alt+0255 as a space replacement to mess with the IT teacher