Pixelating Text Not A Good Idea

February 23, 2022

People have gotten much savvier about computer security in the last decade or so. Most people know that sending a document with sensitive information in it is a no-no, so many people try to redact documents with varying levels of success. A common strategy is to replace text with a black box, but you sometimes see sophisticated users pixelate part of an image or document they want to keep private. If you do this for text, be careful. It is possible to unredact pixelated images through software.

It appears that the algorithm is pretty straightforward. It simply guesses letters, pixelates them, and matches the result. You do have to estimate the size of the pixelation, but that’s usually not very hard to do. The code is built using TypeScript and while the process does require a little manual preparation, there’s nothing that seems very difficult or that couldn’t be automated if you were sufficiently motivated.

You don’t see it as often as you used to, but there have been a slew of legal and government scandals where someone redacted a document by putting a black box over a PDF so it was hidden when printed but the text was still in the document. Older wordprocessors often didn’t really delete text, either, if you knew how to look at the files. The Facebook valuation comes to mind. Not to mention that the National Legal and Policy Center was stung with poor redaction techniques.

64 thoughts on “Pixelating Text Not A Good Idea”

Miles says:

February 23, 2022 at 10:01 pm

If an approach this simple works, throwing some AI at it means pixelating is double plus non-good for redaction.

Also Cue CSI meme:. “Enlarge”

Report comment

Reply
1. Fungus says:
  
  February 23, 2022 at 10:20 pm
  
  You mean “Enhance”
  
  Report comment
  
  Reply
  1. DainBramage says:
    
    February 24, 2022 at 7:31 am
    
    I thought that was from Star Trek?
    
    Report comment
    
    Reply
    1. Perry says:
      
      February 24, 2022 at 12:01 pm
      
      Super Troopers is the most popular current reference.
      
      Report comment
      
      Reply
      1. Moonpie says:
        
        February 26, 2022 at 7:42 am
        
        Just print the damn thing!
        
        Report comment
    2. Endless says:
      
      February 26, 2022 at 4:28 pm
      
      Blade Runner
      
      Report comment
      
      Reply
2. Andy B says:
  
  February 25, 2022 at 7:04 am
  
  That’s double plus UN good. In to the memory hole with it all. The past never had been altered. And Eurasia was at war with Eastasia. Eurasian had always been at war with Eastasia.
  
  Report comment
  
  Reply
  1. Jim says:
    
    February 27, 2022 at 11:42 am
    
    Russian bot with bad AI
    
    Report comment
    
    Reply
    1. Asher says:
      
      February 28, 2022 at 5:43 pm
      
      It’s a reference to the book 1894 but okay
      
      Report comment
      
      Reply
      1. Xeno Shvetsario says:
        
        March 1, 2022 at 1:53 am
        
        Lol “1894” you mean 1984? Or is this from the multiverse?
        
        Report comment
  2. Anthony Larson says:
    
    February 28, 2022 at 6:21 pm
    
    Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
    
    Report comment
    
    Reply
Dude says:

February 23, 2022 at 10:21 pm

It’s such a fragile approach. Even the author says “the whole thing really depends on being able to correctly replicate the redacted characters.” which means anything can go wrong.

Report comment

Reply
1. A7 says:
  
  February 23, 2022 at 11:00 pm
  
  So what？ An exploit like this only need to happen once, for the right target, under the right circumstances.
  
  Report comment
  
  Reply
  1. Dude says:
    
    February 24, 2022 at 1:04 pm
    
    In any realistic scenario, where you’re looking at a photo of some blurred characters, it’s a cold day in hell before you even guess the font right.
    
    Report comment
    
    Reply
    1. plab says:
      
      February 24, 2022 at 2:06 pm
      
      Except that presumably there is unredacted text to go along with the redacted text
      
      Report comment
      
      Reply
      1. Matias says:
        
        February 25, 2022 at 9:53 am
        
        This. You can get all the info you want from the other text, and maybe there is some kind of adjustable threshold to set how close to the original the output has to be (for antialiasing, slightly different x/y position, whatever)
        
        Report comment
      2. Dude says:
        
        February 25, 2022 at 12:04 pm
        
        Then you have to guess which blur algorithm was used.
        
        Report comment
      3. Foxy Bill says:
        
        February 26, 2022 at 7:16 am
        
        Yes, this. Also, even if you change the font for redacted text, this is one computer using brute force. You could be smarter on the brute force with a rainbow table equivalent (except more effective because this wouldn’t be all or nothing; someone could get feedback on a partial answer). You could throw multiple computers at the problem. You could train a neural net and invest upfront time on learning for fast inference on multiple fonts. You could use a language faster than TypeScript and greedily guess the first letter on a font-by-font basis and back out if you didn’t get high alignment. This PoC is incredibly damning of this blurring practice
        
        Report comment
    2. zeel says:
      
      February 27, 2022 at 7:40 am
      
      It’s not that hard to guess the font metrics. And you don’t need a complete reveal, just enough for a human to be able to guess the words.
      
      Report comment
      
      Reply
2. wiebel says:
  
  February 23, 2022 at 11:01 pm
  
  Unscrambling passwords does not have to be robust to be a threat. The scrambling part is the one that must not be fragile.
  
  Report comment
  
  Reply
3. WereCatf says:
  
  February 23, 2022 at 11:05 pm
  
  Most people, even in this day and age, tend to use easily recognizable words. Even with a couple of errors in the unredacted output, a human could probably guess the correct one.
  
  Report comment
  
  Reply
4. bm says:
  
  February 25, 2022 at 7:36 am
  
  But you probably don’t even need all the characters to ascertain the message and context through.
  
  Report comment
  
  Reply
e says:

February 23, 2022 at 11:57 pm

this is why I always convert my text to cuneiform before pixelating.

Report comment

Reply
1. mathman says:
  
  February 24, 2022 at 10:54 am
  
  XD
  
  Report comment
  
  Reply
2. Daniel Scott Matthews says:
  
  February 24, 2022 at 1:01 pm
  
  𒀱 would like that.
  
  Report comment
  
  Reply
Alexander Wikström says:

February 24, 2022 at 12:29 am

To be fair, there is many ways to “pixelate” text. Quantizing the pixels into larger pixels is just one approach. One can also first scramble the original pixels a bit, before pixilating it.

Or the easiest solution, one changes the characters to different characters first. Since the recipient shouldn’t known the original text, they have no reason to really care about the pixelated text.

Even using black bars to censor text isn’t without flaws. One still knows the length of the word/phrase used. And that alone leaks some information. (this is though starting to go into tempest.)

Report comment

Reply
1. JohnU says:
  
  February 24, 2022 at 1:17 am
  
  Surely the easiest and least computationally intensive method is to put a plain black box over them?
  
  Report comment
  
  Reply
  1. Dan says:
    
    February 24, 2022 at 1:57 am
    
    As Alexander said, the length of a text can give away the content, especially where there’s a small set of likely options.
    
    Report comment
    
    Reply
    1. Alexander Wikström says:
      
      February 24, 2022 at 2:29 am
      
      Sometimes, even the knowledge of the message existing can be enough to leak vital information.
      
      One reason why some more security oriented organizations sends “empty” documents internally.
      
      One example is an embassy just sending a briefcase with content that in itself is pointless. But by sending it one can greatly blur when actual data is being sent. Since if there only is transactions happening when things of importance happens, then it is easy to conclude that whatever is happening is important enough to converse about. It is easier to hide these patterns if one always sends something at a certain intervals.
      
      The same applies for trivial things like web traffic between servers in an organization. (and here timing of transactions can leak even more data compared to sneakerneting documents)
      
      But information security is a field where the question of “can it be decrypted?” shouldn’t be asked, but rather, “how long until they/someone find(s) out?” is the more important question, that hopefully has the answer “long enough.”
      
      Report comment
      
      Reply
      1. cliff claven says:
        
        February 24, 2022 at 6:12 am
        
        See https://www.mattblaze.org/blog/neinnines for example (or, better, the referenced source: Strzok’s _Compromized_)
        
        Report comment
      2. Kyle says:
        
        February 24, 2022 at 8:18 am
        
        Isn’t that a theory behind numbers stations? That they were always broadcasting garbage but when necessary you could transmit encrypted messages that naively resembled the garbage
        
        Report comment
      3. Alexander Wikström says:
        
        February 24, 2022 at 10:58 am
        
        Kyle
        Some number stations are very easy to know when they send actual messages, and when they send garbage.
        
        Since some just play music, or noise when “offline”, and have someone reading out numbers or even words when “online”. Making a very stark contrast between the two modes of operation.
        
        However, most people that are intended to listen to a number station to receive information will do so at a pre specified time. So they tend to not be reactionary, so one can’t gleam as much from the timing of a message.
        
        Though, we don’t really know how many of the “messages” are actual messages. I wouldn’t be surprised if some are just garbage to fill the void.
        
        Report comment
      4. DJ says:
        
        February 25, 2022 at 5:16 am
        
        Numbers stations are not sending an encoded message directly. The people who are expecting to the messages, know the schedule and also know the key. The numbers could be referring to pages/chapters/word on a line from a specific book that they know to use, etc. There is not likely a directly encoded message in those numbers.
        
        Report comment
  2. Karl Emmanuel Sanchez says:
    
    March 13, 2022 at 3:09 pm
    
    Best method, put your text, cover the sensitive data with a text box containing things like F* you I won’t give my f*ing password” and pixelate that, at least it will make the hacker burn some time…
    
    Report comment
    
    Reply
Janez+D. says:

February 24, 2022 at 1:39 am

It would be a much better demo if the text said “No more secrets”

Report comment

Reply
Blunder Blender says:

February 24, 2022 at 2:51 am

Does it work with porn?

Report comment

Reply
1. Elliot Williams says:
  
  February 24, 2022 at 5:27 am
  
  Read the OP’s writeup, and you’ll find out!
  
  Report comment
  
  Reply
2. Matt says:
  
  February 26, 2022 at 6:41 am
  
  There was some thing a couple years ago that could supposedly unblur a face. The pixelization would change as the face moved, and could guess smaller and smaller features. Can’t seem to find anythjng about it anymore
  
  Report comment
  
  Reply
Hassi says:

February 24, 2022 at 5:24 am

Why bother pixelating stuff anymore? just flat out put a black bar on top and call it a day.

Report comment

Reply
Twisty Plastic says:

February 24, 2022 at 5:58 am

This brings to mind a more fun way to redact text. Paste other pixelated text over it!

It could be false information, made to look correct but actually harmful to the competitor/enemy/opponent who thinks they are being sneaky by decoding it. It could be nonsense text. Or it could even just be insults.

So much more fun than a plain black box.

Report comment

Reply
1. Foldi-One says:
  
  February 24, 2022 at 7:21 am
  
  Hehehe, I like it, and you can make it easier or harder to unscramble depending on just how ‘important’ the document seems to be – so they will spend all those hours of computer time to find ‘Erm, reading others mail is rude buddy, please bugger off…’ just because that document seemed like it was worth it…
  
  Really got to ask the question these days though of why redact at all? – if its digital just having a character you use to represent a cut is enough, it doesn’t need to be in printed page format that gives all these hints as to the content just by its length… And if you really are putting it down in paper, or the layout matters for some reason yet you actually need to redact it in some way you have to ask why? Can’t you find a better method to transfer the more public elements separately from the sensitive…
  
  Report comment
  
  Reply
2. Hirudinea says:
  
  February 24, 2022 at 4:14 pm
  
  Yep, I can just imagine the NSA guy saying “What the hell does ‘Be sure to drink your Ovaltine’ mean!?”
  
  Report comment
  
  Reply
  1. BrightBlueJim says:
    
    February 25, 2022 at 12:05 am
    
    Beat me to it!
    
    Report comment
    
    Reply
anachronda says:

February 24, 2022 at 5:59 am

back in the day, my university had a burroughs mainframe and a bunch of printing terminals. when you logged in, the password was echoed. to obscure this, the mainframe would overprint with a sequence of xes and asterisks and other characters that put a lot of ink on the paper. some of the poor students hanging around in the terminal room were pretty good at seeing through the overprinting to extract the password.

Report comment

Reply
Nathan says:

February 24, 2022 at 6:44 am

This is why you take a PDF, screen shot it, use MS Paint (or equivalent) to overlay black boxes, then ‘print’ the image back to a PDF document.

Report comment

Reply
1. DainBramage says:
  
  February 24, 2022 at 7:36 am
  
  You beat me to it.
  It’s pretty hard to extract text from a black box in a JPEG.
  
  Report comment
  
  Reply
2. Hirudinea says:
  
  February 24, 2022 at 4:16 pm
  
  Screen shot, edit image, print out image, scan image and send scan.
  
  Report comment
  
  Reply
  1. Bob says:
    
    February 25, 2022 at 5:06 am
    
    For the love of cod DON’T. Especially not if you’re simply being overly paranoid about trivial content. I’ve been making a number of public records request recently and some of the local governments have peculiar sensitivities and definitions of “personal” information, despite the documents being completely legible if I was able to view them in person. Instead I get charged a quarter for the paper they use to print them out on some splotchy printer, and then receive a 2-bit scan of the result as a muti-page TIFF. They’ve complied with letter of the request, but not the spirit, as the result is completely unreadable.
    
    Report comment
    
    Reply
    1. Stu Pidaso says:
      
      February 25, 2022 at 8:31 pm
      
      If I were being paid by a 3-letter org I would delete and rewrite these sensitive messages prior to pixelation and public dissemination. You know they already decrypt far more difficult puzzles. False messages would be well planted in such a fashion….
      
      Report comment
      
      Reply
socksbot says:

February 24, 2022 at 8:41 am

Seriously, just blot it out. Pixellate when you want to give others a hint.

Report comment

Reply
Gravis says:

February 24, 2022 at 9:05 am

This isn’t terribly new as something similar was presented over a year ago: https://hackaday.com/2020/12/18/this-week-in-security-solarwinds-and-fireeye-wordpress-ddos-and-enhance/
Depix: https://github.com/beurtschipper/Depix

Report comment

Reply
SteveS says:

February 24, 2022 at 9:51 am

I once got a whole pile of “redacted” word documents related to a bid we were working on for a company that way to into it’s “propriatary” data.

It turned out that the genius lawyers at the sending end had just generated all their copious black boxes by selecting the text and changing the background color to black.

After I discovered that it took all of two seconds to unmask each “double-extra-super-secret” document.

Report comment

Reply
snarkysparky says:

February 24, 2022 at 1:16 pm

pixelate.. fine. but definitely salt the pixels with random numbers.

Report comment

Reply
1. Sabás says:
  
  February 26, 2022 at 10:58 am
  
  I thought the same like a new and small Qr codes as characters and a high security private and public key to encode/decode the text with a zero knowledge proof method or somenthing, that would be too heavy and intense but if someone really wants that low level of security on printed or digital documents, they will have the resources to do so
  
  Report comment
  
  Reply
2. the Keyboard Kid says:
  
  February 26, 2022 at 12:29 pm
  
  or even have the text pixelisation use a random llorum ipsum section (or other “random” word fillers, or even replace the letters with random numbers, possibly even have the pixelization chose a random font to use for the text to be pixelized) for the pixels.
  
  even with inserting random numbers in the pixelization you could still potentially un-pixelate the text and filterout the numbers.
  replacing the text with random noise will mostly garentee that someone cant extract data from a redacted section.
  
  and you could have the algorithm used to pixelate the text require 2 factor authentication to unredact the text (i.e. pull from an encrypted unredacted file) otherwise the document when you try to unredact the text will just show the random noise.
  
  Report comment
  
  Reply
BrightBlueJim says:

February 25, 2022 at 12:12 am

Since we are ALL familiar with the pixelated picture of Abe Lincoln, and how easily it is recognized, it’s amazing that anyone, anywhere, would ever think that this would be a robust way to obscure text. Human neural nets are quite good at extracting data at extremely low signal-to-noise ratios.

Report comment

Reply
Joseph Eoff says:

February 25, 2022 at 5:40 am

Ask this guy how well pixellation worked:

https://en.wikipedia.org/wiki/Christopher_Paul_Neil

They untwirled his swirled pixels, tracked him down, and put him in jail.

Sometimes you’ve got to be glad that people don’t understand the technology they are using.

Report comment

Reply
Yoinx says:

February 25, 2022 at 9:36 am

I’ll see your single pixelation un-pixelater. Now I’ll pixelate my picketed text, then pixelate that pixelated text.

Your move.

Report comment

Reply
Christopher Paul Neil says:

February 26, 2022 at 6:33 am

Just don’t write things down you’d need to redact.

Report comment

Reply
Md says:

February 26, 2022 at 12:28 pm

That’s why I always use Gaussian blur with large radius when reacting something

Report comment

Reply
1. bcsj says:
  
  March 1, 2022 at 5:56 am
  
  You might be surprised what can be recovered from even extremely blurred text. Check out last year’s Helsinki Deblur Challenge. At the highest difficulty levels you would hardly guess that the picture actually contains text if you weren’t told.
  
  Report comment
  
  Reply
Chris walker says:

February 28, 2022 at 8:55 am

Good thing I use the wingdings don’t for all my important communication… I pixilate that…

(Ok I don’t do this…but it would throw a wrench at hackers (after pixelation) and the people you intend to read it both)

Report comment

Reply
Jacob says:

March 1, 2022 at 9:41 am

This is why you should just write all personal information in Wingdings and then blur it. No one’s guessing that shit without brute force.

Report comment

Reply
Atharv says:

April 1, 2026 at 8:56 am

I think pixelation in the more large size will work work

Report comment

Reply

Hackaday

Pixelating Text Not A Good Idea

64 thoughts on “Pixelating Text Not A Good Idea”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Encryption In The 1790s

The Need For Speed: Internet Speed Measurement (or DIY?)

Postal IRCs Are Almost A Thing Of The Past

Launching Rockets Is Hard, Bring Them Back Is Harder

Putting Some Zig In A Linux-Based 3D Printer

Our Columns

Hackaday Europe 2026: Half Quad, Half Blimp: Test. Fly. Survive.

FLOSS Weekly Episode 876: There Is No Money Fairy

Compile Here, Run Everywhere: Crosstool-Ng

Giving Resin 3D Printers Another Shot After Six Years

Hackaday Europe 2026: Project Gigapixel

64 thoughts on “Pixelating Text Not A Good Idea”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns