If only you could get your hands on the code to fix the broken features on your beloved electronic widget. But wait, hardware hackers have the skills to write their own firmware… as long as we can get the compiled binary into a format the hardware needs.
Luckily, we have Uri Shaked to walk us through that process. This workshop from the 2020 Hackaday Remoticon demonstrates how to decipher the encryption scheme used on the firmware binary of a 3D printer. Along the way, we learn about the tools and techniques that are useful for many encrypted binary deciphering adventures.
The origin story of this workshop began when Uri decided to become a backer of a 3D printer on Kickstarter that had okay hardware but rubbish firmware. This was the second time he had fallen for it, but the first time around someone had saved his bacon by writing custom firmware to make the thing run well. This time the community needed help reverse engineering the new binary format before they could run custom code, so Uri jumped into action.
He’s using a CoLab notebook during the workshop to help everyone follow along as he runs a combination of Python and Linux shell commands. You can find the links for everything on the workshop guide, but to be honest there isn’t much in the way of secret sauce here. The true ingredient is ingenuity as a hex dumper to visualize the code, and Python to process it rounds out the bulk of the process.
Since the previous printer model’s firmware used a substitution cipher, Uri tries compressing the target binary. It seems that using robust encryption like AES 256 will make it difficult to compress efficiently. Since the bin file drops from 58 kb to 38 kb there’s a good chance it’s just a substitution.
Next he runs a histogram to plot repeated occurrence of characters and it becomes obvious right away which binary value stands in for 0x00, as that will be the most frequently used value (think leading zeros on 32-bit numbers), as well as 0xFF which usually pads out the ends of binaries. This is further confirmed by looking up the datasheet of the STM32 microcontroller to find that the vector reset table is always placed at the beginning of program memory. Since many values in the this table are known, and the placement of unknown values are specified, this turns out to be a huge key in getting the deciphering effort started.
The rest is an interesting game of finding obfuscated strings. Uri used DotPeek to decompile the .NET computer software that controls the printer and locates a few strings that software listens for on the serial connection. Since the printer will be sending these to the computer, they must exist somewhere in the binary.
This diagram shows a function Uri wrote to visualize repetition within strings. He uses it to judge if a string has enough repetition to serve as a uniquely identifiable pattern in the binary which will then reveal the substitution for each of those values. Here he hit the brick wall repeatedly before a eureka moment reveals that pairs of bytes in the binary are reversed. From there he rapidly begins to piece together a substitution table — using a custom hex dump view he wrote in Python to sanity check the process as he goes.
Due to a problem with the Zoom recording, the segment after the labs section of the workshop was not recorded, however everything you need is available in this video. Follow Uri’s lead, download the files he links, and see if you can find your own way through the rest of the substitutions.
This is a fantastic way to get better at this kind of deobfuscation task. We have both the binary itself, and Uri’s first hand recollection of the experience to guide us. The work is very much a jigsaw puzzle without a picture on the box, but as with most things in life, you get the hang of it as you do more of it.