[Kuba Tyszko] like many of us, has been hacking things from a young age. An early attempt at hacking around with grandpa’s tractor might have been swiftly quashed by his father, but likely this was not the last such incident. With a more recent interest in cracking encrypted applications, [Kuba] gives us some insights into some of the tools at your disposal for reading out the encrypted secrets of applications that have something worth hiding. (Slides here, PDF.)
There may be all sorts of reasons for such applications to have an encrypted portion, and that’s not really the focus. One such application that [Kuba] describes was a pre-trained machine-learning model written in the R scripting language. If you’re not familiar with R, it is commonly used for ‘data science’ type tasks and has a big fan base. It’s worth checking out. Anyway, the application binary took two command line arguments, one was the encrypted blob of the model, and the second was the path to the test data set for model verification.
The first thing [Kuba] suggests is to disable network access, just in case the application wants to ‘dial home.’ We don’t want that. The application was intended for Linux, so the first port of call was to see what libraries it was linked against using the ldd
command. This indicated that it was linked against OpenSSL, so that was a likely candidate for encryption support. Next up, running objdump
gave some clues as to the various components of the binary. It was determined that it was doing something with 256-bit AES encryption. Now after applying a little experience (or educated guesswork, if you prefer), the likely scenario is that the binary yanks the private key from somewhere within itself reads the encrypted blob file, and passes this over to libssl
. Then the plaintext R script is passed off to the R runtime, the model executes against the test data, and results are collated.
[Kuba]’s first attack method was to grab the OpenSSL source code and drop in some strategic printf() function calls into the target functions. Next, using the LD_PRELOAD ‘trick’ the standard system OpenSSL library was substituted with the ‘fake’ version with the trojan printf
s. The result of this was the decryption function gleefully sending the plaintext R script direct to the terminal. No need to even locate the private key!
Next [Kuba] outlines the ‘easy way’ which is to freeze the binary, just like we could with a whole machine in years gone b, by having it read from a FIFO instead of a file but never place data on the other end. Then, with the read()
call blocked, the binary is frozen, and hopefully, the private key is in memory already. Next, we use gcore to create a core dump of the running application, which only requires knowledge of the process PID. Since the binary has already accessed the key and decrypted the secret model data, which it is held in memory, the plaintext contents will be in the core file, and easily visible by just opening it as a text file! After a bit of searching around, the R script code was visible. No special libraries are needed, just a handful of standard Linux commands.
With the shoe on the other foot, how can you protect your application against such a simple hacking process? Roll you own crypto? That is a dangerous proposition and the consensus is to not do this. Preventing the FIFO attack could be as simple as using stat
() to check the file presented is an actual file. You could also statically link certain critical libraries, to prevent the LD_PRELOAD attack, if that is possible. [Kuba] also suggests that the application could inspect any loaded shared objects using callbacks, to verify the libraries are the expected ones.
The only way to be sure (and you can never be 100%) is to enumerate all the possible attack methods and mitigate each one accordingly. There is no hack-proof method, you just have to make it as hard as possible.
“With the shoe on the other foot, how can you protect your application against such a simple hacking process?”
You can’t. You can only make it more cumbersome/entertaining for the attacker. Yet some people imagine that they can provide a magic bit of software that will have access to some data, but the user will not be able to access it directly…
While you’re right about any counter measure you might think of, the attacker can still run the application in a virtual machine and pause it at the right time to inspect its memory, bypassing any modification of the software/environment.
Yet, now there is homomorphic encryption system available, you can have your program that’s dealing /manipulating with encrypted data without never decrypting it. So you’ll never be able to access the open / clear text data anymore. Yet, this is currently so slow that even a simple addition takes msec to perform, but a lot of progress are made in this field and I suspect a DH system could happen soon in FHE so you’d never get the decrypted key in memory at any time.
While homomorphic encryption allows (untrusted) parties to perform computations/manipulations on encrypted data without decryption, at some point in time, you probably want to read that data as well, which requires decryption.
That’s where you’re wrong.
This. What’s described here is not a security hack but effectively a DRM crack, allowing the user to inspect the software running on their own machine. You can’t and shouldn’t protect an application from the person who has root access to the computer. The only winning move with DRM is not to play.