The phrase “extraordinary claims require extraordinary evidence” is most often attributed to Carl Sagan, specifically from his television series Cosmos. Sagan was probably not the first person to put forward such a hypothesis, and the show certainly didn’t claim he was. But that’s the power of TV for you; the term has since come to be known as the “Sagan Standard” and is a handy aphorism that nicely encapsulates the importance of skepticism and critical thinking when dealing with unproven theories.
It also happens to be the first phrase that came to mind when we heard about Obfuscation Revealed: Leveraging Electromagnetic Signals for Obfuscated Malware Classification, a paper presented during the 2021 Annual Computer Security Applications Conference (ACSAC). As described in the mainstream press, the paper detailed a method by which researchers were able to detect viruses and malware running on an Internet of Things (IoT) device simply by listening to the electromagnetic waves being emanated from it. One needed only to pass a probe over a troubled gadget, and the technique could identify what ailed it with near 100% accuracy.
Those certainly sound like extraordinary claims to us. But what about the evidence? Well, it turns out that digging a bit deeper into the story uncovered plenty of it. Not only has the paper been made available for free thanks to the sponsors of the ACSAC, but the team behind it has released all of code and documentation necessary to recreate their findings on GitHub.
Unfortunately we seem to have temporarily misplaced the $10,000 1 GHz Picoscope 6407 USB oscilloscope that their software is written to support, so we’re unable to recreate the experiment in full. If you happen to come across it, please drop us a line. But in the meantime we can still walk through the process and try to separate fact from fiction in classic Sagan style.
Baking a Malware Pi
The best way of understanding what this technique is capable of, and further what it’s not capable of, is to examine the team’s test rig. In addition to the aforementioned Picoscope 6407, the hardware configuration includes a Langer PA-303 amplifier and a Langer RF-R H-Field probe that’s been brought to rest on the BCM2837 processor of a Raspberry Pi 2B. The probe and amplifier were connected to the first channel of the oscilloscope as you might expect, but interestingly, the second channel was connected to GPIO 17 on the Pi to serve as the trigger signal.
As explained in the project’s Wiki, the next step was to intentionally install various rootkits, malware, and viruses onto the Raspberry Pi. A wrapper program was then used that would first trigger the Picoscope over the GPIO pin, and then run the specific piece of software under examination for a given duration. This process was repeated until the team had amassed tens of thousands of captures for various pieces of malware including bashlite
, mirai
, gonnacry
, keysniffer
, and maK_it
. This gave them data on what the electromagnetic (EM) output of the Pi’s SoC looked like when its Linux operating system had become infected.
But critically, they also performed the same data acquisition on what they called a “benign” dataset. These captures were made while the Raspberry Pi was operating normally and running tools that would be common for IoT applications. EM signatures were collected for well known programs and commands such as mpg123
, wget
, tar
, more
, grep
, and dmesg
. This data established a baseline for normal operations, and gave the team a control to compare against.
Crunching the Numbers
As explained in section 5.3 of the paper, Data Analysis and Preprocessing, the raw EM captures need to be cleaned up before any useful data can be extracted. As you can imagine, the probe picks up a cacophony of electronic noise at such close proximity. The goal of the preprocessing stage is to filter out as much of the background noise as possible, and identify the telltale frequency fluctuations and peaks that correspond to individual programs running on the processor.
The resulting cleaned up spectrograms were then put through a neural network designed to classify the EM signatures. In much the way a computer vision system is able to classify objects in an image based on its training set, the team’s software demonstrated an uncanny ability to pick out what type of software was running on the Pi when presented with a captured EM signature.
When asked to classify a signature as ransomware, rootkit, DDoS, or benign, the neural network had an accuracy of better than 98%. Similar accuracy was achieved when the system was tasked with drilling down and determining the specific type of malware that was running. This meant the system was not only capable of detecting if the Pi was compromised, but could even tell the difference between a gonnacry
or bashlite
infection.
Accuracy took a considerable hit when attempting to identify the specific binary being executed, but the system still manged a respectable 82.28%. Perhaps most impressively, the team claims an accuracy of 82.70% when attempting to identify between various types of malware even when attempts were made to actively obfuscate their execution, such as running them in a virtualized environment.
Realistic Expectations
While the results of the experiment are certainly compelling, it’s important to stress that this all took place under controlled and ideal conditions. At no point in the paper is it claimed that this technique, at least in its current form, could actually be used in the wild to determine if a computer or IoT device has been infected with malware.
At the absolute minimum, data would need to be collected on a much wider array of computing devices before you could even say if this idea has any practical application outside of the lab. For their part, the authors say they chose the Pi 2B as a sort of “boilerplate” device; believing it’s 32-bit ARM processor and vanilla Linux operating system provided a reasonable stand-in for a generic IoT gadget. That’s a logical enough assumption, but there’s still far too many variables at play to say that any of the EM signatures collected on the Pi test rig would be applicable to a random wireless router pulled off the shelf.
Still, it’s hard not to come away impressed. While the researchers might not have created the IT equivalent of the Star Trek medical tricorder, a device that you can simply wave over the patient to instantly see what malady of the week they’ve been struck by, it certainly seems like they’re tantalizingly close.
Maybe an easier and cheaper method would be to use the CPU’s performance counters. Although that seems obvious enough that I’d be very surprised if someone hasn’t already done/tried that.
Problem with using the CPU’s own reported data is that malware can potentially foul that data – its the observer and the observed being the same thing so you can’t really trust the data.
Also if they are targeting the IOT crowd then most of these devices should be headless, with very little user access to how they work, so actually asking the system for such details may not even be possible – if you are going to embed a fully functional Linux computer with the entire general purpose Linux OS then you are doing something rather wrong when it comes to IOT, where efficiency, small data footprint and low power have to be high priorities.
modifying a CPU’s hardware counters is usually impossible. Messing with the software that reads them might be easier, but then again, if you’re doing that, you might as well just randomize your own runtime…
If you’d like a very approachable dissection of “the Sagan standard” beyond Wikipedia and Quote Investigator, going back to Plato, you can find it here: https://link.springer.com/article/10.1007/s11406-016-9779-7
For those of us who spend time talking management sorts down from leaps into unworkable technologies, it’s handy to have in your pocket.
Old tricks… can’t find a proper source, but this will do…
50 years ago, before multiprocessing, before personal computers, an infinite loop meant that someone had to push a reset button to reboot the computer. Some computers had a transistor radio sitting nearby, or a speaker hooked to an address line. A constant tone from the speaker meant that the program had got stuck in a loop, so that the operator would know to reset it and send a rude note to the programmer who had submitted the job
Well back in the early 1970s our ICL 1904 had a speaker, so we didn’t even need the radio. George 3 did have process monitoring but certain sounds from the speaker were an indicator that you needed to use it to see which of the MOP terminals (I am looking at you Highways) was trying to sneak a high load task in without submitting a batch job, bypassing out scheduling and slowing down payroll and credit processing.
Similar to the NSA screen grabbing technique often referred to as TEMPEST.
You mean the one that Dutch guy invented and anyone can build at home?
“even when attempts were made to actively obfuscate their execution, such as running them in a virtualized environment.” This is not very surprising. Virtualization is mostly a software concept, the CPU/MMU has some useful features to allow for efficient implementations, but they are not involved most of the time. The CPU is not really concerned whether “ADD a, b” is called inside a VM, container, user/kernel mode, interrupt handler, or whatever else.
The observations would most likely differ if the VM involved CPU emulation.
A good point, though I wonder if it would differ enough to matter with CPU emulation – ultimately the program is still calling the same operations, and most of the CPU’s operations are pretty architecture neutral, so it would almost certainly run about as well as a legless critter with the emulation overheads but the patterns are quite likely still largely there, just played out a little slower and with the odd hitch thrown in when the operations really don’t match with the native host hardware well…
Would love to see this experiment carried forward, along with using the dateset created on the Pi to try classifying the crap on say a snapdragon common phone processor (so sticking to similarly modern Arm, as I expect in the arm family there would be enough similarity that not much retraining would be needed – though probably a much bigger change if you went back to much older ARM or AMD64)
Back in the ’80s we used Apollo DN3000 workstations with switching power supplies. The supplies made an audible “scratching” noise depending on the transient loads (probably due to inductor vibration). After a few weeks, you could tell what the machine was doing by the variations in this noise.
“scratching” noises in switchmode PSUs is almost always the PSU running in discontinuous mode, what you hear is it turning on and off ;-)
It’s inpressive that this is possible even under lab conditions, but I am not sure I understand what the real-world use case is envisioned to be. If the signatures are specific to the exact hardware setup, then it seems like it would be much easier to do this via a JTAG header or something. But if the idea is that the EM fingerprint of MalwareX 2.17 on ARM will be recognizably similar to the fingerprint of MalwareX 2.64 on X86, then that would be very interesting. Except, especially in the ltter case, this seems like it would be much more useful to a spy than to someone trying to secure their own network…
A field-trainable version could be quite useful. After you detect an invasion in your network, show it a normal device and a compromised device. Then go through the bazillion wifi base stations deployed to find the ones that remain compromised.
But a problem with this is malware that sleeps a lot of the time. If it is not executing anything, it is not detectable.
Thanks for this write up, the first I’ve found that actually bothered to read the paper and explain what they were doing, the equipment they were using (and how much it cost), and the limitations of the technique.
This doesn’t sound too extraordinary a claim to me. Most IoT devices should be pretty much doing the same thing all the time – probably mostly idle and sending a few packets a minute? – so detecting any deviations from that via EM sounds reasonable. It’s got to be much simpler than detecting it in a PC which is up to all kinds of stuff anyway.
On the assumption that most IoT devices are basically idle 99% of the time, I wouldn’t be surprised if you could spot some malware just be touching the device to see if it’s warmer than it should be! (Probably won’t work for lightbulbs, but the rest…)
I could always here the EMI from the tandy-2000 desktop running my BBS software , it has a heartbeat type of sound as it polled the modem port for signals , when a call came in the the EMI/RFI sound changed and you hear the remote keypress pretty good… more then i wanted to hear on the the ham radio… that tandy really had some RFI , I could hear it across the street sometimes on FM and 27mhz.