Over on the Cloudflare blog, [Marek] found himself wondering about computer memory, as we all sometimes do. Specifically, he pondered if he could detect the refresh of his SDRAM from within a running program. We’re probably not ruining the surprise by telling you that the answer is yes — with a little more than 100 lines of C and help from our old friend the Fast Fourier Transform (FFT), [Marek] was able to detect SDRAM refresh cycles every 7818.6 ns, lining right up with the expected result.
The “D” in SDRAM stands for dynamic, meaning that unless periodically refreshed by reading and writing, data in the memory will decay. In this kind of memory, each bit is stored as a charge on a tiny capacitor. Given enough time (which varies with ambient temperature), this charge can leak away to neighboring silicon, turning all the 1s to 0s, and destroying the data. To combat this process, the memory controller periodically issues a refresh command which reads the data before it decays, then writes the data back to fully charge the capacitors again. Done often enough, this will preserve the memory contents indefinitely. SDRAM is relatively inexpensive and available in large capacity compared to the alternatives, but the drawback is that the CPU can’t access the portion of memory being refreshed, so execution gets delayed a little whenever a memory access and refresh cycle collide.
Chasing the Correct Hiccup
[Marek] figured that he could detect this “hiccup,” as he calls it, by running some memory accesses and recording the current time in a tight loop. Of course, the cache on modern CPUs would mean that for a small amount of data, the SDRAM would never be accessed, so he flushes the cache each time. The source code, which is available on GitHub, outputs the time taken by each iteration of the inner loop. In his case, the loop typically takes around 140 ns.
Hurray! The first frequency spike is indeed what we were looking for, and indeed does correlate with the refresh times.
The other spikes at 256kHz, 384kHz, 512kHz and so on, are multiplies of our base frequency of 128kHz called harmonics. These are a side effect of performing FFT on something like a square wave and totally expected.
As [Marek] notes, the raw data doesn’t reveal too much. After all, there are a lot of things that can cause little delays in a modern multitasking operating system, resulting in very noisy data. Even thresholding and resampling the data doesn’t bring refresh hiccups to the fore. To detect the SDRAM refresh cycles, he turned to the FFT, an efficient algorithm for computing the discrete Fourier transform, which excels at revealing periodicity. A few lines of python produced the desired result: a plot of the frequency spectrum of the lengthened loop iterations. Zooming in, he found the first frequency spike at 127.9 kHz, corresponding to the SDRAMs refresh period of 7.81 us, along with a number of other spikes representing harmonics of this fundamental frequency. To facilitate others’ experiments, [Marek] has created a command line version of the tool you can run on your own machine.
If this technique seems familiar, it may be because it’s similar the the Rowhammer attack we covered back in 2015, which can actually change data in SDRAM on vulnerable machines by rapidly accessing adjacent rows. As [Marek] points out, the fact that you can make these kinds of measurements from a userspace program can have profound security implications, as we saw with the meltdown and spectre attacks. We have to wonder what other vulnerabilities are lying inside our machines waiting to be discovered.
At the 33rd annual Chaos Communications Congress, [Antonio Barresi] and [Erik Bosman] presented not one, not two, but three (3!!) great hacks that were all based on exploiting memory de-duplication in virtual machines. If you’re interested in security, you should definitely watch the talk, embedded below. And grab the slides too. (PDF)
Memory de-duplication is the forbidden fruit for large VM setups — obviously dangerous but so tempting. Imagine that you’re hosting VMs and you notice that many of the machines have the same things in memory at the same time. Maybe we’re all watching the same cat videos. They can save on global memory across the machines by simply storing one copy of the cat video and pointing to the shared memory block from each of the machines that uses it. Notionally separate machines are sharing memory. What could go wrong?
The technique is deceptively simple. Dynamic RAM is organized into a matrix of rows and columns. By performing fast reads on addresses in the same row, bits in adjacent rows can be flipped. In the example image to the left, fast reads on the purple row can cause bit flips in either of the yellow rows. The Project Zero team discovered an even more aggressive technique they call “double-sided hammering”. In this case, fast reads are performed on both yellow rows. The team found that double-sided hammering can cause more than 25 bits to flip in a single row on a particularly vulnerable computer.
Why does this happen? The answer lies within the internal structure of DRAM, and a bit of semiconductor physics. A DRAM memory bit is essentially a transistor and a capacitor. Data is stored by charging up the capacitor, which immediately begins to leak. DRAM must be refreshed before all the charge leaks away. Typically this refresh happens every 64ms. Higher density RAM chips have forced these capacitors to be closer together than ever before. So close in fact, that they can interact. Repeated reads of one row will cause the capacitors in adjacent rows to leak charge faster than normal. If enough charge leaks away before a refresh, the bit stored by that capacitor will flip.
Cache is not the answer
If you’re thinking that memory subsystems shouldn’t work this way due to cache, you’re right. Under normal circumstances, repeated data reads would be stored in the processor’s data cache and never touch RAM. Cache can be flushed though, which is exactly what the Project Zero team is doing. The X86 CLFLUSH opcode ensures that each read will go out to physical RAM.
Wanton bit flipping is all fine and good, but the Project Zero team’s goal was to use the technique as an exploit. To pull that off, they had to figure out which bits they were flipping, and flip them in such a way as to give elevated access to a user level process. The Project Zero team eventually came up with two working exploits. One works to escape Google’s Native Client (NaCL) sandbox. The other exploit works as a userspace program on x86-64 Linux boxes.
Native Client sandbox escape exploit
Google defines Native Client (NaCL) as ” a sandbox for running compiled C and C++ code in the browser efficiently and securely, independent of the user’s operating system.” It was designed specifically as a way to run code in the browser, without the risk of it escaping to the host system. Let that sink in for a moment. Now consider the fact that rowhammer is able to escape the walled garden and access physical memory. The exploit works by allocating 250MB of memory, and rowhammering on random addresses, and checking for bit flips. Once bit flips are detected, the real fun starts. The exploit hides unsafe instructions inside immediate arguments of “safe” institutions. In an example from the paper:
Viewed from memory address 0x20EA0, this is an absolute move of a 64 bit value to register rax. However, if we move off alignment and read the instruction from address 0x20EA02, now it’s a SYSCALL – (0F 05). The NaCL escape exploit does exactly this, running shell commands which were hidden inside instructions that appeared to be safe.
Linux kernel privilege escalation exploit
The Project Zero team used rowhammer to give a Linux process access to all of physical memory. The process is more complex than the NaCL exploit, but the basic idea revolves around page table entries (PTE). Since the underlying structure of Linux’s page table is well known, rowhammer can be used to modify the bits which are used to translate virtual to physical addresses. By carefully controlling which bits are flipped, the attacking process can relocate its own pages anywhere in RAM. The team used this technique to redirect /bin/ping to their own shell code. Since Ping normally runs with superuser privileges, the shell code can do anything it wants.
Rowhammer is a nasty vulnerability, but the sky isn’t falling just yet. Google has already patched NaCL by removing access to the CLFLUSH opcode, so NaCL is safe from any currently known rowhammer attacks. Project Zero didn’t run an exhaustive test to find out which computer and RAM manufacturers are vulnerable to rowhammer. In fact, they were only able to flip bits on laptops. The desktop machines they tried used ECC RAM, which may have corrected the bit flips as they happened. ECC RAM will help, but doesn’t guarantee protection from rowhammer – especially when multiple bit flips occur. The best protection is a new machine – New RAM technologies include mitigation techniques. The LPDDR4 standard includes “Targeted Row Refresh” (TRR) and “Maximum Activate Count” (MAC), both methods to avoid rowhammer vulnerability. That’s a good excuse to buy a new laptop if we ever heard one!