This Week In Security:Use-After-Free For Dummies, WiFi Cracking, And PHP-FPM

October 29, 2021

In a brilliant write-up, [Stephen Tong] brings us his “Use-After-Free for Dummies“. It’s a surprising tale of a vulnerability that really shouldn’t exist, and a walkthrough of how to complete a capture the flag challenge. The vulnerable binary is running on a Raspberry Pi, which turns out to be very important. It’s a multithreaded application that uses lock-free data sharing, through pair of integers readable by multiple threads. Those ints are declared using the volatile keyword, which is a useful way to tell a compiler not to optimize too heavily, as this value may get changed by another thread.

On an x86 machine, this approach works flawlessly, as all the out-of-order execution features are guaranteed to be globally transparent. Put another way, even if thread one can speed up execution by modifying shared memory ahead of time, the CPU will keep the shared memory changes in the proper order. When that shared memory is controlling concurrent access, it’s really important that ordering happens the way you expect it. What was a surprise to me is that the ARM platform does not provide that global memory ordering. While the out-of-order execution will be transparent to the thread making changes, other threads and processes may observe those actions out of order. An example may help:

volatile int value;
volatile int ready;

// Thread 1
value = 123; // (1)
ready = 1; // (2)

// Thread 2
while (!ready); // (3)
print(value); // (4)

This is one of [Stephen]’s examples. If this were set up to run in two threads, on an x86 machine you would have a guarantee that (4) would always print 123. On an ARM, no such guarantee. You may very well have an uninitialized value. It’s a race condition. Now you may look at this and wonder like I did, how does anyone program anything for ARM chips? First thing, even though memory reordering is a thing, ARM guarantees consistency within the same thread. This quirk only affects multi-threaded programming. And second, libraries for multi-threaded programming offer semantics for marking memory access that need to be properly ordered across threads.

The actual exploitable binary in question uses a circular queue for the inter-process buffer, and tracks a head and tail location, to determine how full the buffer is. One process puts data in, the second reads it out. The vulnerability is that when the buffer is completely full, memory manipulation reordering can result in a race condition. This ring buffer gets filled with pointers, and when the race is won by an attacker, the same pointer is used twice. In essence, the program now has two references to the same object. Without any further tricks, this results in a double free error when the second reference is released.

What are the tricks we could use to make this into an exploit? First, know that what we have is two references to an object. That object contains a pointer to another string, the length of which is entirely controlled by the user provided data. We can trigger a release of one of those references, which leads to the object getting freed, but we still have another reference, which now points to uninitialized memory. To turn this into an arbitrary read, a very clever trick is used. Before freeing our object, we allocate another object, and store a long-ish string. Then we free the object we have a double reference to, and finally free the object with the long string. Finally, we allocate one more object, but the string we store is crafted to look like a valid object. Memory gets reallocated in a last in, first out order, so the string is stored in the reclaimed memory we still have a reference to. The program expects the object to contain a pointer to a string, so our fake object can point to arbitrary memory, which we can then read.

The last trick is arbitrary write, which is even harder to pull off. The trick here is actually perform the double free, but manipulate the system so it doesn’t result in a segfault. We can use the above trick to write arbitrary data to a freed memory location. Because the location has made it onto the free list twice, the system still considered it free even though it’s also in use. The Linux memory manager uses a clever trick to manage reclaimed memory chunks, storing a pointer to the next reclaimed location in each chunk. Write the location you want to overwrite in that free chunk, and then allocate another chunk. The system now thinks your arbitrary location is the next free memory location to use. The next allocation is your arbitrary write. The writeup has more details, as well as the rest of the exploitation chain, so be sure to read the whole thing.

How Secure is that WiFi?

[Ido Hoorvitch] of CyberArk had some pandemic induced time on his hands, and opted to collect packet captures of 5000 password protected WiFi networks around Tel Aviv. In the old days, you had to capture a 4-way handshake to have any chance at breaking WPA encryption. In 2018 a new technique was discovered, where a single authentication response was all that was required to attempt to crack the key — no active user required. The magic string here is the PMKID, which is a SHA-1 hash of the WPA password and other network details, first run through a key derivation function.

The popular tool, Hashcat, can take advantage of a GPU to accelerate the cracking of a PMKID. SHA-1 hashes are one of the things GPUs are particularly good at, after all. The 8 Quadros managed almost 7 million hash calculations per second. The problem with trying to crack a WPA key is that while they must be at least 8 characters long, they can be much longer, making for an enormous search space. The first trick [Ido] used was to take advantage of one of the common password sources, a cell phone number. In Tel Aviv, that means the password is 05 followed by 8 more digits. That’s a searchable key space, and of the 5000 sniffed networks, nearly half were cracked by this approach. Next was pointing Hashcat at a dictionary file, to automatically try known passwords. Between the dictionary attack, and constraint-based approaches like the cell number format, 70% of the networks targeted were cracked. The takeaway? Use a long password that isn’t easily guessed, and won’t be easily part of a constrained search.

Google Use After Free PoC

Reported in June of this year by the Security For Everyone Team, CVE-2021-30573 now has a published PoC. This vulnerability was fixed in Chrome/Chromium 92. The triggering code is a bit of simple but very malformed HTML. Trying to parse this code just by looking at it, I immediate called it “cursed” HTML, so there’s no wonder Chrome had trouble with it, too.

<select class="form-control">
<option style="font-size: 1rem;" value="
<"">
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA(abbreviated)
</>">a
</option>
</select>

PHP Worker to Root

A bug in PHP-FPM discovered by Ambionics Security allows jumping from control over a PHP worker straight to system root. While it’s a serious problem, this isn’t a remote code execution vulnerability. Some other techniques needs to be used first to take over a PHP worker thread. This means an attacker would nead to be able to run PHP code, and then also find a way to escape the PHP “sandbox.” While not trivial, there are techniques and bugs to make this possible.

The problem is that the inter-process communication mechanism is shared mapped memory, and far too much of the data structure is made available to the individual workers. A worker can modify the main data structure, causing the top-level process to write to arbitrary memory locations. While the location may be arbitrary, the actual data writes are extremely limited in this vulnerability. In fact, it boils down to two write primitives: Set-0-to-1 and clear-1168-bytes. That may not seem like much, but there are a lot of flags that can be toggled by setting a value to 1, and the rest of the exploit makes heavy use of that technique. The real trick is to generate arbitrary error messages, then use the 0-to-1 primitive to corrupt the data structure of those messages.

The vulnerability has been around for a very long time, since PHP 5.3.7. It’s fixed in 8.0.12, 7.4.25, and 7.3.32. One final wrinkle here is PHP 7.3 is still in security support, but this was considered an invasive change, and the PHP maintainers initially opted not to push the fix to this older version. After some back and forth on the bug discussion, the right call was made, and 7.3.32 has been released with the fix.

Gitlab in the Wild

HN Security had a client report something suspicious, and it turns out to be CVE-2021-22205 in use in the wild. This bug is a problem in ExifTool, where a DjVu file can execute arbitrary perl code. This RCE was abused to make new users admin on the attacked system. If you’re running Gitlab, make sure you’re up-to-date. Versions 13.10.3, 13.9.6, and 13.8.8 were released with the fix on April 14 of this year. It appears that the 14.* versions were never vulnerable, as version 14.0 was released after this fix. Hackerone and the entire bug bounty community has its share of problems, but the disclosure thread for this one is an example of a program run correctly.

27 thoughts on “This Week In Security:Use-After-Free For Dummies, WiFi Cracking, And PHP-FPM”

abjq says:

October 29, 2021 at 7:41 am

With ARM compilers using volatile isn’t enough for such operations (depending on the compiler).
You also have to use memory barrier instructions (again, syntax depending on the compiler) to enforce that data is flushed to/from memory before proceeding away from a critical operation. Then any other thread can depend on the data being consistent.
When porting to ARM yes this can be an extra headache, but really the programmers doing the port should know all this stuff already.

Report comment

Reply
1. steelman says:
  
  October 29, 2021 at 7:51 am
  
  Yes. This is because of cache incoherence. Each core/thread sees a copy of a variable in its own cache which may hold a different value until memory barrier instruction is issued.
  
  Report comment
  
  Reply
  1. abjq says:
    
    October 29, 2021 at 8:27 am
    
    That’s a different thing, for which you need cache flushing/invalidation.
    
    dmb/dsb/isb is for the SAME core, but different threads/ISR’s (as well as different cores too.)
    
    Report comment
    
    Reply
    1. steelman says:
      
      October 29, 2021 at 9:17 am
      
      Are you sure? This would mean information about pending memory accesses are part of a thread context. This is the only reason I can imagine, why a different thread on the same core would see a different value in memory. Otherwise, how would the same thread (on the same core) be able to properly retrieve the value it wrote a few instructions earlier.
      
      Report comment
      
      Reply
      1. abjq says:
        
        October 30, 2021 at 3:15 am
        
        Yes I am sure. The simple explanation for the need for memory barriers is that the instruction and data pipelines on ARM take different routes. So you code can be executing away without the data having made it to memory (this improves performance).
        
        The memory barrier instruction is a synchronisation point between the two pipelines.
        
        If you don’t do it, then later code can be executing that references the same memory, and reads the value in from the physical memory rather than the value that is traversing the previous write pipeline. You might thing of the pipeline values as tiny caches but that would probably confuse the matter as the data and instruction caches are much bigger, use different hardware, and have a different “synchronisation API”.
        
        ARM do a course on this stuff, I did it as part of an M3 and M7 programming course.
        I have also experienced these opportunities for bugs first hand while writing device drivers and fixing linked list code.
        
        Report comment
      2. abjq says:
        
        October 30, 2021 at 3:47 am
        
        Oh and in case I didn’t make this clear – for your “multiple threads” you don’t even need an RTOS or similar, if you use IRQ’s. Just consider an IRQ to be a different thread to the main code, and add memory barriers where needed.
        
        Report comment
2. X says:
  
  October 29, 2021 at 8:16 am
  
  “really the programmers doing the port should know all this stuff already.”
  
  When a “high level” language is incapable of hiding the basic chip architecture from the application programmer, it is most certainly a grave flaw in the language.
  
  WHY does Every C programmer have to memorize thousands of pages of chip manuals and work out the arcane details of mulitithreaded memory coherency for a long laundry list of architectures? Why do we all have to be experts on something that the language could be handling transparently? Is this really what we “should” be doing?
  
  And where are these “programmers doing the port”? They do not exist! They are exactly the same programmers who have been making security disaster after security disaster for decades now, why do you expect them to magically get better at their jobs?
  
  Report comment
  
  Reply
  1. tekkieneet says:
    
    October 29, 2021 at 9:50 am
    
    That sounds more like an implementation issue. i.e. the compiler code generator or thread library doesn’t have the necessary memory coherency for the architecture.
    
    Report comment
    
    Reply
    1. X says:
      
      October 29, 2021 at 10:02 am
      
      The compiler is not going to figure out that you forgot to use the barrier instructions that were not needed when the original developer wrote the code for x86_64. They retired and are no longer available so you have to pick over the code yourself to find the race conditions.
      
      Report comment
      
      Reply
      1. abjq says:
        
        October 30, 2021 at 3:20 am
        
        True – and it’s all “part of the fun” when porting code between architectures.
        Bun now I’ve been doing ARM (“do no ARM” said the x86 programmer :-) ) for a few years this stuff is now second nature and when I look at such code I’m internally thinking “needs a barrier there”, “needs a barrier here” etc.
        
        I suppose it would therefore be possible to get a compiler/preprocessor to do this for you maybe with some simple rules. Then produce a copy of the same code with the barriers, which you can then save over the original. That would be pretty neat.
        
        Report comment
  2. pelrun says:
    
    October 30, 2021 at 1:48 am
    
    I’ve written C code on countless embedded targets over a couple of decades now, and I’ve never “memorised thousands of pages of chip manuals”. Sure, I have to read them, but I can write code to handle it and then forget most of the details. Or use an existing library where someone else has done the work.
    
    It’s ridiculous to assume you can blindly execute *any* code on a different platform and expect it to work perfectly. Even if there was a language that cleanly hid those details across every platform… is that supposed to have magically appeared out of the aether? Someone has to have written it, and they have to do it it in a language that gives access to those platform features. If this magic language has hidden it all, it obviously can’t be self-hosting… which means someone has to be doing it in C *anyway*. If it’s exposed via some mechanism then you’ll have someone try to dumbly use THAT instead of the intended method and we’re right back here again.
    
    Don’t blindly thow ‘volatile’ around unless you understand what it does, which means understanding the platform you’re on. If you can’t do that, use one of the libraries that already do the job properly. Which is *functionally identical* to having it built into the language.
    
    Report comment
    
    Reply
3. jpa says:
  
  October 29, 2021 at 8:33 am
  
  I wonder what is the penalty of memory barriers, or could compilers just emit them after each volatile access to simplify porting?
  
  Report comment
  
  Reply
  1. X says:
    
    October 29, 2021 at 9:04 am
    
    Memory barriers have huge performance penalties on modern processors. The execution pipeline must be flushed. You might just as well throw out 30 years of innovation and compile for 80386 because that’s what kind of performance you will get.
    
    Report comment
    
    Reply
4. Truth says:
  
  October 29, 2021 at 9:57 am
  
  I’m looking at it and thinking bad software design with no thread safe mutex lock. If you have multiple threads accessing the exact same memory why would you ever consider that the RAM would be in a consistent state. Me, if I was carrying out something like a multi threaded Fast Fourier Transform (for the sake of an example), I’d treat any RAM shared between threads as read only and allocate a mutex lock per thread for the block of memory being processed in parallel and only once all locks had unlocked by all the threads then would I consider it safe to allow a different thread to write to that block of RAM (with an additional mutex lock). I would end up using more memory but I would be thread safe. I guess the problem is if people fully think things through or half ass the job (Bailey, the “I have no idea what I’m doing” dog always comes to mind https://i.imgur.com/ZQM77OT.jpeg RIP: 2009-2016).
  
  Report comment
  
  Reply
  1. X says:
    
    October 29, 2021 at 10:12 am
    
    “I’d treat any RAM shared between threads as read only and allocate a mutex lock per thread”
    
    This is a performance killing disaster and completely unnecessary in most circumstances, you can get far better performance and still ensure thread safety with lockless algorithms, you have to code it up yourself and you must be very careful but the rewards are big.
    
    This is why we need a new language, it should be easy to do the right thing.
    
    Report comment
    
    Reply
    1. Truth says:
      
      October 29, 2021 at 6:54 pm
      
      “This is a performance killing disaster and completely unnecessary in most circumstances”
      It was a contrived example but at the end of the day it depends on the size of the FFT if it was a 128 point then yes it would totally destroy performance, but if it was a 33554432 point FFT, then the overhead becomes insignificant.
      
      Report comment
      
      Reply
tyjuty says:

October 29, 2021 at 8:10 am

nobody use today a wpa

Report comment

Reply
Ren says:

October 29, 2021 at 8:56 am

Using a cell phone number as a network password!
Wow, I no longer have to use the combination for my luggage!

Report comment

Reply
1. Adrian says:
  
  October 29, 2021 at 10:00 am
  
  Quite common for WiFi access for customers, like at coffee shops, restaurants, small hotels, etc.
  
  Report comment
  
  Reply
  1. Ren says:
    
    October 29, 2021 at 10:25 am
    
    Thanks, I don’t access “public” Wi-Fi much, so I don’t think I’ve had to enter a phone number (theirs or mine) at this time.
    
    Report comment
    
    Reply
2. PPJ says:
  
  November 1, 2021 at 7:31 pm
  
  Your bank card comes with a choice of 4 pin numbers written on it;) Youare not obligated to use them directly. Simple math can donthe trick;)
  
  Report comment
  
  Reply
X says:

October 29, 2021 at 2:10 pm

The example is kind of contrived because you can use a byte sized type for the variables and just ignore the whole ordering problem. A smart compiler would figure this out and do it for you, but alas C compilers are too stupid to infer typing.

Report comment

Reply
1. animal717 says:
  
  October 29, 2021 at 4:35 pm
  
  Man, you really hate C. Using pointer(s) show us where C did ya wrong.
  just messing with ya, you can like/hate what ever you like/hate.
  Its all 01101111 01101110 01100101 01110011 00100000 01100001 01101110 01100100 00100000 01111010 01100101 01110010 01101111 01110011 00100000 01110100 01101111 00100000 00101011 00100000 00101101 00100000 01110110 01101111 01101100 01110100 01100001 01100111 01100101 00100000 01100110 01110010 01101111 01101101 00100000 01110110 01100001 01110010 01101001 01100001 01110100 01101001 01101111 01101110 01110011 00100000 01100100 01100101 01100110 01101001 01101110 01100101 01100100 00100000 01100010 01111001 00100000 01100011 01101111 01101110 01110011 01100101 01101110 01110011 01110101 01110011 00100000 01101111 01100110 00100000 01110000 01100001 01110010 01100001 01101101 01100101 01110100 01100101 01110010 01110011 00101110 00100000 01010100 01101000 01100101 01101110 00100000 01100001 01100111 01100001 01101001 01101110 00100000 01111001 01101111 01110101 00100000 01101000 01100001 01110110 01100101 00100000 01110100 01101111 00100000 01100101 01111000 01100011 01110101 01110011 01100101 00100000 01101101 01100101 00100000 01001001 00100111 01101101 00100000 01100001 00100000 01101100 01101001 01110100 01110100 01101100 01100101 00100000 01100010 01101001 01110100 00100000 01000010 01110010 01101001 01110100 01101001 01110011 01101000 01101100 01111001 00100000 01101101 01100001 01100100 . have a fun day
  
  Report comment
  
  Reply
Bit says:

October 29, 2021 at 4:29 pm

Oh boy, another thread where X acts like he’s smarter than everyone else and throws around a bunch of unearned hyperbolic snark.

Report comment

Reply
1. Ren says:
  
  October 29, 2021 at 5:05 pm
  
  I’ve written it before, at least he makes me look good!
  B^)
  
  Report comment
  
  Reply
2. pelrun says:
  
  October 30, 2021 at 1:49 am
  
  I’m just glad he signs his work consistently so I can recognise it before getting sucked in.
  
  Report comment
  
  Reply
Joel says:

October 31, 2021 at 6:42 pm

There are no guarantees on x86 either (MSVC has an exception that’s non-standard). The compiler is free of re-ordering accessed across volatiles. It is not allowed to reorder voalitles though. And even if the cache might reorder accesses (I believe ARM is notorious for ordering writes in memory address order), the CPU itself may also do this when reordering accesses in its pipeline.

Report comment

Reply

Hackaday

This Week In Security:Use-After-Free For Dummies, WiFi Cracking, And PHP-FPM

How Secure is that WiFi?

Google Use After Free PoC

PHP Worker to Root

Gitlab in the Wild

27 thoughts on “This Week In Security:Use-After-Free For Dummies, WiFi Cracking, And PHP-FPM”

Leave a Reply to XCancel reply

Search

Never miss a hack

If you missed it

NPAPI And The Hot-Pluggable World Wide Web

The Time Clock Has Stood The Test Of Time

The Rise And Fall Of The In-Car Fax Machines

How Advanced Autopilots Make Airplanes Safer When Humans Go AWOL

2025: As The Hardware World Turns

Our Columns

Fighting Food Poisoning With A Patch

Hackaday Podcast Episode 352: Visualizing Sound, And Windows 11 Is A Dog

How Do PAL And NTSC Really Work?

Linux Fu: Yet Another Shell Script Trick

Hands On WIth The Raspberry Pi Compute Module Zero

How Secure is that WiFi?

Google Use After Free PoC

PHP Worker to Root

Gitlab in the Wild

27 thoughts on “This Week In Security:Use-After-Free For Dummies, WiFi Cracking, And PHP-FPM”

Leave a Reply to XCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns