Overhauling The ESP8266’s Flash Memory Handling

September 9, 2016

If you’ve ever corrupted a flash memory on a power failure, you’ll be glad to hear that the ESP8266 SDK implements a very secure and almost infallible read/write management for its flash memory. The catch: It’s also very wasteful. For a single memory block of stored data, three memory blocks of physical flash memory are occupied. [Peter Scargill] enlightens us with a better solution.

When the ESP8266 writes data to its external flash memory, for example during an OTA update, it can’t simply overwrite the block holding the currently running program — it needs to write that data to a second block. Once the write operation is complete, it must keep track of which block holds the current data. For this, the ESP8266 SDK employs a third block, in which it stores the pointer to the current block. However, besides the block pointer, that third block stores no useful data.

It’s a deliberately wasteful technique that’s extremely useful for bulletproof firmware updates, but for storing additional data in the flash memory, you’d want a more efficient method. [Peter] managed to accomplish the same data integrity by using only two blocks per stored block of data. His method adds an 8-byte version counter to each block: When a block is read, the version counters are compared to retrieve the current data – when data is written, the version counter allows you to determine which block is the older one and can be overwritten.

[Peter] initially placed the version counter at the very end of a block, so it would naturally be written after the rest of the block has been written successfully. Unfortunately, flash memory practically requires you to clear an entire block before new data can be written to the same, so [Peter’s] method would leave the version counter in an erased state during the write operation. Eventually, he placed the version counter at the very beginning of the block using a flash-specific trick: When writing the data, he fills the first four bytes with ones (0xFFFFFFFF). This coincides with the erased state of the flash memory, allowing him to go back to the first four bytes and write the version counter after the entire block has been written successfully. A runnable test implementation of [Peter’s] overhauled flash read/write method for the ESP8266 can be found on his blog. Still too much overhead? Let us know in the comments!

11 thoughts on “Overhauling The ESP8266’s Flash Memory Handling”

werecatf says:

September 9, 2016 at 4:01 am

The Arduino-core for ESP8266 splits the Flash in two sections, one for the sketch and one for the SPIFFS filesystem. When performing an OTA-update it stores the new sketch at the end of the sketch-area, so it has to be large enough to be able to hold both the new and the old one simultaneously, sets a flag in the RTC-memory and resets the ESP8266, at which point the bootloader notices the flag in the RTC-memory and copies the new sketch over the old one and proceeds to load it as usual. This way all the contiguous area after the running sketch is available for OTA-updates and there’s no fragmentation. The SPIFFS filesystem isn’t touched at all, allowing its contents to remain as they were.

Report comment

Reply
1. M says:
  
  September 9, 2016 at 9:22 am
  
  Neat, but kind of irrelevant? Could this updated method not improve flash usage regardless?
  
  Report comment
  
  Reply
  1. werecatf says:
    
    September 9, 2016 at 5:51 pm
    
    Aye, it may be a tad irrelevant. I just shared it as a point of comparison, should anyone find it interesting.
    
    The way the Arduino-core writes the OTA-sketch to flash, then the bootloader writes it again to another location is kind of wasteful. It works, but it needlessly writes the same thing twice, wearing the flash out that much faster.
    
    Report comment
    
    Reply
StephaneAG says:

September 9, 2016 at 7:00 am

hi there !
If anyone has a working hack to allow support for 128M flash chips ( Winbond W25Q128 ), I’ll be glad to know about it ;p
https://github.com/themadinventor/esptool/issues/123
http://forum.espruino.com/conversations/279176/?offset=125#comment13202242

thannnnks :)

Report comment

Reply
Matt says:

September 9, 2016 at 8:10 am

This is pretty much how wear leveling in NOR flash parts work.

Report comment

Reply
thereza says:

September 9, 2016 at 9:32 am

I don’t understand how this would work. If you are updating the firmware and the upload dies, then you just guarantee no bad data in a 4K block. But half the code can be from the old firmware and the other half the new firmware.

Report comment

Reply
1. pelrun says:
  
  September 10, 2016 at 6:02 am
  
  No, because the version counters aren’t written until all the blocks are successfully uploaded. And then the bootloader can ensure all the version counters are consistent before deciding which firmware to boot. If they’re not, just start the old firmware.
  
  Report comment
  
  Reply
U.S. Water Rockets says:

September 9, 2016 at 9:40 am

Why don’t they just write a simple bootloader that can test for a new firmware stored in a specific flash area and have it checked with a hash and if valid it will copy it over the previous firmware and then mark the update complete? If the update fails due to power cycle, it will just reboot and repeat the process until it succeeds.

Report comment

Reply
1. 4afb says:
  
  September 9, 2016 at 10:01 am
  
  That’s exactly my method. While flashing keep two running checksums (crc32 or md5 will do), one tracks the data to be written and one tracks the data read back from the flash. At the end both must match and only then the flash operation is considered successful and only then the active boot slot is changed appreciately. No wasting sectors there, just verify what you wrote, just like in the old days.
  
  Report comment
  
  Reply
Blah` says:

September 9, 2016 at 3:22 pm

Thereza: You start by erasing the version number of the code you’re going to replace. Last thing you do is writing the new version number if everything checks out. If something goes wrong the other block has a valid number and will be chosen.

Water Rockets: What if power fails halfway during the copy, you have nothing.. and no way to re-download as the download code will be in the corrupted bootloader.

Report comment

Reply
ItsThatIdiotAgain says:

September 9, 2016 at 4:47 pm

Ooops the curse of HAD strikes again …”enlightens us with a better solution.” => Error establishing a database connection.
Never mind, I assume Peter will have the site back online soon.

Report comment

Reply