The late 1990s saw the widespread introduction of solid-state storage based around NAND Flash. Ranging from memory cards for portable devices to storage for desktops and laptops, the data storage future was prophesied to rid us of the shackles of magnetic storage that had held us down until then. As solid-state drives (SSDs) took off in the consumer market, there were those who confidently knew that before long everyone would be using SSDs and hard-disk drives (HDDs) would be relegated to the dust bin of history as the price per gigabyte and general performance of SSDs would just be too competitive.
Fast-forward a number of years, and we are now in a timeline where people are modifying SSDs to have less storage space, just so that their performance and lifespan are less terrible. The reason for this is that by now NAND Flash has hit a number of limits that prevent it from further scaling density-wise, mostly in terms of its feature size. Workarounds include stacking more layers on top of each other (3D NAND) and increasing the number of voltage levels – and thus bits – within an individual cell. Although this has boosted the storage capacity, the transition from single-level cell (SLC) to multi-level (MLC) and today’s TLC and QLC NAND Flash have come at severe penalties, mostly in the form of limited write cycles and much reduced transfer speeds.
So how did we get here, and is there life beyond QLC NAND Flash?
Floating Gates
At the core of NAND Flash lies the concept of floating gates, as first pioneered in the 1960s with the floating-gate MOSFET (FGMOS). As an FGMOS allows for the retention of a charge in the floating gate, it enabled the development of non-volatile semiconductor storage technologies like EPROM, EEPROM and flash memory. With EPROM each cell consists out of a single FET with the floating and control gates. By inducing hot carrier injection (HCI) with a programming voltage on the control gate, electrons are injected into the floating gate, which thus effectively turns the FET on. This allows then for the state of the transistor to be read out and interpreted as the stored bit value.
Naturally, just being able to program an EPROM once and then needing to erase the values by exposing the entire die to UV radiation (to induce ionization within the silicon oxide which discharges the FET) is a bit of a bother, even if it allowed the chip to be rewritten thousands of times. In order to make EPROMs in-circuit rewritable, EEPROMs change the basic FET-only structure with two additional transistors. Originally EEPROMs used the same HCI principle for erasing a cell, but later they would switch to using Fowler-Nordheim tunneling (FNT, the wave-mechanical form of field electron emission) for both erasing and writing a cell, which removes the damaging impact of hot carrier degradation (HCD). HCD and the application of FNT are both a major source of the physical damage that ultimately makes a cell ‘leaky’ and rendering it useless.
Combined with charge trap flash (CTF) that replaces the original polycrystalline silicon floating gate with a more durable and capable silicon nitride material, modern EEPROMs can support around a million read/write cycles before they wear out.
Flash memory is a further evolution of the EEPROM, with the main distinctions being a focus on speed and high storage density, as well as the use of HCI for writes in NOR Flash, due to the speed benefits this provides. The difference between NOR and NAND Flash comes from the way in which the cells are connected, with NOR Flash called that way because it resembles a NOR gate in its behavior:
To write a NOR Flash cell (set it to logical ‘0’), an elevated voltage is applied to the control gate, inducing HCI. To erase a cell (reset to logical ‘1’), a large voltage of opposite polarity is applied to the control gate and the source terminal, which draws electrons out of the floating gate due to FNT.
Reading a cell is then performed by pulling the target word line high. Since all of the storage FETs are connected to both ground and the bit line, this will pull the bit line low if the floating gate is active, creating a logical ‘1’ and vice versa. NOR Flash is set up to allow for bit-wise erasing and writing, although modern NOR Flash is moving to a model in which erasing is done in blocks, much like with NAND Flash:
The reason why NAND Flash is called this way is readily apparent from the way the cells are connected, with a number of cells connected in series (a string) between the bit line and ground. NAND Flash uses FNT for both writing and erasing cells, which due to its layout always has to be written (set to ‘0’) and read in pages (a collection of strings), while erasing is performed on a block level (a collection of pages).
Unlike NOR Flash and (E)EPROM, the reading out of a value is significantly more complicated than toggling a control gate and checking the level of the bit line. Instead the control gate on a target cell has to be activated, while putting a much higher (>6V) voltage on the control gate of unwanted cells in a string (which turns them on no matter what). Depending on the charge inside the floating gate, the bit line voltage will reach a certain level, which can then be interpreted as a certain bit value. This is also how NAND Flash can store multiple bits per cell, by relying on precise measurements of the charge level of the floating gate.
All of this means that while NOR Flash supports random (byte-level) access and erase and thus eXecute in Place (XiP, allows for running applications directly off ROM), NAND Flash is much faster with (block-wise) writing and erasing, which together with the higher densities possible has led to NAND Flash becoming the favorite for desktop and mobile data storage applications.
Scaling Pains
With the demand for an increasing number of bytes-per-square-millimeter for Flash storage ever present, manufacturers have done their utmost to shrink the transistors and other structures that make up a NAND Flash die down. This has led to issues such as reduced data retention due to electron leakage and increased wear due to thinner structures. The quick-and-easy way to bump up total storage size by storing more bits per cell has not only exacerbated these issues, but also introduced significant complexity.
The increased wear can be easily observed when looking at the endurance rating (program/erase (P/E) cycles per block) for NAND Flash, with SLC NAND Flash hitting up to 100,000 P/E cycles, MLC below 10,000, TLC around a thousand and QLC dropping down to hundreds of P/E cycles. Meanwhile the smaller feature sizes have made NAND Flash more susceptible to electron leakage from electron mobility, such from high environmental temperatures. Data retention also decreases with wear, making data loss increasingly more likely with high-density, multiple bits per cell NAND Flash.
Because of the complexity of QLC NAND Flash with four bits (and thus 16 voltage levels) per cell, the write and read speeds have plummeted compared to TLC and especially SLC. This is why QLC (and TLC) SSDs use a pseudo-SLC (pSLC) cache, which allocates part of the SSD’s Flash to be only used only with the much faster SLC access pattern. In the earlier referenced tutorial by Gabriel Ferraz this is painfully illustrated by writing beyond the size of the pSLC cache of the target SSD (a Crucial BX500):
Although the writes to the target SSD are initially nearly 500 MB/s, the moment the ~45 GB pSLC cache fills up, the write speeds are reduced to the write speeds of the underlying Micron 3D QLC NAND, which are around 50 MB/s. Effectively QLC NAND Flash is no faster than a mechanical HDD, and with worse data retention and endurance characteristics. Clearly this is the point where the prophesied solid state storage future comes crumbling down as even relatively cheap NAND Flash still hasn’t caught up to the price/performance of HDDs.
The modification performed by Gabriel Ferraz on the BX500 SSD involves reprogramming its Silicon Motion SM2259XT2 NAND Flash controller using the MPTools software, which is not provided to consumers but has been leaked onto the internet. While not as simple as toggling on a ‘use whole SSD as pSLC’ option, this is ultimately what it comes down to after flashing modified firmware to the drive.
With the BX500 SSD now running in pSLC mode, it knocks the storage capacity down from 500 GB to 120 GB, but the P/E rating goes up from a rated 900 cycles in QLC mode to 60,000 cycles in pSLC mode, or well over 3,000%. The write performance is a sustained 496 MB/s with none of the spikes seen in QLC mode, leading to about double the score in the PCMark 10 Full System Drive test.
With all of this in mind, it’s not easy to see a path forward for NAND Flash which will not make these existing issues even worse. Perhaps Intel and Micron will come out of left field before long with a new take on the 3D XPoint phase-change memory, or perhaps we’ll just keep muddling on for the foreseeable future with ever worse SSDs and seemingly immortal HDDs.
Clearly one should never believe prophets, especially not those for shiny futuristic technologies.
Featured image: “OCZ Agility 3 PCB” by [Ordercrazy]
You can get somewhat reliable SSDs from Swissbit. They sell pSLC as well as true SLC drives. Not exactly cheap though.
I can also recommend Cactus Technoplogies. https://www.cactus-tech.com/
We used to get a lot of PC Card format memory from them for weird embedded computers.
They make SLC and pSLC products, but they have the same cost issues as swissbit.
Buying NAND to only use 1/4 of it makes the economics kinda poor.
It’s not obvious to me that density is a big problem for NAND flash. I can currently put 8TB on a PCIe NVMe carrier and it’s not obvious that we have a short-term need for more than that. Those drives are still considered huge and difficult to get. They’re also hardly filling up all the space with silicon; both speed and capacity worries could be addressed by just stacking more ICs on a module, or by making motherboards with more M.2 slots. My current laptop came with two M.2 slots; who really needs 16TB of storage on their laptop?
People always say that over the years. “Who really needs more than ____”. Things keep getting bigger and bigger, some games are into the hundreds of GB. For a single game.
A baseline Windows 11 install is 20-27 GB alone. Linux used to fit on less than 4GB with room to spare…now most distros tend to be more (with WM and DE).
Just my opinion, but software optimization in the consumer software market is long gone. Instead unoptimized software (in terms of both storage and performance) is the norm.
I agree. In my old laptop that we take along on trips, I have a 2TB SSD for OS drive and 1TB SSD for data drive… so swimming in disk space. Reason it seems backwards is when SSDs were ‘cheap’, I bought a 2TB SSD to replace the 128G that was in it from the factory. Spread those writes around :) .
Also I think SSDs are ‘reliable’ enough (in most cases) to outlast the system they are running on. I’ve yet to experience an SSD ‘failure’. Knock on wood. That said, I do keep good and many backups — just in case.
Many applications these days are dozens or hundreds of megabytes, but there is no reason why they couldn’t be 1 megabyte.
People forget how much 1 MegaByte is. The entire Bible is only a few MegaBytes and can be compressed to be around 1 MegaByte. Imagine how much source code that is (and compiled it is much more code).
And many applications just need the business logic as operating systems supply many necessary libraries for GUI and IO. But applications come with a lot of bloat.
Not just applications. Operating systems come with a lot of bloat. Often applications that cannot be uninstalled.
The solution is to write more elegant software. Hire better programmers. Software quality is a major issue. This is why software gets slower and bigger every year. It’s like companies stopped caring when they no longer had to fit things on a CD and download speeds improved enough for people to quickly download software.
Multimedia takes up a lot of space.
A wider adoption of JPEG-XL would help a lot (supports lossless/reversible recompression of JPEG, 20% reduction in file size).
Movies take up a lot more space. But with the decline of optical media people want all of it on their drives. I would love optical media to stay.
I still remember my first hard disk on a PC-XT clone. 20 megabytes! Looking at the free space reported by CHKDSK… can it really have so many digits! All my software, tools and utilities took much less than half of the space.
Later, when the free space started to get low, I installed an RLL controller and got 30MB. As disks got cheaper, I added another, 60MB disk, for a total of 90MB, and was able to install Slackware Linux beside the MSDOS. Oh the happiness.
“The solution is to write more elegant software. Hire better programmers. Software quality is a major issue” : Got to remember that ‘time’ is money. So ‘good enough’ is the motto for most cases. Let the consumer find the edge case bugs. That is, unless you are writing mission critical software, where money is no object (relatively) to write tight, efficient, and correct code.
We’e always known there was a long-term role for both magnetic and solid-state storage, even when there was more optimism about progress in SSDs.
Yet somehow “we” decided to summarily ditch spinning disks from consumer devices, even though it meant smaller and more expensive disks, and in many cases non-replaceable storage and dependence on cloud services. Only one of those things constitutes progress, and only if you’re Amazon or Google – as for users, it looks we’ve been plain old hoodwinked.
It’s true there are performance gains from SSDs, but a hybrid arrangement can give you all those gains *and* abundant cheap local storage. Except that option was quickly discarded in favor of everyone renting hard disk space (in the cloud) instead of buying it.
I make monthly backups of all my SSD drives to spinning USB drives. So far, no problems with SSD (totalling 2.5 TB) but one 500 GB USB HD has failed. I trust that the probability of both the SSD and (otherwise unused) backup HD failing during same month is sufficiently low.
Of course I also follow the SSD error statistics with smartd and smartctl.
Keeping your data in multiple places is the key. I avoid cloud storage however.
“and dependence on cloud services”: I never bought into the cloud for storage mantra for several reasons of my own. That is why I also use cheap 4TB and 8TB HDD external backup drives for on-site and off-site backups. Speed isn’t a concern when backing up data…. Works well, and my data stays relatively private and safe. FYI, 1TB and 2TB SSD drives are not that pricey and do work well for local storage. I have 5 2TB 2.5 SATA SSD drives, and one HDD (for quick backups) in my local file server. And all desktops and laptops have at least one 2TB SSD drive in them. Works for me.
I have been seeing the issues with SSD life degredation for the past few years, but personally I see all of my small devices as temporary…
I am moving ever closer to having no single copies of data, pushing toward smaller drives in my laptops and SD cards, and syncing all of the data up to my NAS, and once I figure out wireguard, to be replicated to offsite storage and also up to the cloud.
I have just seen to many memory cards fail, things get dropped or lost, etc to have any trust in *any* single storage medium.
Personally, I really hope that we get significant investment in MRAM to overcome some of these endurance hurdles.
Everspin has been working on this tech for ages now and it seems quite capable and mature, and the low latency like SRAM and 3Dxpoint would be a good fit with the bus-connected CXL technologies we are starting to see.
