It’s a problem familiar to anyone who’s spent a decent amount of time playing with a Raspberry Pi – over time, the flash in the SD card reaches its write cycle limits, and causes a cavalcade of confusing errors before failing entirely. While flash storage is fast, compact, and mechanically reliable, it has always had a writeable lifespan much shorter than magnetic technologies.
Of course, with proper wear levelling techniques and careful use, these issues can be mitigated successfully. The surprising thing is when a major automaker fails to implement such basic features, as was the case with several Tesla models. Due to the car’s Linux operating system logging excessively to its 8 GB eMMC storage, the flash modules have been wearing out. This leads to widespread failures in the car, typically putting it into limp mode and disabling many features controlled via the touchscreen.
With the issue affecting important subsystems such as the heater, defroster, and warning systems, the NHTSA wrote to the automaker in January requesting a recall. Tesla’s response acquiesced to this request with some consternation, downplaying the severity of the issue. Now they are claiming that the eMMC chip, ball-grid soldered to the motherboard, inaccessible without disassembling the dash, and not specifically mentioned in the owner’s manual, should be considered a “wear item”, and thus should not be subject to such scrutiny.
Certainly An Odd Wear Item
Historically, major electronic parts in automobiles are not considered consumables. While it’s not uncommon for some cars to face issues with engine control units or body control modules, they’re not typically treated as wear items to be replaced at nominal intervals. Thus far, precedent has considered these parts as something to last the lifetime of the vehicle, and to be replaced in the case of unexpected malfunction. The Tesla case is different in that the eMMC failure is, by and large, inevitable. Rather than being a case of isolated malfunctions in a small percentage of cars as would be expected from the occasional manufacturing defect, this is a issue affecting every car that rolled off the line up to a certain date. Failure rates are up to 30 percent in certain build months. With the computer and touchscreen being in charge of so many vital vehicle functions, it’s not a defect that can be easily ignored by the end user.
Tesla’s assertion that the eMMC chip should be considered a ‘wear item’ is a dubious one at best. Flash memory does wear out, it’s true, as Tesla points out when discussing the limits of the technology. Many parts on a modern car wear out over time – brake pads, belts, and air filters are all common examples. The difference is that these parts are all designed to be replaced by the end user or a typical mechanic.
Trying to claim that a ball-grid array chip, permanently soldered onto a PCB and buried inside the dashboard is a wear item is patently ludicrous. If it were, we’d expect to see several things. There’d be a recommend time and mileage upon which the eMMC would be changed to avoid surprise failures, and this would be listed in the manual. Additionally, Tesla’s repair process would involve desoldering the eMMC chip from the board and replacing it directly. Given that Tesla are instead replacing the computers as a whole is indicative that the part is not being treated as a wear item by anyone, anywhere.
Obviously, the chip can be replaced, but it’s no easy job. Once the computer’s main board has been extracted from the car, the storage must be backed up over JTAG. Then, it must be carefully reflowed to remove the chip, in a delicate process that has a significant chance of damaging other components on the board. If the chip was a wear item, it wouldn’t require specialist BGA reflow equipment to change. We’d see Tesla doing it routinely, replacing a sub-$7 chip rather than swapping out entire mainboards instead at the costs of thousands of dollars. Granted, there are parts of modern cars that are also time consuming to replace – such as timing belts, water pumps, and so on. However, again, in these cases, automakers make it clear that these are wear items ahead of time, create maintenance schedules for them, and standard processes to change them.
Nobody would put up with swapping out their entire front suspension setup every time their brakes wore out – automakers realised brake pads were wear items and designed accordingly. Tesla simply dropped the ball, writing too often to the flash memory, which isn’t easily replaceable. The proper solution is trivial. Either stop logging so much to flash storage, or make it easier to swap out.
And maybe put the logs in their own partition. While SD cards probably aren’t up to snuff for storing the car’s operating system, they’d make a cheap place to store non-critical logs that probably are never read anyway. Alternatively, put the eMMC chip on a removable module, or just use an M.2 drive with automotive-rated connectors.
The issue is claimed to only effect models built prior to March 2018, which run on an NVIDIA Tegra 3. Later models are based on the Intel Atom, and feature a larger eMMC chip on board. These modules are yet to demonstrate the same failures, and Tesla claim they should not suffer the issue. We’ll see.
8 thoughts on “Tesla Recalls Cars With EMMC Failures, Calls Part A ‘Wear Item’”
As a Tesla owner, I know this is going to be an issue. I saw it before I bought my Model Y. Why they didn’t go for removable flash is beyond me. Even as tech advances, the packages will change. But at least there are adapters to deal with that. In this case, it’s far more complicated than replacing the flash on an iPhone, of which most consumers couldn’t do today.
Well, flash wears, why wouldn’t it be a “wear item”? Wear leveling is something that most products incorporate, regardless flash will not last forever. Tesla is just like any car company, the first adopters are the ones that get to find all the bugs. First gen cars are typically riddled with them.
The fact that eventually wears out is not in dispute, what is in dispute is that the chips were ever intended to be replaced. If they had only written to the memory on rare occasion then it could have easily lasted millions of miles of driving.
I find the statement:
“Now they are claiming that the eMMC chip, ball-grid soldered to the motherboard, inaccessible without disassembling the dash, and not specifically mentioned in the owner’s manual, should be considered a “wear item”, and thus should not be subject to such scrutiny.”
To seem very much inline with how Tesla and other tech companies operates.
Obviously nothing can ever actually be “incorrectly made”…
One big method of handling flash chips without “wear” becoming as big of an issue is to use something else for actual “day to day” activates, for an example some battery backed RAM that then only writes to flash if our main power source disappears.
For an electric car with a 100 kWh battery in it, running a small bank of RAM for months should be fairly trivial after all… Give it a super capacitor and it can quickly save its contents to flash if the battery ever were to suddenly disappear. In this setup the wear on the flash will likely be less than the wear on the car in general. (This is after all fairly common in a lot of other applications where flash storage is used in a high wear environment, like battery backed disk caching on some RAID cards for an example.)
Though, if one would like to treat the Flash chip as a wear item, then it shouldn’t be soldered to a board, unless it is a daughter board. (Ie, the storage should be sitting in a socket/connector.)
Then we have:
“The issue is claimed to only effect models built prior to March 2018, which run on an NVIDIA Tegra 3. Later models are based on the Intel Atom, and feature a larger eMMC chip on board. These modules are yet to demonstrate the same failures, and Tesla claim they should not suffer the issue. We’ll see.”
It doesn’t really matter that the eMMC chip is larger, it still wears.
Only exception is if it is so large that the time for it to wear out becomes similar to the life expectancy of the car. But even then, one generally prefers that a device doesn’t get unusable due to a relatively cheap thing breaking, so the eMMC flash should probably be on a connector regardless. (One reason I personally don’t like SBC with eMMC flash, to me, it is just future e-wast waiting to happen. Why throw away a whole SBC just because the flash died, though here we usually can just boot of something else…)
I though do like Lewin Day’s suggestion of keeping the log and other non vital data on a separate storage device that is far more trivially replaced.
Yes, this was an unforeseen problem that sandbagged Tesla. All they can do is fight their way out of it – hopefully they may be able to get some recourse from the BGA makers??
This is a time and labor issue. Tesla should do a time and motion optimization. There will be a number of different modules. They can set up a central repair depot to repair each type of module and set aside a stock of these to be shipped to the dozens of service bureaux all over. Then they can schedule cars in for repair with known repair modules at hand to minimise the task. A trained crew that does modules can be used. No matter how this is done, it will be a large hit – one they must eat.
All they can do is minimise it and supply a loaner if possible.
A sad unforeseen event. Sockets would have mitigated this – if only it was predictable?
Someone suggested that it could have been anticipated and with changes to the way Linux addresses/refreshes/wear levels – it could have been avoided.
“Nobody would put up with swapping out their entire front suspension setup every time their brakes wore out..”
No, but don’t put it past a car maker to do something equally as stupid. The lower control arm bushings on my Civic are cast into the control arm. Want to replace the bushing? Replace the entire part. I can get why they do it, but what a waste.
In the first company I worked for we had products based on several SBCs (RPi1 among them). We did use SD cards but they were completely read-only except for system updates. It is not so hard to build such system using Buildroot but it is not completely easy because some GNU/Linux components like to non-optionally require access to a mutable storage and even if you use tmpfs you have make sure that it doesn’t run out of memory…
This is exactly the most annoying aspect of this whole thing: almost everyone who’s done serious SBC development knows not to fall into this trap. “You’ll wear out your flash!” is almost as common a critique as “3D prints aren’t food safe!”
To simply not have bothered implementing the design correctly is unforgivable.
