[eigma] had a difficult problem. After pulling a TV out of the trash and bringing it home, it turned out it was suffering from a troubling boot loop issue that basically made it useless. As so many of us do, they decided to fix it…which ended up being a far bigger task than initially expected.
The TV in question was a Samsung UN40H5003AF. Powering it up would net a red standby light which would stay on for about eight seconds. Then it would flicker off, come back on, and repeat the cycle. So far, so bad. Investigation began with the usual—checking the power supplies and investigating the basics. No easy wins were found. A debug UART provided precious little information, and schematics proved hard to come by.
Eventually, though, investigation dialed in on a 4 MB SPI flash chip on the board. Dumping the chip revealed the firmware onboard was damaged and corrupt. Upon further tinkering, [eigma] figured that most of the dump looked valid. On a hunch, suspecting that maybe just a single bit was wrong, they came up with a crazy plan: use a script to brute-force flipping every single bit until the firmware’s CRC check came back valid. It took eighteen hours, but the script found a valid solution. Lo and behold, burning the fixed firmware to the TV brought it back to life.
It feels weird for a single bit flip to kill an entire TV, but this kind of failure isn’t unheard of. We’ve seen other dedicated hackers perform similar restorations previously. If you’re out there valiantly rescuing e-waste with these techniques, do tell us your story, won’t you?
“which ended up being a far bigger task than initially expected.”
respect…
“It feels weird for a single bit flip to kill an entire TV…”
As components get smaller, the scale of things that represent fundamental hazards does as well. It’s not my specialty (by a lot) but SPI’s relying on ” the presence or absence of trapped electrons on the floating gate” (yeah, I googled it…) for memory storage, things like charged particles/cosmic rays come into play in creating random failures. You can fill in all the vacuum tube/555 timer jokes here, but I think we’ll see more and more of this as the trend continues.
The manufacturers will likely just warranty out the customer complaints until the perceived service span of the devices begins to be questioned (eg. “All Samsung TVs die after about 14 months”) and it begins to looks like a market-share problem that can’t be solved by monopolization.
Samsung warranty the customers? Hell no! 5 TVs and 3 monitors. Within 2 years, 3 TVs are dead or about dead with one doing the boot loop, 2 TVs with lines missing on the screens, 1 with Dark spots extremely visible in uniform white background. Monitor once in a while decides to show noise on the screen. I have to turn the screen off and on to restore the it. Another one has the bezel separating itself from the monitor and needs to be snapped back into place.
Answer from Samsung: Sorry, buy a new one.
I had a SwampScum monitor on my PC. When it started to act up, I ended up breaking the case and then the screen trying to open it up. I guess the plastic case was welded together.
Yeah, and no Service Manual available.
Or alternatively arm their service ‘techs’ with a stanley knife (box cutter for those across the ocean) who subtly stab the TV whilst the owner is away then claim the TV was damaged and warranty void. Friends don’t let friends buy Samsung TVs.
Intel had a batch of (Atom?) CPUs which died after ~12-18m. Unfortunately they went into NASs.
NAS manufacturer sent me a replacement under warranty for the first one, but when that died was out of warranty. Intel refused to accept responsibility. Thankfully their later model had a different CPU so I just replaced with that.
You don’t know who has the issues until they happen. And it’s often not the manufacturers’ fault, it’s a part they bought and thought they could trust.
Don’t simply accept the manufacturer’s warranty terms – there’s a good chance that you have statutory warranty rights that extend longer than that, potentially up to the expected lifetime of the device. The manufacturer will generally do everything they can to pretend they don’t exist, or insist that they have already expired, but invoking your local consumer affairs entity is often enough to get them to suddenly decide they can “do you a favour”.
Capitalism!
With SPI (NOR flash) it’s not radiation, they’re stupidly more radiation resistant than NAND/DRAM/etc. It’s more likely either static or temperature plus age: the floating gate structure breaks down and the barrier becomes smaller. Happens with NAND too but because it happens soo often the NAND controller refreshes it.
To be fair failed ‘AndroidTV’ updates (or successful ones). Or flash wearing out as they are usually only running 4GB chips is what kills the last few batches of TVs.
I dont see the bit flipping to be that remarkable. Voyager ring any bells? Probably have dozens on my equiment today, but until i need to read the bit, i would never know
Oh Christ. Wait until Europa Clipper starts in with it’s special mosfets in a few years. lol maybe we’ll have to start annealing our TVs.
Not really. There is a reason why some of us use ZFS and EEC-RAM (which became a lot cheaper when Intel decided to bundle that with the i3s in the 2010s).
I’m going with a different tack on this than “not building to last”.
I think this is why we should move to separating the panel and controller in displays. I think it’d be much more interesting if we could swap out the controller for a good display to something with better features for less cost than an entire new TV when a new HDMI standard comes out.
This could also make for more interest in open source display controllers that could do more crazy or interesting and advanced tricks to get more out of panels.
Like a dumb tv, you mean? The smart part of the tv is just for the producer who can sell you a new one after the firmware has been abandonned..
I think it’d be interesting to go even further than a dumb TV.
I mean something where there is only an input for some universal display controller standard and the controller handles inputs and everything.
I think a TV that is literally just a panel with a PSU would be very interesting.
So a giant monitor?
Almost but a monitor still has a display controller and takes HDMI or DP then coverts, scales, etc and then writes to the panel.
I mean a panel nearly by itself.
Embedded DisplayPort is a thing (abbreviated eDP), otherwise it’s the ye olde LVDS standard for panels.
I wonder if the TV would have worked fine if Samsung just had a CRC warning, while letting the boot continue? There is a decent chance the corruption would be in something nonessential. One could make products last longer by just being less finicky.
If it was boot looping, it probably wasn’t just being picky. It was likely an instruction changed to a completely different one that crashed the CPU.
It was in fact just the CRC which caused the boot process to halt. No idea what the affected bit actually does.
This exists in the form of commercial displays for digital signage.
https://www.youtube.com/watch?v=q9a3dCd1SQI
Wholeheartedly agree. A display should be just that. But then consumers want more than a display, they want the smartest of smart TVs which now the manufacturers are adding in all sorts of privacy/ad ‘features’ so if you could buy a display it would likely cost a hell of a lot more than the equivalent ‘smart’ TV.
Just as a use case, apparently my flagship Sony TV supports Alexa (yes I have Alexa spying on me). However in my European region the app isn’t allowed, because. So I splashed out on a Fire TV bought in the UK which allows my old Echo to control the TV.
I they made a system where a single bitflip kills it, they did it intentionally.
I don’t mean “mustache twisting evil”.
I mean “Hey, isn’t this going to be bad for people? We should do it in a way that fails gracefully. Management do it the cheaper way or you are fired.”
That is willful.
We CHOOSE to make disposable products.
We CHOOSE to value saving 1% on a BOM and then throw it away while 99% is perfectly serviceable.
This is what you get when you over optimize a system towards making “line go up” instead of making “good thing”.
What, no fail-safe boot-recovery mode? I’m shocked!
Never had to worry about a TV boot looping 50 years ago.
You just took out the tubes and ran them to the local store and plugged them into the tube tester and found the bad one and bought a replacement.
TVs these days are basically free. If a 50″ TV is $250, accounting for inflation it’s less than half the cost of the 13″ Sony my Dad bought in the mid-80s. And a 42″ will probably do fine as well. If you are buying the latest Quantum Dot Organic LED model, then get an extended warranty?
The TVs 50 years ago would get an error in the vertical sync and the picture would bounce up and down. I remember my eyes ‘bouncing’ on their own for a little while after playing video games at a friend’s house X’D
Reminds me of the zero ohm resistor that broke on the macbook laptop. Apple deemed it beyond repair but Louis Rossmann (youtube) fixed it.
My TV repair story is different, but lacking any documentation of the process, there’s no story to submit. So I guess a comment will have to do.
My parent’s aging TV developed a fault where the picture would go out after a short period of time, five or ten seconds I think it was. After a bit of troubleshooting, I determined that the panel was still displaying the picture, however the backlight was going out. Great, probably just a power supply problem, right?
Nope! Near as I could tell, it was fine. Alas, the testing methodology eludes me so I can’t elaborate. I just know I somehow came to the conclusion that it was working properly.
So I kept poking around the innards of the TV, and eventually found a line that used a 0-5v signal to control the brightness of the backlight. Except that whatever was driving this was the source of my problem. I’d get 5v for a short bit on power-on, and then it’d drop out. Since the TV was otherwise trashed, and had in fact already been replaced, I chanced tying it to the 5v rail of the PSU, and tada! Permanent, 100% brightness backlight.
Sure, it’s not possible to adjust the brightness anymore, and for a while I was concerned something might burn out because at the time it didn’t occur to me to slap a resistor in there, but it’s been running that way for years now.
And best of all, it’s not a smart TV >_>
i love that bit flip approach! dumb but effective
speaking of flipped bits…i read somewhere that RAM had a certain number of flipped bits per month per gigabyte or whatever and figured my PC with 16GB of RAM must have bit flips in RAM??? i couldn’t believe it. so i made a memtest program that tests 4GB of RAM at a time (using mlock()), periodically re-allocating the memory to get a different set of pages. and lo, it revealed a flip! i noticed after a few months of running it that it was one of three specific bits each time…so i blacklisted those addresses in the kernel commandline and that ‘solved’ the problem. i’ve since bought new RAM and i haven’t been able to detect a fault in it.
no punchline just an anecdote :)
Those memory blacklist are depreciated. At least last time I tried it. Bummer. Windows even used to have the option. Sad.
Not using any particularly sophisticated techniques but I’ve diverted 8 or so cheap white box dumb TVs from their dumpster destiny.
15 or so were purchased and installed all in one go. A couple of years later they started failing in quick succession. I was the guy installing the replacements so drug the dead ones home to investigate.
Bad LEDs turned out to be the culprit. As I had broken the LCD on a couple while trying to disassemble them, I had no guilt about cannibalizing parts.
There were 9 strips of 5 6 volt LEDs in series. One dead LED took out a bank of 3 strips but the remaining backlight egments would still light. A second dead LED in another bank shut down the backlight.
Once I figured out the two-dead-all-dead pattern, it was just a matter of probing each LED with a 6 volt wall watt to find the bad ones and replacing the strips where the bad ones were.
I now have the issue of what to do with the pile of 55″ TVs. Talk about your first world problems….
Ahhhh …. If only that had occurred when David Letterman used to throw stuff off of a building and film it!
The TV manufacturers figured that one out. Now they are clipping the LCD in so tightly that you can’t get to the backlight without cracking it 😢.
replace the capacitor
Smart TVs are the dumbest idea, – Murphy’s law
Good effort! Lucky if indeed one bit was actually flipped (could have found a different solution as well now). I wonder how these flash chips typically fail. Only experience I have is after overheating a pcb for a night. Comparing the flash afterwards, looked like most bytes changed. If I even compared with the correct binaries, not sure..
If he was able to verify the CRC outside of the tv, I wonder if recomputing it with the single bit flip still in would have worked too. Good chance it will ?
Yep, I think that would have worked. But would have left a time bomb in an unknown part of the TV firmware. (or maybe just an unimportant image/font file)
Something interesting I found while developing a project at work:
If there is a CRC with known parameters, you can correct any 1-bit error if you have the failed CRC value. Many 2-bit errors can be corrected but it works a little like even-numbered-roots – there can be two possible ‘corrections’ to the error and only one is the original data.
That being said, trying every bit was probably faster than writing code for CRC error correction.