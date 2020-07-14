It should have been another fine day, but not all was well in paradise. Few things bring a creeping feeling of doom like a computer that hardlocks and then refuses to boot. The clicking sound coming from the tower probably isn’t a good sign either. Those backups are up to date, right? Right?
There are some legends and old stories about hard drive repair. One of my favorites is the official solution to stiction for old drives: Smack it with a mallet. Another trick I’ve heard repeatedly is to freeze a hard drive before trying to read data off of it. This could actually be useful in a couple instances. The temperature change can help with stiction, and freezing the drive could potentially help an overheating drive last a bit longer. The downside is the potential for condensation inside the drive. Don’t turn to one of these questionable fixes unless you’ve exhausted the safer options.
For the purpose of this article, we’ll assume the problem is the hard drive, and not another component like a power supply or SATA cable causing problems. A truly dead drive is a topic for another time, but if the drive is alive enough to show up as a block device when plugged in, then there’s hope for recovering the data. One of the USB to SATA cables available on your favorite online store is a great way to recover data. Another option is booting off a Linux DVD or flash drive, and accessing the drive in place. If you’re lucky, you can just copy your files and call it a day. If the file transfer fails because of the dying drive, or you need a full disk image, it’s time to pull out some tools and get to work.
As a hard drive degrades, individual sectors can become unreadable. This is an expected process, and modern drives are built with spare sectors to fend off the inevitable. As sectors begin to become unreliable, they are retired, and spare sectors are used instead. When the spare sectors are gone, the disk begins accumulating unreadable sectors. An unreadable sector in the middle of a file will kill a file transfer, or maybe even make the device unmountable. The ironic part is that it’s usually only a tiny percentage of the disk that’s unreadable. If only there was a way to manage those unreadable sectors.
Turning to DDRescue
The amateur sysadmin has a potent tool in his toolkit:
ddrescue. It’s a descendant of sorts of the venerable
dd disk copy tool, but with an important difference. When
dd encounters a read error, it stops the transfer and displays the error.
ddrescue makes a note of the error, leaves a blank spot in the output file, and continues transferring what data it can. Because there is record of the missing chunks, we can keep trying to read the missing parts, and maybe recover more data.
To get ddrescue running, we give it an input, an output, and a mapfile.
ddrescue /dev/sda diskimage.img mapfile.log
By default, ddrescue goes through three phases of rescue. First, it copies a sector at a time until it hits an error. For a drive that’s working perfectly, this operation completes without issue and the whole drive is copied. If a sector can’t be copied, or is even particularly slow in responding,
ddrescue jumps ahead, hopefully beyond the problem.
The second phase is trimming. To put it simply,
ddrescue starts at the end of each skipped section, and works backwards till it hits a bad sector. The purpose is to recover the largest amount of data as quickly as possible, and to establish exactly which sectors are the problematic ones. The last phase is scraping, where each unread sector is examined individually, attempting to read the data contained. Each time a sector is read, the mapfile is modified to keep track.
A sector might fail to read 15 times in a row, and on the 16th attempt, finally read successfully. Because of this,
ddrescue supports making multiple scraping passes in alternating directions. Part of the theory is that the read head alignment might be slightly different when approaching the sector from a different location, and that difference might be enough to finally get a successful read.
When It’s Not So Simple
While the ideal operation of ddrescue is straightforward enough, there are some potential problems to be aware of. The first is heat. The process of trying to recover data from an already dying drive can quickly overheat it, and make further reads impossible. The best and simplest solution is a fan blowing cool air over the drive. The other common problem I’ve encountered is a bit harder to explain, but it’s identified by a specific error message:
ddrescue: Input file disappeared: No such file or directory. When trying to read from the drive, something went wrong badly enough that the drive has disappeared from the system. My theory in this case is that the firmware on the drive itself has crashed and halted. Regardless, unpowering and repowering the drive is usually enough to get back to work.
This means that for a particularly stubborn drive, the process of recovering bits feels a lot like babysitting. Power cycle the drive once it crashes, and restart
ddrescue — over and over and over again. Since the read fails as a result of the crash, that sector is marked as bad, and the rescue attempt jumps past it. Sectors in good shape might not trigger the crash, so some data gets read.
If you think that spending hours power cycling a hard drive doesn’t sound like a fun task, and is something that should be automated, then you’re right. It’s easy enough to wrap our
ddrescue command in a loop, ideally along with five seconds of sleep. That handles half the problem, but power cycling the drive isn’t a software problem. I’ve used Adafruit’s power switch tail in the past, connected to a Raspberry Pi GPIO pin, to kill the drive’s power supply every 30 seconds. It’s not ideal, but it works. Unfortunately that device is discontinued, and I’m not aware of a direct replacement.
The last time I ran into this problem, I used a WiFi power switch, pictured above. Whenever the device disappeared, the script triggered the plug to power cycle the drive. This worked, and on a 500 GB drive, I recovered all but the last 1.5 megs. The only downside is that the smart plug only works via the cloud, so every power cycle required a request sent to the IFTTT cloud. Leaving the drive running overnight resulted in too many requests, and my account was frozen. Next time, Ill have to use a device that supports one of the open source firmwares, like Tasmota. Regardless, the script is simple:
while true; do sudo ddrescue /dev/sda diskimage.img mapfile.log if [ -a /dev/sdc ]; then sudo ddrescue /dev/sda diskimage.img mapfile.log -M else curl -X POST https://maker.ifttt.com/trigger/switch_off/with/key/REDACTED sleep 10 curl -X POST https://maker.ifttt.com/trigger/switch_on/with/key/REDACTED sleep 10 fi done
If the device disappears, use the switch to power cycle the drive. If ddrescue completes, and the device is still present, then use the
-M switch to mark all the bad sectors as untried.
In many cases, this isn’t a process that ever really finishes, but the rate of recovery eventually drops too low to be worth continuing. Once you’ve copied as much of the raw data off the drive as possible, it’s a good idea to use
fsck/
chkdsk to repair the now-rescued filesystem. If it’s a system drive, after you burn it to a new disk, you’ll want to use your OS’s tools to verify the system files. For Windows, I’ve had good success with
SFC and
DISM. On Linux, use your system’s package manager to verify your installed packages. On a Fedora/Red Hat system,
rpm -Va will show any installed binaries that have unexpected contents.
Over the years I’ve rescued a handful of drives with
ddrescue, that other techniques just wouldn’t touch. It’s true that a good backup is the ideal solution, but if you find yourself in a situation where you really need to get data off a dying drive,
ddrescue might just be your saving grace. Good luck!
Banner Image: “Shiny” by Nick Perla, BY-ND
21 thoughts on “Tales From The Sysadmin: Impending Hard Drive Doom”
I actually used the freezer trick on a drive and it worked. I documented it all on my site if anyone cares to read it:
https://miscdotgeek.com/adventures-in-hard-drives/
I have used it as well, worked like a charm twice on two different drives. Although i have had it now work on a few drives as well. Tends to work best on drives that are not spinning on their own.
Long time ago a drive just clicked one morning. Welp, tried the freezer trick w/o success couple of times, except that on the last try I just left the drive in the freezer and decided not to bother go further. You know, f the drive, restore from backups etc. Lost just a few files I had modified previous night.
About six months later I needed more space in my freezer so took out the drive, which had quite a bit of frost and ice on it, because I didn’t bag it. The drive spent a few months on the kitchen table melting, drying and just laying around.
Then one weekend I thought the faulty drive had been around too long and I’d kill it in a way or another before throwing it away. Got my hammer ready, powered the drive and… it spinned up and didn’t click. The trick did work after all.
I also had successful use of the freezer trick on a laptop drive. Although, I had to keep the drive IN THE FREEZER for the drive to work at all, with the SATA and power cable run out through the door seal and the PC laid on top of the freezer. It literally took over a week (12ish days irrc) of looping ddrescue runs for ddrescue to get a complete enough disk image that I could successfully mount under MacOS.
SpinRite has saved my butt a ton of times as well.
Fixed enough issues for me to get the data off and on to another drive!
I wish he would update it already.
Steve is now, finally, working on Spinrite 6.1 to bring a whole load of modern-era stuff in to it (faster transfers and I believe support for modernly-huge drives).
I’ve had mixed success with Spinrite, but it’s definitely worth trying. I would probably try ddrescue first, and then let SpinRite work on the disk for a while, and finally let ddrescue try reading the missing sectors again.
Totally agree – if the drive could be saved, Spinrite is the best bet to do it!
Another option for software-controlled mains is to find a socket or power-strip using the Gembird chipset and hence able to be controlled locally over USB with sispmctl – also very convenient for any rPi-based mains switching needs.
Unfortunately they’re not very fashionable now the “wifi smarthome” style of devices have come around, so they appear to be dying out. I probably couldn’t use a cloud-and-wifi-based switching adaptor for my automatic power-cycling of my slightly unstable router – no way to turn the router back on again!
I did the freezer trick on a 240mb drive worked long enough to get the 486’s proprietary setup software backed up
I later opened it up and any of the rubber parts around the head looked solid until I touched it to confirm that it was in fact a gross super sticky liquid
From dead drives to a drive that caught fire. I’ve seen many drive crashes and failures over the years.
If you don’t know what’s going on with it, first putting your ear to the drive is always a good start. Many drives are so quiet that you need to get up close and personal to hear if the drive is having issues. Is it buzzing, ticking or making any odd noise?
Putting your hand on the drive can also tell you if it’s even spinning but while picking it up will also tell you (from the gyroscopic affect) you have to be very gentle.
Stiction can be overcome by “gently” tapping the drive as you apply power. I wouldn’t do it with a metal hammer. I used the end of a plastic screwdriver. Just tap on power up.
If I remember correctly Steve Gibson has been around since the 80’s. I think it was one of his tools that I used to reinterleave MFM and RLL hard drives or to test the performance of the drives after.
I’ve found that connecting the drive directly to SATA works better than USB. Then again, as long as you can get it to work, anything goes. Just get the data off of it ASAP.
Interleaving drives… That brings back memories! Like recovering data from a floppy disk that was so worn out you could actually see through parts of it. I used “Super Zap” for that. I miss that tool. Yes, it was SpinRite used for interleaving. We always ran SpinRite on new hard drives we sold in new computers or as replacements, even if the controllers were rated for 1:1. Sometimes the computers still weren’t fast enough… and maybe more to the point M$ DOS.
And, yes, its an awesome tool for data recovery. Gibson is a genius. But like every other fix mentioned here its only good for fixing certain kinds of failures. The best road to success is always to address the the failure, ie. what’s actually broken. I never froze a drive for stiction issues, although I imagine repeated rapid heat cool cycles would work. I use the freezer to deal with circuits that get too hot and quit working.
For stiction I prefer to twist the platter spindle. But on recent drives its usually not accessible. So strategically applying vibration works. I haven’t used a hammer. But I have banged it on desktop. That doesn’t sound very strategic, but I usually only use/need one tap. The key is to get the head stack to break free so you make an educated guess on what direction it wants to swing and apply jarring force in that direction, baring in mind that over application of said force can destroy instead of fix. And, of course, tapping on it while the electronics are trying to pull it free works when the balance of force is adequate. Obviously these tricks are not likely to fix a broken drive if the issue isn’t “classical” stiction (seized bearing, burnt coils, toasted drivers, …)
I have used DD to recover data. I haven’t heard of DDRescue. So THANKS Jonathan! It will save me from having to write something next time. But honestly I haven’t built a system without RAID in so long I’ve not needed to get ingenious about recovering data. Yes, I know its not a 100% solution, nothing is and I can’t argue with the results. Toss that drive, slap in a new one, done. And for the rest of the time I have backups, remembering what was so eloquently stated in HaD comments on another article, “If you have one backup you DON’T have ANY backups!”
Oh… and by “toss that drive” I really mean: disk erasure by Ruger at the range. ;-)
Thx for the tips. Good to know when you are caught out and really want to get a file or two off the drive. As said above though, backups are the best strategy now-a-days as cheap as the drives are (when you take them apart it is amazing you can sell them at the price they do). Don’t think these techniques would work with an SSD :) .
Be aware that ddrescue (https://www.gnu.org/software/ddrescue/) and dd_rescue, sometimes called ddrescue (http://www.garloff.de/kurt/linux/ddrescue/) are different tools which do much the same thing…
Came here to say that. And one is much better at it than the other..
I had one case of a dead drive in a SCSI RAID array, and of course the customer had no usable backups. Through some testing, I determined that the drive controller board had failed, but the mechanics were fine. So I hopped on the Bay of e and found an identical drive for cheap, got it shipped, and swapped its controller board to the dead drive. Bingo! The RAID array was resurrected long enough for me to pull all the data off, and we were off to the races.
I had another case, which I caused myself to my own drive. I had attached an external drive to my Linux laptop and was beginning to wipe it for some reason. Only, out of muscle memory, I typed the dev entry for the laptop’s boot drive instead of the external drive, and subsequently wiped its partition table (late at night, eyes starting to cross, etc.).
Luckily I had the testdisk/photorec utilities installed, so I immediately ran photorec on the drive (the laptop is still operational, mind you! I hadn’t rebooted it or anything), and it found the partitions. I let it rebuild the partition table and learned to verify which drive or partition I was nuking before hitting GO. The laptop ran perfectly fine for a long time after that.
Two comments. A looonnng time ago (and far far away) there was a product called SpinRight. What it did was do a low level full track read to lift all of the data then a track erase to restablize the tracks magnetic surface and finally do a full track rewrite the original data which essentially performed a track reformat with the original data. It would do this over of all of the tracks on the drive. This restoration worked quite well on drives of that era (typically 100 Megabytes or less) and corrected the progressive track and sector deterioration caused by normal use . It also worked on floppy and cartridge disks (remember those?). After backing up a drive (for safety), we would run this app on roughly a six month schedule. We found that our data access failure rates were substantially reduced compared to before.
I now wonder if there is a modern equivalent and whether such an application would work on modern ultra high density drives (We typically use 1 to 8 Tb 7200 rpm drives) or whether it’s even needed.
Second we have had 4 WD drive controller board failures over the last couple of years (1 was toasted, fried by bad power hookup – our bad). Anyway modern drives are apparently factory tuned with the parametric for each specific head-platter stored in on board flash. We’ve been sending our failed WD HD controller boards out to DataPro Data Recovery Labs (hddparts@gmail.com) a company in White Rock BC Canada. Using the serial numbers on the drives, they swap the parametric flash from the bad board to an new identical controller board and return both for $49 USD. When we reinstalled the replacement boards on the failed drives, all four came back up and running with no data loss. (your mileage may vary).
The external USB enclosure is a terrible advice, depending of the chip, full resets are issued when disks start misbehaving. common issue that I ran across multiple times, few stuck sectors are easily recovered when attached to a SATA controler, almost impossible to fix on a USB enclosure.
I’ve had the same problem with direct SATA attached drives, except it takes a computer reboot to clear up. Your milage may vary, and it’s definitely worth trying both ways.
I used to get seriously into this shit, I bought a cracked copy of PC3000 drive recovery from china (of course) for $500, the real deal was thousands. I since sold it to some other guy for i think $800, the package was amazing. Drives progress , but this didn’t which is why I sold it, it was increasingly no good for newer drives.
The best story was when a guy came in my store with his dead drive AND a donor he got already from ebay. He thought we could just swap the boards, but you actually have to swap the nvram chip from the old board to the new. Otherwise your stuff will not be right. It’s been a while, but I believe the remapped sectors & other platter specific details were on it. Used a USB microscope to move the chip, it’s a small smd. Had the guys data back that afternoon.
I am waist deep in yak hair over one old drive I really really want data off. It has been an on and off project, just trying things gently, trying not to further screw it up in case it has to go to a pro. I have thus far been able to rule out the PCB. Had a side quest about potentially reading the serial eeprom with an AVR. Am looking into faking temperature sensors at the moment. I believe it has weak heads, that are just outside the calibration parameters. Oh yah, it’s also led to me basically rebuilding 3 different computers so far (Insert wild tails of weird age related failures and strange incompatibilities here)
The stupid thing is, there’s a whole lot of NDA stuff on it that mean I can’t let the thing out of my sight, which may have significant commercial value, but that is not the data I need back, that stuff is backed up where it needs to be, be better if I just smashed it up with a hammer on that score. There is some personal data, with some personal relevance, which may lead to realising a monetary benefit of zero to a few thousand. BUT it may be worth completely zip and/or I might need to get forensic with other data or metadata on there to determine where that data did go or where else to look. So the drive is a huge liability, with a slim chance of gain. So I can’t use a pro unless I find a fairly local one with onsite techs who aren’t going to send it to India or something. Meaning the kind of pro I need isn’t going to be Billy-Joe’s Bait, Tackle and Data Recovery, but someone expensive, with megacorp, law enforcement and government clients.