Today I have a tale of mystery, of horror, and of hope. The allure of a newer kernel and packages was too much to resist, so I found myself upgrading to Fedora 30. All the packages had downloaded, all that was left was to let DNF reboot the machine and install all the new packages. I started the process and meandered off to find a cup of coffee: black, and darker than the stain this line of work leaves on the soul. After enough time had elapsed, I returned, expecting the warming light of a newly upgraded desktop. Instead, all that greeted me was the harsh darkness of a grub
command line. Something was amiss, and it was bad.
(An aside to the reader, I had this experience on two different machines, stemming from two different root problems. One was a wayward setting, and the other an unusual permissions problem.)
How does the fledgling Linux sysadmin recover from such a problem? The grub
command line is an inscrutable mystery to the uninitiated, but once you understand the basics, it’s not terribly difficult to boot your system and try to restore the normal boot process. This depends on what has broken, of course. If the disk containing your root partition has crashed, then sorry, this article won’t help.
In order to get a system booting, what exactly needs to happen? How does booting Linux work, even? Two components need to be loaded into memory: the kernel, and the initramfs
. Once these two elements are loaded into memory, grub
performs a jump into the kernel code, which takes over and finishes the machine’s boot. There is one more important detail that we care about — the kernel needs to know where to find the root partition. This is typically part of the kernel parameters, specified on the kernel boot line.
When working with an unfamiliar shell, the help command is a good starting point. grub
runs in a very limited environment, and running the help command scrolls most of the text off the screen. There is an environment variable that helps out here, enabling output paging:set pager=1.
Finding What You’re Looking For
ls
is your friend. Don’t know which drive is which? ls
to the rescue. grub
uses a unique nomenclature for accessing partitions. You might see entries like (HD0,0) or (hd0,msdos1). A modern grub
will even let you list the files and folders contained in that partition using a command like ls (hd0,msdos1)/
.
We want to start by figuring out which partition stores the kernel and initrd
files. Those files might be in a boot folder, or just in one of the partitions. The kernel is generally named vmlinuz-kernel_version.architecture so for example: vmlinuz-5.3.7-200.fc30.x86_64
. The initrd we need will match the kernel’s version. Something like initramfs-5.3.7-200.fc30.x86_64.img
.
The last needed bit of information, the root filesystem location, can be a bit trickier to find. While searching through partitions, you may find one with a root filesystem layout, containing boot
, bin
, etc
, home
, etc. You can likely figure out what the kernel will call the partition based on the name in grub.
hd0
is probably sda
, hd1
is probably sdb
. The second half of grub's
name tells you which partition it is, so (hd0,msdos1)
is likely sda1
.
Putting It Together
To actually boot, we issue three commands in grub
. The first command sets the kernel image and any kernel boot options. The one required option is setting the root location:linux (hd0,msdos1)/boot/vmlinuz-4.19.0-6-amd64 root=/dev/sda1
Next we set the initrd option:initrd (hd0,msdos1)/boot/initrd.img-4.19.0-6-amd64
Once those options are set, we can tell grub
to try to boot the kernel. It a simple command:boot
Assuming we set the right options, and the system isn’t otherwise terribly broken, that should boot your machine back into normalcy. Time to troubleshoot what caused grub
to go off the rails to begin with. That however, is for another time.
Since we’re here, there are a few other tricks worth knowing about grub
and booting. The most useful is probably single user mode, which is enabled by adding a “1” to the boot options.
linux (hd0,msdos1)/boot/vmlinuz-4.19.0-6-amd64 root=/dev/sda1 1
On some distributions, this even bypasses the need to know a root password, which is useful if you find yourself locked out of a system. Many modern systems still require logging in as root to proceed. Still, single-user mode is helpful for troubleshooting other boot and system problems.
One more trick to have up your sleeve is the ability to blacklist a driver. Adding blacklist amdgpu
, for example, would prevent the amdgpu
driver from loading at all, regardless of the hardware present. If a buggy or misconfigured driver is causing a crash during boot, blacklisting it will likely let you successfully boot.
Hopefully this is enough to give you the edge next time you’re debugging a Linux boot problem, and adds a couple tools to your repertoire. Happy hacking.
GRUB: At least it’s not LiLo
Yeah Lilo that simple bootloader that just worked with only a few lines of config? Open up the grub.conf file and what does the first line say? Do not edit this file directly. Modern Linux is hot garbage to deal with anymore.
You know why it says not to edit that file directly? Any changes made there will be overwritten when you update the kernel.
You know why it says not to edit that file directly?
no, that’s exactly the problem, there’s a half trillion config files that say do not edit
My comment wasn’t nostalgic. LiLo was extremely simple but also extremely limited. Linux isn’t hot garbage- it’s just not the simple little OS we had 15-20 years ago. It’s a full enterprise OS with the complications that come with that. Of course, it was then, too- but back then computers were single core processors with ram measured in MB and disks in the tens of GB most of the time. It’s no surprise that as it became possible to do more, the needs grew and the complexity likewise grew.
Also, remember this: Grub isn’t a Linux Only bit of software. It can boot any OS.
What Limit? Lilo LBA support has been around forever. EFI support predates Mac86 and the proliferation of EFI on the consumer market. Ability to randomly load a “plug in” RAT on the hard drive during boot, from a malware dropper on a website your grandmother visited last week? OK; That’s a grub only feature.
Kidding aside, What problems did grub fix that Lilo didn’t? Every single “feature” you are going to name is for manufacturers/OEMs/Vendors. At best they give consumers nothing and at worst lead to reduced user rights. See also; Secure Boot, TPM, et al.
Secure Boot isn’t our fault, it’s just some garbage that looms over everything, and they probably wanted to make sure they could deal with it if it became mandatory on some systems.
Grub was a nice improvement up until version 2 when it became boot bloatware. Version 1 is not developed anymore so it’s back to LiLo for me.
what? you don’t like stuff like if [ x”{myvariable}” = “x” ] :)
there are literally 95 lines of conditional code just to deal with the header.
GRuB is horrible.
So your saying I should stop using Lilo?
Honestly I didn’t know anyone still did! If it’s working, why fix it? It’s a fine boot loader for Linux only.
Its been over a decade since I bothered with a dual boot. I upgrade my drive every couple years and just do a clean Slackware install at that time.
oh you had to go there! another trick is to put inir=/bin/bash to get a password free shell.
LILO: At least it’s not LOADLIN. ;)
No one in their right mind would ever want the help command to be paged by default.
A wise default was chosen when the feature was added.
Why?
Some help features ‘helpfully’ list all available commands in a not-so-brief format or even in full, leaving you manually paging through a potentially large amount of text before you can retry the mistyped command.
Sane pagination systems let you press q to quit.
Indeed. Sadly there’s still a lot of ‘legacy’ code in use which doesn’t.
Many system management boards can redirect the text console over a serial port instead of using the console, and in both cases the terminal can (and these days usually does) provide its own scroll back capability.
A persons default assumption is the output will operate both normally and normally (no, that isn’t a redundancy)
Commands normally output in whole and so it will be assumed you can redirect the output in whole too.
If paging is desired it will be assumed you must provide that with a pager or otherwise configured your preference to use.
Breaking that assumption results in things like trying to redirect output to a file for offline review only to discover you only have the first 24 lines saved, or that you pipe the paged output to a pager and now can’t provide the input to the inner command from the outer pager command and need to abort.
These aren’t show-stopper problems, but any annoyance added needlessly to an already concerning situation is going to draw ire towards whomever needlessly caused that frustration.
If you want it paged, pipe it to less
Most grub environments don’t have pipe redirection or the less binary.
That was sarcasm.
If you’re using the interactive shell, especially the help command you’re on an interactive console.
It should be paged by default at that point.
In the rare hypothetical case that for whatever reason you’re piping boot-time grub to a file (I can’t guess what information would be useful non-interactively if the boot process is borked), which would be a very advanced technical power-user case (eg: running a server farm, but even then, why?), that advanced technical power-user could disable paging as needed.
It’s really a bad UX to have paging disabled by default and not documented in the help.
If your only machine won’t boot it’s really difficult to look up stuff online to figure out you need to type “set pager=1” before getting any meaningful help from grub.
Agreed, on modern machines that usually don’t imply a serial port being used, it ought to be the default setting IMO.
I wonder why all this. I upgraded to Fedora 30 without a hitch (from Fedora 29). I haven’t had any dealings with grub for many years. Understanding booting is always a fascinating business though — you should be thankful for the chance to get into the details. The answer probably lies in what the system was before the upgrade.
It’s been a long time, now, since this happened. It seems like on one machine, it was an odd folder permissions issue, and on the other, a minor setting change inherited from upgrading through multiple Fedora releases.
I’ve been doing DNF upgrades from one release to the other as long as I can remember. One thing I try to do is to have a “disposable” root partition (so I keep my home directory on another partition). The idea is that if all hell breaks loose I can reformat and do a fresh install on that partition without losing data that is kept on other partitions. An approach I endorse although in my current setup I see things have crept back — I should clean that up so I have this option once again. Of course I would try the sort of things you outline before resorting to that, but it is a good bailout to have.
You lose the installed programs tho don’t you?
It depends. Make /opt a link to a separate partition and you keep some/most. I always make a copy of /etc when I nuke root to serve as a guideline for setting up things I customize. There is always some pain. The thing is to minimize it.
“I wonder why all this. I upgraded to Fedora 30 without a hitch”
Yeah that sounds like Linux, one person has a perfectly user friendly experience, the other persons entire system explodes by pressing the update button. While I use Linux at work for all the versatility it offers me, I would never recommend it as a home PC for myself or anyone else. It’s just such a headache when something breaks, and something will break. Honestly the most frustrating thing is that the more average user friendly the distro the more difficult it is to fix problems when something goes wrong.
“the more average user friendly the distro the more difficult it is to fix problems when something goes wrong.”
I’ve found this as well, though it’s possible it’s because I don’t use those distros, so am not as familiar with how to fix them.
I’ve broken my Linux install countless times, but almost every time it’s been because I’ve been tinkering. Like installing bleeding edge Mesa. Cases like the problems in the article are the exception, thankfully.
I use “n00b” distros exclusively, and I almost never run into anything like this to begin with.
The one time I had a boot issue(I messed up installing something to an external drive), I just used a live CD and that one automatic GRUB fixer tool with the GUI wizard that should really come with all distros.
Of course, I don’t tinker with drivers or kernels, and Ubuntu doesn’t give you anything that’s not tested for a crazy long time, so it may well be that it is hard to fix, I just don’t do anything that is likely to break it badly.
So you take a computer that was intended to run Windows, you run a different operating system on it, and you are surprised when there are problems???
I’ve run linux on Intel NUC systems and Dell Laptops without any incident, because these systems are certified to run linux. Do you put diesel fuel in your car and expect it to work?
If a machine will run Windows (well, the x86/64 versions) then you’ll be able to run some kind of Linux on it.
The Intel NUC’s aren’t even a good example, because you need a recent kernel to enable all of the hardware.
You came to a hobby hacking site, and are surprised people are pushing limits?
Seriously though, I’ve never owned a corporately-blessed-for-linux system, and it’s never caused a problem. I’ve also never encountered the idea that PCs are designed for a specific OS (outside of Macdom). Gardware doesnt support OS’s, OS’s support hardware.
That’s just a false analogy–computers all use the same energy source, electricity. All electrons are the same, there are no “Windows electrons” or any other OS-specific electrons.
Computers do not have intentions, they run programs. While specialized computers are stripped-down devices capable of running only one program, or a limited number of programs, general purpose computers are designed with sufficient resources to let them execute any program that the user desires. Windows has always been a program for general purpose computers. (The short-lived “Windows CE” for specialized computers was a completely different program.) The whole point of general purpose computing is to be able to run whatever you want.
“Secure Boot” is a testament to how much leverage Microsoft has over the industry, and the undeniable purpose of it is to scare away unsophisticated computer users from ever trying any OS other than Windows (Apple uses even more Draconian measures), the fact is that it can be switched off easily. And for those afraid to even look at their UEFI, there are Linux distros with certificates to allow them to be installed and run with SB. The notion that people must obtain “certification” to use any software on our own computers is bizarrely Orwellian, and fallacious. If you own it, you can use it how you like.
It’s not like there’s much alternative. Everything can break. It’s just as painful to fix Windows when it goes south.
Ever get a blue-screen and stuck in a reboot loop after loading classpnp.sys ?
Even Macs with their ideal hardware+OS integration can get kernel panics on boot if you muck around in the system much.
Pretending no other OS in the world breaks except Linux is just being disingenuous.
Because Windows updates have never gone wrong.
2018: https://www.techrepublic.com/article/windows-10-users-should-wait-to-install-the-latest-update-its-bricking-some-pcs/
2019: https://www.pcgamer.com/a-windows-10-bug-is-bricking-some-pcs-that-use-system-restore/
https://answers.microsoft.com/en-us/windows/forum/all/new-pc-bricked-by-windows-10-update-again-2nd-time/3e47da02-53b2-4a78-93a5-70ce58d070a0
https://www.lifewire.com/how-to-fix-problems-caused-by-windows-updates-2625775
That happens with every OS. They can’t plan for every hardware configuration out there.
“One person has a perfectly user friendly experience, the other persons entire system explodes by pressing the update button. ”
You sure you aren’t talking about Windows right now, lol?
I know on my garage computer if I let one of the noobie friendly distro’s update the next boot will be to recovery command line, every single stinking time too. Installs fine, everything works out of the box, want to update, yes please, 30 min later brick
You really ought to file a bug report or something. Chances are, you can reproduce it by running update-grub. Also, use SuperGrubDisk to solve the “borked Grub” problem quickly the next time you upgrade!
How do you solve this problem when it appears?
Which n00b distro are you running? A lot of the non-ubuntu “Let’s make linux easy” distros seem untested like nobody actually uses them.
I have to partially disagree with the comment “I would never recommend it as a home PC for myself or anyone else. It’s just such a headache when something breaks, and something will break.”.
I had to set my mother up, age 79, with a Linux system on her Lenova S10 netbook computer. No WINDOZE! About 7 years ago, I installed Linux Mint 13. She has never had a problem with it and uses it for web browsing, email client, word processing, camera/photo downloads and cataloging. Eventually, the Mint 13 repositories went off-line (EOL) but it didn’t matter until about a year ago when one of her banks decided to force a web-browser code update that rendered the FIREFOX version on her system unusable for that bank’s web site. Effectively she was locked out and forced to upgrade her version of FIREFOX, which was nearly impossible for me to compile from source code (outdated or missing libraries, GCC version issues, etc.). I decided to upgrade the OS instead. [ Gripe: we all seem to get FORCED by Commerce into spending $$$ and time to upgrade hardware and software. Its sad when the banking system supports the commerce flow of the computer and software industry. ] Anyhow, with only 1 GB of RAM, I was forced to find a Linux with a small GUI memory footprint, so I picked DEBIAN-based SPARKY Linux. Once installed, about 6 months ago, it has been used by her without any issues. Everything looks to her just like the old Mint 13 desktop and the same programs are still available for her to run. I suppose it’s only a matter of time before she is (I am) forced again to update her OS. My point: All was fine for many years until the bank forced a browser upgrade by not being backward compatible. The system didn’t just break on its own, it was forced into being broken by an outside force. Other than that, we never had a problem with the same Linux running reliably for 7 years.
Peace and blessings.
There are other reasons why I typically wouldn’t drop a random person onto a Linux desktop; but breakages on version updates seems like a questionable one.
I have the…pleasure…of shepherding a whole bunch of Win10 boxes(all relatively new, while a few slip through the cracks, policy is to replace systems when their 3 year warranty is up; and all are deeply boring intel-based corporate typing systems, nothing exotic) through updates and feature updates at work.
Most of them work most of the time. Then there are the ones that don’t; and they are deeply cryptic. Hundreds of megabytes of log spew, often no clear answers. Even better are the ones that ‘work’; but retain weird quirks afterwards. (Hello Tile Cache system; nice to see that you are still hammering the event log…)
All that said, when I really need to relax and get back to an OS where things don’t move unless I tell them, I go to OpenBSD; contemporary Linux has a lot of moving parts behind the scenes as well.
Or…
What I usually do when this happens is just fire up a Linux live CD. Then I create some folder to act as a mount point and mount my root partition to it. I mount my boot, dev, sys, etc.. partitions into their correct places within that. Then I chroot into it all.
After that I can run grub-mkconfig. Sometimes it just works. Other times there really is some error that needs fixed but I can read the output of the grub-mkconfig command to get a hint as to what needs done.
The advantage – I don’t have to learn grub. I spend the whole time in a Linux environment with the same bash shell I am already used to. It’s not that I think learning grub would be any harder than learning Linux and Bash were but it does have it’s own naming convention and it’s own commands with their own syntax. Why invest the effort to learn an environment that you will only use on rare occasion to get yourself back to the environment you actually want to be in?
The disadvantage – Lots of extra steps. I always forget exactly which non-hard drive partitions need mounted like dev, sys, proc, etc and exactly how to do that. Plus it’s changed over the years anyway. For this I always go to the Gentoo installation handbook and follow the steps regarding mounting partitions and entering the chroot environment.
One of the machines that broke on me was a laptop without a dvd drive, and I didn’t have a bootable flash drive handy. Chroot is another super handy trick to have in your repertoire, though.
Or just use SuperGrub2Disk, a bootable flash that automagically boots your distro if Grub’s broken.
I carry a Debian install disk. With that I can repair boot problems and avoid learning yet another language. But one of these days I’m going to have to spend some time bouncing a computer to try and make friends with GRUB. I just don’t like having to deal with complexity created for the sake of having complexity.
Thank you very much for this. I use GRUB so infrequently that I’ve never learned how to use it better. `set pager=1` is very useful.
Check out Super Grub Disk – it’s a small LiveCD that scans all the connected disks and helps you boot directly into the system so that, from there, you can run `update-grub` and fix the problem without much typing. Very helpful, whether you have a broken bootloader, or just want to boot your UEFI-based system on a non-UEFI computer.
If you can not figure out the grub command line with it’s built in help and all, you ain’t much of a sysadmin.
Every one of us are sysadmins on at least our own machine. How does the guy with 30 years of Unix/Linux sysadmin experience get to that point? Making dumb mistakes and fixing them just like the rest of us.
Grub is a particularly rough environment the first time through. I don’t think someone who had never messed with grub config files would be able to figure out how to boot just from the built-in help.
Despite the trendy and infrequent announcements on the Grub-devel list, GRUB2 is still very early Alpha. And even the maintainers behind the basis of Fedora 30 should have given you the option of _not_ using it instead of definitely using it. And second off all, I also run Slackware here. But Slackware 11.0 on a Dell Dimension. And 14 in 64 bit form inside of Docker.