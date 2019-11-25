Today I have a tale of mystery, of horror, and of hope. The allure of a newer kernel and packages was too much to resist, so I found myself upgrading to Fedora 30. All the packages had downloaded, all that was left was to let DNF reboot the machine and install all the new packages. I started the process and meandered off to find a cup of coffee: black, and darker than the stain this line of work leaves on the soul. After enough time had elapsed, I returned, expecting the warming light of a newly upgraded desktop. Instead, all that greeted me was the harsh darkness of a
grub command line. Something was amiss, and it was bad.
(An aside to the reader, I had this experience on two different machines, stemming from two different root problems. One was a wayward setting, and the other an unusual permissions problem.)
How does the fledgling Linux sysadmin recover from such a problem? The
grub command line is an inscrutable mystery to the uninitiated, but once you understand the basics, it’s not terribly difficult to boot your system and try to restore the normal boot process. This depends on what has broken, of course. If the disk containing your root partition has crashed, then sorry, this article won’t help.
In order to get a system booting, what exactly needs to happen? How does booting Linux work, even? Two components need to be loaded into memory: the kernel, and the
initramfs. Once these two elements are loaded into memory,
grub performs a jump into the kernel code, which takes over and finishes the machine’s boot. There is one more important detail that we care about — the kernel needs to know where to find the root partition. This is typically part of the kernel parameters, specified on the kernel boot line.
When working with an unfamiliar shell, the help command is a good starting point.
grub runs in a very limited environment, and running the help command scrolls most of the text off the screen. There is an environment variable that helps out here, enabling output paging:
set pager=1.
Finding What You’re Looking For
ls is your friend. Don’t know which drive is which?
ls to the rescue.
grub uses a unique nomenclature for accessing partitions. You might see entries like (HD0,0) or (hd0,msdos1). A modern
grub will even let you list the files and folders contained in that partition using a command like
ls (hd0,msdos1)/.
We want to start by figuring out which partition stores the kernel and
initrd files. Those files might be in a boot folder, or just in one of the partitions. The kernel is generally named vmlinuz-kernel_version.architecture so for example:
vmlinuz-5.3.7-200.fc30.x86_64. The initrd we need will match the kernel’s version. Something like
initramfs-5.3.7-200.fc30.x86_64.img.
The last needed bit of information, the root filesystem location, can be a bit trickier to find. While searching through partitions, you may find one with a root filesystem layout, containing
boot,
bin,
etc,
home, etc. You can likely figure out what the kernel will call the partition based on the name in
grub.
hd0 is probably
sda,
hd1 is probably
sdb. The second half of
grub's name tells you which partition it is, so
(hd0,msdos1)is likely
sda1.
Putting It Together
To actually boot, we issue three commands in
grub. The first command sets the kernel image and any kernel boot options. The one required option is setting the root location:
linux (hd0,msdos1)/boot/vmlinuz-4.19.0-6-amd64 root=/dev/sda1
Next we set the initrd option:
initrd (hd0,msdos1)/boot/initrd.img-4.19.0-6-amd64
Once those options are set, we can tell
grub to try to boot the kernel. It a simple command:
boot
Assuming we set the right options, and the system isn’t otherwise terribly broken, that should boot your machine back into normalcy. Time to troubleshoot what caused
grub to go off the rails to begin with. That however, is for another time.
Since we’re here, there are a few other tricks worth knowing about
grub and booting. The most useful is probably single user mode, which is enabled by adding a “1” to the boot options.
linux (hd0,msdos1)/boot/vmlinuz-4.19.0-6-amd64 root=/dev/sda1 1
On some distributions, this even bypasses the need to know a root password, which is useful if you find yourself locked out of a system. Many modern systems still require logging in as root to proceed. Still, single-user mode is helpful for troubleshooting other boot and system problems.
One more trick to have up your sleeve is the ability to blacklist a driver. Adding
blacklist amdgpu, for example, would prevent the
amdgpu driver from loading at all, regardless of the hardware present. If a buggy or misconfigured driver is causing a crash during boot, blacklisting it will likely let you successfully boot.
Hopefully this is enough to give you the edge next time you’re debugging a Linux boot problem, and adds a couple tools to your repertoire. Happy hacking.
12 thoughts on “Tales From The Sysadmin: Dumped Into The Grub Command Line”
GRUB: At least it’s not LiLo
Yeah Lilo that simple bootloader that just worked with only a few lines of config? Open up the grub.conf file and what does the first line say? Do not edit this file directly. Modern Linux is hot garbage to deal with anymore.
You know why it says not to edit that file directly? Any changes made there will be overwritten when you update the kernel.
My comment wasn’t nostalgic. LiLo was extremely simple but also extremely limited. Linux isn’t hot garbage- it’s just not the simple little OS we had 15-20 years ago. It’s a full enterprise OS with the complications that come with that. Of course, it was then, too- but back then computers were single core processors with ram measured in MB and disks in the tens of GB most of the time. It’s no surprise that as it became possible to do more, the needs grew and the complexity likewise grew.
Also, remember this: Grub isn’t a Linux Only bit of software. It can boot any OS.
No one in their right mind would ever want the help command to be paged by default.
A wise default was chosen when the feature was added.
Why?
Many system management boards can redirect the text console over a serial port instead of using the console, and in both cases the terminal can (and these days usually does) provide its own scroll back capability.
A persons default assumption is the output will operate both normally and normally (no, that isn’t a redundancy)
Commands normally output in whole and so it will be assumed you can redirect the output in whole too.
If paging is desired it will be assumed you must provide that with a pager or otherwise configured your preference to use.
Breaking that assumption results in things like trying to redirect output to a file for offline review only to discover you only have the first 24 lines saved, or that you pipe the paged output to a pager and now can’t provide the input to the inner command from the outer pager command and need to abort.
These aren’t show-stopper problems, but any annoyance added needlessly to an already concerning situation is going to draw ire towards whomever needlessly caused that frustration.
I wonder why all this. I upgraded to Fedora 30 without a hitch (from Fedora 29). I haven’t had any dealings with grub for many years. Understanding booting is always a fascinating business though — you should be thankful for the chance to get into the details. The answer probably lies in what the system was before the upgrade.
It’s been a long time, now, since this happened. It seems like on one machine, it was an odd folder permissions issue, and on the other, a minor setting change inherited from upgrading through multiple Fedora releases.
“I wonder why all this. I upgraded to Fedora 30 without a hitch”
Yeah that sounds like Linux, one person has a perfectly user friendly experience, the other persons entire system explodes by pressing the update button. While I use Linux at work for all the versatility it offers me, I would never recommend it as a home PC for myself or anyone else. It’s just such a headache when something breaks, and something will break. Honestly the most frustrating thing is that the more average user friendly the distro the more difficult it is to fix problems when something goes wrong.
“the more average user friendly the distro the more difficult it is to fix problems when something goes wrong.”
I’ve found this as well, though it’s possible it’s because I don’t use those distros, so am not as familiar with how to fix them.
I’ve broken my Linux install countless times, but almost every time it’s been because I’ve been tinkering. Like installing bleeding edge Mesa. Cases like the problems in the article are the exception, thankfully.
So you take a computer that was intended to run Windows, you run a different operating system on it, and you are surprised when there are problems???
I’ve run linux on Intel NUC systems and Dell Laptops without any incident, because these systems are certified to run linux. Do you put diesel fuel in your car and expect it to work?