Write Your Own x86 Bootloader

What if you want to make a very lean machine and do without any operating system? Or maybe you want to try to write your own OS, even just for the challenge or fun? Maybe you were reading up on a cool OS architecture and thought to yourself, “I can write that!”. Well, before diving into your code, you’d first have to write something called a bootloader.

A bootloader is code that runs early on in a PC’s, Mac’s, Raspberry Pi’s or microcontroller’s boot sequence, before anything like an operating system is up. Often its job is to set up minimal hardware, such as RAM, and then load the OS or your embedded code.

[Alex Parker] has written a three-part series of clear blog posts that make writing the bootloader part easy, at least for x86 machines. And the nice thing is that you don’t need an x86 to get started. He does it on a Mac using the QEMU processor emulator, though he also talks about doing it under Windows and Linux.

In the first part of the series, the bootloader leaves you in the x86’s real mode, with 16-bit instructions and access to one megabyte of memory — think pre-80286 days, or 1982 for those of us who were computing back then. To prove it works, he uses BIOS calls to display “Hello world!”. This also shows that through the BIOS, you have a set of peripherals you can work with.

In the second part, he shows how to set up 32-bit protected mode and a Global Descriptor Table, making access to a large amount of memory easier.

In the first two parts, the code is written in assembly, so in the third part he finishes the series by showing how to load C++ code into memory and execute it. That C++ code would of course be your application, which we’ll leave to your imagination.

It’s reasonably rare to write bootloader code for a desktop computer — much less so for microcontrollers. For instance, [Dmitry Grinberg] wrote his own bootloader so that he could have encrypted ROM images for his AVR on USB. And we’ve talked about [Lady Ada]’s guide to burning Arduino bootloaders. But if you want to get down to the bare metal on your x86, the bootloader is the place to start. And it’s not so bad.

38 thoughts on “Write Your Own x86 Bootloader

    1. Was going to post that one myself. When I dove into OS development on x86, that wiki was a huge help in figuring out where the machine ends and my code could begin. Did that to apply for a job, specifically to do their filter test. Their filter test was meant to be impossible and it was if you tried to do it inside an existing OS, you were suppose to figure that out. I however ditched the OS and made it possible. Didn’t get the job despite the incredible amount of ingenuity such a solution implies.

  1. Technical the BIOS is already a bootloader. It provides basic input and ourput to the system. It takes care of setting up memory and other stuff like diskdrive, video and harddisk, after this it load a second stage bootloader like this.

    Without BIOS the bootloader will be much more complicated then this..

    1. Ask the guys working on the Cromwell BIOS for the original XBOX or the people behind Libreboot just how hard it is to produce a true bootloader for a modern x86 system.

      Doing it for an older PC (say, original IBM PC or 286 or 386) would probably be a lot easier if you didn’t need to be specifically compatible with existing PC software.

      1. Well, it would be a lot easier, if it weren’t for the gazillion different CPU’s and configurations out there, not to mention at least three different ways of determining the amount and location of RAM…

      1. Pretty much like stop holding onto the wrong end of a hot soldering iron. Real mode (what a bogus name, what is “real” about it) is the old non-32 bit run like an ancient Intel processor mode that x86 processors fire up in. Just thinking about it makes me glad I am working with ARM these days. Now that I am hooked on ARM, I don’t think I will ever look back. Think of dental work without anesthesia. I could go on, but you get the idea I hope.

        1. The reason “real” mode is there is for backwards compatibility. True it is rarely used these days. but it was significantly still used back in the day just like when 386, 486, and early pentium machines had “turbo” buttons. These were not really a turbo button but a slow down button. It would drop the clock of these newer chips down to the speeds of an 8088/8086 for older software. This was frequently an issue with some games where if the chip was not slowed down, the game would play in hyper fast motion due to things like timing loops having their timing designed to run at the slow speeds of the 1st 8088/8086. This is all still there today because of the design decisions of x86 to keep backwards compatibility in mind. Sure you could drop all this stuff, but then it truly wouldn’t be an x86 any longer.

          If your x86 CPU didn’t support real mode, you would not be able to boot a bootable USB stick that has DOS on it. DOS needs real mode.

          Also in x86 systems, almost all of the bootloaders still need real mode. The systems don’t switch into protected mode until the operating system’s kernel begins to load and the kernel makes that change.

          1. @Jim:
            See my comment below…
            The bootloader that boots the UEFI mode bootloader… It starts in 16-bit real-mode to decompress the UEFI routines, modules, code, blobs, etc.
            Then UEFI has a filesystem driver built-in to load a file off said filesystem and said file is yet another boot-loader.
            So now the modern X86_64 boot chain is as follows:

            Platform bringup OS bootloader -boots: platform OS to wait for power switch close on GPIO(intel ME or platform security) – on Power sw GPIO close boots: UEFI bootloader – boots: UEFI service – boots: UEFI application (OS bootloader) – boots: the OS!!!!!!

            I’ll try to TL;DR that:

            Bootchain:
            PCH – Intel ME – ROM loader – UEFI service – UEFI application – operating system.

            0-bit – 0-bit – 16-bit – 32/64bit – 32/64bit – 32/64bit
            Where 0-bit means main CPU is off!

          2. @Mr Mannering
            I’d suggest reading the platform initialization (PI) spec.
            In a nutshell, here is the boot process coming from the x86 CPU reset vector (there is a bunch of stuff the PCH needs to do in descriptor mode, but that’s chipset specific).

            The reset vector is the SEC phase of the PI spec and the CPU is in real mode. Anything needed to establish the root of trust is done here, some special magic is done so the CPU can use some of its cache as RAM for a C stack, the CPU is put into 32 bit protected mode, stack is set up (for C code), and the code searched for the Pre EFI initalization (PEI) core entry point. main memory has not been discovered or initialized yet so code is being executed from flash and heap is extremely limited. This leads to some interesting caveats like all globals are read only and there are caveats about how to work with variables. The main goal of the system right now is to run the code needed to program the memory controller and CPU interconnects (which is a bit more complex in a multisocket server).
            Once that code is complete, PEI can now support a heap, shadow code into RAM rather than execute directly from flash and can handle compressed modules. There might be some remaining chipset initialization that needs to be done here but the system goal at this point is to locate the DXE core and pass control to it.

            Once control is passed to the Driver Execution Environment (DXE) core entry point, the system will be put into 64 bit mode if the CPU supports it. The format and data structures specified in the PI spec become much more important. DXE will look in specified firmware volumes (FV) and parse their firmware file system (FFS) entries. Those FFS entries will contain sections which may be compressed or uncompressed, may be executable code, may be data, or something vendor defined.

            The behavior of DXE is largely up to the OEM. The modules the DXE core executes can be defined to be a minimum set needed to get to a specific boot device, or can be a complete system initialization. The system then enters boot device selection (BDS) where it tries to figure out where the OS the user intends to boot is located and load its boot loader.

            It’s also important to note that FVs and FFS entries are used in PEI and DXE. Usually all of PEI is in a single FV. The DXE core and required architectural components will be in a separate FV. The built in modules may be in the same FV as the DXE core, or it might be in a separate FV. It’s generally up to the OEM though the reference designs are typically one FV for PEI, and one for DXE and BDS.

            Backwards compatibility on a UEFI system is handled by something called a comparability support module (CSM). The ones I have seen are basically stripped down legacy BIOS projects. All the stuff like the interrupt vectors and in memory call tables for the 16 bit interfaces are in this code. There can be complex systems where an interrupt vector (like int 13 or int 10) trigger a thunk to 32 or 64 bit UEFI code which does the work, the system thunks back down to 16 bit mode, and all the data is passed as the legacy call specifies. This can also be done using something called a system management interrupt (SMI) which has the ability to intercept accesses to certain IO ports so things legacy keyboard and mouse accesses can be made to work with modern USB keyboards and mice.

            The reverse is also true, it is possible to wrap a legacy service with a UEFI driver so there can be a UEFI block IO driver provided for an add in card that only supports a legacy int 13 interface.
            Both of these methods are really hacky and are responsible for things like the int 10 weirdness when the initial OS transition to UEFI was going on.

          3. Ok, @Matt:

            Yes, I’ve read both thed PI specs and the platform specific datasheets, I have looked into the Q67 chipset, the GM45 chipset and checked out some platform datasheets for the 5th-gen architecture.

            So for the most part you just confirmed what I just said, think about it:
            Already you said about the platform specific stuff with the “Descriptor mode” stuff, that is the PCH+ME and other pre-power-on stuff. Though there are descriptor mode stuff that happens after the main power is applied:

            0.8v-standby for the PCH and ME (AMT) stepped down from the 5v-standby (AT/ATX) or the 3.2v-to-19v (portable/battery-operated)
            (all other VRM units are off)

            Power-up:
            PCH @ 0.8v-AMT + 0.8v to 1.5 (or user/MFG supplied) for peripherals and interconnect,
            0.7v to 1.5v CPU (upon request from the VID lines)
            2.5v-ddr 1.8v-ddr2 or for DDR3: LDDR3 1.2v, DDR3 1.5, HDDR3 1.8v (High Performance)
            and any other system voltages.

            If what you said for a modern core i series is as true as to the datasheet, then why is it so difficult for coreboot, Libreboot and Purisim to just simply disable Intel ME or any of the other blob-ware?
            and why can’t they just compile their own ME firmware and just run it?
            Also why do Libreboot and friends go on about how little documentation there is for this stuff and why is it they produce at least some results if they’re always wrong? (after all that is one of my many sources + experience)

            Modern x86 is an onion of layers and is essentially an x86 emulator on a couple chips with more x86 (ARC on older ME) emulators scattered around running various “security” stuff.

            P.S. SoC units like the baytrail and cherrytrail are worse: secure-boot is optionally burnt into the fuse-table on the CPU/SoC it self and thus the ROM is checked before the ROM can check it self (Seems partitionable as NVRAM emulation partitions are allowed changes)

            .
            .
            .

            TL;DR:
            My post that you wrote against was just a TL;DR version of what you wrote: heck I even skipped out Cache-as-ram mode since it was about the boot chain, not the nitty-gritty instruction-by-instruction description.

          4. Pad padded padding for x86 commenting and that cat and mouse chase using various handles that require verifying due to certain uses of reminding certain people of certain things… Matt this header can be ignored as the content is below… hopefully less delayed.

            Ok, @Matt:

            Yes, I’ve read both thed PI specs and the platform specific datasheets, I have looked into the Q67 chipset, the GM45 chipset and checked out some platform datasheets for the 5th-gen architecture.

            So for the most part you just confirmed what I just said, think about it:
            Already you said about the platform specific stuff with the “Descriptor mode” stuff, that is the PCH+ME and other pre-power-on stuff. Though there are descriptor mode stuff that happens after the main power is applied:

            0.8v-standby for the PCH and ME (AMT) stepped down from the 5v-standby (AT/ATX) or the 3.2v-to-19v (portable/battery-operated)
            (all other VRM units are off)

            Power-up:
            PCH @ 0.8v-AMT + 0.8v to 1.5 (or user/MFG supplied) for peripherals and interconnect,
            0.7v to 1.5v CPU (upon request from the VID lines)
            2.5v-ddr 1.8v-ddr2 or for DDR3: LDDR3 1.2v, DDR3 1.5, HDDR3 1.8v (High Performance)
            and any other system voltages.

            If what you said for a modern core i series is as true as to the datasheet, then why is it so difficult for coreboot, Libreboot and Purisim to just simply disable Intel ME or any of the other blob-ware?
            and why can’t they just compile their own ME firmware and just run it?
            Also why do Libreboot and friends go on about how little documentation there is for this stuff and why is it they produce at least some results if they’re always wrong? (after all that is one of my many sources + experience)

            Modern x86 is an onion of layers and is essentially an x86 emulator on a couple chips with more x86 (ARC on older ME) emulators scattered around running various “security” stuff.

            P.S. SoC units like the baytrail and cherrytrail are worse: secure-boot is optionally burnt into the fuse-table on the CPU/SoC it self and thus the ROM is checked before the ROM can check it self (Seems partitionable as NVRAM emulation partitions are allowed changes)

            .
            .
            .

            TL;DR:
            My post that you wrote against was just a TL;DR version of what you wrote: heck I even skipped out Cache-as-ram mode since it was about the boot chain, not the nitty-gritty instruction-by-instruction description.

        2. Real mode is a 16 bit instruction subset with unprotected segmented memory access. What’s so bad about it? If you use a higher level language the only problem is choosing the correct memory model, if you code assembly then why do so if you don’t like it?

          Really don’t see the problem.

    1. OK, let’s all stop using Intel processors and SOCs… Or if we need Intel, lets all move to the Itanium* architecture!

      I’m writing off a machine that started in real-mode, Massive data-centers started in real-mode (assuming IA-32_64),
      UEFI SoC tablet PCs also included.

      Heck many “Pure UEFI” systems use old routines from a cut-down BIOS that don’t expose their API in the same places as the old BIOS spec. Instead of 0x10h 0x13h for primitive video at 0xA0000000 with just under about 500KB of linear allocation AFAIR, instead only UEFI can fall back to real-mode and change these… or just have direct access to memory mapping tables in the descriptor tables and move the buffer into a local buffer (the ones that can’t display UEFI-shell EDIT command properly!).

      I tried to modify the Tesco Hudl 2 BIOS and all I got was a stone cold CPU with no voltage: Some form of boot-ROM signed boot was burnt into the SOC fusible table!!!
      It should of got to 50 to 70*C by being in a real-mode tight loop consisting of:

      Pseudo ASM:

      :F000FFF0
      ;Why does intel boot from the end of ROM space, need more room!
      jmp F000FFF0_start

      :F000FFF0_start
      ; Copy the image from ROM and other routines
      jmp halted ; jmp decompress - changed to a halt and catch fire routine

      :halted
      invalid ;Invalidate cache so it doesn't accidentally contain IRQ service routines
      noirq ; wipe interrupts before they cache AND service!
      jmp halted

      That was the 16-bit area of the ROM that copied the remaining 8MB mem-mapped ROM (excluding the other 8MB Platform Security partition) and subsequently decompressing the 32-bit UEFI mode and jumping to it when the magic byte appeared at the end of the decompressed code (the last byte of the data).
      Oh and it won’t load Platform security if the real-mode code doesn’t check out due to a hard-fuse in the SoC!

      *Isn’t Itanium supposedly a pure x64 architecture.
      Also I assume the SoC didn’t run that code at 90MHz (Standby-exec/connected-standby) whilst the CPU only needed 0V across VCC and VDD? Cos’ thatll explain the cold CPU!

      1. For compatibility yes. But it isn’t what the OP meant. I kind of like the original x86 instruction set and design, it is clean but restricting with most registers having specific functionality with specific instructions using them. If one can accept those limitations it is a good design.

        Intel did have a x86 chip that started in 32 bit mode (don’t remember the name/number – was a 80386 based design). Almost nobody wanted it and Intel learned from that.
        Besides the extra functionality required for real mode support is small, most instruction decode and execution logic is also needed for 32 and 64 bit modes. Allowing 16 bit protected mode means that one almost have real mode supported already and most of the 32 bit support also require supporting 16 bit protected mode.

        TL;DR supporting real mode gives support for 8086 software, not supporting it wouldn’t buy much.

        1. You’re probably thinking of the Pentium Pro. It was a complete redesign from the ground up for 32bit environments. The Slot1 Celerons, Pentium 2, 3 and Core/Core Duo, Core Quad were derived from the Pentium Pro with added support for 16bit to be more usable for home environments where win95/98/ME was still common. Pentium 4 was its own power hungry netburst beast. And I believe the current i series processors we have these days are a merging of the Pentium Core line with some of the Netburst features.

          1. That’s strange because every BIOS image I’ve seen for the Intel atom SoC (cherrytrail/baytrail), QM67-core-i-series, core2duo/quad, Pentium M (P3 with enhanced speedstep, sprinkling of new instructions from P4 and more cache)… They all have 16-bit code strapped at F000FFF0 and the block boundary begins at around 1MB before the F000FFFF alignment.

            Thus Pentium and onwards (Pentium Pro was the server stuff of the days, just I being picky… sorry) was an add-on to the 16-bit real-mode… hence you had to “Enter protected mode” ever since.

            Also I’ve only once used IDA to modify a BIOS into a halt-and-catch-fire panic() like loop… that didn’t even boot due to DRM* burnt into the SoC.
            I have viewed plenty of BIOS images in IDA, BTW.

            * DRM should be acronym for: Damn Retarded Morons, or, Dangerously Restricted Machinery
            One to describe the people who came up with DRM and one to describe the crippled state of the machine.

          2. Nah the Pentium Pro supported all the horrible old 16-bit stuff, including the really bizarre way Intel implemented paging with the segment registers. It was just slower at running 16-bit code than ordinary Pentia were. At the time there was still a lot of 16-bit code being run on PCs, even Windows 95 used a fair bit of it. So the “Pro” failed because of that.

            Is there a prize for stupidest name in a processor? Pentium must be a contender. As in, “we can’t trademark 80586”. Seems a lot like they had a competition and the losing entry got the job for some reason. Maybe they used FMUL in the entry-picking code.

            And then after that, well, Hexium, obviously. Nope! And then fast-forward and what have we got, “Core”. May as well call the next one “Chip” or “Processor”, “Bucket O’ Transistors” maybe.

            All that money…

          3. The Pentium Pro booted in real mode as all other x86 processors. It wasn’t fast when executing 16 bit code but it did execute it. Later processors in the series (Pentium II, III) improved the 16 bit support a bit.

            Current x86 processors do kind of the same thing, support legacy software but optimizing for the common case which is 32 bit mode without segmentation or 64 bit mode.

            As Doc Oct wrote (thanks!) it was the 80376 I was thinking about.

    2. Millions of computers stop running in real mode every day, a few milliseconds after they are reset. :-)

      Programming in real mode is not that hard, certainly not for a small program like a boot loader or a slightly bigger program such as a BIOS which is basically a boot loader and some device drivers. If you’re afraid of real mode, you shouldn’t call yourself a hacker.

  2. Great post. Would have loved to have had this 20 years ago when I was writing an OS for the 386. It took me a long time to figure out the A20 nonsense. I spent almost as much time on the bootloader as I did the kernel!

  3. Hi Steven ! Great article. Many Forth systems from the 70′ and 80′ would do that from their core . They were also able to read and write raw sectors from HD and Floppy DD (dividing the media in 1024 (1K) bytes blocks. I remember from that time volksForth86 from the Forth Gesellschaft Germany did this in the x86 version (it is still available to give it a try (it is a 32K .com with online assembler +editor included! ) and full documented source with all call´s to the bios functions). It was also easy to make it start from Eproms or Floppy Disk emulators in a ROM/EPROM card. Since it saved and generated .COM files, it was excellent to create device drivers for DOS too… and TSR routines.

  4. Real mode with the segmented addressing is one of the ugliest things I have ever seen in computers. And a real WTF for a person like me that used to program 680×0 in assembly.
    Programming it in assembly is like solving a puzzle all the time. Why the hell is the segment selector shifted by 4 (!!!) bytes when its purpose is to be combined with 16 bit offset?!! Is that some technical limitation of the day, joke or idiocy of the designers? The first and only thing I do in Real mode is to escape from it to the Protected mode with linear addressing.

    Good luck to everybody enjoying Real mode, espetially programming VGA through the narrow addressing window instead of linear buffer is a pure joy and time saver.

    1. If you read up on how and why segmented addressing was implemented the way it was, it was actually (yes arguably) pretty brilliant. But you should think about it from the perspective of a CP/M programmer. In CP/M there was a predefined memory map where a bunch of things were in low memory (addresses 0000-00FF hex) and programs were loaded and started from address 100 hex. It was possible (with a few tweaks) to take the source code for a CP/M program written for the 8080 processor and assemble it for the 8086, and run it with almost no modification if the CP/M BIOS/BDOS calls were implemented.

      WIth the segmenting scheme, on the 8086/8088 you could load multiple programs in memory, at intervals of 16 bytes, and each program could be made to thing that their memory map started at 0000. In the worst case, if you had a computer that had the huge amount of an entire megabyte of memory, you could load 16 programs at the same time, each with 64KB of address space, and they would never touch each other as long as they didn’t know how to use the segment registers. And of course in the best case, programs that would behave nicely and wouldn’t peek or poke into memory that they weren’t supposed to peek/poke could just be stored at intervals of 16 bytes so you didn’t even need that gigantic megabyte of memory.

      Yes, real mode gets to be a real pain in the butt when you have to start juggling pointers around. And of course not having a memory management module is a big pain if you want to have an operating system that lets programs work together and not take the entire system down when they don’t. We’re pretty spoiled nowadays. But for booting a machine, real mode addressing should be just fine. Nobody needs more than 64K to boot a PC (right? :-)

  5. This is great! I’m working with an older 486 machine that I need to get running C code. I lost the harddisk and I am running all my code on the EPROM, so I need to set up some kind of netboot setup through serial.

  6. I once dabbled with the Linksys NAS200 which runs an RDC8610 which is basically an 80386 emulator running at 150MHz. Of course one of the things I did while I worked on it was try out some commands in the interactive boot loader, which made me brick my machine. I had to create a JTAG cable to make it boot one instruction at a time, and the RDC8610 apparently has some bugs in real mode which made it do all kinds of weird stuff while booting (I think the problem is that it was impossible to set the CS register through the JTAG interface or something).

    I don’t remember how I finally got it to work again but I think it was basically step through memory one instruction at a time, and trying to store an long JMP instruction somewhere that would make it go to the code that would set up the GDT and switch to protected mode so I could make it run the TFTP server to download the boot loader again.

    Good times!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s