The Unusual Pi Boot Process Explained

If you’ve ever experimented with a microprocessor at the bare metal level, you’ll know that when it starts up, it will look at its program memory for something to do. On an old 8-bit machine, that program memory was usually an EPROM at the start of its address space, while on a PC, it would be the BIOS or UEFI firmware. This takes care of initialising the environment in both hardware and software, and then loading the program, OS, or whatever the processor does. The Raspberry Pi, though, isn’t like that, and [Patrick McCanna] is here to tell us why.

The Pi eschews bringing up its ARM core first. Instead, it has a GPU firmware that brings up the GPU. It’s this part of the chip that then initialises all peripherals and memory. Only then does it activate the ARM part of the chip. As he explains, this is because the original Pi chip, the BCM2835, is a set-top-box chip. It’s not an application processor at all, but a late-2000s GPU that happened to have an ARM core on a small part of its die, so the GPU wakes first, not the CPU. Even though the latest versions of the Pi have much more powerful Broadcom chips, this legacy of their ancestor remains. For most of us using the board it doesn’t matter much, but it’s interesting to know.

Fancy trying bare metal Pi programming? Give it a go. We’ve seen some practical projects that start at that level.

7 thoughts on “The Unusual Pi Boot Process Explained

  1. Well, this is close to useless – he tells us three times that:
    1) The GPU starts up from either internal ROM or external EEPROM, while the ARM cores are held in reset.
    2) The GPU reads config.txt, initializes all of the hardware, and finds the kernel.
    3) The ARM core runs the kernel.

    Yeah, I just figured everybody needed to see that one more time.

    The linked HaD article on bare metal programming the Pi may actually be useful, though. I don’t know how I missed that the first time around.

    1. Well, it may not be ready for prime time, but that page looks like a treasure map to me. Not that I’m ready to do that kind of work; the Circle project https://github.com/rsta2/circle looks more like my speed, since it shows you how to build a “kernel” (i.e., any program you want to run on an A7, A53, or A72 core in lieu of a Linux kernel) that the standard Pi firmware will boot. I don’t have time for it right now, but it’s in my notebook.

  2. The “Traditional PC Boot” sequence is also wrong. The BIOS is definitely not the first code that executes on any modern PC! For instance, on Intel PCs, a small mask ROM executes on the Intel Management Engine core, which then loads and verifies a second-stage Management Engine binary that lives in the BIOS flash. As I understand it, ME does not initialize DRAM itself (DDR5 training is tricky business!), but it does initialize plenty of “uncore” components, and also does adjust some pad settings based on configuration options in the BIOS flash. Only after setting up those parts of the system, and loading in verification keys for Boot Guard, does ME then jump into the BIOS flash (which proceeds to set up DRAM).

    This post says a bunch of contradictory things about how Pi’s secure boot is implemented. (It suggests that Pi’s secure boot does not provide immunity to malicious GPU firmware, but then says that that the chain of trust does start in the GPU. As it turns out, Raspberry Pi’s own documentation, page 5 shows that GPU firmware is also verified.)

    Ugh. This post is very confidently written for how wrong it is.

    1. But to be fair, the IME sort of is a whole embedded computer running Minix.
      It’s thus more like an external debugger hardware that analyzes/manipulates the “PC” side at will.
      Also, modern x86 CPUs aren’t true x86 anymore. They’re RISC-CISC hybrids that have a front-end that converts x86 instructions into nstive format.

      Heck, even the 8086/8088 had microcode that performed things in software rather than in hardware! :)
      The NEC V20/V30 line thus was more “native” at executing x86 instructions than the intel 808x line ever was.
      Before the 80286 (maybe 80186) it didn’t even feature address calculation using dedicated hardware – it had to use the ALU for that (slow).
      The Zilog Z80 likewise also used more hardware implemented functions, which made it more “native” at code execution than i8080/i8085.

  3. “On an old 8-bit machine, that program memory was usually an EPROM at the start of its address space”. Commodore 64, old and 8-bit enough? The 6502 starts at the end-of-address-space. Just like 8086, 80186, 80286, 80386 and 80486 processors. And back in the commodore times commercial computers simply had ROM. Not EPROM.

    1. The 6502, coming out of reset, would read the two bytes at FFFE and FFFF, to obtain a 16 bit starting address. (Just like the 68xx it evolved from)

      The 68K, coming out of reset, would read the 32 bit word at address 000000 into the program counter, and the next 32 bit word from 000004 into the stack pointer, to get going.

      Meanwhile, the 8088 approach was to just start executing at address 0000 in segment FFFF, which if you ironed that out to a flat 20 bit space, would be FFFF0, i.e., the last 16 bytes of the address space, room enough for a couple of instructions, e.g. a jump back to a more convenient location in the ROM.

      Of course, in those days, the DRAM refresh usually took care of itself (baked into H/W), or was handled once you init’d your DMA controller. After that, you’d get your interrupt controller and interrupt sources under control, maybe init a video system, and then maybe go for a boot block from a disk.

      The actual CPU took care of all of that. That was when processors were products, that tended to be very well debugged before they hit the market. But then, those products had on the order of 10^3 to 10^4 transistors, so “getting it right” was undoubtedly a lot easier than the situation today. With north of 10^9 transistors, apparently we’re stuck with CPUs that themselves need to be initialized (patched) before running, by some simpler, internalized CPU like the IME or what have you.

Leave a Reply to rewolffCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.