Degrees Of Freedom: Booting ARM Processors

Any modern computer with an x86 processor, whether it’s Intel or AMD, is a lost cause for software freedom and privacy. We harp on this a lot, but it’s worth repeating that it’s nearly impossible to get free, open-source firmware to run on them thanks to the Intel Management Engine (IME) and the AMD Platform Security Processor (PSP). Without libre firmware there’s no way to trust anything else, even if your operating system is completely open-source.

The IME or PSP have access to memory, storage, and the network stack even if the computer is shut down, and even after the computer boots they run at such a low level that the operating system can’t be aware of what they’re really doing. Luckily, there’s a dark horse in the race in the personal computing world that gives us some hope that one day there will be an x86 competitor that allows their users to have a free firmware that they can trust. ARM processors, which have been steadily increasing their user share for years but are seeing a surge of interest since the recent announcement by Apple, are poised to take over the personal computing world and hopefully allow us some relevant, modern options for those concerned with freedom and privacy. But in the real world of ARM processors the road ahead will decidedly long, windy, and forked.

Even ignoring tedious nitpicks that the distinction between RISC vs CISC is more blurred now than it was “back in the day”, RISC machines like ARM have a natural leg up on the x86 CISC machines built by Intel and AMD. These RISC machines use fewer instructions and perform with much more thermal efficiency than their x86 competitors. They can often be passively cooled, avoiding need to be actively cooled, unlike many AMD/Intel machines that often have noisy or bulky fans. But for me, the most interesting advantage is the ability to run ARM machines without the proprietary firmware present with x86 chips.

ARM is an Architecture Licensed to Many Manufacturers

ARM doesn’t make any chips themselves like the x86 manufacturers like Intel do. Rather, they maintain and license their architecture to other companies who in turn build processors that use the ARM instruction set. There is an almost uncountable number of companies making ARM processors: Broadcom, Qualcomm, Rockchip, Atmel, STMicroelectronics, and Texas Instruments, to name a few. And don’t forget Apple, who have been making ARM-based phones and tablets for years and who are about to transition their entire line of products to this superior architecture.

The diversity in manufacturers both is a blessing and a curse when it comes to privacy-respecting options for firmware and software. With so many manufacturers, ARM chips are in almost everything and are so common that there is an easily-accessible wealth of knowledge about how to build software for them (even though desktop computing applications are just a little bit behind).

Applications for the platform are varied as well, from microcontrollers to routers to smartphones and a handful of PCs. However, as anyone with an Android phone may have experienced when trying to unlock their bootloader, there is no uniform way that ARM processors are booted and there’s no uniform or even standardized boot software for ARM-based chips. Some use uboot or coreboot, some need to use some binary blobs, and still others have proprietary firmware that is not open source or able to be modified in any way and even prohibits modifying other software on the device.

Companies using ARM devices are free to open up their devices to be as free as possible like Pine64 does with their phones, tablets, and computers, but others (including cell phone service providers like AT&T or Verizon) can use the freedom afforded to them by the ARM platform to make sure their customers have almost no access to the software running on that hardware. Finding ARM platforms that are open is a challenge if the original manufacturer or supplier didn’t make it a priority, but there are some other options available.

Finding Your Way to ARM and Libre Firmware

One of the more favorable of those options is the Rockchip RK3288, which uses an ARM Cortex-A17 processor and can be found in a number of different Chromebooks. Libreboot, a free and open-source firmware available for a small set of computers, is available for these chips as well which means that (as long as you can get the right graphics driver installed) you can run 100% free software on this computer. Of course, the chipset is around six years old so while it is a fair bit newer than other computers running libreboot (like installing libreboot on my personal laptop which is of the 2008 vintage), it’s still not the most modern processor out there.

PineBook Pro teardown shows a Rockchip RK3399 ARM processor.

For something a little newer, a great example of the openness possible with ARM is from Pine64, which produces several laptops, phones, and a tablet all based on ARM chips. Their PineBook Pro, for example, uses the upgraded Rockchip RK3399 which has two Cortex-A72 cores and four Cortex-A53 cores, which allows it to split various tasks among themselves in order to make the best use of each of these chips, and of course it uses a libre bootloader as well. The offerings from Rockchip aren’t the only options, either; the Free Software Foundation has a list of other systems-on-a-chip that have varying degrees of software freedom.

Popular Choices for Open Bootloaders on ARM

While the open, diverse nature of ARM means that anyone anywhere can code a firmware/bootloader/BIOS for their specific platform of choice, it’s not necessary to reinvent the wheel. There are a few options already out there that are popular choices.

The most free of these is the oft-mentioned libreboot, which uses 100% free and open-source software and never uses any binary blobs. It is available for a handful of ARM-based laptops from the early 2010s (as well as some other older x86-based boards as well). Libreboot itself is a fork distribution of coreboot, a bootloader that is largely free (licensed under GPLv2) but occasionally uses proprietary binary “blobs” of non-free software in order to get certain hardware up and running that might not otherwise have a non-proprietary way of booting.

Besides these two main bootloaders there is also Das U-boot, or simply uboot, another free bootloader available for various platforms including ARM. Many specialty bootloaders exist as well, such as RedBoot which is built specifically for Red Hat implementations, and BareBox which is used largely in embedded devices. Of course, like the many flavors of Linux, there are an astounding number of other bootloaders available with various features and levels of freedom.

You Should Value Your Privacy and Security

With so many variables, hopefully the coming ARM revolution will include free options for those of us who value security and freedom from the ground up. While Apple almost certainly will not use a free or open-source bootloader as the firmware for their laptops, they’re not actually driving this movement. There’s sea change happening right now throughout the computing world in favor of ARM processors over their more inefficient and insecure x86 competitors and if Apple is any indication this may eventually spill over into the rest of the PC world as well.

The current state of PCs doesn’t really allow us to “vote with one’s wallet” since there are almost no options in the landscape for security or privacy. But your privacy and security have value. With the diversity of manufacturers of ARM devices, I am hopeful that the a growing number of companies will to listen our needs and finally offer modern, powerful, and competitive computers built from the ground up with hardware, firmware, and software choices that begin with privacy and security in mind.

97 thoughts on “Degrees Of Freedom: Booting ARM Processors

  1. I don’t see why this is x86 specific.
    The market forces that drove IME and psp will probably still be there for an arm, riscv or any other core.
    trusted boot , debug, remote security management, safety requirements , hacker/virus detection, and so on… drive needs for deep HW access outside the user application space.

    I agree we should all value privacy and security, just saying more has to change than the CPU to do this.

    1. I don’t really think the IME/PSP is concerning because it’s *there*, it’s concerning because it’s a black box. If you want those capabilities in a computer that’s fine, it just needs to be open source and accessible to the user/owner.

        1. From what I can tell, not every ARM processor is required to use the ARM Trust Zone. If it was analogous to the IME/PSP then any ARM without it wouldn’t start up or would cripple itself in some way if the Trust Zone wasn’t detected. As far as I can tell this isn’t the case. It also seems that the Trust Zone is less of a black box than the IME/PSP. As it stands right now, you can get ARM processors to easily boot with libreboot/uboot/coreboot. This is not the case with IME/PSP where you need a computer from 2008 at the newest in order to remove the proprietary firmware.

        1. Don’t even put the Raspberry Pi or its processor in the same classification as ANY other ARM device. The RPi has, simply, a closed, black-box, proprietary-blob CPU AND low-level operating system (gee, you CAN”T program the RPi in Assembly Language? Now, why in the world would THAT be…?). It has been STRICTLY PROPRIETARY, since the very first day, and will never change.
          I assume you’ve NEVER WONDERED why you have to run a high-level operating system just to get the RPi to do what it’s best at: flashing LEDs, and a poor imitation of an SE/NE 555 timer at that, have you?

  2. First I was thinking “what’s the big deal? Read block zero, save it in ram, and jmp to it”. But these things are turning into full featured OSes in their own right. Also, the wikipedia rat hole reveals that GRUB2 can boot from a Zip drive, so there goes the afternoon. Thanks, Hackaday.

    OK, sparcstations had a FORTH interpreter, so I guess you could use the boot roms as your main machine. It’d be an Apple ][ on steroids…

      1. Au contraire! Sure, BOOTP/TFTP booting still requires a minimal IP stack and a driver (well, a ROM anyway), but everybody supported it. Your stack only needed to support a minimal IP implementation, and just enough UDP to fetch an IP address and TFTP down your kernel. And in the days of net booting PCs, the stack was contained in the NIC’s bootrom. PXE came later and add some necessary (and unnecessary) complications.

        And to belabor the point, yes booting was really that easy. That’s how booting worked once the first engineer integrated enough bootcode into a ROM at the start address of a CPU.

        1. Wow you just repeated what I said, you need to have a network stack and a nice driver for network booting. Not “everyone” supported it, there are lots of old Isa network cards with no boot rom socket.

  3. The reason for the IME and PSP existing is to maximise profits. In effect the same silicon, after testing, is being sold as different products. The IME/PSP will tell you what the device is and which features you are allowed to access. If they are making 3.2 GHz 64 core CPU’s and there is a very low yield rate of the silicon wafer, maybe they can be sold as 2.4 GHz dual core CPU’s, just get the IME/PSP operating system to only enable what should be reported back, and allowed access.

    For maximum profit, licensed ARM cores will eventually introduce the same. It is either permanent one time fuses or “secure” software reconfigurable after the product has shipped (which may avoid a costly recall).

    1. If they wanted to offer a version without IME/PSP perhaps they could design it to have some sort of fuse bit that permanently disables IME/PSP. Something which is only intended to be settable at the factory, perhaps with something on the wafer that never gets bonded to any pin.

      That should make it fairly inexpensive to provide both options.

      But I doubt that will ever happen because they would lose their friends at all those three letter agencies.

      1. There’s also the fact that, at very least for AMD (and probably Intel’s IME as well), the PSP does a bunch of the pre-init setup to get the CPU into a bootable state (I think I read somewhere that it does the RAM “training” to dial in the different values of termination resistors for the various lines before the CPU starts up). Simply disabling it at a hardware level isn’t that easy.

    2. You’re talking out of your ass. IME/PSP has nothing to do with binning and disabling features on “cheaper” silicon. CPU/GPU manufacturers have been doing this for decades already by blowing fuses during manufacture.

      1. I know people who work with actual wafers and it is how they described why the IME/PSP exist today. Maybe they are wrong, but I put more faith in their knowledge working in the industry since the 80’s than yours.

        1. Intel/AMD wafers or they just an anonymous wafer fab tech? They have no idea about the AMD/Intel security processor architecture in either case. There are several groups working on reverse engineering IME and PSP and there is nothing that shows either company to be limiting features using this. It is still set using fuses of which there are a large number now. Intel has various fuses to lockdown what can be done or changed in IME. I won’t be able to convince you so believe what you want.

  4. It is funny how often this RISC vs CISC thing gets muddled. The only reason we still have CISC (i.e. x86) is because of the Microsoft stranglehold (combined with perhaps Intels lethargy). Any notion that “complex” is better is off the mark — and ignorant that under the CISC hood, things are pretty much RISC anyway. The truly amazing thing to me is that the x86 instruction set has persisted to the modern day.

    1. The main difference between RISC and CISC is how many instructions they have.

      Architectures with more than 32 instructions can be called CISC.
      Architectures with fewer than 50 instructions can be called RISC. (Yes, the two overlap.)

      Also, do note, up-codes/instruction-calls aren’t “instructions”. x86 for an example has 90+ up-codes for conditional jumps (“if statements”), but all these instruction calls are largely going to be regarded as the same instruction. The same is true for a whole bunch of other instruction calls for x86. (So people stating that x86 has 2000+ instructions are a bit incorrect. Though yes, there is 2000+ instruction calls.)

      Now, a lot of modern architectures use microcode instructions, breaking down the more insanely complicated instructions into more manageable strips of code. Saving us the need to implement dedicated hardware for seldom used functions, while still getting the performance advantage of not needing to toss around as much memory to execute the same function.

      But microcode doesn’t send x86 instruction count down bellow 50 instructions. Not even close to be fair…. Modern x86 processors are still CISC.

      ARM on the other hand, most of its more modern implementations tends to have more than 50 instructions as well. So even ARM tends to be CISC these days, despite having “RISC” as part of its name. (Same is true for PowerPC that also started out life as a RISC architecture, then IBM/Apple added a bunch more instructions to it to improve its performance in more application specific areas, more than once as well…)

      One of the main goals of RISC architectures were mainly that they could provide similar performance for a whole slew of more general applications. After all, there is a lot of applications that only needs If statements and basic integer math.

      RISC isn’t magic. It doesn’t “use fewer cycles to do the same work” as so often quoted by people. (including this article… There is more to this, but I’ll get to that in a bit.)

      The main reason why RISC architectures can perform on par, and usually with superior power efficiency is mainly due to the fact that they have less control logic to bind together their various features. A crude example is a cross bar switch. If we have 4 inputs that we want to link to any of 4 outputs, we need 4*4 gates for controlling that signal flow. But if we have more outputs, we need more gates, and it can get rather absurd if we also add on even more inputs.

      But if we run an application that doesn’t need that much resources, nor is able to really make use of it regardless. Then all that hardware will only chew power while not providing any additional performance.

      Another application that can make use of these additional resources would see increased performance.

      So what about power efficiency?
      Well, our control logic will generally have a fairly constant power consumption per cycle, and our execution resources will have their own power demands. (And unused resources wouldn’t cycle their transistors, as to save in on power. (This is trivial to implement btw.))

      A RISC architecture would generally need to execute multiple instructions for each CISC instruction. (There is exceptions in regards to more basic instructions.)
      This means that our RISC architecture will need more cycles to implement the same function as that one instruction over in our CISC architecture. With each of those cycles our RISC architecture will need to also run its control logic, not only slowing down the execution speed, but also introducing more transistors for us to run and thereby lowering power efficiency. (Similar to the difference between a proper ASIC and an FPGA. FPGAs are rather flexible, but draw more power for less performance compared to an ASIC using the same manufacturing technology/node.)

      A RISC architecture is theoretically unable to achieve better performance, nor power efficiency compared to a CISC architecture aimed at our application of choice.

      Now, Intel nor AMD, or most other CPU vendor makes “application oriented CPUs”, or they kinda do, but its way broader… (Though, some CPU vendors do make CPUs aiming at very specific applications and including relevant instructions, bringing forth rather immense performance and power efficiency advantages for said application.)

      For applications that uses the CPUs more advanced functions more seldom, will generally not partake in the advantage that those functions brings, greatly narrowing down the gap in both performance and especially power efficiency between a CISC and a RISC implementations. This is more or less what we expect the general “PC” system’s software environment to fall under.

      But then we have the “RISC architectures uses fewer instructions than CISC ones!” argument some people make.

      This is actually true at times. CISC and RISC isn’t just divided by how many instructions an architecture has. But it is also partly down to the philosophy behind the instruction order, and how instruction calls are made.

      X86 for an example can get a bit flamboyant in how it wishes you to express your code, and at a lot of times you need to do a fair bit of housekeeping just to make it do what you want. (Exaggeration to be fair.)

      While some other architectures are more like programming in BASIC. (Ie, you don’t need to go about resetting various flags.)

      This means that in some cases, you can end up using fewer cycles to do that same job. A well implemented CISC architecture on the other hand wouldn’t have this deficit.

      But then we have the question of why is CISC still a thing?
      Well, as stated, even ARM is largely CISC these days.
      x86 and a lot of other architectures do use microcode, but this only removes the more extreme functions that are rarely even used. (Honestly, some of the more niche instructions might never get used for the whole life of a computer… While some applications use those instructions all the time, maybe even to the point where dedicated hardware makes sense, but including hardware for such niche instructions would be a bit silly if people in general don’t use it.)

      Also, well implemented CISC architectures can be less memory intensive compared to RISC ones. Since you need less code to do the same job. (Microcode conversion is done inside the CPU and thereby doesn’t need to pass via any external buses. Saving in on the need to also include information in regards to registers and the numerous instructions needed for said microcode instruction)

      Is Microsoft to blame for x86’s “dominance” in the PC world?
      No not directly, Linux runs on a lot of other architectures, BSD too, not to mention Solaris and other OSes. Even Windows has dipped its toes into other architectures too… (mainly in the embedded world)

      The bigger reasons x86 is around is mainly due to its insane back catalog of code, its high performance, and the fact that two of the larger CPU suppliers in the world is producing it. (There is though technically more ARM devices in the world, but if we look at the higher performance side of things, ARM is all but a fraction of x86 at current. Though, this might change.)

      So the fact of the matter in regards to the standing of CISC and RISC on the current market isn’t as clear cut as most might think at first.

      Though, I do need to say that I have skimmed over a lot of details here, but this comment is already 100 words shy of being longer than the original article. (excluding this last paragraph.) But to be short, computer architectures are fairly complicated beasts, though it varies from architecture to architecture, not to mention implementations of said architectures. But this comment outlines some of the general differences between CISC and RISC. (Though, to be fair, I more or less expect uneducated troll answers to this comment regardless…. It’s the internet after all.)

      1. RISC started out with slim instruction sets, but the real differentiator is that they use load/store archititecture, vs complex memory-addressing modes in CISC.

        That is, in RISC, the instructions are either load/store instructions, which read or write between memory and registers, or other instructions which operate only on registers.

        In CISC, instructions can perform directly on memory locations – the cpu has to do the load/store implicitly as part of the instruction.

        It’s more efficient to have manual control, and also separation of concerns allows better optimization.

        1. It indeed tends to be the case that CISC architectures gets a bit more hands on with the memory, though it isn’t a major requirement to be fair, and even some RISC architectures does the same, though to generally a more limited degree due to having fewer instructions to start with.

          Though, some architectures don’t even have registers and always works directly with the memory.

          But yes, in general RISC architectures do aim for a more simple way to interact with memory by mostly having a few instructions that does anything with it. (Though, then we have cache related instructions to flush stuff out of various cache levels, or to do forced coherence checks, etc. And this of course varies from architecture to architecture as well. Some architectures can even have forced prefetching. (Sometimes I poke at the idea of a system that mostly uses software managed cache and prefetching, but it has some downsides….))

      2. I recall that the original argument for RISC was that with far fewer instructions to decode there was far less silicon needed; on the same size die freed real-estate could be put into task-specific registers so that on a task switch, instead of storing the register contents out to RAM, the task state would just switch to the new task, which would still have its registers loaded, eliminating nearly the entire time cost of a task switch. Further die-space savings could be put to more on-die cache, further decreasing the amount of time spent communicating with RAM. The trade-off was an expansion of the 20% or so of more complex instructions being rewritten by the compiler to use the reduced set.

        It appears to me that CPU designers instead went “wide”. Instead of increasing instruction width, they evaluate multiple paths in parallel and then throw away the results from all the branches that weren’t required after all. I suppose that does make it more difficult to cut power use in the name of performance.

        I’ve understood the main reason for x86 was Intel’s original insistence on not screwing their customers with their new products. Everyone else seemed to think that new processor-> new compiler -> new software version was a great way to go. Compared to some other contemporary architectures, x86 is a horror. But people can get used to horror.

        1. Yes, one reason for going RISC is to reduce die space.
          One reason a lot of microcontrollers use RISC architectures. (PIC for an example only has 22 instructions.)

          And yes, the modern take on serial execution performance of speculative execution is a throwing power efficiency into the bin in exchange for pure performance. Good for single threaded applications.

          Though, some processors don’t do this due to parallel work loads benefiting from the advantage increased power efficiency and thereby higher density in one’s computing system. (Ie, more performance for the same amount of rack space.) And obviously, processors aiming at power efficiency doesn’t really go “wide” either.

      3. ” I do need to say that I have skimmed over a lot of details here, but this comment is already 100 words shy of being longer than the original article”
        This comment alone could be HAD article or even introduction to new series. Please consider adding few words more to maybe not necessarily cover but at least “touch” these details and make it happen.
        Great read!

        1. Having studied computer architectures for 15+ years and developed a few architectures of my own, I tend to get a bit passionate about providing some more in depth information on the topic.

          Most articles and videos or explanations in general tend to only vaguely hint at the more complicated things. Not to mention that a lot of sources make huge overarching generalizations, so adding on “in some architectures.” to the end of every explanation is recommended. (Though, a lot of architectures have similarities, but they also have their own tricks that makes them interesting. And some are just out right weird…)

          For an example, a lot of people talk about the Fetch, Decode, Execute and Store pipeline, but that is mostly scratching the surface of an antiquated solution. Some single instruction per cycle architectures do use it, but toss in something as trivial as cache and things gets more interesting.

          Suddenly one “can’t” just fetch content. Not to mention that branches tends to make fetching a bigger pain as well. If the architecture is multi threaded then we also have thread switching to worry about. And out of order execution will usually add in a whole wait stage to our pipeline. (Though, for out of order execution to really make sense, we need to decode multiple instructions in parallel per cycle, and this requires fetching more things in parallel too, not to mention making everything down stream wider as well… And if we toss in some SMT on top, things can get even more “fun”….)

          There is also a lot of misconceptions about processing architectures in general to be fair.
          Like looking at the performance of a CPU by how many “bits” it has, or MHz/GHz clock speed, or core count. Or various “multiples” of these. (Neglecting any instruction set extensions, memory hierarchies, interconnect typology/protocol/overhead (IPC), what exact instructions we have, hardware accelerators, or even the usable instructions per cycle limit (if supporting out of order), SMT and MT, etc, etc)

          Now, to a limited degree, a quad core 1.5 GHz processor will have “similar” performance to a dual core 3GHz processor. Especially if they both use the same architecture and implementation, and aiming at the same market. Ie, comparing for an example an I3 to an I7/I9 isn’t logical, one is aiming at performance, the other power efficiency, wildly different implementations of x86. (And making comparisons between manufacturers gets even more tricky, especially if using different architectures.)

          Though, the software we run will also impact what processor will end up on top to a degree. (Single threaded applications will mainly care about serial execution rate, and the out of order queue depth. (Now, speculative execution can trade power efficiency for better serial performance, but this isn’t particularly advantageous for parallel workloads, since there, we rather want more cores for our power budget. (Though, Amdhal’s law is a thing, so using big/little cores can be advantageous. (Some high speed cores, and a smattering of power efficient cores.)))

          And yes, both serial execution rate, and the out of order queue depth aren’t really stated in any datasheets to be fair. (Putting a number on these is hard at the best of times. Also, core clock speed doesn’t equal serial execution rate. (The PIC family of micro controllers is a good example, its serial execution rate is half its clock speed. But with non fixed cycle times per instruction, things gets a lot more complicated, especially when we add in microcode… And the Pentium D is another extreme case, with its 32 stage pipeline, it can grow a lot of bubbles during execution, leading to lackluster real world performance.)))

          I will probably make a video series about computer architectures some day or another, but need to up my video editing/production knowledge/skill a bit first…..

          1. Some of those last out of the door P4 cores have some weirdly specific optimisations. A presler for example I noticed has a real outlier high score on the Linux System Info (Hardinfo?) Fibonacci benchmark. Maybe it’s that it hit best case instructions, or maybe that it’s one of Intel’s benchmark cheats.

            I was actually testing that system to determine whether it would be mildly useful with a Pentium D in it, and the conclusion was that it didn’t seem worth the bother, couldn’t find anything that made Pentium D seem better than hyperthreading on a similar clock speed late Prescott or Presler. Kinda weird implementation really, you could MP mod two cache starved Durons and put them in a dual socket board and show performance increase, but somehow sticking two P4 cores together does virtually nothing.

          2. RW ver 0.0.1
            Yes, the Pentium D were an odd chip to say the least.

            Probably were great for some applications though.
            But with it’s huge pipeline it is kinda used as a school book example of “why one shouldn’t build excessively long pipelines for general purpose processors.”

      4. Well, I couldn’t have said it myself. The “tedious” arguments about risc vs cisc are FAR from tedious, and you effectively de-pantsed the idea that given a certain common workload that an equally performing x86 or ARM cpu will use about the same amount of energy and give off the same amount of heat.

        1. Yes, some workloads will have far greater performance on one architecture compared to the other.

          Though, seeing as x86 currently tends to have higher core clock speeds, more cores, and on top much wider out of order execution capabilities, then this makes it unlikely that ARM “outperforms” an x86 processor in raw performance.

          Though, a more apples to apples comparison would need to lock in at least some specs to be on par with each other. Be it core count/clock, or power consumption, or even price. (ARM is after all generally aiming at the lower performance and higher power efficiency side of the market. While x86 runs the world’s fastest super computers….)

          In regards to power efficiency on the other hand, ARM will most likely do splendidly in a lot of cases, but a workload that makes ample use of x86’s more nuanced array of instructions, then this will likely make x86 more power efficient. (Due to the aforementioned lack of control logic cycling needed by ARM. (Though, technically, we can just toss in instruction set extensions catering to the specific application and “solve” this issue. A rough example would be h.264 decoding or SSEx. (Though, the h.264 decoder isn’t remotely an instruction inside the core itself, but rather an on chip accelerator sitting “off to the side” somewhere on the die….))))

          In the end, simplest way to know is to benchmark the actual hardware.
          (Unless it is “obvious” that one will be better at something of relevance to one’s application. (Like a Pentium 2 isn’t all that impressive these days and easily gets beaten in all categories by a Raspberry Pi 4, be it core count, clock speed, power efficiency, linepack benchmark score, max supported RAM, IO, etc…))

    2. Gonna stop you right thier. Look at the pipeline development chart. AMD were the FIRST to make a 64bit pipeline. Intel had to create a obscene amount of pipeline add-ons to emulate 64-bit.

      Their 64bit wasnt even compatible. You’d have to be an extremely brain damaged CTO to invest in Itaniums.

      People forget.. I am here to remind.

      Now really… it is about how smart you are in designing the chip.

      Forget Intel/AMD…. if ARM kicks off correctly we can have backplane motherboards be it a crossfire or nvlink connections.

      IME and the PSP is utter garbage.

      The fact they were redirecting Cisco router shipments to rando sites to reflash and add on another chip to get a backdoor onto a product is enough to say they are trash.

  5. Even before IME and PSP, there were no open source alternatives to the BIOS on my PC. There was a handful only for old and popular motherboard. What stopped them back then from not working on BIOS? Really?

    Too bad that the developers won’t touch the 20GB leak of Intel’s NDA developer materials including IME and toher things.

    1. > Too bad that the developers won’t touch the 20GB leak of Intel’s NDA developer materials including IME and toher things.

      This is, unfortunately, a legal issue. I’m sure they would be absolutely giddy to read it, but the moment they do Intel can claim copyright violations against the project and have it shut down.

        1. Actually, the watermarks aren’t set. They are the generic placeholder watermarks that would normally be edited to show the clients information. The contents still weren’t terribly interesting to me though.

  6. Just a plug in for UBoot. I’ve used it a lot at work and it’s great. It also has a lot of pre-built code for different boards which you can leverage as well to help bring up your own if you’re using a similar chipset to one that’s already included. It also give you a lot of debugging power if something has gone wrong on boot, as well as loading test builds into RAM. Great tool!

    1. I’ve customized or ported U-Boot to various ARM processors for use in large cluster systems, and it really is infinitely better than UEFI or traditional BIOS. Not only do you get the flexibility of a scripted pre-boot environment, but it’s so easy to support hardware variations within a single build.

  7. Concerning the closed source firmware… x86 chips aren’t actually x86 anymore. Both AMD and Intel make their own proprietary architectures, which in turn emulate the x86 instruction set. That’s essentially what the IME and PSP is, the OS that handles that emulation.

    Ah ARM… The unfortunate thing about booting ARM machines is that the boot process is such a pain. ARM never specified a boot process for the platform, and as a result, trying to boot Linux on a new board is a fiddly and painful process.

    The boards I’ve worked with go like this: You have to have a custom version of uboot that has the custom DTB baked into it. Even then, uboot uses some terrible ugly hack to determine what board it’s running on and send the right DTB to the kernel it boots. Then you have to have a custom 10-year-old Linux kernel, because no hardware vendor is capable of pushing their code upstream and maintaining it properly.

    The alternative is something like the Pi, which has a black box proprietary boot process. Yeah, it works, but you have closed source firmware that’s doing all the initialization and booting.

    I really want to be excited about ARM, but every time I’ve tried to run a mainline kernel and distro on an ARM machine, I’ve given up in frustration after hours of work. The Pi is sort of an exception, because Raspbian is actually a decent distro with a relatively modern kernel. Even then, I have to ask why they couldn’t have worked directly with the upstream kernel to bring up new boards. (I know the answer, Broadcom.)

    1. “ARM never specified a boot process for the platform” what you mean by that, what ‘platform’ you mean? There is no single ARM platform. They license CPU designs to tens or hundrend manufacturers who add their own hardware on top of that. It is pretty clear where each cpu jumps after reset/poweron. The rest in not for ARM to decide.

      1. ARM the company develops the specifications and instruction set that it licenses. It would have been relatively simple to add a boot standard to one of those iterations. “In order to be Armv8 compliant you must support UEFI”, for instance.

        1. that would be limiting or have no sense for many use cases where such cpu can be used, what is uefi good for inside smartwatch, 5g modem, ssd controller, hdmi dongle like chromecast, video or wireless card, or any other random device that does not work like traditional PC at all?

          1. You’re conflating ARM SoC’s with ARM microcontrollers. They’re different lines with different use cases and specifications. It’s entirely feasible to specify a standard boot sequence for one and not the other.

          2. Exactly. Am arm system SOC or micro controller can do and boot in any way it wants. If you want to run bare metal code, no bootloader or Linux on your pkta arm SOC, that’s up to you. Don’t need ram? Fine no problem. Arm made the right call.

            Linus’s rant is also true, arm was a mess. But with dtbs its qiiete nice now. Some auto discovery help would have been nice though, be it a built in eeprom to store some identifiers for example. But also that is up to the SOC designer.

            As for uefi, no thanks. That bloated painful mess … Better than BIOS? Maybe on some parts ..

    2. Booting is always a pain. ARM or x86 or m68k or whatever. You are at the mercy of whatever boot firmware is in mask ROM in the chip. In an x86 PC you have the BIOS, which is the moral equivalent of U-Boot. I have spent days, perhaps weeks working out the details to get a RTOS I was working out booting on x86 — it was miserable. In the ARM world, usually you are handed U-Boot, which honestly is worlds better. For one thing it is open source and well understood and there are no end of resources to get up the curve with it.

      The Pi is a special case. I have avoided the Pi for a multitude of reasons, but the story at one point was that booting was done via the GPU. What the ???? Who and why thought of that?

      I think most of your complaints boil down to “things are different”. As far as the 10 year old kernel thing, take a look at Armbian which is running current kernels on a host of ARM single board machines. Usually U-Boot is built for a specific target board and doesn’t need to do anything to figure out what board it is running on. Where do you get all of this from? Usually you have U-Boot for your specific board, just like you have a BIOS in the PC world.

      And no, the IME is not handling instruction emulation, you are totally wrong there. That is hardware level.

      1. It’s more than a simple “things are different”, though I’m sure that’s part of it. I’ll put it this way. I can burn a Fedora DVD or flash drive, plug it into any x86 computer built in the last 10 years, and that machine will boot Fedora and work. Good luck getting Fedora booting on an ARM device that isn’t one of their few supported boards, and even then it will likely take more work than the instructions indicate.

        1. Well, if you are saying that linux support for the myriad of ARM boards is not smooth and polished like Fedora on X86, I would have to agree. The only ARM with polished linux support is probably the Raspberry Pi. On one hand we have microsoft to thank for this. It is impossible to sell an X86 machine that won’t support Windows. The ARM world is much more fragmented. I run Fedora on my desktop (the very machine I am typing on now), but don’t even think of running Fedora on ARM, I go with Debian (i.e. Armbian) on the rare case when I do want to run linux on one of my ARM’s — which is rare, usually I am hacking away on my RTOS project.

          As for IME and instruction emulation, I stand by what I said. Maybe the IME loads microcode into whatever gate array or whatever it is that does instruction emulation, but I don’t know. As for the ARM, we don’t need no stinking instruction emulation. That is the whole point of RISC – no microcode! The real architecture is exposed to the compiler (and end user).

          1. Yeah, that’s essentially the point I was making. I’d argue that a big part of the reason X86 support is so good is that IBM built a standardized boot process way back when they released the IBM PC.

            And yeah, it’s not really accurate to say that IME/PSP is the OS that emulates the X86 arch. I should have double-checked what I was remembering.

        2. Apples and oranges. If I put in a fedora drive into my PC, there is a BIOS that handles a lot of the bringup.

          This would be comparable, on my arm system, there’s uboot (or whatever) to handle a lot of the bringup.

          So distros would only have to support an existing bootloader that always tries certain things in the same way (search for syslinux on CD for example)

          Arm gave us freedom by not locking down on something, and this freedom now inconviens us. A little messy? Sure but but far more flexible. Yes adding support is harder. Wanna change the world for ‘desktop arm’? Come up with a boot specification for diatros to follow ensure uboot and coreboor can do that, and require the boards to ship with a bootrom that follows that spec ;)

    3. “Both AMD and Intel make their own proprietary architectures, which in turn emulate the x86 instruction set. That’s essentially what the IME and PSP is, the OS that handles that emulation.”

      No. IME/PSP has absolutely nothing to do with this. The μop architecture isn’t x86 but it isn’t emulated either. It is a combination of hardware implemented instructions and microcoded instructions. There is no emulation involved at all. Performance would be beyond awful if it was being emulated on the IME/PSP.

      The only x86 CPU that used emulation are the long gone Transmeta chips.

  8. Two things:

    “The IME or PSP have access to memory, storage, and the network stack even if the computer is shut down”
    The article would be improved if the author briefly explained how that works. To me “shut down” means “no electricity”.

    “binary blobs” Awkward.

    1. ATX power spec: 5v standby voltage ~1-2A
      That will power a lot of computing nowadays, from a Raspberry pi quad core to an intel Compute stick

      If your power switch (toggle) isn’t in the back or side and doesn’t make a satisfying ‘thunk’, you are not powered down. (AT power supply)

      I did have a few 486 machines with a real power switch on the front, not sure how popular that was. They were still AT form factor though.

      1. Exactly why I shut off my UPS, followed by its own switch on the power strip, after shutting down my computer. The only way it’s turning itself back on without my okay is if someone climbed through my 3rd floor window and manually turned it back on. “Off” means no power.
        (and two satisfying clunks!)

      2. Thanks to the three of you. You improved the article and confirmed what I suspected was being implied. Imagine chainsaws or lasers being designed in a similar way. Cars probably already are.

      3. Every x86 machine I owned from my first 286, through I believe my first Pentium class chips used an AT style case and AT power supply. I was going to say my first P2 machines, but I believe those were my first ATX machines. I will attest that the power switch on the front (the physical rocker style) was pretty standard at the time. I can’t say for sure what OEM’s like Packard Bell, HP and Compaq did, but pretty much all of the smaller brands, mom and pop shops, and custom builds used a fairly standard AT case design layout, from mini-tower to mid-tower to full-tower. A lot of these case designs were adapted slightly for the ATX style switch and power supply, but remained visibly similar for quite a while,

        Any of my modern machines remain connected to a surge strip, so if I need a physical switch, I always have one ready regardless.

        1. I think the furthest you could go on AT with mainstream parts (Industrial unicorn bollock hair impregnated stuff excepted) was about a 1.4 Ghz Tualatin (Celeron or PIII) on a BX or Apollo AT motherboard, probably in a slotket.

          I know Acer did some crossbreed bastards with an AT(X)ish PSU and a soft power switch in a case that seemed otherwise AT but might have had funny board dimensions so either the board wouldn’t fit standard AT or standard AT wouldn’t fit case, forget which was their particular brand of nonsense. Dell did similar with some P4 class that looked ATX but weren’t.

  9. Don’t forget OpenPOWER! One can currently buy a power 9 machine from Raptor CS that is fully open. Yes, it’s expensive, but the machines are much faster than an A72, per clock, clock speed (3.8 GHz all core), and core count (4 – 22 cores). The software used to be a rough experience, but it’s pretty good these days. I happily run fedora workstation on my 8 core Blackbird, and most of the time the experience is not different from an x86 PC.

  10. A big thing with saying “ARM processors” is there are broadly 3 levels of licensing with the ARM processor which compounds the ‘diversity’ issue, plus the ARM IP is only a fraction of the stuff in the chip which needs to be controlled to make a chip actually useful (yeah, the instruction set in the core is common which makes controlling the processor easy but what about getting the pins toggling?).

    If I was to use a house analogy, where the chip represents the entire house, the ARM processor is just the kitchen.
    There are those customers who license just the architecture, the features of the kitchen, in this analogy (e.g. Apple and Qualcomm).
    Then there are those who buy a designed processor from ARM, which could be considered the equivalent of the fixtures and fittings to go in this theoretical kitchen (TI, STM, Atmel, et al).
    Then there are those that buy the pre-fab kitchen, the licenses of the hard-macros (Processor Optimised Package in ARM parlance, or POP) which have a fixed layout and the most choice is what colour the doors are (which silicon fab-house you go to).
    Now, there’s still all the other bits of the house, that is totally the licenses choice, yeah, ARM can supply extras such as corridors (interconnect) and some rooms (GPUs) but they can still change where they sit in the house (the memory layout).

  11. Sorry, but this article is full of BullSheet…
    First, these background systems are not x86 only, just look at the Raspberry PI. The VideoCore boots first prior the ARM, and it has closed binary blob doing god knows what.
    Second, you compare jet fighters to bycicles in power usage. Of course some ARM doesnt even need a heatsink, but please consider performance per watt. There will be not much difference.
    And finaly, you can never know if there is a hidden core on the die connected to every subsytem.

  12. Differentiating Arm and x86 processors by instruction count might be a valid comparison, but does not map to RISC OR CISC. When I worked at an IBM lab (on the PC RT) the 80’s, we never counted “instructions” . At least for us, RISC stood for Reduced Instruction Set Complexity. Late Engineer Larry Loucks frequently admonished us the the “C” did not stand for Computer. He noted that a RISC computer might actually have more (but less complex) instructions. The goal was to execute each instruction with fewer cycles. I think the RISC or CISC acronyms have mostly become marketing buzzwords during the past 35 years., with their true meaning lost.

  13. CPUs are more than just CPU, there are memory management, busses to outside world like USB and PCIe peripherals and so on.

    All processor chips are full of black boxes. Some company provides one IP block, another comes from some other source and so on. Those blocks are inserted by the fab so the company making the processor may not even see what it’s doing. True black box where designer may just connect the leads and hope is making correct design. Also some IP blocks are less dark and can be deliver even as example implementation and synthesizable sources. Even the pad where the bond wire attaches can be black box.

    Verifying silicon die is next to impossible even if there was all design sources.

    Second stage of black box is that you don’t have the VHDL sources which make the CPU core so that there could be some flimsy idea of writing to some register may trigger some kind of operations on the peripheral. This is important when trying to understand if some bit is really don’t care.

    Thirdly, any chip without pcb is rather useless. board and schematics aren’t open source and even if they were, verifying them would be really hard.

    And fourth thing. Complete documentation. There are always some production test features which aren’t disclosed.

    Anyway evilness may lie in many places and truly verifiable devices aren’t available. I don’t think that anyone can even handle the full CPU (and its peripherals) from VHDL to C.

    1. C is also problematic, too many things are undefined and you never really know what a C program is going to do. On one machine it might do something vaguely like what you want and on another system you get nonsense. “Portable” is a joke.

  14. I’m a little saddened in the article as it makes it sound that rockchip with coreboot is like the most common thing. Yes Chromebooks rely on it, but in the embedded world, I’d argue uboot is king … Not in installed units probably, but supported chips/boards surely …

    Also the linked FSF article is from 2013. I’m sure half the stuff there is no longer accurate nor updated. E.g. ‘exynos requires signed blob’? Nonsense. Its up to the device vendor to do this. It simple offers the ability to do signature checking. That Samsung phones make use of this, I understand. But its by no means a requirement of the SOC.

  15. e.g qcom sells their SoCs practically wide open, it’s just that the most of OEMs choose to close them down. But if you place a minimum order with qcom (or go on Aliexpress), you can get a blank SoC… no such option with intel/AMD…

  16. I dunno. I see ARM will go that route too once it grows up and has to play with the big boys in enterprise world. It will be interesting to try these new devices as I have rarely used an ARM processor and felt like it was just tearing through data, really feeling like it was just taking a bite and chewing it thoroughly. I am sure Apple has done their homework and it will be great so we all keep our heads up their walled world butts. As for IME, it is pretty easy to get around with two system call changes, so while I get there is something to fuss about, I feel like it is more about yelling fire while roasting marshmallows.

  17. Can you get an ARM device right now in 2020 capable of running a full desktop Linux stack that doesn’t have all sorts of binary blobs or hidden code?
    Raspberry Pi, anything using Qualcomm Snapdragon and pretty much anything Android, all of these have code underneath the Linux layer controlling things. (that’s if they actually comply with the GPL and release the kernel source as required which even big companies that have big legal departments and should know better aren’t always doing properly)
    And then you have ARM devices like the iPhone/iPad/etc and probably the new ARM Macs where you are locked out from even running your own OS on them.

    Even things claiming to be “open” like the Pinebook still seem to require binary blobs for things like DRAM init (at least based on a quick Google) and I can’t even tell if the GPUs in these things work without blobs (or if they do have open drivers, what you give up in terms of perf and features by going with the open drivers)

    1. I don’t know of anything that can run with all its possible features while being 100% blob free, though the pine devices were the first things to come to mind to look into.

      But the important part to look at is the core functionality – if boot, OS all the core parts of a PC are open and its just a blob to use HDMI, WWAN, etc you can opt out if you don’t need that function. And be reasonably sure that as the core parts of your system are open they will be auditable/secure everywhere they don’t interact with that blob.

      1. But are these so-called “open” devices or do they need blobs for things like DRAM init that are essential for device functionality?

        If there is a device that can boot to a Linux desktop without any blobs at all involved, I haven’t seen it.

    2. You might be confusing “blobs” with “Intellectual Packages”. Litterally a 1 minute look with show you ARM is giving away integration of Snapdragon core tech. But you must use a software “blob” to interact with it. I don’t see the problem you can run a multi matrixed “dumb” cluster with only raw computation and build you own drivers around it or easy way it deploy that graphics driver blob.

      You dont seem to get it, just like many of the fanatical Coreboot hasbens that think Libreboot is bad.

      Risc-V chips are available right now why are you so obstinate and narrow minded!

  18. The X86 to ARM transition is going to be a hassle. Is ARM really that much more power efficient (comparing similar device classes, obviously not an Atom to a Cortex M3)?

    Are we really going to get i9 equivalent ARMs any time soon, with all the media, graphics, crypto, etc, instructions, or enough performance to equal them?

    If not, then we’re going to have two architectures for “real” PC’s instead of one, which is going to cause some trouble. ARM is great for specialty uses, but I wouldn’t be happy about the switch unless they’ve got more benefits than just lack of IME.

    1. Yes. We will. It’s called crossfire, nvlink and back planes. I don’t want to drop another 2k on a new system when I can just add 4 more ARM computing nodes to do the job. At 50% of the cost.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.