Close To The Metal

Firmware is caught between hardware and software. What do I mean? Microcontroller designers compete on how many interesting and useful hardware peripherals they can add to the chips, and they are all different on purpose. Meanwhile, software designers want to abstract away from the intricacies and idiosyncrasies of the hardware peripherals, because code wants to be generic and portable. Software and hardware designers are Montagues and Capulets, and we’re caught in the crossfire.

I’m in the middle of a design that takes advantage of perhaps one of the most idiosyncratic microcontroller peripherals out there – the RP2040’s PIOs. Combining these with the chip’s direct memory access (DMA) controllers allows some fairly high-bandwidth processing, without bogging down the CPUs. But because I want this code to be usable and extensible by a wide audience, I’m also trying to write it in MicroPython. And configuring DMA controllers is just too idiosyncratic for MicroPython.

But there’s an escape hatch. In my case, it’s courtesy of the machine.mem32 function, which lets you read and write directly into the chip’s memory, including all of the memory-mapped configuration registers. Sure, it’s absurdly low-level, but it means that anything you read about in the chip’s datasheet, you can do right away, and from within the relative comfort of a Micropython program. Other languages have their PEEK and POKE equivalents as well, or allow inline assembler, or otherwise furnish you the tools to get closer to the metal without having to write all the rest of your code low level.

I’m honestly usually a straight-C or even Forth programmer, but this experience of using a higher-level language and simultaneously being able to dive down to the lowest levels of bit-twiddling at the same time has been a revelation. If you’re just using Micropython, open up your chip’s datasheet and see what it can offer you. Or if you’re programming at the configure-this-register level, check out the extra benefits you can get from a higher-level language. You can have your cake and eat it too!

81 thoughts on “Close To The Metal

  1. “Meanwhile, software designers want to abstract away from the intricacies and idiosyncrasies of the hardware peripherals, because code wants to be generic and portable.”

    Good API design helps.

  2. Will you please explain–and expand on– the statement, “…perhaps one of the most idiosyncratic microcontroller peripherals out there – the RP2040’s PIOs…” ?
    I am a complete newbie when it comes to this machine, and understand only the 8- (or 4-) bit I/O ports (and instructions) of traditional, conventional machines, wherein the contents of a register–or memory location–are transmitted directly to an I/O port.
    Many thanks, in advance, for your help.

    1. The RP2040 has these weird very basic processors for its I/O pins, separate from the regular core. They are fairly limited state machine devices, but they can be useful. I’ve never seen anything like it, but I haven’t had time to play with them yet.

      1. I have used them. For WS2812 leds, or even VGA timing. They are very versatile, they are somewhere between statemachines and very limited CPU. I wouldn’t call them the most idiosyncratic. But they are super powerful, up to the point where they could have just removed all I2C, SPI and UART support from the chip and have that all handled by the PIO peripherals.

        I’m quite sure you could also do PWM outputs and various other things with them that I haven’t even imagined.

        1. Isn’t that basically the approach taken 15 years ago by the Beagleboard with the PRUs? They’re stripped-down processors on the same die as the CPU but separate from it, with a very simple instruction set for writing bit-banged protocols and such into.

    2. The PIO on the RP2040 is a really powerful peripheral that goes beyond simple gpio. It has it’s own little mini-assembly language that allows it to have code that can interpret some of the low-level signals so you can just DMA bits to and fro without worrying about the simple stuff in software. Think of it being halfway between bit-banging and a dedicated peripheral. As an example, there’s an implementation of QSPI for the PIO floating around somewhere.

  3. Users of the original 8bit 6502 BBC Basic have had this since the eighties with a powerful built in assembler. Basic code could have embedded machine code for the fast deep probing bare metal stuff. Good to see it has been rediscovered in Python world…… Peek and poke were always the hallmarks of the inferior Basic Languages.

  4. @Elliot Williams said: “Microcontroller designers compete on how many interesting and useful hardware peripherals they can add to the chips, and they are all different on purpose.”

    If only the “Microcontroller Designers” understood how providing signals in quadrature (coherent 90 degrees phase difference) opens the door to incredibly powerful analog and digital signal processing opportunities – they would always do it (but they usually don’t). The time-worn adage goes: “Give me signals in quadrature, and I’ll give you back whatever you want.”

    O/T – BTW: The Comment System here on Hackaday is really going down-hill.

          1. MicroPython provides pretty good control over when/how the GC operates. You can even turn it off if you’re careful.

            For example, we’ve disabled the GC, performed our timing-stringent tasks and then turned it on again.

    1. We’ve observed savings of ~30% in development effort by using MicroPython compared to C++ (for medical device development). Any good engineer should consider that when weighing up the choice of language.

      There are some constraints in using MicroPython but they are, by-and-large, overblown. Check out my PyCon AU talk for a discussion of some of the more common concerns:

        1. Yes, 60601 is a baseline for all our devices, of course. It’s not really much different for a MicroPython device to satisfy 60601 as it is for a C-based system.

          We’ve had a few projects where MicroPython has been the primary language at Class I and II. I wouldn’t define them as *safety critical* but they do execute diagnostic tests for example.

      1. And what is it, exactly, that gives this productivity difference? As a language Python is very weak and uninspiring, all the power comes from tons of libraries that people wrote for this awful language for some reason, but this is not even the case for MicroPython.

        1. Python operates at a higher level of abstraction. Akin to the difference between asm and C.

          You may think Python is weak and uninspiring but that’s an opinion that isn’t born out by the wider community – or certainly my experience at least. Further, we’ve had a number of projects now where we’ve gone through the usual estimation process and quoted for both C and MP – each time MicroPython comes out at 20-30% lower. We have a *lot* of experience with C estimation (and development) and the MP projects have run on-time. Point is that my claim is based on data. Developers who’ve worked on MP projects are unsurprised by the development savings.

          Libraries are a benefit of Python. While MicroPython doesn’t have access to all of them, many are easy to port.

          More importantly, it is *significantly* easier to write peripheral drivers in MicroPython than C. It’s hard to express just how effective it is to explore interacting with a peripheral live at the REPL so you can determine how the driver should behave…

          1. I don’t really see any higher level of abstraction here. Same procedural flow, same functions, same primitive data types.

            Of course, cannot really have a valid opinion here because I did not try MicroPython in any real-world project yet.

            Can see the value of the REPL though, but, again, Python cannot really compete with something like Forth in this regard.

          2. @combinatorylogic I can’t really speak of the Forth REPL, but being able to run the MicroPython REPL live, on a device, is a huge productivity boon. In particular, exploring how peripherals behave in an interactive manner makes writing drivers *significantly* easier.

    2. On MicroPython, the lowest level, close-to-hardware programming tasks are implemented in C.
      One might needs to do high-level programming on a small platform like a microcontroller.
      For some other tasks, even software is too high-level.
      Would a scripting language have it purpose? Be it for interactive troubleshooting as the product evolves?

    3. …Low-level, close-to-hardware programming is not meant to be done in a high-level, far-from-hardware language.”

      Low-level, close-to-hardware programming simply can NOT be done in a high-level, far-from-hardware language.

  5. All is fun and games until you change some timer or interrupt configuration that the Python VM depends on, or forget to clean up some state, and suddenly nothing makes sense, and your firmware goes on a happy trip.

    1. That’s my big question about all this — how would you know which registers the high-level language is already relying on, either by setting them explicitly, or assuming they’re left at default? I don’t think it’s safe to assume that, even if you did read into every function used by every library in the whole system, that they’d even declare their assumptions correctly for you to find.

      Is that a matter of trial and error? How do you get decent test coverage to make sure you haven’t laid a trap for yourself in unusual conditions?

  6. @Elliot Williams said: “…I’m in the middle of a design that takes advantage of perhaps one of the most idiosyncratic microcontroller peripherals out there – the RP2040’s PIOs. Combining these with the chip’s direct memory access (DMA) controllers allows some fairly high-bandwidth processing, without bogging down the CPUs. But because I want this code to be usable and extensible by a wide audience, I’m also trying to write it in MicroPython. And configuring DMA controllers is just too idiosyncratic for MicroPython…”

    Hi Elliot – As usual, a great post. I have a couple of comments…

    A. The hardware PIO on the RP2040 is pretty-much unique, especially if you use DMA as well. I seem to remember the open-source BeagleBoard [1] having a similar capability with its “PRU DMA”. Trying to emulate this hardware capability cross-platform in software/firmware methinks is a big challenge. I would steer-clear of this; especially if determistic timing is a requirement.

    B. The MicroPython [2] or CircuitPython [3] interpreter-form really has NO PLACE anywhere near a micro-controller – ever – even if you dye your hair pink and live your life in the footsteps of Karl Marx. Python interpreter-form source code on a micro-controller is horribly inefficient. Yeah, if you absolutely must rapid-prototype like a skript-kid with the likes of the MicroPython Interpreter, go ahead – but ALWAYS move to something like compiled C/CPP (at-least) in final production.

    * References:

    1. BeagleBoard

    2. MicroPython

    3. CircuitPython

    1. B) Have you used MicroPython? It saves a significant amount of development effort, is robust, is more amenable to testing and performs well. If you do need higher perf than the interpreter can provide, it’s easy to drop in to C to create a module you can use from MicroPython. I’ve used it in production and will continue to do so. You’d be a crappy engineer if you weren’t evaluating it.

        1. I didn’t say “more” robust. But it’s robust. It’s very rare to find a critical bug in the interpreter and test coverage is very high. (Defects in the various port-specific code bases are admittedly more common.)

          But I’ve had to write my fair share of all-rules enabled MISRA-C and if you too then you’ll know that it’s a painful experience with extraordinarily slow development cycles. But sure, if done right – which is often not the case (it’s very easy to argue exceptions!) – MISRA does produce very robust code.

  7. “this experience of using a higher-level language and simultaneously being able to dive down to the lowest levels of bit-twiddling at the same time has been a revelation.”

    Rust is like that too, particularly embedded rust. To quote an Oxide podcast, “hardware engineers don’t believe they can have nice things.”

  8. Back to the beginning…

    From the second post in this series–

    “Will you please explain–and expand on– the statement, “…perhaps one of the most idiosyncratic microcontroller peripherals out there – the RP2040’s PIOs…” ?

    With all due respect and appreciation to the people who tried to provide clarification, the answers were somewhat circular and/or dependent on terms which didn’t clarify the situation at all (eg, appeals to ‘state machines’, ‘limited CPUs’…)

    Let’s try this…

    since assembly language on most all conventional machines (Motorola, Intel, Zilog, TI…) is well understood, simply demonstrate, using RP2040 Assembly Language, how one would output an 8-bit byte from a register or memory location (whichever is easiest) to an 8-bit peripheral which one has attached to an I/O port on the RP2040.

    TL;DR–show, please, the RP2040 Assembly Language (ONLY) program which sends internal 8-bit data to a RP2040 I/O port.

    Thanks again…


    “By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality.”
    “People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird.”
    –Donald Knuth

    1. The literal answer to your question is something like:
      out pins, 1
      jmp loop
      Full details at

      But to get a better sense for what PIO is/for, the somewhat more complicated programs at and are probably more helpful.

      Lots more examples at or you could, you know, read the actual docs at (section 3 starting on page 309).

      1. Thank you very much for clarifying for me the Raspberry Pi’s I/O process.
        So, if I understand what I’m reading in all your references, the RPi cannot output 8 bits in parallel (all eight bits–at one time–from either an internal register or memory location) to an 8-bit port. (Each individual bit must be written independently, over an extended period of time?)
        Please correct me if I have not read these references (thanks, again) correctly.

        1. “…So, if I understand what I’m reading in all your references, the RPi cannot output 8 bits in parallel (all eight bits–at one time–from either an internal register or memory location) to an 8-bit port. (Each individual bit must be written independently, over an extended period of time?)…”

          Or, put another way: in order to output an 8-bit byte, the RPi can only output one bit at a time–via one instruction at a time–, resulting in a long ‘byte-output time’ compared to (for example), the i80x86; the M680xx (including, of course, MOS Tecnology’s 650x series); the Z80 series, TI’s 9900 series, etc.
          Not to put too fine a point on this, but If this is the case, then the need for a separate I/O processor is obvious.

  9. No advantage for using assembler in 2023. imo.

    Use gcc c compiler couple with short high-level c code generation accompanied with construct asmseg[..] = 0x…. ;, asmseg[..] = 0x…. ;. .. . or an in-line to generate machine code.

    Laboratory Microsystems owner responded to my 1983 question, “Should we us use assembler or a compiler to bring-up 8080 fig Forth on Sandlia Lab’s rad-hard 8085 processor.”

    Owner adamant, “Use a compiler!”

    We did an it made out work LOTS easier. A stunning success compared to other Sanidans who used assembler.

    Decision to abandon 1960s platform-specific assembler lots easier.

    Lots more interesting hardware platform to choose from.

    Most platforms now have a c compiler to produce code for them.

    Another enormous advantage is that many of these hardware platforms now have a Linux
    OS with which to compile and test embedded firmware INTERACTIVELY!

    Here is gc c c code to call a high-level c code segment given by gcc c manual writers:

    W are using Ubuntu for RpiOS for Orange, Librie and Raspberry pi ARM processors.

    Ubuntu used for x86 processors.

  10. you lost me when you said ‘MicroPython’..

    You would be much better writing it in C++, and putting the hardware specific stuff in their own routine or class. You could then just replace those routines/class for different processes.

    I do that for the embedded process I use, and you don’t then have the insane overhead of running python…

  11. So many bad takes in this comment thread, not that I didn’t expect it. “PyThOn Is BaD aNd SlOw” is hilarious coming from people who clearly haven’t tried it and is a clear example of premature optimisation. I’m laughing at the “you should use C++” instead when received wisdom used to be that C++ was the WORST THING EVER to use in embedded. I guess Arduino ended up dispelling that particular myth pretty handily.

    The truth is, use *whatever* language gets you the results you’re after. Telling people that they *MUST* use something specific is just revealing your own narrow set of experiences. You might even discover that you’ve been doing things the hard way all along.

    1. nope. If you are doing code that needs to work reliable in microsecond (or below) range, using any interpreted language language turns out to be not such a good idea… It doesn’t matter what it is.

      1. You’re wrong. If you need extremely tight timing, then doing it in code is already the wrong approach. It doesn’t matter if it’s C or Python or even BASIC, you should be using the hardware peripherals for that. 99% of an application’s code is not timing sensitive, and the real hurdle is developer time.

        As I already said, going for the FaStEst LaNgUaGe PoSsIbLe without any profiling information is premature optimisation. Yada yada, root of all evil, etc.

        This goes for non-embedded programming as well – big iron scientific numbercrunching routinely uses python – the python defines the calculation, but the actual processing is still done by native code.

        If all you can think about is the performance on a 16Mhz arduino, then you’re about 20 years behind the times. Modern microcontrollers have more processing power than you are likely to ever use, and even an inefficient python script running on them will smoke a native routine running on the ancient chip.

        1. I’ve ended up using code for timing critical stuff many times because the hardware had weird bus synchronization issues, or it just didn’t do what was required, or it was already in use for something else.

          1. Ironically (?) that’s exactly the benefit of the pio hardware in the rp2040. It provides just enough low level close-to-the-hardware programability that you can use it for all the fiddly timing-sensitive stuff, connected via fifo to the vast majority of your high level code.

          2. Same. But the fact is that the occasions where you need insane optimisation are very rare indeed, and you’re able to implement *just* those pieces in whatever way is more appropriate. Insisting that 99% of your codebase has to be written in the same painful way that the 1% of hot code is plain barmy. Different parts of your codebase have different design requirements, and an experienced engineer knows to evaluate them separately.

            I’ve straight up thrown out designs and started from scratch when I’ve discovered that a core routine could not be made fast enough on the original uC; the RP2040 PIO’s can trivially do things that otherwise required an 800MHz Cortex M7 to bitbang in tightly written code. I love that chip.

          3. I feel it’s an interesting discussion, but i just dont get the point of using Python.

            I have had times when the language was an issue (well actually quite recent, yaml always has messed up syntax, but that’s neither c or Python..) but mainly this was a long time ago. The real issue in my everyday is figuring out what the code should do, not making it do it.

            And that is independent of language.

            I have a theory that many prefer Python because of the libraries they use/are accustomed to, but libraries are just that.. they can be made/fetched/used in any language, again. And I’m also usually hesitant to use too many, because I’ve found bugs in too many libraries! That’s a real time sink, and I imagine it will be.. In any language.

            Also I wonder if im just not good enough at Python to “get it”, even though I have university credits for it.

            I would understand if the debate was “use cpp or Python, not asm, c, vhdl!”, but in my mind they are both simple and powerful enough to not be the bottleneck.

          1. Yes, the MCU needs to be more powerful. It’s worth noting that the first bottleneck isn’t compute, it’s RAM – an interpreted GC language imposes a cost in memory.

            But with the cost of MCU’s having dropped so much it’s becoming increasingly attractive to consider more powerful MCUs for a reduction in dev effort.

            You can continue to claim that Python is not more productive than C but I think you’ll find that a difficult argument! I’ve used C and C++ for a few decades at this point and am very comfortable with those languages. There is no doubt, none at all, that they take more effort to develop with. I find it difficult to understand how anyone could begin to make the counter argument.

            In terms of battery life, this is another overblown argument – though it does have a modicum of truth. Battery life is primarily extended by maximising the time spent in low-power sleep modes. It doesn’t matter what language you’re using, sleep modes consume the same power! that said, MicroPython can use more power when the device is awake, so yes, it will cost additional battery life for some applications. Sometimes that will matter, sometimes it won’t.

          2. And I find it really difficult to understand how can anyone argue that Python is not just the same language as C (not C++, really, Python is a lower level language than C++). I don’t see any possible development time savings from using such a sub-par language.

          1. (1) “Forth is a nice little language. But it’s a small language with few features…”

            Forth is a nice little language which uses a data stack, Reverse Polish Notation, and is an extensible language, which means that one CREATES NEW INSTRUCTIONS (called ‘words’) in the writing of the application code. Forth is a nice, little, VERY BIG language, which size can get to be very large, and is determined strictly by the application.
            (one consequence of this attribute of extensibility, and one which has not escaped the notice of designers who can appreciate Forth’s capabilities, is that it is well-nigh impossible to reverse-engineer Forth code)

            (2) “…it’s very difficult to hire developers with Forth experience.”

            It’s very difficult to find anyone willing to invest the intellectual rigor and currency in the learning and mastering of any very powerful tool or technique.
            See (1), above.

            Also see

          2. While Forth is, indeed, extensible, there are very few features or APIs that are standardised. This can make development inefficient.

            Take I2C and SPI comms for example. They are the most common protocols for peripheral control and yet there is no standard library or API to use. You’ve got to hunt down a third-party library (of which there are a few, with differing interfaces), evaluate and test it. At least that was the status last I looked…

            And, no, I maintain that hiring Forth developers is simply difficult. Hiring C/C++/Python folks is significantly easier, at least here in Australia. I’d be surprised if it were different anywhere else.

            “Intellectual rigor” (sic)? Forth is a small, simple (stack based!) language – it’s not exactly challenging to learn. But few *want* to use it, at least professionally.

            I think it’s an interesting language and would consider it on very small devices. But then, for those cases, I’d probably just use C.

          3. Forth as a language is infinitely more powerful than Python, with infinitely more features. Literally.

            Forth is a meta-language. It can be turned into any language you can imagine (and any language you cannot imagine). Python is a fixed and very low-level language that does not have any features allowing to create higher levels of abstraction.

  12. Okay, here’s what I know of Python, it is an interpretive language. However, according to Wikipedia, MicroPython can output code to machine code.

    However, since you know C, why not develop the firmware in C? That is as close to machine code as possible.

    It would be interesting to compare the efficiency and code size of the output from C and MicroPython.

    1. Why not develop it all in C? Because, in a typical system, only a *tiny fraction* – if any – needs to be in C. You get the benefit of faster development, better testing etc by using MicroPython for the majority of your codebase.

      Comparing the efficiency between the languages is important, though it depends heavily on the task. If you’re working in tight loops in the MicroPython interpreter, expect poor results (like possibly 100x worse!). But many operations are no faster in C – like SPI comms, I2S transfers or RP2040 PIO commands.

          1. Why would I care of what people agree or disagree with when it’s an entirely formalised question that have one and only one correct answer? It’s a science, there is no place for opinions there.

            C and Python are very close in terms of the level of abstraction of their semantics. It’s a fact that does not need to be backed by any opinions.

          1. Python? With “very high-level abstractions”? I don’t see any high-level abstractions in Python. Not a single one. Same procedural control flow, very much exposed (unlike C++, for example, where control flow can be hidden away). Same primitive data types. And no functionality whatsoever that would have allowed to construct higher levels of abstraction on top of Python or on top of C. They’re very similar, and very low level.

  13. The required techniques of effective reasoning are pretty formal, but as long as programming is done by people that don’t master them, the software crisis will remain with us and will be considered an incurable disease. And you know what incurable diseases do: they invite the quacks and charlatans in, who in this case take the form of Software Engineering gurus.”–Edsger W. Dijkstra

  14. Something that I learned along the way, is that peripherals are often the same for all MCUs of a same family, sometimes across families, and even across vendors.

    In the example from RP2040, most things but PIO comes from 3rd-party sources like ARM, Cadence, Synopsys. So instead of implementing a driver named RP2040 UART, you could call the driver “ARM PrimeCell UART”, and have that same implementation for every core. [1]

    Zephyr allow implementation of portable applications, but also portable drivers, using the DeviceTree which contains all device-specific data instead of the code.

    Once a driver is implemented for Zephyr, it is a matter of DeviceTree configuration to have it on another MCU with the same peripheral. This abstracts the vendor SDK API and allows to use supported external hardware on any MCU: a library of driver.

    Most RTOS have that, Zephyr does it with DeviceTrees + macros (few runtime overhead), other RTOS will have other interesting strategies as well, and even MicroPython/CircuitPython and Arduino cores are a bit like that: they have a collection of drivers and APIs to make them portable across the supported chips.

    A crossfire? Surely, let’s get well geared to harness all of that! :)
    Should I even get started with vendors not disclosing the datasheets? :P

    [1] Search for “Excertped” on

  15. @combinatorylogic I’ve not used MicroPython for *hard* real-time. Hard real-time is not the most obvious fit for MicroPython…

    But, if you were to consider it’s use, pre-allocating everything and disabling the GC and preventing allocations would be the way to go. (Incidentally, this specific use-case was required by work supported by the European Space Agency.)

    1. This is how real-time Java works, for example. And it limits the language quite significantly, making one to question why use this language in the first place if you have to dance around one of its main conveniences. And makes it really hard to even think of using the 3rd party libraries, which is the second main Python convenience. I wonder if what is left is really worth it. REPL? Well, there is a lot of languages with better REPLs.

      For a soft real-time use case – sure, GC is not necessarily an obstacle. I have to admit to using RacketCS in such scenarios, but then I could really separate the hard real-time processing from the soft real-time physically, running on different dedicated cores (and PL fabric) on a single SoC.

      My main issue yet with Python is the language itself. Not its interpreted nature , not the GC, but the very semantics of the language that make it so unfit for anything serious. Formal verification? Impossible. Even a reasonable degree of static analysis? Still impossible. The language is too dynamic, just riddled with hash tables everywhere, in every place you would not normally expect to see them.

      This, and its lack of extensibility.

      I’ll still give MicroPython a try – I have a few low key tasks with soft real-time requirements at most, and not budget-constrained too much to reduce MCU requirements to the bare bones.

    2. The only things one needs to know: Python simply provides NO deterministic latency; is an interpreted language; has memory management and storage issues, as well as a very large assemblage of libraries and functions, which slow it down.
      The very fact that a distinction needs to be made between *hard* real-time[sic] or ‘soft’ real-time parts of an application written in Python should be a large ‘red flag’ to anyone considering Python for use in a real-time scenario.

      Ada is a compiled (fast) language specifically written for embedded and real-time systems. These attributes have made it the language of choice–and, in a lot of cases, the mandated–language for military, avionics, and space-based applications. There are no reasons why an Ada designer need lend any consideration to ‘hard’ real-time or ‘soft’ real-time parts of their designs; any application written in Ada IS a ‘hard’ real-time application.
      With all of the Python’s inherent, built-in problems militating against its being used as a the basis for the design of a real-time application (“stop-the-world” garbage collection being the absolute worst), one has to ask, “Real-time? Python? Why?”

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.