8 Pins For Linux

We’ve seen a Linux-based operating system made to run on some widely varying pieces of hardware over the years, but [Dmitry Grinberg]’s latest project may be one of the most unusual. It’s a PCB with 3 integrated circuits on it which doesn’t seem too interesting at first, but what makes it special is that all three of those chips are in 8-pin SOIC packages. How on earth can Linux run on 8-pin devices? The answer lies as you might expect, in emulation.

Two of the chips are easy to spot, a USB-to-serial chip and an SPI RAM chip. The processor is an STM32G0 series device, which packs a pretty fast ARM Cortex M0+ core. This runs a MIPS emulator that we’ve seen on a previous project, which is ripe for overclocking. At a 148 MHz clock it’s equivalent to a MIPS running at about 1.4 MHz, which is just about usable. Given that the OS in question is a full-featured Debian, it’s not running some special take on Linux for speed, either.

We like some of the hardware hacks needed to get serial, memory, and SD card, onto so few pins. The SD and serial share the same pins, with a filter in place to remove the high-frequency SPI traffic from the low-frequency serial traffic. We’re not entirely sure what use this machine could be put to, but it remains an impressive piece of work.

27 thoughts on “8 Pins For Linux

      1. Yeah weird. He only added the section on the WCH chips after I asked on Hacker News why he hadn’t considered them.

        I guess the consolation is that he also doesn’t like STM who make the chip he ended up using.

      2. Risc-V requires bit swizzling to extract constants out of the machine instructions which slows down emulation significantly whereas in MIPS, constants are easier to extract. The bit swizzling simplifies the most basic implementations of Risc-V in hardware but complicates emulation.

        1. On the other hand, RISC-V is open so you could implement your own instruction decoding on top of the normal one and compiler for that one (or just re-pack the instructions). You could do that with the others too but perhaps you would be harassed by some lawyer cats trying to bury you.

          1. “On the other hand, RISC-V is open so you could implement your own instruction decoding on top of the normal one and compiler for that one”

            You can’t just repack the instructions – anything that generates instructions (like a compiler) would need to be changed. Plus anything that does hash verification or size verification on compressed binaries would fail.

            It also wouldn’t solve the problem entirely! You can’t repack all the instructions because then it moves source/destination operands between formats, and now you’ll have to shift those.

          2. (could not reply to the original)

            “You can’t repack all the instructions because then it moves source/destination operands between formats, and now you’ll have to shift those.”

            Yup … like MIPS and Arm (all versions).

            In hardware the RISC-V encoding of literals makes for smaller and faster hardware and being able to start reading rs1 and rs2 from the register file before you even know which format of instruction you’re looking at shaves off gate delays.

            In an emulator that caches decoded instructions somehow — which both Spike (most recent instructions in a hash table) and Qemu (JITed code) do it doesn’t matter at all.

            Emulators so simple they parse every instruction anew are a small niche but the instruction most-hit by complex offset encoding, JAL, occurs relatively infrequently dynamically so the overall slowdown is pretty minor.

            On the other hand emulating anything with condition codes (not MIPS, obviously, but x86 and Arm) slows down every arithmetic instruction, even on a host machine that itself has condition codes.

            Just as a simple example look at my primes benchmark which I wrote to compare Arm SBCs with each other and with x86 in 2016 (http://hoult.org/primes.txt). Some speeds on my i9-13900HX laptop:

            1967 ms 246 bytes of code native
            10567 ms 246 bytes native binary run in qemu-x86_64
            14466 ms 268 bytes arm64 binary run in qemu-aarch64
            9521 ms 292 bytes mips64 run in qemu-mips64
            5612 ms 188 bytes riscv64 binary run in qemu-riscv64

            This carries through to running full Linux for the various ISAs in Docker (which uses Qemu underneath). Running RISC-V Linux in emulation feels like an early i7-3770 or so, while MIPS and x86 feel like an early Core 2 Duo and Aarch64 feels like a Pentium III.

          3. “speeds on my i9-13900HX laptop:”

            You’re literally comparing speeds on an x86??? Replying to a comment about the bit encoding??

            This article is about running an emulator on a microcontroller. You know what your laptop has that those microcontrollers almost universally do not?

            A barrel shifter. Meaning the entire conversation about how expensive it is to decode instructions is almost pointless.

            If you’re saying things like “oh if you’ve got instruction caching and a hardware barrel shifter architecture X is…” you’re missing the point.

          4. Pat: Well, that’s incorrect.

            Re-packing: All information (think information theory) about the encoded data in the insn is there, so you could re-swizzle it. It would take the same amount of bits to encode but in a more convenient position. The baked relocations would be the same – all offsets in code don’t change.

            Own compiler: This is another topic, I meant to say “or another” and not “and another” but the point still stands. Since the thing in the article is running Linux+userland stack, which is all open, you could write your own compiler. Perhaps a modified RISC-V backend for LLVM would be the easiest. The thing here is not that the source is unavailable, but being able to make an IR suitable for emulation at speed.

        1. Dmitry, awesome work! It makes my brain hurt.

          I tend to agree that RISC-V being too RISCy. I bought a RISC-V based system to play with the architecture. I was astounded to find it was so incredibly slow that a 1GHz RISC-V 64bit CPU was at least 4x slower than an ancient ARM32 implementation running at the same clock rate. Seeing things like not having foundational math flags (carry/borrow) kind of explains it all. It is almost like they went to the school of uneducated novices for its design. I’m still baffled.

  1. The SD and serial share the same pins, with a filter in place to remove the high-frequency SPI traffic from the low-frequency serial traffic.

    BEHOLD! The frequency domain pin multiplexing! I have no words for this.

    1. A very nice & clever trick I once saw is to control one (or more) 74HC595 shift registers with a single microcontroller pin. It used a few RC filters to adjust the timing and split the uC output into data, clock and strobe signals for the 595.

          1. Apparently pausing mid-sentence to search for a link and switching contexts from “I had seen it discussed” to “it’s been discussed” didn’t go well. As usual.

  2. Actually SPI ram and SD share the same four pins. Serial and SD sharing was considered but rejected. This would be very boring if I said five pins since SPI is composed of three and you would need two chip selects. Sharing four pins to address to SPI devices is quite a bit more interesting

  3. Hmm, what about running the serial rx/tx in half duplex somehow in order to use only 1 pin?
    Since you can program both the MCU and the USB chip, this seems possible, right?
    You could imagine a system where you just run the interchip comm at 2x the speed of the external comm, giving a window of time for going each way.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.