It is easy to port C compilers to architectures that look like old minicomputers or bigger CPUs. However, as the authors of the Small Device C Compiler (SDCC) found, pushing C into a typical 8-bit CPU is challenging. Lessons learned from SDCC inspired a new 8-bit architecture, F8. This isn’t just a theoretical architecture. You can find an example Verilog implementation in the SDDC project and on GitHub. The name choice may turn out to be unfortunate as there was an F8 CPU from Fairchild back in the 1970s that apparently few people remember.
In the video from FOSDEM 2025, [Phillip Krause] provides a nice overview of the how and why of F8. While it might seem odd to create a new 8-bit CPU when you can get bigger CPUs for pennies, you have to consider that 8-bit machines are more than enough for many jobs, and if you can squeeze one into an FPGA, it might be a good choice as opposed to having to get a bigger FPGA to hold your design and a 32-bit CPU.
Many 8-bit computers struggle with efficient C code mainly because the data size is smaller than the width of a pointer. Doing things like adding two numbers takes more code, even in common situations. For example, suppose you have a pointer to an array, and each element of the array is four bytes wide. To find the address of the n’th element, you need to compute: element_n = base_address + (n *4). On, say, an 8086 with 16-bit pointers and many 16-bit instructions and addressing modes can do the calculation very succinctly.
Other problems you frequently run into with compiling code for small CPUs include segmented address spaces, dedicated registers for memory indexing, and difficulties putting wider items on a stack (or, for some very small CPUs, even having a stack, at all).
The wish list was to include stack-relative addressing, hardware 8-bit multiplication, and BCD support to help support an efficient printf implementation.
Keep in mind, it isn’t that you can’t compile C for strange 8-bit architectures. SDDC is proof that you can. The question is how efficient is the generated code. F8 provides features that facilitate efficient binaries for C programs.
We’ve seen other modern 8-bit CPUs use SDCC. Writing C code for the notorious PIC (with it’s banked memory, lack of stack, and other hardships) was truly a surreal experience.
I was absolutely floored when I found out there are Chinese 8 bit MCUs which can do Bluetooth. It didn’t even have an FPU otherwise!
Kinda makes sense though, if you have everything (timers, serial interfaces, DMA, etc) in hardware, why do you need a 32 bit CPU?
Can you name a few models, so I can look them up at Ali? If they support BLE on top that would be very useful to me. Cheers!
I tried to find the exact model but I sadly can’t find it anymore. The last time I stumbled across it was 6-7 years ago. iirc it was an 8051 core with BLE.
Sorry
Sinowealth SH79F081B is one…
A quick search found the CC2540 and CC2541 from Texas Instruments. BLE (4.0) chip with a 8051 core. https://www.ti.com/lit/ds/symlink/cc2540.pdf
They exist since about 2010, so I’m sure that by now there are Chinese clones floating around.
If you want an 8-bit C-compatible MCU there’s Atmega. Otherwise it’s cheaper to just go with Cortex M0 (or M3 or even M4).
Also the 6809. But that’s from the dark times of the 20th century, so nobody remembers it.
I taught 6809 embedded systems at Purdue in the late 80s. We had 16 SWTPC systems in the lab. We walked students through developing a round-robin multitasking system with circular buffered interrupt driven I/O in a semester. One lab was to write a driver that controlled a paper tape reader, turning the reader on and off as the buffer was emptied, passing data to a loader.
The development cycles were low-level, cross-compiling on a VAX and then downloading to a bootloader (that the students also wrote) with S-records.
The C compiler worked very well, and the 6809 instruction set was really nice. Indirect addressing from pointed in memory in a single instruction was better than what the 8085 offered. I asked the students to come up with a mnemonic for the flag bits EFH1NZVC. One student wrote “Extra Fast Hardware 1nrerruots Never Zap Vital Code”.
It also had rudimentary BCD support, addition only, I don’t know why they bothered–BCD subtraction required you to manage carries in code.
You can do BCD subtraction via ten’s complement addition.
The 6809 was the ultimate 8 bit-ter! It had a clean symmetrical instruction set, and just enough built in 16 bit math support to make C work well. It was also the last non-microcoded successful CPU.
I wrote a PL/M-like compiler for the 6809 before C became available, and it outperformed anything else at the time. Those were fun days.
AVR-8 is indeed battle-tested and has very good documentation. I can only recommend it as a start, since it is easy to understand (no MMU, no caches, …). ARM is a whole different level of complexity, but also a whole different level of (compute) power.
To be honest, the AVR architecture is somewhere in between 8bit and 16bit, as it has a surprising amount of support for 16bit operations for an 8bit CPU. And an 8bit multiplier. All that really helps in making the AVR a lot more performant then most “legacy” 8bit controllers that struggle with running C code for many reasons.
AVR is a pretty rotten architecture, using different instructions to access data in RAM and ROM. The result is absolute hell of you’re trying to write functions that can access data in either.
You mean you don’t like the Harvard machine architecture. It’s a feature, not a bug (and it’s not unique to AVR).
it’s an absolute hell of a feature wherever it shows up
It’s a perfectly sensible thing to want ROM and RAM to look the same to the programmer, especially from a modern perspective where even some embedded processors can implement virtual memory that’s more associated with large virtual memory systems. But history and fundamental differences between RAM and ROM make that too hard or too expensive to be worthwhile. First, there’s a fundamental problem that flash memory is orders of magnitude slower to write to in some cases than it takes to write to RAM. (If some ‘0’ bit in a block needs to be changed to ‘1’, the whole block has to be erased and re-written, at great cost in time. For that reason alone, unifying flash and RAM have limited usefulness apart from adding convenience for the programmer. And that hasn’t been a factor in ISA design in ages.
Because you really can’t ignore the differences between the behavior of these devices, they’re really not invisible at the machine level, so also the RISC approach would be to let the OS or application writer implement a driver to do this when or if needed and not try to build hardware inside to support it directly just for programmer convenience.
Yeah, I don’t agree. There’s no need for the memory to be writable: you could either trap or ignore a write to those areas and alter the flash via totally different paths.
The main benefit from an EE perspective is performance: you just have a higher memory bandwidth overall because you’ve got two busses. You can fetch instruction/store data at the same time.
Hence the reason why modern processors are modified Harvard to get the benefits of both.
This may be a perfect example of the profound shift in “what to include, what to leave out”, and why, that started as RISC architectures became ascendant. The mentality of designing an ISA around human programmer is pretty dead after research showed there were a lot of instructions barely used, and how that silicon for making a convenient instruction set might be better deployed instead for more registers or more cache memory or some kind of accelerator or special purpose computation useful for applications.
Once you think about die size and how cheaper chips get deployed , it’s no wonder a latter-day 8 bit design like AVR would use more silicon for registers and peripherals than making memory easier to use by making ROM look like RAM. Post-RISC that sort of non-performance enhancing virtualization gets cut regardless of whether it makes the code easier. They retronym’ed the term RISC to “Relegate the Impossible Stuff to the Complier) for good reason. :)
RISC hasn’t been renamed, it still mean Reduced Instruction Set Computer. It’s arguable that after 4 decades it’s finally winning; with Intel controlling the remaining CISC architecture and failing against ARM-64 (I’m typing this on a MacBook M2).
To summarise: RISC had an initial advantage of being able to simplify architectures. This meant they could turn decode logic and microcode into pipelines and registers; which speeded up performance.
This meant CPU speeds started to outpace RAM, which in turn forced a switch towards caches. As caches became bigger, x86 (and 68K) cores took less space, but moreover, needed less cache than RISC devices (with worse code density). With 10x as many Intell engineers, superscalar x86 started to catch up with superscalar PowerPC and eventually overtake.
However, both RISC and CISC then hit a GHz ceiling forcing a shift from Superscalar architectures (multiple functional units mimicking sequential execution) to multiple cores. This shift to real parallelism has given RISC the edge again, thanks to its large number of registers and smaller possible cores (and therefore more cores). The size of a core and internal parallelism is now an advantage.
The number of RISC CPUs is now probably 100x that of Intel, even if they’re still the majority in the desktop/laptop market. ARM-based Windows PCs will increasingly become viable and normal; x64 will fade; RISC-V will be the new competitor at the high end.
I think I’ve seen C optimized 8 bit mcu’s from microchip as well (of course atmel is now part of microchip). ie PIC16F193x series mcu’s, there are c-compiler optimized instructions.
The CC5X compiler generates code for tiny PIC chips like the PIC16 and PIC12. I used it for the Curilights project. Google CC5X C Compiler.
Providing links separately so they don’t get killed by moderation:
https://curilights.com
https://www.bknd.com/cc5x/
curilights.com
New super high speed math implementations like “Diamon” can manipulate data at 10,000 x speeds.
So too 8 bit machines. Covenants on my brain – trade secrets – preclude my ability to explain exactly how. Id you think about it, you can make an 8 bit AVR do backpropogation at q0,000 X.
I used CC5X for my early PIC stuff and was always very happy with the workflow. Thanks for the reminder; I should find my tube of 16F688s that I bought when < $2.50 / MPU was an incredible deal.
Lovely documentation for those of us casually browsing on the web who eschew videos.
https://github.com/f8-arch/doc/blob/trunk/manual.tex
The example they give of an array with 4 byte wide elements doesn’t need a MUL, it needs a left shift by 2 bits (and then the ADD of the array base address).
Is there a comparison with other 8-bit architectures that shows where F8 is better (apart from being GPL) for the C use case.
Can we start the Rust arguments already btw?
It is not “lovely”. It might be lovely if they bothered to include a PDF version of the manual.
If they can’t be arsed to do that, then what else haven’t they bothered to do?
I will point out that while it is a left shift, it is a multi-byte left shift (which is still a multiply). So either way you go, you have to do something over multiple instructions
While this is an interesting bit of open source work, it pretty much follows the spirit of the development of the AVR (released in 1996), which was created specifically to work well with compiled high level languages- the developers worked directly with IAR.
I’m a 3rd of the way through the video and I haven’t seen any mention of this bit of history.
This is interesting “stack-relative addressing, hardware 8-bit multiplication, and BCD support” since first and 3rd are 65C02 items, and “instant” 16 bit multiply can be done with a ROM LUT.
I have often pondered ways to make a 16 bit arch with the nice features of the 6502 but I never come up with anything that, surprise, the smallest ARM doesn’t do better.
For small FPGA footprint, neo430 probably beats everything and has very compact code as well.
I need 10 bit procesor (or 2 with error detector crc in memory)
10 bit is better than 8. Easy adressing and still small form many programs
Ha! I still have a few F8’s laying around. Both quartz window EPROM and a socketed version that something like a 2716 EPROM. Plus a bunch of datasheets. Had plans for them, but never used ’em.
Now those F8 component chips probably be a prize exhibit at the Computer History Museum, or some museum at any rate. Bits of more obscure history like SC/MP or F8 or 1802 are especially worth preserving in my view, because their lower numbers and near absence from commercial volume mean there’s less available to preserve.
The 8051 has been around for years and has excellent C support. Most of the very cheap Chinese parts with a processor are based on a mcs51 core and yes some of them have Bluetooth and USB
Been using the stm8 for the past few years. Very low power and the stm32s have only recently caught up in terms of power. If they continued to shrink the stm8 it would no doubt still be leading, but unfortunately they aren’t, instead putting future development into the stm32.
Magic-1 is a very interesting 8/16-bit CPU designed to support C well.
https://www.homebrewcpu.com/index.htm
i felt like the write up of the disadvantages of C on an 8-bit microcontroller was pretty good, but i think the article would have floated my boat better if it included a summary of how they overcame those.
and the answer is, they included a 16-bit ALU and a convenient stack. that was the improvement to an 8-bit CPU that they made to make it more friendly to C. i’m sorry, but i’m going to say: duh.
that’s exactly what microchip did with the PIC18. and just speaking practically, at first i loved the PIC18 but now that stm32 exists i don’t any longer. if i want something that is aimed at running C, ARM is just so convenient these days. there’s just not much to differentiate it that makes the PIC18 particularly special compared to an rp2040. maybe some power save modes? i don’t know, i do love PIC peripherals.
but by comparison the PIC12 is still special. and not least because i would never be tempted to use C on it. so every time i use it, i confidently control every detail. which is actually handy in embedded work and not just a mental handicap like in the rest of my life :)
also the PIC12 includes some nice hacks to make it easier to live within 8 bits. one of those is that it’s really fundamentally 8 bit addressing for data memory. which isn’t quite so limiting because program memory is on a separate address bus. man, i just loaded up the PIC12F675 datasheet. 132 pages, and it’s complete, everything you need. i sure do love PIC12 :)
“As opposed to having to get a bigger FPGA to hold your design and a 32-bit CPU.”
My current interest is Decimal64 math. I’d need 8 registers to hold a value, 16 if I want to add / subtract etc.
32-bit CPUs take a lot of FPGA space, but what about 16-bit? Are 8- and 32-bit REALLY the only games in town?
There’s a soft implementation of an MSP430 called Neo430, I think. It’s an exceptionally clean 16-bit architecture.
I see what you did there.