A New 8-bit CPU For C

February 21, 2025

It is easy to port C compilers to architectures that look like old minicomputers or bigger CPUs. However, as the authors of the Small Device C Compiler (SDCC) found, pushing C into a typical 8-bit CPU is challenging. Lessons learned from SDCC inspired a new 8-bit architecture, F8. This isn’t just a theoretical architecture. You can find an example Verilog implementation in the SDDC project and on GitHub. The name choice may turn out to be unfortunate as there was an F8 CPU from Fairchild back in the 1970s that apparently few people remember.

In the video from FOSDEM 2025, [Phillip Krause] provides a nice overview of the how and why of F8. While it might seem odd to create a new 8-bit CPU when you can get bigger CPUs for pennies, you have to consider that 8-bit machines are more than enough for many jobs, and if you can squeeze one into an FPGA, it might be a good choice as opposed to having to get a bigger FPGA to hold your design and a 32-bit CPU.

Many 8-bit computers struggle with efficient C code mainly because the data size is smaller than the width of a pointer. Doing things like adding two numbers takes more code, even in common situations. For example, suppose you have a pointer to an array, and each element of the array is four bytes wide. To find the address of the n’th element, you need to compute: element_n = base_address + (n *4). On, say, an 8086 with 16-bit pointers and many 16-bit instructions and addressing modes can do the calculation very succinctly.

Other problems you frequently run into with compiling code for small CPUs include segmented address spaces, dedicated registers for memory indexing, and difficulties putting wider items on a stack (or, for some very small CPUs, even having a stack, at all).

The wish list was to include stack-relative addressing, hardware 8-bit multiplication, and BCD support to help support an efficient printf implementation.

Keep in mind, it isn’t that you can’t compile C for strange 8-bit architectures. SDDC is proof that you can. The question is how efficient is the generated code. F8 provides features that facilitate efficient binaries for C programs.

We’ve seen other modern 8-bit CPUs use SDCC. Writing C code for the notorious PIC (with it’s banked memory, lack of stack, and other hardships) was truly a surreal experience.

60 thoughts on “A New 8-bit CPU For C”

shinsukke says:

February 21, 2025 at 4:13 am

I was absolutely floored when I found out there are Chinese 8 bit MCUs which can do Bluetooth. It didn’t even have an FPU otherwise!

Kinda makes sense though, if you have everything (timers, serial interfaces, DMA, etc) in hardware, why do you need a 32 bit CPU?

Report comment

Reply
1. Carl Breen says:
  
  February 21, 2025 at 4:40 am
  
  Can you name a few models, so I can look them up at Ali? If they support BLE on top that would be very useful to me. Cheers!
  
  Report comment
  
  Reply
  1. shinsukke says:
    
    February 21, 2025 at 10:18 am
    
    I tried to find the exact model but I sadly can’t find it anymore. The last time I stumbled across it was 6-7 years ago. iirc it was an 8051 core with BLE.
    
    Sorry
    
    Report comment
    
    Reply
    1. Nathan Lewis says:
      
      February 21, 2025 at 10:55 am
      
      Sinowealth SH79F081B is one…
      
      Report comment
      
      Reply
    2. RetepV says:
      
      February 21, 2025 at 3:21 pm
      
      A quick search found the CC2540 and CC2541 from Texas Instruments. BLE (4.0) chip with a 8051 core. https://www.ti.com/lit/ds/symlink/cc2540.pdf
      
      They exist since about 2010, so I’m sure that by now there are Chinese clones floating around.
      
      Report comment
      
      Reply
      1. Carl Breen says:
        
        February 22, 2025 at 5:25 am
        
        Thanks to all three of you! Saved the datasheets, very nice low power modes too!
        
        Report comment
2. rikka0w0 says:
  
  February 22, 2025 at 12:28 pm
  
  Most tasks are off loaded to the ASIC part of the chip, and you wont need a powerful CPU for the rest of the jobs.
  
  Report comment
  
  Reply
Greg Mathews says:

February 21, 2025 at 5:05 am

If you want an 8-bit C-compatible MCU there’s Atmega. Otherwise it’s cheaper to just go with Cortex M0 (or M3 or even M4).

Report comment

Reply
1. 8bitwiz says:
  
  February 21, 2025 at 5:31 am
  
  Also the 6809. But that’s from the dark times of the 20th century, so nobody remembers it.
  
  Report comment
  
  Reply
  1. Mark says:
    
    February 21, 2025 at 6:03 am
    
    I taught 6809 embedded systems at Purdue in the late 80s. We had 16 SWTPC systems in the lab. We walked students through developing a round-robin multitasking system with circular buffered interrupt driven I/O in a semester. One lab was to write a driver that controlled a paper tape reader, turning the reader on and off as the buffer was emptied, passing data to a loader.
    
    The development cycles were low-level, cross-compiling on a VAX and then downloading to a bootloader (that the students also wrote) with S-records.
    
    The C compiler worked very well, and the 6809 instruction set was really nice. Indirect addressing from pointed in memory in a single instruction was better than what the 8085 offered. I asked the students to come up with a mnemonic for the flag bits EFH1NZVC. One student wrote “Extra Fast Hardware 1nrerruots Never Zap Vital Code”.
    
    It also had rudimentary BCD support, addition only, I don’t know why they bothered–BCD subtraction required you to manage carries in code.
    
    Report comment
    
    Reply
    1. Ross Archer says:
      
      February 21, 2025 at 7:47 am
      
      You can do BCD subtraction via ten’s complement addition.
      
      Report comment
      
      Reply
    2. Kevin Hall says:
      
      February 21, 2025 at 12:51 pm
      
      The 6809 was the ultimate 8 bit-ter! It had a clean symmetrical instruction set, and just enough built in 16 bit math support to make C work well. It was also the last non-microcoded successful CPU.
      
      Report comment
      
      Reply
    3. Graham Trott says:
      
      February 21, 2025 at 4:49 pm
      
      I wrote a PL/M-like compiler for the 6809 before C became available, and it outperformed anything else at the time. Those were fun days.
      
      Report comment
      
      Reply
      1. Wonderfool says:
        
        February 22, 2025 at 12:47 am
        
        Hello Graham, I want to thank you for your great product, P/L9, that teach me how to program functions, and use the stack to pass the variables! GREAT language, the only missing for me, was the assembly mnemonics, after compilation: so, I did disassembly of the bin code to:
        – understand what going on
        – optimize some LDA #0 by CLRA when the carry bit wasn’t used
        So, you, Graham with your soft, and Stark, with the sk*dos are (was, for Stark) GREAT GUYS!
        I still prepare my retirement, in 2 or 3 years, and pillup 6809, 680×0, 68332 (from reformed avionics card, that the actual technics are not able to repair…) to build up 6309 SBC cards…
        Thank you again for your clear vision!
        
        Report comment
    4. Carlos C says:
      
      February 27, 2025 at 7:34 pm
      
      Any of that code around?
      
      Report comment
      
      Reply
  2. Wonderfool says:
    
    February 22, 2025 at 12:35 am
    
    Did anyone forget the 6809?🤔
    Also the Hitachi flavor 6309, with his hidden registers and opcode? 👍
    
    Report comment
    
    Reply
2. Bernd das Brot says:
  
  February 21, 2025 at 5:54 am
  
  AVR-8 is indeed battle-tested and has very good documentation. I can only recommend it as a start, since it is easy to understand (no MMU, no caches, …). ARM is a whole different level of complexity, but also a whole different level of (compute) power.
  
  Report comment
  
  Reply
3. Daid says:
  
  February 21, 2025 at 7:01 am
  
  To be honest, the AVR architecture is somewhere in between 8bit and 16bit, as it has a surprising amount of support for 16bit operations for an 8bit CPU. And an 8bit multiplier. All that really helps in making the AVR a lot more performant then most “legacy” 8bit controllers that struggle with running C code for many reasons.
  
  Report comment
  
  Reply
4. Patrick Van Oosterwijck says:
  
  February 21, 2025 at 7:13 am
  
  AVR is a pretty rotten architecture, using different instructions to access data in RAM and ROM. The result is absolute hell of you’re trying to write functions that can access data in either.
  
  Report comment
  
  Reply
  1. Tyler N says:
    
    February 21, 2025 at 7:29 am
    
    You mean you don’t like the Harvard machine architecture. It’s a feature, not a bug (and it’s not unique to AVR).
    
    Report comment
    
    Reply
    1. Pat says:
      
      February 21, 2025 at 1:05 pm
      
      it’s an absolute hell of a feature wherever it shows up
      
      Report comment
      
      Reply
  2. Ross Archer says:
    
    February 21, 2025 at 9:06 am
    
    It’s a perfectly sensible thing to want ROM and RAM to look the same to the programmer, especially from a modern perspective where even some embedded processors can implement virtual memory that’s more associated with large virtual memory systems. But history and fundamental differences between RAM and ROM make that too hard or too expensive to be worthwhile. First, there’s a fundamental problem that flash memory is orders of magnitude slower to write to in some cases than it takes to write to RAM. (If some ‘0’ bit in a block needs to be changed to ‘1’, the whole block has to be erased and re-written, at great cost in time. For that reason alone, unifying flash and RAM have limited usefulness apart from adding convenience for the programmer. And that hasn’t been a factor in ISA design in ages.
    Because you really can’t ignore the differences between the behavior of these devices, they’re really not invisible at the machine level, so also the RISC approach would be to let the OS or application writer implement a driver to do this when or if needed and not try to build hardware inside to support it directly just for programmer convenience.
    
    Report comment
    
    Reply
    1. Pat says:
      
      February 21, 2025 at 7:54 pm
      
      Yeah, I don’t agree. There’s no need for the memory to be writable: you could either trap or ignore a write to those areas and alter the flash via totally different paths.
      
      The main benefit from an EE perspective is performance: you just have a higher memory bandwidth overall because you’ve got two busses. You can fetch instruction/store data at the same time.
      
      Hence the reason why modern processors are modified Harvard to get the benefits of both.
      
      Report comment
      
      Reply
  3. Ross Archer says:
    
    February 21, 2025 at 9:37 am
    
    This may be a perfect example of the profound shift in “what to include, what to leave out”, and why, that started as RISC architectures became ascendant. The mentality of designing an ISA around human programmer is pretty dead after research showed there were a lot of instructions barely used, and how that silicon for making a convenient instruction set might be better deployed instead for more registers or more cache memory or some kind of accelerator or special purpose computation useful for applications.
    Once you think about die size and how cheaper chips get deployed , it’s no wonder a latter-day 8 bit design like AVR would use more silicon for registers and peripherals than making memory easier to use by making ROM look like RAM. Post-RISC that sort of non-performance enhancing virtualization gets cut regardless of whether it makes the code easier. They retronym’ed the term RISC to “Relegate the Impossible Stuff to the Complier) for good reason. :)
    
    Report comment
    
    Reply
    1. Julian Skidmore says:
      
      February 21, 2025 at 1:36 pm
      
      RISC hasn’t been renamed, it still mean Reduced Instruction Set Computer. It’s arguable that after 4 decades it’s finally winning; with Intel controlling the remaining CISC architecture and failing against ARM-64 (I’m typing this on a MacBook M2).
      
      To summarise: RISC had an initial advantage of being able to simplify architectures. This meant they could turn decode logic and microcode into pipelines and registers; which speeded up performance.
      
      This meant CPU speeds started to outpace RAM, which in turn forced a switch towards caches. As caches became bigger, x86 (and 68K) cores took less space, but moreover, needed less cache than RISC devices (with worse code density). With 10x as many Intell engineers, superscalar x86 started to catch up with superscalar PowerPC and eventually overtake.
      
      However, both RISC and CISC then hit a GHz ceiling forcing a shift from Superscalar architectures (multiple functional units mimicking sequential execution) to multiple cores. This shift to real parallelism has given RISC the edge again, thanks to its large number of registers and smaller possible cores (and therefore more cores). The size of a core and internal parallelism is now an advantage.
      
      The number of RISC CPUs is now probably 100x that of Intel, even if they’re still the majority in the desktop/laptop market. ARM-based Windows PCs will increasingly become viable and normal; x64 will fade; RISC-V will be the new competitor at the high end.
      
      Report comment
      
      Reply
      1. Jerr says:
        
        February 23, 2025 at 7:38 pm
        
        Not to mention ARM and RISC-V having the built-in feature of bi-endianess.
        
        Report comment
    2. BrightBlueJim says:
      
      February 22, 2025 at 12:51 am
      
      Back when the 6809 was making its debut, there was an interview in Byte Magazine with some of its developers. Their whole intention was to make a compiler-friendly microprocessor. Sounds like they may have succeeded.
      
      Report comment
      
      Reply
      1. BrightBlueJim says:
        
        February 22, 2025 at 1:49 am
        
        This was published in the first three issues of 1979. On browsing these, some corrections:
        a) Not an interview, but a three-part article by two of the designers.
        b) They don’t explicitly say that they had set out to make a compiler-friendly MPU, but much of their research involved analyzing compiled 6800 code, and finding where the inefficiencies were.
        
        Report comment
  4. Dylan H says:
    
    February 21, 2025 at 6:25 pm
    
    What’s funny is that in the tinyAVR version of the AVR architecture (think ATTiny10), ROM is also mapped into the data address space (albeit as read-only), so you can access it as if it were RAM!
    
    Report comment
    
    Reply
5. imqqmi says:
  
  February 21, 2025 at 11:26 am
  
  I think I’ve seen C optimized 8 bit mcu’s from microchip as well (of course atmel is now part of microchip). ie PIC16F193x series mcu’s, there are c-compiler optimized instructions.
  
  Report comment
  
  Reply
  1. ex HiTech says:
    
    February 24, 2025 at 1:56 pm
    
    Microchip claimed they were C-compiler-optimized. If only their hardware guys had spoken to C compiler implementers other than their in-house software guys who had only previously targeted PIC18. (The “C compiler optimized” instructions, when enabled, disabled so many other features they really hamstrung the chips.) At least a couple years later, having acquired another compiler team, they did talk to us before committing to silicon.
    
    Report comment
    
    Reply
6. STM32 says:
  
  February 22, 2025 at 8:47 am
  
  Agreed. Once the Cortex-M0 came out, it was game over for everything else with their odd toolchain requirements. Prices are driven by the amount of flash, not the core anyway. You can get an STM32 M0 for 30 cents.
  
  Report comment
  
  Reply
J. Peterson says:

February 21, 2025 at 5:21 am

The CC5X compiler generates code for tiny PIC chips like the PIC16 and PIC12. I used it for the Curilights project. Google CC5X C Compiler.

Report comment

Reply
1. J. Peterson says:
  
  February 21, 2025 at 5:22 am
  
  Providing links separately so they don’t get killed by moderation:
  https://curilights.com
  https://www.bknd.com/cc5x/
  
  Report comment
  
  Reply
  1. J. Peterson says:
    
    February 21, 2025 at 5:23 am
    
    curilights.com
    
    Report comment
    
    Reply
    1. Nickey Joe Atchison says:
      
      February 21, 2025 at 11:03 am
      
      New super high speed math implementations like “Diamon” can manipulate data at 10,000 x speeds.
      So too 8 bit machines. Covenants on my brain – trade secrets – preclude my ability to explain exactly how. Id you think about it, you can make an 8 bit AVR do backpropogation at q0,000 X.
      
      Report comment
      
      Reply
2. mgrusin says:
  
  February 21, 2025 at 6:42 am
  
  I used CC5X for my early PIC stuff and was always very happy with the workflow. Thanks for the reminder; I should find my tube of 16F688s that I bought when < $2.50 / MPU was an incredible deal.
  
  Report comment
  
  Reply
Sykobee says:

February 21, 2025 at 5:47 am

Lovely documentation for those of us casually browsing on the web who eschew videos.

https://github.com/f8-arch/doc/blob/trunk/manual.tex

The example they give of an array with 4 byte wide elements doesn’t need a MUL, it needs a left shift by 2 bits (and then the ADD of the array base address).

Is there a comparison with other 8-bit architectures that shows where F8 is better (apart from being GPL) for the C use case.

Can we start the Rust arguments already btw?

Report comment

Reply
1. Tom G says:
  
  February 21, 2025 at 7:39 am
  
  It is not “lovely”. It might be lovely if they bothered to include a PDF version of the manual.
  
  If they can’t be arsed to do that, then what else haven’t they bothered to do?
  
  Report comment
  
  Reply
  1. jalnl says:
    
    February 25, 2025 at 1:37 am
    
    Totally agree, I mean tex, come on…
    
    Report comment
    
    Reply
2. Al Williams says:
  
  February 21, 2025 at 10:29 am
  
  I will point out that while it is a left shift, it is a multi-byte left shift (which is still a multiply). So either way you go, you have to do something over multiple instructions
  
  Report comment
  
  Reply
MattAtHazmat says:

February 21, 2025 at 6:46 am

While this is an interesting bit of open source work, it pretty much follows the spirit of the development of the AVR (released in 1996), which was created specifically to work well with compiled high level languages- the developers worked directly with IAR.

I’m a 3rd of the way through the video and I haven’t seen any mention of this bit of history.

Report comment

Reply
Charles Springer says:

February 21, 2025 at 7:14 am

This is interesting “stack-relative addressing, hardware 8-bit multiplication, and BCD support” since first and 3rd are 65C02 items, and “instant” 16 bit multiply can be done with a ROM LUT.

I have often pondered ways to make a 16 bit arch with the nice features of the 6502 but I never come up with anything that, surprise, the smallest ARM doesn’t do better.

Report comment

Reply
1. jalnl says:
  
  February 25, 2025 at 1:38 am
  
  “a 16 bit arch with the nice features of the 6502” – isn’t that just the 65C816?
  
  Report comment
  
  Reply
svofski says:

February 21, 2025 at 7:18 am

For small FPGA footprint, neo430 probably beats everything and has very compact code as well.

Report comment

Reply
3 kołtun says:

February 21, 2025 at 8:02 am

I need 10 bit procesor (or 2 with error detector crc in memory)
10 bit is better than 8. Easy adressing and still small form many programs

Report comment

Reply
George says:

February 21, 2025 at 8:58 am

Ha! I still have a few F8’s laying around. Both quartz window EPROM and a socketed version that something like a 2716 EPROM. Plus a bunch of datasheets. Had plans for them, but never used ’em.

Report comment

Reply
1. Ross Archer says:
  
  February 21, 2025 at 9:59 am
  
  Now those F8 component chips probably be a prize exhibit at the Computer History Museum, or some museum at any rate. Bits of more obscure history like SC/MP or F8 or 1802 are especially worth preserving in my view, because their lower numbers and near absence from commercial volume mean there’s less available to preserve.
  
  Report comment
  
  Reply
  1. gregg4 says:
    
    February 21, 2025 at 10:29 pm
    
    Also found themselves into one of the first video games. The carts they used were peculiar. Oh and Mostek was a second source for them.
    
    Report comment
    
    Reply
2. Mr Name Required says:
  
  February 22, 2025 at 3:28 am
  
  The F8 was my first computer. It was a kit with the Mostek 3850, 3851 and 3853, 1K RAM and 1K ROM which contained the FAIR-BUG monitor. It’s an odd architecture.
  
  Report comment
  
  Reply
me says:

February 21, 2025 at 10:06 am

The 8051 has been around for years and has excellent C support. Most of the very cheap Chinese parts with a processor are based on a mcs51 core and yes some of them have Bluetooth and USB

Report comment

Reply
Paul says:

February 21, 2025 at 11:51 am

Been using the stm8 for the past few years. Very low power and the stm32s have only recently caught up in terms of power. If they continued to shrink the stm8 it would no doubt still be leading, but unfortunately they aren’t, instead putting future development into the stm32.

Report comment

Reply
Julian Skidmore says:

February 21, 2025 at 1:38 pm

Magic-1 is a very interesting 8/16-bit CPU designed to support C well.

https://www.homebrewcpu.com/index.htm

Report comment

Reply
1. gregg4 says:
  
  February 21, 2025 at 10:31 pm
  
  The developer behind the platform is the brother to the gentleman who created the OpenWRT community by building the first binary.
  
  Report comment
  
  Reply
Greg A says:

February 21, 2025 at 1:53 pm

i felt like the write up of the disadvantages of C on an 8-bit microcontroller was pretty good, but i think the article would have floated my boat better if it included a summary of how they overcame those.

and the answer is, they included a 16-bit ALU and a convenient stack. that was the improvement to an 8-bit CPU that they made to make it more friendly to C. i’m sorry, but i’m going to say: duh.

that’s exactly what microchip did with the PIC18. and just speaking practically, at first i loved the PIC18 but now that stm32 exists i don’t any longer. if i want something that is aimed at running C, ARM is just so convenient these days. there’s just not much to differentiate it that makes the PIC18 particularly special compared to an rp2040. maybe some power save modes? i don’t know, i do love PIC peripherals.

but by comparison the PIC12 is still special. and not least because i would never be tempted to use C on it. so every time i use it, i confidently control every detail. which is actually handy in embedded work and not just a mental handicap like in the rest of my life :)

also the PIC12 includes some nice hacks to make it easier to live within 8 bits. one of those is that it’s really fundamentally 8 bit addressing for data memory. which isn’t quite so limiting because program memory is on a separate address bus. man, i just loaded up the PIC12F675 datasheet. 132 pages, and it’s complete, everything you need. i sure do love PIC12 :)

Report comment

Reply
alanrcam says:

February 21, 2025 at 6:08 pm

“As opposed to having to get a bigger FPGA to hold your design and a 32-bit CPU.”
My current interest is Decimal64 math. I’d need 8 registers to hold a value, 16 if I want to add / subtract etc.
32-bit CPUs take a lot of FPGA space, but what about 16-bit? Are 8- and 32-bit REALLY the only games in town?

Report comment

Reply
1. Pat says:
  
  February 21, 2025 at 7:56 pm
  
  There’s a soft implementation of an MSP430 called Neo430, I think. It’s an exceptionally clean 16-bit architecture.
  
  Report comment
  
  Reply
pelrun says:

February 21, 2025 at 6:49 pm

notorious PIC

I see what you did there.

Report comment

Reply
STM32 says:

February 22, 2025 at 8:41 am

Weird 0x4000 reset vector

Report comment

Reply
William Payne says:

February 22, 2025 at 9:39 am

Transparent portable 64 bit fixed and floating point multiply should work on 8 bit platforms?

15 times 15 applied to AI floating point hardware multiply
2/19/25 06:14

15×15 = 225.
Hex 225 = e1.
binary e1 = 1110 0001.

Ones compolement of 0 0001 is 1 1110.

Add 1 to get twos complement. 1 1111.

1 1111 binary is 31 decimal.

225+31 = 256.

(24-1)(24-1) = 28-25+1 so what was done on the above line is
to remove the negative -25+1 to get 28. :)

So similar computations here [photo on hackaday twitter0

in hex ones complement 1 ffffffff fffffffe – least signincant bit to far right.

Twos complement by adding 1 ffffffff ffffffff. The -265+1. :)

Adding above to feffffff ffffffff – getting he order right – gives ffffffff ffffffff ffffffff ffffffff.

Above line is 2128 … Albert Gore willing, of course.

Is transparent portable gcc c MUL working properly on both x86 and ARM platforms?

Allwinner H618 ARM A53 draws ~1 W. Green … in comparison to Nvidia AI accelerators
which use lots of power and get inactrate multiply results [AI Hardware, Explained video]?

Bill,
Posit arithmetic is far more accurate than IEEE standard. Check it out.
TL [former editor-in-chief of IEEE Software and IEEE Cmouter]

On Feb 21, 2025, at 6:30 AM, bpayne37 wrote: >

15 times 15 applied to AI floating point hardware multiply

Report comment

Reply

Hackaday

A New 8-bit CPU For C

60 thoughts on “A New 8-bit CPU For C”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

A Field Guide To The North American Cold Chain

The DEW Line Remembered

The Fight To Save Lunar Trailblazer

Hacking When It Counts: DIY Prosthetics And The Prison Camp Lathe

Dearest C++, Let Me Count The Ways I Love/Hate Thee

Our Columns

FLOSS Weekly Episode 841: Drupal And AI: The Right Tool For Everything

Mach Cutoff: Bending The Sonic Boom

Robots Want The Jobs You Can’t Do

Hackaday Links: July 13, 2025

Trickle Down: When Doing Something Silly Actually Makes Sense

60 thoughts on “A New 8-bit CPU For C”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns