Weird Processing Unit Only Has 4 Instructions

October 16, 2012

[Tomáš], a.k.a. [Frooxius] is playing around with computational theory and processor architectures – a strange hobby in itself, we know – and has created the strangest CPU we’ve ever seen described.

The Weird Processing Unit, or WPU, isn’t designed like the Intel or ARM CPU in your laptop or phone. No, the WPU is a thought experiment in computer design that’s something between being weird for the sake of being weird and throwing stuff at the wall and seeing what sticks.

The WPU only has four instructions, or attoinstructions, to change the state of one of the 64 pins on the computer – set to logical 1, set to logical 0, invert current state, and halt. These instructions are coded with two bits, and the operand (i.e. the wire connected to the computer) is encoded in another six bits.

These 64 wires are divided up into several busses – eight bit address and control busses make up the lowest 16 bits, a 32-bit data bus has a function akin to a register, and a 16-bit ‘Quick aJump bus’ provides the program counter and attocode memory. The highest bit on the WPU is a ‘jump bit’, implemented for unconditional jumps in code.

We’re not even sure the WPU can even be considered a computer. We realize, though, that’s probably not the point; [Tomáš] simply created the WPU to do something out of the ordinary. It’s not meant to be a real, or even useful, CPU; it’s simply a thought experiment to see what is possible by twiddling bits around.

Tip ‘o the hat to [Adam] for sending this one in.

34 thoughts on “Weird Processing Unit Only Has 4 Instructions”

Ren says:

October 16, 2012 at 7:26 am

I still don’t have my head around this concept, but wouldn’t a 4 instruction processor, have an advantage over a 1 instruction processor? Advantage in programming (less jumps? more control) yet, not as, oh how do I say it?, less confusing as CISC?

Report comment

Reply
1. Megol says:
  
  October 16, 2012 at 5:07 pm
  
  Not really as it depends on what the instruction(s) do(es). As one example a 3 address instruction set with 3 instructions can be relatively easy to program compared to this 4 instruction machine.
  
  (operation/memory width is 1 bit)
  NAND dst, src1, src2
  mem[dst]=mem[src1] nand mem[src2]
  
  SKIP0 src1
  if mem[src1]=0 skip next instruction
  
  BRA target
  branch to target address
  
  Even if the system state except the program itself is unknown at the start it is pretty easy to program as a general purpose machine.
  
  Not: NAND dst, src, src
  
  And: NAND tmp, src1, src2
  NAND dst, tmp, tmp
  
  Or: NAND tmp1, src1, src1
  NAND tmp2, src2, src2
  NAND dst, tmp1, tmp2
  
  Xor: NAND tmp1, src1, src2
  NAND tmp2, src1, tmp1
  NAND tmp1, src2, tmp1
  NAND dst, tmp2, tmp1
  
  …
  
  Some PLC systems in the real world have been just a bit more complicated.
  
  Report comment
  
  Reply
fartface says:

October 16, 2012 at 7:33 am

Without a jmp instruction it’s pretty useless.

Report comment

Reply
1. IceBrain says:
  
  October 16, 2012 at 8:00 am
  
  Read more carefully. You can jump by setting the ‘execute’ bit.
  
  Report comment
  
  Reply
chango says:

October 16, 2012 at 7:51 am

It’s not so much weird as it’s an architecture with the world’s laziest instruction decoder.

Report comment

Reply
Tarface says:

October 16, 2012 at 8:03 am

There is already single instruction CPU design out there, so I’m not sure how a four-instruction design is weirder?

http://www.ddj.com/embedded/221800122

Report comment

Reply
1. frooxius says:
  
  October 16, 2012 at 1:01 pm
  
  It’s not that much about the number of instructions, as the architecture in the whole. The description of the architecture is a bit ripped out of the context (doesn’t mention other important parts).
  
  Report comment
  
  Reply
eddieallanwood says:

October 16, 2012 at 8:06 am

Sounds pretty RISCy to me!

Report comment

Reply
1. Frank Reich says:
  
  October 16, 2012 at 8:24 am
  
  Make that MISCy.
  
  Nobody has mentioned NISC or ZISC yet… ;)
  
  Report comment
  
  Reply
ScottV says:

October 16, 2012 at 8:32 am

I need two of them for my parallel computing processor project.

Report comment

Reply
mstone says:

October 16, 2012 at 8:39 am

If you can code an AND-or-Invert function with it, it’s Turing complete. If you can’t, it isn’t.

A quick scan of the documentation doesn’t show anything that looks like a conditional, though there’s some funkiness in the interaction between the address and control buses that might do the job. If ‘look at the data and decide what to do’ has been abstracted that far away from anything immediately visible, coding an adder or multiplier will be Fun in the Dwarf Fortress sense of the term.

Report comment

Reply
attilasukosd says:

October 16, 2012 at 10:17 am

Is it turing complete? :P

Report comment

Reply
Jack says:

October 16, 2012 at 11:02 am

Back in my academic days, there was a proof that it took 7 instructions to make a minimal Von Neumann machine.

Just wish I could find the paper!

Report comment

Reply
1. mstone says:
  
  October 16, 2012 at 8:06 pm
  
  Sounds vaguely like Marvin Minsky’s cellular automaton Turming Machine. IIRC the later version of that had something like 7 states.
  
  You only need about four concepts to get Turing completeness: substitution (if you see pattern A replace it with pattern B), representation (pattern A and pattern B are ‘equal’ in the sense that you can replace either one with the other), scope (rule R only applies within defined limits), and persistence (very loosely, the ability to put information somewhere, forget it, then get it back again later).
  
  If the substitution, representation, and scope rules are stored in memory the machine can write to, you have a Turning machine.
  
  Report comment
  
  Reply
Luca says:

October 16, 2012 at 11:05 am

When I started working in industrial automation more that 20 years ago, my company couldn’t use commercial PLCs because they were too slow for their line of business (ceramic tiles), so the machines were “programmed” with 74XX chips.
Then they developed an in-house PLC, which had a single one bit register and 5 instructions (load, store, and, or, jump), the “programmer” was a suitcase with a keypad to key in those instructions, a single line display and a ZIF socket to program the EPROM. Oh, and the program memory was limited to 1024 instructions.
This is just to say that this CPU design isn’t as weird as it seems (even if I wouldn’t wish my worst enemy to have to use it).

Report comment

Reply
Willrandship says:

October 16, 2012 at 12:02 pm

While it’s a cool project, I don’t like the implication on his site that he made the first strange computer architecture ever.

Here’s a challenge: make a CPU that has as small an instruction set as possible, while still being easy to program.

Report comment

Reply
Galane says:

October 16, 2012 at 12:26 pm

There’s probably uses for a 2 bit computer.

Most CD players use single 1 bit DACs clocked really fast with buffer RAM to synch the channels instead of dual 16 bit DACs running at the exact speed required to decode the audio in real time.

Report comment

Reply
1. chango says:
  
  October 16, 2012 at 12:59 pm
  
  It’s still 16 bits per channel all the way down to the decimator.
  
  Report comment
  
  Reply
2. cutandpaste says:
  
  October 18, 2012 at 5:34 pm
  
  And most other CD players don’t operate at exactly 44.1KHz, either: Oversampling is more the rule than the exception (4x or 8x being common multipliers). Doing so keeps the antialiasing filter simple, cheap, and inaudible.
  
  Report comment
  
  Reply
Davidsdvbuifh says:

October 16, 2012 at 2:08 pm

But does it have a Brainf*ck compiler?

Report comment

Reply
1. Kris Lee says:
  
  October 16, 2012 at 6:06 pm
  
  +1
  
  Report comment
  
  Reply
agtrier says:

October 16, 2012 at 2:25 pm

It’s simple enough to run at, say, 10 THz, beating the i7 by lengths.

Report comment

Reply
1. Megol says:
  
  October 16, 2012 at 4:11 pm
  
  it isn’t and it doesn’t.
  
  Report comment
  
  Reply
2. mstone says:
  
  October 16, 2012 at 9:32 pm
  
  Lessee.. 10THz is 10^13 cycles per second, and the speed of light is 3*10^8 meters per second. So at the speed of light, the physical distance between signal wavefronts will be about 30 microns. Actual signal propagation rates are about half that, so let’s call it 15 microns.
  
  15 microns is 15,000 nanometers, and we have 32 nanometer processes these days. That means you could fit about 450 minimum-aspect features between wavefronts. A minimum-aspect transistor is about 9×16 minimum feature squares, so let’s be generous and say you could fit a 30×40 array of transistors into the space between wavefronts.
  
  Of course, that assumes that the transistor at the left side will be a full clock cycle ahead of the one at the right side (or vice versa). They’ll be 180 degrees out of phase within 15-20, and 90 degrees out of phase within 8-10.
  
  Conclusion #1: clock synchronization is gonna be a bitch.
  
  Each transistor has the usual junction, gate, and substrate parasitic capacitances, which are generally measured in the femtofarad/picofarad range. Assuming you have to shove 100 femtofarads of capacitance around to turn the transistor on or off, then multiplying by frequency, you’ll need an amp of current per transistor to make the thing go. Let’s be generous and assume that the logic levels are 100mV apart, so we only get 100mW per transistor. That translates to something like 12W of power consumption out of our 30×40 array of impossible-to-sync transistors.
  
  Conclusion 2: the energy density of the chip will be well beyond that of the average nuclear reactor core.
  
  Copper interconnects suffer 99% electromigration failure within an hour at current densities of 15 million amps per square centimeter of cross section. There are 10^14 square nanometers per square centimeter, but a feature is 32 nanometers wide and the average interconnect is about a micron thick.. 32,000 square nanometers.
  
  Plugging and chugging, that gives us a MTBF well under an hour for a minimum-feature interconnect carrying more than about 500 microamps.
  
  We need an amp to drive the transistor, and let’s be generous and assume the MTBF scales linearly with current. That means the average trace has an MTBF of about 500 microseconds.
  
  Conclusion 3; I don’t know whether to nominate this for a Magic Smoke award or a Darwin award.
  
  Report comment
  
  Reply
  1. SonicBroom says:
    
    October 17, 2012 at 2:34 am
    
    Mmhm. Yeah. Mhmm. Oh yeah yeah yeah I know some of these words.
    
    Report comment
    
    Reply
  2. Mj says:
    
    October 17, 2012 at 11:53 am
    
    This has to be one of the funniest, nerdiest and most insightful (even educational) “trashing” replies ever. Well done!
    
    Report comment
    
    Reply
  3. mwil7034 says:
    
    October 17, 2012 at 4:25 pm
    
    Or possibly a Chewbucca type defence… ;)
    
    Report comment
    
    Reply
  4. agtrier says:
    
    October 19, 2012 at 1:06 pm
    
    Uh, yeah, I guess you have a point here (or something that looks similar to a point ;-)
    
    Report comment
    
    Reply
3. mstone says:
  
  October 17, 2012 at 10:12 am
  
  @ SonicBroom:
  
  *laugh* Bravo sir.. bravo!
  
  Report comment
  
  Reply
tedmeyers says:

October 16, 2012 at 8:15 pm

It doesn’t sound like it has a way to read a pin???

Report comment

Reply
tedmeyers says:

October 18, 2012 at 7:35 am

Great, hackaday does a post on this guy and he promptly disables his website.

Report comment

Reply
András says:

October 21, 2012 at 12:32 am

I do not think this can be considered as a CPU, the whole thing seems to be just a control unit, part of a CPU. As someone has pointed out it is a kind of lazy instruction decoder.

Report comment

Reply
1. frooxius says:
  
  October 26, 2012 at 11:16 am
  
  That’s not really true, I don’t know why the article only talks about the instruction decoder, because that’s just one (and relatively small) part of this architecture.
  
  It’s a fully featured and higly customizable/modular architecture, I even implemented a playable Pong clone on this architecture as well as many other examples, but the article doesn’t mention any of this for some reason.
  
  Report comment
  
  Reply
  1. András says:
    
    November 1, 2012 at 2:36 am
    
    Yes, I agree, but what the article is referring to is solely can be considered as a control unit.
    
    I have briefly read the documentation and agree that this is a modular system with up to 256 units. But how the units are actually implemented is not discussed, they are assumed to be “somehow” available. Only addressing and controling those units are described as far as I can see.
    So in my view the whole innovation is on how these units can be connected and controlled by these control signals generated by what is referred here as the “WPU” .
    
    Report comment
    
    Reply