This CPU Has Only One Instruction

Most of us will be familiar at some level with the operation of a basic CPU, usually through exposure to microprocessors of the type that find their way into our projects. We can look at its internal block diagram and get how it works, see the registers and ALU, follow the principles of a von Neumann architecture, and understand that it has an instruction set with different instructions for each of its functions. We all know that this only describes one type of CPU though, and thus it’s always interesting to see alternatives. [Ike Jr] then has a project that should provide a lot of interest, it’s a CPU that only has a single instruction. It can only move data from one place to another, which seems to preclude any possibility of computation. How on earth can it work?

The machine has a set of registers as well as memory, and it achieves computation by having specific registers where we might expect to see instructions. For example the AND register is a two-position stack, that when it is full computes the AND of its two pieces of data and places the result in a general purpose register. The write-up goes into significant detail on the CPU’s operation, and while it’s unlikely the world will move en masse to this architecture it’s still a very interesting read. For now this is a machine that exists in software simulation rather than in silicon, and he’s working to release it so enthusiasts for unusual CPUs can have a go.

The idea of having registers that compute reminds us of a transport triggered architecture machine, being not the same as a one instruction CPU with a more conventional computing instruction.

Abstract PCB header image: Harland Quarrington/MOD [OGL v1.0].

37 thoughts on “This CPU Has Only One Instruction

  1. The ability to just move data between memory and registers might in and off itself seem like it only has one instruction.

    But it isn’t unreasonable to argue that the registers themselves acts as instructions, since a given register is an input to a specific executable unit.

    Still an interesting approach non the less.

    1. It can be a allegory of how the cpu is implemented in silicon. If you look at how the 6502 is implemented, you can see how the instructions are decoded in a matrix. If I understand this cpu, the registers takes care of the “instructions” and therefore there it is only one instructions technically.

      The most important part of this is to explore how you can make a cpu. It got my gears spinning.

      1. The data routing matrix serves the same concept as the instruction decoder. Since the addresses are hard-coded, they take the place of instructions.

        The question is just about granularity. With a register based machine, each register has a definite function and has to implement that function separately. With an instruction/microcode based machine, a single ALU can typically perform most of the functions, so the circuitry is not duplicated unless a second ALU is added to speed things up.

  2. There are several examples of single instruction computer architectures. Years ago I built one that used a Subtract and Compare Instruction… If the result was zero, it executed the next instruction… If not, it skipped it. It was Turing complete albeit not very efficient.

    1. None of the “move” architectures really went anywhere. Maybe the best known were the GRI minicomputers of the very early seventies (and who remembers them?). I think Burroughs used some of the concepts in the microcode engines in some of their smaller mainframes of the 1970s.

    1. “The Ultimate RISC” was the article that got me started on this path back in 1988 when it was published. I made a lot of changes to the original design.

      The PC is address 0 and may be read or written. Address 1 when read returns (PC++). When written, register 1 causes the PC to be written to the stack and the write value is loaded into register 0 (PC) making it a Call instruction.

      A preliminary specification is at https://github.com/BillBohan/NISC

      I have received very few comments on it and plan to update the spec in 2020.

  3. We did this in the early nineties at my computer-archtecture group at University (TUD). We called it “move architecture”.

    Some 15 years ago a guy who worked at a novel computer-chip-manufacturer at that time posted publicly that it is very important to “compress” the instructions. To get good performance out of a computer you need to reduce the number of bits in your instruction stream because the instruction stream is a significant bottleneck for the whole system. That guy’s name is “Linus Torvalds”.

    So in this case, often only using 12 out of 40 instruction bits is quite inefficient. A better architecture might use 4 bits to select “condition”, and then 6+6 for the register numbers. To get literals, a few special registers can take the 6 bit source register number as a constant and move it to a certain 6 bits in a constant register. Or a special register might trigger an automatic: ‘skip next instruction” but load the data either in the top half or the bottom half of a register.

    Anyway, lots of options that would in practice work a bit better than this lets add 26 extra bits to every instruction that are rarely used.

      1. You beat me to it…. Plus, it turns out that Linus is much better at software than hardware as the instruction bandwidth is not the limiting factor as hardware becomes cheaper.

  4. Interesting at software level, but for those who are interested for an hardware finality by following links I understand that the processor exist physically already.. since 2004:

    MAXQ[5][6] from Dallas Semiconductor, the only commercially available microcontroller built upon transport triggered architecture, is an OISC or “one instruction set computer”. It offers a single though flexible MOVE instruction, which can then function as various virtual instructions by moving values directly to the program counter.

    https://www.maximintegrated.com/en/design/technical-documents/app-notes/3/3222.html
    and development boards: https://www.maximintegrated.com/en/products/microcontrollers/low-power-microcontrollers.html

    NOTE: I didn’t thought deep about this but I believe what missing in Periwinkle is Interruption capability…

  5. This is technically a two instruction computer. It either moves a literal or it moves the content of a register. But it could be converted to a one instruction computer if you store the literals in ROM.

    1. That is honestly up for debate.

      Do we count each instruction call its own instruction.
      Or should we rather look at the function of the instruction call and simply consider its variants as the same.

      For an example, x86 has 96 variants for its conditional jump instruction, depending on what condition one jumps based on. So is that 1 instruction, or 96 instructions?

      One can argue that it occupies 96 instruction calls and therefor is 96 instructions.
      One can also argue that it uses 1 set of hardware and simply inputs from a multitude of locations.

      I am at least in the camp where one counts a hammer as one object, even if one hammers in a multitude of different types of nails with it. Ie, x86 has 1 conditional jump instruction, with 96 instruction calls.

      1. Not too much room for debate, the things you describe have different names due to being different.
        An instruction is an op-code, represented as a word in memory, sans arguments.
        A micro-instruction manipulates the transistor switching. Each instruction is a list of one or more micro-instructions.

        96 machine language op-codes is 96 instructions, even if one function/”name” in assembly.
        Each is a different list of micro-instructions. The functions are up to a human to describe.

        Conditional vs direct jump is an obvious difference, but even jump to literal vs relative is a vastly different list of micro-instructions, where jump to literal vs jump to register are even more vast.
        Specifying a literal address to jump to only manipulates the program counter register, where register jump must also fetch data, and a relative jump does those plus invokes the ALU. The list of micro-instructions grows longer with each.

        Just because there is one assembly “JMP” command (one hammer in your comparison), it does matter which op-code to use (a hammer for human hands, a 1/32th scale model mini, a sledge, metal, plastic, drawn on paper, hydrolic multi ton.. those differences matter), even if they appear the same function (a screwdriver can hammer nails too, if you want it bad enough)

    2. He I would say that in _theory_ it is even more than two instructions computer: each time you choose one different register you choose a different meaning so a different “instruction”

      Now, _technically_ the instruction is unique at processor level.

  6. Julian Ilett has been (slowly) building such a computer on his YouTube channel. 8-bit words, and you build literals by way of a register that shifts 4-bits and then copies over the source bits from the current instruction word. Copying that register elsewhere works as normal. I believe he also had an adder implemented at this point.

  7. Caveat: I’m an emulation author and I spent way too much time studying the 6502 and its variants.

    If you strip off the opcode byte, then you’re left with 5-byte RISC-like instructions that resemble a traditional 1-byte opcode plus 4-byte operand format; and arguably you’re optionally repurposing some of the opcode bits for operand bits — therefore you’re not really making a one-opcode CPU. You’re making a CPU that has an extra appendage to every real opcode that is itself not really an opcode at all. But I digress.

    Other thoughts; I don’t think it’s a nonsensical ideal to store values in the status register, how else would you clear carry and overflow bits?

    Also, the other part where the target address is truncated seems that it would undermine the overall usefulness of that operation. That reminds me of scenarios that require bank switching to resolve, and can ugly really fast.

  8. A move based architecture would be a good candidate for partially mechanical architecture for use in a science museum. Use a robot to literally move data between operational units.

    For example bits could be represented by disks that have a zero on one side and a one on the other. Flipping a bit becomes a literal operation. The robot could push a set of bits into a register for a write.

  9. Sorry, but spreading the logic units of a computer across a bus does not make it a single-instruction CPU. It makes it a DPU – Distributed Processing Unit.

    By the way: the idea of a co-processor, in which the operands are written to two addresses and the result read from a third, has been around since 1944, in the IBM Automatic Sequence Controlled Calculator, also known as the Harvard Mark I. Its multiplication unit was connected this way.

  10. Get rid of the objections to special hardware like AND registers and stacks. Just use look up tables with the data as the address. For the AND example, move data to location X and X+1. Read the result at Y+3,4,5 as needed by the operation results. ROM is mapped to RAM with ROM address lines as the inputs and data lines are a shared output by all operations in that ROM. You can do everything. Some quite sophisticated, and all “instant”. Or, for less hardware, just a BIG EPROM full of all the tables. Would the LUTs be considered instructions?

    Does this architecture need literals stored somewhere? Or can you always get values you need using a increment or decrement register. I assume there is a MOV Immediate with a literal in the instruction?

  11. As it only has one instruction displaying it is redundant and the real instruction is the destination register -which it has many. So does it really have one instruction or several?

  12. Many have already noticed the similarity of this to previous attempts, dataflow as well as stack machines. From the write-up it seems that register addressing space is replacing opcodes, but with significantly more difficulty for the compiler for both timing and execution unit scheduling. Always good to see new ideas, even if they were approached before. After all, out of order execution today is common, but failed miserably when first attempted. It took the Pentium Pro design to make it work.

    1. I don’t remember the name of the project nor of the inventor, but the one 1-instruction computer I read into, the first thing he did once the hardware was to a working point, was to write a macro assembler for it, and a set of macros to make reasonable macro-instructions that looked like the instructions of a more conventional machine. So it’s really only one (admittedly large) step from there to having a full compiler or interpreter, to run your favorite language. That’s the beauty of Turing machines: any such machine with sufficient resources (memory, mainly) can be made to do what any other can do.

Leave a Reply to BrightBlueJimCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.