Register Renaming

[Shreeyash] asks an interesting question: how many registers does your CPU have? The answer is probably more than you think. The reason? Modern CPUs — at least many of them — execute instructions out of sequence so they can perform multiple instructions per clock cycle. To do this, they may need to execute instructions that change registers that other instructions are still reading. In addition, you might be writing a result speculatively — a branch might make it where your result won’t wind up in the target register. The answer to both of these problems is register renaming.

The ARM CPU he looks at has many physical registers you can’t see. These get mapped to the registers you use on the fly. So when you read a register in software, you are really getting an underlying physical register. Which one? Depends on when you read it.

The RAT, or Register Alias Table, keeps track of the mapping between physical registers and the register names you use. Not only does this allow the CPU to run operations out of order, but it also lets results sit in unnamed physical registers until the time is right for it to become the real register. As a byproduct, moving one register to another becomes fast since you can just copy the alias of one physical register to another logical register.

Not clear? Try reading the post. There are other ways to get the same result (e.g., reservation stations), but the technique goes way back to mainframe computers. While it didn’t appear right away in microprocessors, modern ones often execute out of order and have to have some scheme to address this problem.

If you build your own CPUs with FPGAs, it is possible to do the same trick. There are also RISC-V variants that can do it.

13 thoughts on “Register Renaming

  1. The NEC V20/V30 did use register renaming in 8080 emulation mode, I vaguely remember.
    That’s why i8080 software could run, but not Z80 software.
    The i8086 ISA was a superset of i8080, basically.

    1. Z80 was also a superset of i8080, just a different set than i8086. Z80 repurposed unused or redundant opcodes from the 8080 (such as NOP variants and alternative jump/call codes) and introduced opcode prefixes (including duplicate register sets that can be quickly swapped using the EX AF, AF’ and EXX instructions) That isn’t register renaming in the modern out-of-order execution sense, but rather a register bank switch used for fast context switching, especially useful in interrupt handling. I bring it up as an example of what was the alternative/used before register renaming.

      CP/M itself had system calls which emulated the Z80 instructions (mainly LDIR), which could be “accelerated” if the processor was a real Z80, but of course programs occasionally ignored this and just used the Z80 instructions directly, which messed things up for running on CP/M-86 (which, a bit confusingly, was built for the IBM 8088 PC)

    2. The v20 in 8080 emulation mode mapped the 8080 registers to its own internal (8086 style) registers, but I assumed that for most or all of them this was probably a static 1:1 remapping. The 8086 registers were in many ways a more flexible 16-bit superset of those used in the 8080, after all.

      If you’re remembering that it did some kind of dynamic remapping to improve efficiency, that’d be a neat trick! It’s not impossible – it was a very clever CPU for the time, after all.
      (though i don’t think that would be considered “register renaming” in the modern sense, since it was still very much executing one instruction at a time, not doing any kind of out-of-order execution/predictive branching/etc.)

      1. “but I assumed that for most or all of them this was probably a static 1:1 remapping.”

        Yeah, the datasheet explicitly says that it’s a 1:1 remapping and tells you what it is. It’s just implemented via a microcode switch – and the V20 microcode saga is a whole Big Thing in microprocessor intellectual property history involving drama and hilariousness including the judge having to recuse himself near the end because he held some trivial amount of Intel stock indirectly through an investment group.

        1. Your not wrong but the Intel F00F bug shows how messy CPU design gets. That obsolete legacy is one reason modern CPUs treat microcode updates and features (and things like modern standby behavior) with far more centralized, opaque firmware control today.

          1. Sure, but that was a lot more in the future than the V20. That’s one of the reasons designers speak fondly of the older designs since they’re far easier to understand.

            The V20 legal saga actually is what established microcode is copyrightable.

    3. Yeah, uh, your’re not remembering correctly, or you’re misunderstanding what register renaming is. The V20 mapped the 8 general purpose registers (well, the 8-bit versions of the 16-bit registers) to 8080 registers directly. That’s not renaming, that’s… naming.

  2. I’ll have to do a deeper dive to find an answer, but I’m wondering how this is related to shadow registers, such as the Motorola 68000’s A7 register, which was either the user-stack-pointer or the supervisor-stack-pointer depending on what mode your code was running in at the time. My gut says there is at least some historical connection.

    1. This is different – register renaming is for architectures that does out-of order execution to solve data dependency issues. TLDR; CPU can do whatever it wants as soon as the result is exactly the same as if the instructions were executed one by one in the order they were written.

      For example let’s have this program:

      I1 R1, R2
      I2 R2, R3
      I3 R4, R2
      (Ix being some instructions and Rx being some registers)

      and your CPU decides that is is good idea to execute I1, than I3 and than I2 (out of order execution), now I3 would overwrite register R2 before I2 executes and the “in order” expectation is broken. So what the CPU does, it remaps R2 either for I1 and I2 or I3 and whatever comes next (architecture and ordering dependent).

  3. When looking at time CPUs, you cannot get by with less than 4 registers. Accumulator, Operand, Instruction, Pointer. That is assuming you have a CPU that has no flags (Carry is a 1 bit register too!) Even the MC14500 (Usagi’s vacuum tube computer) uses uses 4 latches (IEN, OEN, RR, IR).

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.