We’ve often heard that modern x86 CPUs don’t really execute x86 instructions. Instead, they decode them into RISC instructions that are easier to schedule, pipeline, and execute. But we never really looked into that statement to see if it is true. [Fanael] did, though, and the results are very interesting.
The post starts with a very simple loop containing four instructions. In a typical RISC CPU — RISC-V — the same loop requires six instructions. However, a modern CPU is likely to do much more than just blindly convert one instruction set to another.
The reason is that CPUs aim to increase the number of operations performed on a clock cycle (on average). There are many ways to maximize instructions per clock. One way to do this is pipelining, where you execute instructions in multiple phases. For example, you can load an instruction while decoding a second instruction and executing a third instruction.
There is a problem, though. Suppose you will add three numbers and then increment a counter within a loop, and the three numbers don’t depend on the counter. In a classic pipeline, you must wait for the additions to finish before you can increase the counter and continue the loop. But with an out-of-order pipeline, the CPU could figure out that it could do the increment in parallel with the additions. To further improve parallel operation, register renaming can allow the CPU to place results in a temporary register that you can commit or discard later.
The P6 from 1995 was the first x86 that did out-of-order execution. This CPU does, in fact, convert x86 instructions to RISC instructions. However, the Pentium M had micro-operation fusion which allowed the CPU to treat some operations as pairs, and each subsequent architecture diverged further and further from the model of the P6.