Clockhands For Faster CPU Execution

December 9, 2023

When you design your first homebrew CPU, you probably are happy if it works and you don’t worry as much about performance. But, eventually, you’ll start trying to think about how to make things run faster. For a single CPU, the standard strategy is to execute multiple instructions at the same time. This is feasible because you can do different parts of the instructions at the same time. But like most solutions, this one comes with a new set of problems. Japanese researchers are proposing a novel way to work around some of those problems in a recent paper about a technique they call Clockhands.

Suppose you have a set of instructions like this:

LOAD A, 10
LOAD B, 20
SUB A,B
LOAD B, 30
JMPZ DONE
INC B

If you do these one at a time, you have no problem. But if you try to execute them all together, there are a variety of problems. First, the subtract has to wait for A and B to have the proper values in them. Also, the INC B may or may not execute, and unless we know the values of A and B ahead of time (which, of course, we do here), we can’t tell until run time. But the biggest problem is the subtract has to use B before B contains 30, and the increment has to use it afterward. If everything is running together, it can be hard to keep straight.

The normal way to do this is register renaming. Instead of using A and B as registers, the CPU uses physical registers that it can call A or B (or something else) as it sees fit. So, for example, the subtraction won’t really be SUB A,B but — internally — something like SUB R004,R009. The LOAD instruction for 30 writes to B, but it doesn’t really. It actually assigns a currently unused register to B and loads 30 into that (e.g., LOAD R001,30). Now the SUB instruction will still use 20 (in R009) when it gets around to executing.

This is a bit of an oversimplification, but the point is there’s plenty of circuitry in a modern CPU thinking about which registers are in use and which one corresponds to a logical register for this particular instruction. One proposed way to do this is to stop referring to registers directly and, instead, refer to them by how far away they are in the code (e.g., SUB A-2, B-1). This can be easier for the hardware, but more difficult for the compiler.

Where Clockhands is different is it refers to the number of writes to the register, not the number of instructions. It is somewhat like using a stack for each register and allowing the instructions to refer to a specific value on the stack. The hardware becomes easier, there is less for the compiler to do. This could potentially reduce power consumption as well.

Confused? Read the paper if you want to know more. Some background from Wikipedia might help, too. It reminded us of a CPU architecture from way back called The Mill (dead link inside, but there’s always a copy). If you didn’t know your CPU registers aren’t what you think they are, it is even worse than you think.

15 thoughts on “Clockhands For Faster CPU Execution”

Jon H says:

December 9, 2023 at 11:04 pm

The Mill people reorganized their website.

https://millcomputing.com/docs/

Report comment

Reply
1. Truth says:
  
  December 10, 2023 at 2:29 am
  
  I check the mill forum about once every 6 months (libre-SOC as well), to see how far they have progressed. And from an average Joe, reading what has been made public, neither project is dead, just not moving a long as fast as I would wish.
  
  Report comment
  
  Reply
jalunaki says:

December 10, 2023 at 12:37 am

We have many CPU ISA, spark, motorola , powerpc etc.
problem is wit hpower eficience

Report comment

Reply
1. combinatorylogic says:
  
  December 11, 2023 at 1:32 am
  
  All those ISAs rely on a complex and costly register renaming machinery. Remove it and you’ll have better power efficiency.
  
  Report comment
  
  Reply
Truth says:

December 10, 2023 at 2:39 am

Many happy lawyers ?
https://millcomputing.com/topic/tokyo-universitys-straight-compiler/#post-3958

Report comment

Reply
Ostracus says:

December 10, 2023 at 6:20 am

2033 the patents should expire.

Report comment

Reply
STrRedWolf says:

December 10, 2023 at 7:23 am

Doesn’t this really mean your compiler just sucks at optimization? If the compiler/assembler is doing a “dumb” compile into assembly assuming a limited number of registers, then it’s leaving a lot on the table if you got 16+ registers to play with. It should be able to scan through and change the registers to enable more speed w/o the CPU having to do such lifting.

Report comment

Reply
1. Dyson says:
  
  December 10, 2023 at 8:42 am
  
  But the example is simple. What about branches that depend on data the is unknown at compile time?
  
  Report comment
  
  Reply
2. Greg A says:
  
  December 10, 2023 at 9:11 pm
  
  the thing is, most popular ISAs expose on the order of 16 registers. but hardware has much more than that, specifically to support superscalar / out-of-order execution. and you can’t just expose these registers in the ISA, because they keep changing…it’s nice to have a relatively fixed ISA that doesn’t expose too many details of the execution engine, so the execution engine can change.
  
  i am not gonna bother to click through to read a better summary of clockhands than the one here, but it doesn’t seem to me that it’s worth much. no one needs a more complicated and more implementation-specific way to define references to hidden registers. it seems to me like a waste to add bits to your instruction encoding just to reference “past values of a register”. if you’re going to add the bits, you might as well just name more of the registers. i mean, that’s ultimately how register allocator would have to treat it anyways.
  
  though i’m not convinced referencing more registers is a huge win anyways. it seems hard to avoid the fact that a lot of operations simply are serial.
  
  seems kind of like a baby step back towards VLIW?
  
  Report comment
  
  Reply
  1. Paul P says:
    
    February 26, 2024 at 9:02 am
    
    From the paper: in a 3 argument instruction in RISC-V, they use 15 bits. In Clockwork, the equivalent instruction only uses 14 bits. So no, they are not adding bits to the instruction encoding, they are actually removing one.
    
    Report comment
    
    Reply
3. combinatorylogic says:
  
  December 11, 2023 at 1:37 am
  
  You can have a lot more physical registers than the ISA register addressing space allows, and this is what OoO register renaming is for.
  
  Report comment
  
  Reply
preamp.org says:

December 10, 2023 at 8:41 am

How are they going to make that thing secure? Surely the ‘S’ in ‘Clockhands’ stands for ‘Security’, but it is only a small ‘S’ right at the end, which might as well be left out altogether…
The efficiency gain probably holds true for exfiltrating secret data, no?

Report comment

Reply
Feinfinger (kinda angry here) says:

December 10, 2023 at 8:43 am

“””
NOTE: the slides require genuine Microsoft PowerPoint to view; open source PowerPoint clones are unable to show the animations, which are essential to the slide content. If you do not have access to PowerPoint then watch the video, which shows the slides as intended.
“””

*PLONK!*

Report comment

Reply
1. Ostracus says:
  
  December 10, 2023 at 12:12 pm
  
  I didn’t even get those and I have genuine PowerPoint. Maybe I need genuine EDGE.
  
  Report comment
  
  Reply
2. Jon H says:
  
  December 13, 2023 at 6:11 am
  
  Worked fine in Apple’s Keynote.
  
  Report comment
  
  Reply

Hackaday

Clockhands For Faster CPU Execution

15 thoughts on “Clockhands For Faster CPU Execution”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Back To The Future, 40 Years Old, Looks Like The Past

Why The Latest Linux Kernel Won’t Run On Your 486 And 586 Anymore

One Laptop Manufacturer Had To Stop Janet Jackson Crashing Laptops

The 2025 Iberian Peninsula Blackout: From Solar Wobbles To Cascade Failures

Field Guide To The North American Weigh Station

Our Columns

Hackaday Podcast Episode 327: A Ploopy Knob, Rube-Goldberg Book Scanner, Hard Drives And Power Grids Oscillating Out Of Control

Last Chance: 2025 Hackaday Supercon Still Wants You!

FLOSS Weekly Episode 839: I Want To Get Paid Twice

South Korea Brought High-Rise Fire Escape Solutions To The Masses

C++ Encounters Of The Rusty Zig Kind

15 thoughts on “Clockhands For Faster CPU Execution”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns