Binary Math Tricks: Shifting To Divide By Ten Ain’t Easy

June 12, 2020

On small CPUs, you often don’t have a multiply or divide instruction. Of course, good programmers know that shifting right and left will multiply or divide by a power of two. But there are always cases where you need to use something that isn’t a power of two. Sometimes you can work it out for multiplication.

For example, multiplying by 10 is common when dealing with conversion between binary and decimal. But since 10n is equal to 8n+2n, you can express that as a bunch of left shift three times to multiply by eight, adding that value to your original value shifted left once to multiply by two.

But division is a different problem. n/10 does not equal n/8-n/2 or anything else simple like that. The other day a friend showed me a very convoluted snippet of code on Stack Overflow by user [realtime] that divides a number by 10 and wanted to know how it worked. It is pretty straightforward if you just stick with the math and I’ll show you what I mean in this post. Turns out the post referenced the venerable Hacker’s Delight book, which has a wealth of little tricks like this.

First Multiply

First, let’s do some shifts to multiply. Each left shift is a power of two, so n<<1 is 2*n and n<<8 is 256*n. That’s easy. The hard part is decomposing it for non-powers of two: n<<3 + n<<1 is the same as n*10

If you are in assembly doing one shift at a time, you can often save a little time by combining the two shifts:

SHL A
MOV A,T ; Move A to temporary storage
SHL A
SHL A
ADD A, T ; Add T to A

That’s pretty efficient even on processors that can multiply.

Division Shifts Work but Composites Are Tricky

Division is the same way, but it doesn’t combine nicely. So n>>1 is n/2 and n>>8 is n/256. But there’s no easy way to combine divisions like you can multiplications.

The code we looked at looked like this:

unsigned divu10(unsigned n) {
  unsigned q, r;
  q = (n >> 1) + (n >> 2);
  q = q + (q >> 4);
  q = q + (q >> 8);
  q = q + (q >> 16);
  q = q >> 3;
  r = n - (((q << 2) + q) << 1); return q + (r > 9);
}

That’s a mouthful! But it does work. The secret to understanding this is to treat each shift as taking a fraction of the number. Look at the first working line:

q=(n>>1)+(n>>2)

This is really n/2 + n/4. If you remember your high school math, that’s the same as 3n/4. Of course, this is the same as multiplying by 0.75. If you look ahead to the last assignment of q, you’ll get a clue:q=q>>3;

That’s saying q = q/8. So if our goal is to divide by 10 it may be easier to think of it as multiplying by 0.1. To fit well with powers of two, we really want to think about multiply the whole thing by 0.8 and then divide by 8.

So adding one right shift to two right shifts gets us 0.75, which isn’t too far away from 0.8, but it isn’t quite there. The next line adds a little bit more to our 0.75 factor. How much more? The amount is 3n/64 and the total is now 51n/64. That equates to 0.797 or so. Getting closer to 0.8 already. Each term adds a little bit less and gets us a little bit closer. Here’s how it breaks down: The last term gets you only a little bit closer. The 13107n/16384 term is almost as close.

Expression	Base Value	Delta	Total Value	Ratio
(n>>1)+n(>>2)	3n/4	0	3n/4	0.75
q+(q>>4)	3n/4 + (3n/4)/16	3n/64	51n/64	0.7969
q+(q>>8)	51n/64+(51n/64)/256	51n/16384	13107n/16384	0.7999
q+(q>>16)	13107n/16384+(13107n/16384)/65536	13107n/1073741824	858993458n/1073741824	0.8

I couldn’t help but think that this would be pretty easy to implement in an FPGA.

Commenting the Code

Here’s the code with comments, which is a bit easier to follow:

unsigned divu10(unsigned n) {
  unsigned q, r;
  q = (n >> 1) + (n >> 2); // q=n/2+n/4 = 3n/4
  q = q + (q >> 4);        // q=3n/4+(3n/4)/16 = 3n/4+3n/64 = 51n/64
  q = q + (q >> 8);        // q=51n/64+(51n/64)/256 = 51n/64 + 51n/16384 = 13107n/16384 q = q + (q >> 16); // q= 13107n/16384+(13107n/16384)/65536=13107n/16348+13107n/1073741824=858993458n/1073741824
  // note: q is now roughly 0.8n
  q = q >> 3;              // q=n/8 = (about 0.1n or n/10)
  r = n - (((q << 2) + q) << 1); // rounding: r= n-2*(n/10*4+n/10)=n-2*5n/10=n-10n/10
  return q + (r > 9);      // adjust answer by error term
}

Once you break it down, it isn’t that hard to understand. This method dates back a while, and it appears the original source is the book Hacker’s Delight (see Figure 10-12 in the second edition). I couldn’t find the associated web site, but there is an archival copy. The only difference in that code is the last line is commented out and replaced with: return q+((r+6)>>4);

The book explains how to figure out the optimal shifts and adds and gives several other examples. There are also many other division tricks in that chapter. If you like this sort of thing, have a look at the bit twiddling hacks page. If your division dreams tend to floating point, you might find this interesting.

52 thoughts on “Binary Math Tricks: Shifting To Divide By Ten Ain’t Easy”

Pat says:

June 12, 2020 at 10:14 am

“Division is the same way, but it doesn’t combine nicely. So n>>1 is n/2 and n>>8 is n/256. But there’s no easy way to combine divisions like you can multiplications.

The code we looked at looked like this”

Or you can just subtract 10 from the passed number, returning if it carries, incrementing if it doesn’t and looping. Much *longer*, time-wise, but *significantly* more compact without a barrel-shifter. Sometimes execution time doesn’t matter. (Also can be faster, too, depending on what you expect the range to be). Always cracks me up when people forget that division’s repeated subtraction.

Report comment

Reply
1. X says:
  
  June 12, 2020 at 10:35 am
  
  “returning if it carries,” this is poor practice because unsigned integers don’t “carry”, the behavior is Undefined and you’ve created a really nasty bug that will bite you hard when you change compilers or platforms.
  
  Welcome once again to the poorly defined world of C arithmetic and why we should all give up on this stupid language.
  
  Report comment
  
  Reply
  1. Pat says:
    
    June 12, 2020 at 10:52 am
    
    “Welcome once again to the poorly defined world of C arithmetic ”
    
    (looks at what I wrote)
    Uh… exactly where was the C code in there?
    
    That’s an algorithm. If you can’t figure out how to implement it in your language, that’s not the algorithm’s problem. For reference, in C, it’s just while (1) { if (n < 10) return; n-= 10; q++; } – and if the compiler can't figure out that it can do subtract / return if C / increment / jump, the compiler's stupid, not the code.
    
    Report comment
    
    Reply
    1. Comedicles says:
      
      June 12, 2020 at 1:00 pm
      
      You use the C shifting operators, so a C assumption seems reasonable as far as the arithmetic rules, or lack of them.
      
      Report comment
      
      Reply
      1. Pat says:
        
        June 12, 2020 at 1:09 pm
        
        That wasn’t me, that was the quoted parent article.
        
        Report comment
      2. pelrun says:
        
        June 12, 2020 at 11:27 pm
        
        C didn’t invent shifting.
        
        Report comment
  2. Mike says:
    
    June 15, 2020 at 5:49 pm
    
    Stupid language? So what is better in your mind then Python perhaps? Dare to guess what Python was written in? I think you call it stupid because you don’t know how to use it.
    
    Report comment
    
    Reply
2. Cuthbert says:
  
  June 12, 2020 at 10:44 am
  
  Yes, that method can work just fine for some cases. You can also just use a general purpose division subroutine. Clearly this post is about needing it to be fast. One last thing pedantic point division isn’t repeated subtraction; division is best defined as a scaling operation.
  
  Report comment
  
  Reply
  1. Pat says:
    
    June 12, 2020 at 11:21 am
    
    General purpose subroutines offer some trade between space and speed. The routine here is a speed corner, the one I gave is a space corner.
    
    Report comment
    
    Reply
Artenz says:

June 12, 2020 at 10:38 am

Just do ‘ / 10’ and let the compiler figure it out.

Report comment

Reply
CityZen says:

June 12, 2020 at 10:38 am

In the commented code, you’re missing a new-line in line 5.

Report comment

Reply
tekkieneet says:

June 12, 2020 at 10:51 am

Some small UC such as the old 8051 has both MUL and DIV. They both take 4 cycles. Not enough to make me want to use it.

http://www.keil.com/support/man/docs/is51/is51_mul.htm

Report comment

Reply
1. Valentin Angelovski says:
  
  June 12, 2020 at 10:22 pm
  
  If you think 4 cycles is bad, try doing without them. Here’s an asm code snippet of an 8x8bit multiply algorithm for the 8051 using the shift-add method (no MUL opcode): https://archive.thebearsenal.com/2016/01/multiplication-by-shift-and-add-method.html
  
  A quick tally of the asm code looks to be (worst case) around 107 cycles, versus ‘only’ 4 cycles with the specific MUL opcode. So it did indeed help! :-)
  
  Report comment
  
  Reply
CityZen says:

June 12, 2020 at 10:51 am

In the code, q=quotient, r=remainder. The remainder is calculated as (input number) – 10 * q. If the quotient were exact, the remainder would always be < 10. However, since it is not exact, the remainder can be larger. If this happens, it means q was too small, so this is why it's adjusted by 1. Presumably, q is calculated with enough precision that r won't be bigger than 19, meaning that an adjustment of 1 wouldn't be enough.

Report comment

Reply
PaulG says:

June 12, 2020 at 11:10 am

Ah, bitshifting math in c, loverly.

Also, see “multiplying by magic numbers to achieve division” from Hackers Delight.

This one worked out for me:

// via https://cboard.cprogramming.com/c-programming/128545-division-ten-magic-numbers.html

div by 10 – works up to 99,999

int x = 150;

(x * 6554UL) >> 16;

https://github.com/cups/Binary-fun/blob/master/code/bitshift-division.c

Report comment

Reply
1. Sykobee says:
  
  June 18, 2020 at 5:59 am
  
  Divide by 10 in Z80 using ADD only, works on 8-bit numbers, uses Z80’s 16-bit capability and then uses the upper 8-bits as the answer
  
  ; H = E \ 10
  e_div_10:
  ld d,0
  ld h,d
  ld l,e
  add hl,hl
  add hl,de
  add hl,hl
  add hl,hl
  add hl,de
  add hl,hl
  
  it works, but it’s a play on numbers.
  
  Report comment
  
  Reply
  1. RW ver 0.0.1 says:
    
    June 18, 2020 at 6:40 am
    
    Similar would work on the 6809 I presume or anything else that has 2 registers that can be glommed into one 16 bit.
    
    Report comment
    
    Reply
Murray says:

June 12, 2020 at 12:31 pm

Easy, divide by almost anything…. Let’s say you’re working on a 16 bit number. Multiply it with (0x10000 / 10) and shift the result 16 bits to the right.
As long as your desired divided by amount is constant, it works pretty efficiently. The compiler takes the hit for you.

Report comment

Reply
1. fanoush says:
  
  June 13, 2020 at 4:22 am
  
  maybe the point was to use just shifting and +,-, so no multiplication
  
  Report comment
  
  Reply
  1. Murray says:
    
    June 13, 2020 at 5:24 am
    
    Its true, the solution above does assume a hardware multiply. A hardware multiply is pretty cheap in silicon, while a divide is not. Even the humble 8 bit atmega8 includes a hardware multiply, while the latest 32 bit cortex m0+ still excludes a divide. The above solution is clean, easily understandable, and extends to a division of all real number numbers. Algorithms designed round specific constants, inevitably results in different solution to different constants. It unnecessarily adds workload on programmers and debuggers. Again you’re right there is a need, but its very small.
    
    Report comment
    
    Reply
    1. Phillip Allison says:
      
      June 15, 2020 at 9:47 pm
      
      The 6502 and 65C02 and its variants had no multiply or divide.
      Neither did the Z80.
      While you may be spoiled by what you have now, for many of us in the 70’s and 80’s we had to make due with what we had…
      
      Division on the Z80 was easy.
      
      Div8: xor a ld b,16 LoopForDiv: add hl,hl rla cp d jp c,NxtBit sub d inc l NxtBit: djnz LoopForDiv ret
      
      Report comment
      
      Reply
  2. Artenz says:
    
    June 13, 2020 at 6:19 am
    
    Some compilers (e.g. gcc) will convert constant multiply to shift/add/subtract where it makes sense. In that case, it’s better to write it as a multiply, and increase portability of the code (or simply reduce size of the code with -Os option, so it will go back to using library call)
    
    Report comment
    
    Reply
    1. Pat says:
      
      June 13, 2020 at 10:30 am
      
      It’s better to write it as a multiply, compile, and look at the assembly to make sure. GCC can be a right git plenty of times.
      
      Report comment
      
      Reply
2. Murray says:
  
  June 13, 2020 at 10:06 am
  
  Where I most often use this type of technique is in ADC scaling. Let’s say you have a 10 bit ADC, and the ADC maxes out at 23V (this is the actual input, before the voltage divider). To calculate the amount of milli volts, without a divide, multiply the ADC result with 23000 and shift to right by the ADC precision (in this case 10bits). This makes for efficient, portable code that is easy to read, and requires no divide.
  
  Report comment
  
  Reply
None says:

June 12, 2020 at 2:24 pm

How many even decently modern CPUs would benefit from such hackery? I don’t think it makes you a good programmer to know many tricks of that kind. It actually held me back in the past worrying too much about performance, where it wasn’t an issue. Others take it into the opposite extreme, which is why too much software are real resource hogs now.

Most CPUs even in the 90s took just a few cycles for division. With pipelining it matters even less. Maybe this is useful for an FPGA or some very low performance µC. But in general it’s not too useful, but It’s a nice curiosity.

Report comment

Reply
1. thingabouttheplace says:
  
  June 12, 2020 at 5:27 pm
  
  Not so much modern cpu’s, I’m sure, but I actually could’ve totally used this article about 3 months ago when doing some embedded stuff, where there was no division available, but shifts were cheap. Ended up having to resort to pre-computed lookup tables, which, in the end, still might have been the better choice, but this would’ve been nice to at least benchmark.
  
  The amount of effort needed to hack around dividing by 10 was insane.
  
  Report comment
  
  Reply
2. behle says:
  
  June 14, 2020 at 2:49 pm
  
  It may matter a lot if you take into account huge data manipulations as with matrix calculus when performing simmilar operations on a large number of numbers.
  You won’t notice the time benefit on one operation but you will on several million operations.
  
  Report comment
  
  Reply
  1. Artenz says:
    
    June 15, 2020 at 12:22 pm
    
    If you’re doing that kind of stuff, it’s probably a better idea to find a CPU that has a fast divide
    
    Report comment
    
    Reply
jp314 says:

June 12, 2020 at 2:31 pm

“left shift three times to multiply by eight, adding that value to your original value shifted left once to multiply by two.”
no — that’s a total of 3+1 = 4 left shifts (although with a barrel shifter it’s only 2 shift operations) — instead left shift TWICE, add to the original and left shift once more. Fewer operations, same result.

Report comment

Reply
1. Pat says:
  
  June 13, 2020 at 10:23 am
  
  You mention but don’t explicitly say this only matters if you don’t have a barrel shifter. It also could actually be slower in some superscalar architectures because you’re introducing a data dependency.
  
  That being said if you care about hand optimizing execution speed that much, you probably don’t have a barrel shift or a superscalar architecture.
  
  Report comment
  
  Reply
Dave says:

June 12, 2020 at 6:04 pm

Z80 asm code snippet in the article for the win!
All other CPU’s suck eggs.

Report comment

Reply
1. Dave says:
  
  June 12, 2020 at 6:05 pm
  
  And Im wrong.
  
  What a dumb comment I made.
  
  Report comment
  
  Reply
nimish says:

June 12, 2020 at 11:24 pm

Better yet, use the fact you’re living mod 2^32 and multiply by the modular inverse – division by invariant integers via multiplication (and a shift or two)

Report comment

Reply
1. Bobby says:
  
  October 9, 2022 at 5:46 am
  
  Ahhh that’s a good one. I know for years that division works by storing rests somewhere in computers, yet I didn’t think about that.
  
  Report comment
  
  Reply
Robert Billing says:

June 13, 2020 at 1:53 am

How we used to do it on the PDP-11 back in the 70s was to have a table like this:

table: .word 40000, 20000, 10000, 8000, 4000, 2000, 1000, 800, 400, 200, 100, 80, 40, 20, 10, 8, 4, 2 1

Then for each word in the table compare it with the source value. If table < source subtract table from source and set a bit in the destination. The destination was then 20 bit BCD, add ASCII 0 to each nibble and you have a conversion.

Report comment

Reply
1. Artenz says:
  
  June 13, 2020 at 1:58 am
  
  On my first 6502 home computer, the BASIC ROM had a similar table, but only with powers of 10 (in 32 bit), and then did repeated subtraction for each value.
  
  Report comment
  
  Reply
2. nes says:
  
  June 13, 2020 at 6:03 pm
  
  Here’s an algorithm for BCD that doesn’t require a table: iterate over the input register, left shifting each MS bit into the bottom of the output register. After each shift, add 3 to any nibble in the output reg that is >=5.
  
  If you’re using a CPU that has BCD arithmetic or if your dealing with words > the ALU size, this would be a more efficient way of doing it.
  
  Report comment
  
  Reply
  1. nes says:
    
    June 13, 2020 at 6:59 pm
    
    A little PoC of the above, as we’re code golfing…
    
    #include
    int main(void)
    {
    int i;
    int a = 0xbc614e, b = 0;
    int x;
    
    printf(“0x%08x “, a);
    for (i = 1; i; i <> 3) | ((b >> 2) & ((b >> 1) | b))) & 0x11111111;
    x |= x << 1;
    b = (b + x) << 1;
    if (a & (1 << 31))
    b |= 1;
    a <<= 1;
    }
    
    printf("= %08x\n", b);
    return 0;
    }
    
    Report comment
    
    Reply
    1. nes says:
      
      June 13, 2020 at 7:17 pm
      
      Oh dear, html comments evidently not compatible with C shift operator. Here’s a link instead:
      https://tio.run/##TY/dasMwDIWv46c4tGxYrBtxO8ogP0@yG9tJNkPnlCwdhqbPnklNO@YbH32SDjr@@cP7eV6H6A@npkX5PTahf/mslQpxxJcNUf/0oSF1VpmQUCy/RYU8Ob83r@0GTqpbJxVKZceBZadXeXrI3xJWG1jiftcP0IGHTcFOCChL1gQ2zxJjrR3qGjvCdNdbwuNdG@GOSFCezO0V1@WpQmI7XEu5h1eekOiPhQ7aipcRtDNEDHlyqpa@XY5hefkXoIIEeI@cwEmCoR1PQ5Swl3n@BQ
      
      Report comment
      
      Reply
      1. RW ver 0.0.3 says:
        
        June 13, 2020 at 7:20 pm
        
        Dang, what does that site do, uuencode the content in the URLs?
        
        Report comment
derp says:

June 13, 2020 at 7:19 am

if you “left shift” too many times, windows will prompt you to turn on sticky keys…

Report comment

Reply
1. Mywk says:
  
  June 15, 2020 at 6:18 am
  
  Dad jokes.. why, just why..
  
  Report comment
  
  Reply
2. Eb says:
  
  June 15, 2020 at 8:20 am
  
  You, sir, made me chuckle! thanks for that :-D
  
  Report comment
  
  Reply
Alex Rossie says:

June 13, 2020 at 12:19 pm

Reminds me of the famous Q_rsqrt, the fast inverse square root that was used in Quake III Arena (but presumaly existed before then).

https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code

Cormacks comments (code is on the wikipedia) describe it best.

Report comment

Reply
snarkysparky says:

June 14, 2020 at 4:06 am

(410*N) >> 12 in unsigned 32 bit integers. If your N wont cause overflow.

It’s close enough for many things.

410 / 2^12 = 0.10009765625

Report comment

Reply
Stan says:

June 14, 2020 at 5:11 pm

This is from what is now extremely flakey memory. When doing my training at Control Data in London 40 odd years ago, one language we had to learn was ICL Plan (for ICL 1900). There was no native divide but there was a library routine. For our code we were not allowed to use the library but had to implement divide ourselves.

Report comment

Reply
Matt Southgate says:

June 15, 2020 at 3:27 am

Great article and I’m already a good way through the Hacker’s Delight. You have a typo in the example code comments line 4: 51n/644. It should be 51n/64. It initially confused me as if never come across the fractional approach to division.

Report comment

Reply
Adem Güneş says:

June 16, 2020 at 2:27 am

Hello,

Maybe the BLD format is what you want. (BLD: binary-look decimal)

For examle: let N = 19735.

Now convert it to BLD.

19635
04312 : 11011
02101 : 00110
01000 : 00101
00000 : 01000

So, the result is: 01000, 00101, 00110, 11011..

Note that hese are not binary numbers, these are regular decimal numbers, but only made up 1s and 0s.

Actually it looks like to BCD but totally different.

Backward: 1000 * 2^3 + 101 * 2^2 + 110 * 2 + 11011 = 8000 + 404 + 220 + 11011 = 19635

Advantage is you only operate with 1s and 0s.

Regards

PS: (C) Copyright by Adem GÜNEŞ

Report comment

Reply
Jean says:

June 16, 2020 at 9:31 am

How the article would have looked if taking the short path:
<>

Report comment

Reply
1. Jean says:
  
  June 16, 2020 at 9:36 am
  
  Dammit html sanitizer!
  …what should have been between the angular brackets:
  “Division by shifts is just multiplication by a fraction where denominator is a power of two, so you can use traditional composition (by addition) of left shifts in the numerator, and just a right shift for the denominator. The higher power of two you can afford to pick for the denominator, the closer you will generally get to exact result.”
  
  Report comment
  
  Reply
Kevin Nixon says:

June 20, 2020 at 9:27 am

Yes, this would be easy to implement in an FPGA, but you’re more apt to use one of the hard-wired multiplier cores, which probably take less power and fewer gates than this version, which would be implemented with fabric LUTs and carry chains. You’d just multiply by 1/10 with the multiplier.

Report comment

Reply
alanrcam says:

December 9, 2024 at 4:51 pm

Came here looking for code to use with decimal64. It’s a good concept, but needs to be extended if you want 16 digit resolution.
If you want to add 1 (1×10^0) and 0.1 (1×10^-1) the first step is to “normalise” 0.1 which means dividing by ten.

Report comment

Reply

Hackaday

Binary Math Tricks: Shifting To Divide By Ten Ain’t Easy

First Multiply

Division Shifts Work but Composites Are Tricky

Commenting the Code

52 thoughts on “Binary Math Tricks: Shifting To Divide By Ten Ain’t Easy”

Leave a Reply to DaveCancel reply

Search

Never miss a hack

If you missed it

Supercon 2024: Exploring The Ocean With Open Source Hardware

Porting COBOL Code And The Trouble With Ditching Domain Specific Languages

Jenny’s (Not Quite) Daily Drivers: Raspberry Pi 1

In 2025, The Philly Maker Faire Finds Its Groove

Which Browser Should I Use In 2025?

Our Columns

Hackaday Podcast Episode 317: Quantum Diamonds, Citizen Science, And Cobol To AI

This Week In Security: No More CVEs, 4chan, And Recall Returns

FLOSS Weekly Episode 829: This Machine Kills Vogons

Announcing The Hackaday Pet Hacks Contest

Keebin’ With Kristina: The One With John Lennon’s Typewriter

First Multiply

Division Shifts Work but Composites Are Tricky

Commenting the Code

52 thoughts on “Binary Math Tricks: Shifting To Divide By Ten Ain’t Easy”

Leave a Reply to DaveCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns