Binary division when your processor lacks hardware division

[Hamster] wanted to take a look at division operations when the chip you’re using doesn’t have a divide instruction. He makes the point that the divide instruction takes a lot of space on the die, and that’s why it’s sometimes excluded from a chip’s instruction set. For instance, he tells us the ARM processor used on the Raspberry Pi doesn’t have a divide instruction.

Without hardware division you’re left to implement a binary division algorithm. Eventually [Hamster] plans to do this in an FPGA, but started researching the project by comparing division algorithms in C on an AMD processor.

His test uses all 16-bit possibilities for dividend and divisor. He was shocked to find that binary division doesn’t take much longer than using the hardware instruction for the same tests. A bit of poking around in his code and he manages to beat the AMD hardware divide instruciton by 175%. When testing with an Intel chip the hardware beats his code by about 62%.

He’s got some theories on why he’s seeing these performance differences which we’ll let you check out on your own.

Follow

Get every new post delivered to your Inbox.

Join 97,706 other followers