[Hamster] wanted to take a look at division operations when the chip you’re using doesn’t have a divide instruction. He makes the point that the divide instruction takes a lot of space on the die, and that’s why it’s sometimes excluded from a chip’s instruction set. For instance, he tells us the ARM processor used on the Raspberry Pi doesn’t have a divide instruction.
Without hardware division you’re left to implement a binary division algorithm. Eventually [Hamster] plans to do this in an FPGA, but started researching the project by comparing division algorithms in C on an AMD processor.
His test uses all 16-bit possibilities for dividend and divisor. He was shocked to find that binary division doesn’t take much longer than using the hardware instruction for the same tests. A bit of poking around in his code and he manages to beat the AMD hardware divide instruciton by 175%. When testing with an Intel chip the hardware beats his code by about 62%.
He’s got some theories on why he’s seeing these performance differences which we’ll let you check out on your own.