Using Integer Addition To Approximate Float Multiplication

April 10, 2025

Once the domain of esoteric scientific and business computing, floating point calculations are now practically everywhere. From video games to large language models and kin, it would seem that a processor without floating point capabilities is pretty much a brick at this point. Yet the truth is that integer-based approximations can be good enough to hit the required accuracy. For example, approximating floating point multiplication with integer addition, as [Malte Skarupke] recently had a poke at based on an integer addition-only LLM approach suggested by [Hongyin Luo] and [Wei Sun].

As for the way this works, it does pretty much what it says on the tin: adding the two floating point inputs as integer values, followed by adjusting the exponent. This adjustment factor is what gets you close to the answer, but as the article and comments to it illustrate, there are plenty of issues and edge cases you have to concern yourself with. These include under- and overflow, but also specific floating point inputs.

Unlike in scientific calculations where even minor inaccuracies tend to propagate and cause much larger errors down the line, graphics and LLMs do not care that much about float point precision, so the ~7.5% accuracy of the integer approach is good enough. The question is whether it’s truly more efficient as the paper suggests, rather than a fallback as seen with e.g. integer-only audio decoders for platforms without an FPU.

Since one of the nice things about FP-focused vector processors like GPUs and derivatives (tensor, ‘neural’, etc.) is that they can churn through a lot of data quite efficiently, the benefits of shifting this to the ALU of a CPU and expecting (energy) improvements seem quite optimistic.

14 thoughts on “Using Integer Addition To Approximate Float Multiplication”

Gravis says:

April 10, 2025 at 8:04 pm

Neat math trick.

Report comment

Reply
FEW says:

April 10, 2025 at 8:05 pm

I recently learned about Q numbers (Qm.n and sQm.n) used for fixed precision computing, and the related fixed precision arithmetic that is used with Q numbers. Possibly more interesting and accurate than integer addition?

Report comment

Reply
1. LordNothing says:
  
  April 10, 2025 at 9:37 pm
  
  floating point is really just fixed point with extra steps. fortunately most of those extra steps (as well as the ones you already had to deal with) are automatically handled by the fpu. code long enough and you will break float. if you do fixed point, you know why.
  
  Report comment
  
  Reply
2. Pat says:
  
  April 11, 2025 at 7:20 am
  
  No, the key here is turning multiplication into addition. Multiplication is always a square complexity (just think about the way you do multi-digit multiplication on pen and paper, it’s obviously square), addition is linear. Using fixed point instead of floating point doesn’t change mults to addition.
  
  The reason this works fine with small floats is just that the only part of float multiplication that is actually multiplication is the mantissa (fraction) multiply, and small floats (especially the insane ones used in LLMs) have so few bits to the fraction, who cares. Like, if your fraction only has 3 bits (so only 0.125, 0.25, .. 0.75, 0.875) if you just ignore the multiply you can’t ever be that far off anyway.
  
  Really the whole thing pretty much tells you that LLMs basically only work in log space anyway, they don’t care that much about the actual values as much as log(value).
  
  The article (not the paper) also explains the floor-shift method for log2 approximation, which you can do in a microcontroller or FPGA easily. For 8 bits, think of it like this: start with an 8-bit value A, and an (initially zero) value B, plus an (initially zero) counter C. Shift A.B right – if A is zero, stop. Otherwise increment C and repeat. Now C.B is a Q8.8 approximation to log2. It works because it’s a piecewise-linear approximation between powers of 2 (which he shows in that plot, although it’s more obvious if your span is larger).
  
  Report comment
  
  Reply
  1. ZG says:
    
    April 11, 2025 at 4:50 pm
    
    Shudders in CS degree and numerical methods minor
    
    Report comment
    
    Reply
joseph says:

April 10, 2025 at 9:03 pm

This kind of math was what had to be done all the time in early to mid computer graphics. Converting floating point data in a way that can be processed on int only processing chains. It great to see new uses for old techniques.

Report comment

Reply
jpa says:

April 10, 2025 at 9:54 pm

I wonder if instead of trying to mimic floating point formats, they could just store logarithm of the values with some suitable base that gives enough resolution. Multiplications would become additions, but additions then requires approximation with log(x+y) = log(x) + log(1+y/x)

Report comment

Reply
1. sweethack says:
  
  April 11, 2025 at 4:55 am
  
  Well, since 0 is probably the mostly used number and computing an division is probably the hardest operation in an ALU, using log directly isn’t very useful. In a way, a FP number is a log number since it’s storing the exponent in one part (which is the log of the number) and the mantissa in another part (which is the usual number we are dealing with), but it’s done smartly and, better, it’s standard.
  
  Report comment
  
  Reply
  1. Pat says:
    
    April 11, 2025 at 8:08 am
    
    “Well, since 0 is probably the mostly used number”
    
    Zero has a special case representation in float anyway, so if you were trying to make hardware that handles stored logs it wouldn’t be any harder. Special-casing a mult or add with zero isn’t exactly a challenge.
    
    And adding really wouldn’t require a division, really you just take the larger of the two numbers and adjust based on the integer part of the second, which is the equivalent of a small float anyway (e.g. a+b = a if b/a is less than the resolution of your mantissa).
    
    ” but it’s done smartly and, better, it’s standard.”
    
    (glances at the huge number of new float types)
    are you sure
    
    Report comment
    
    Reply
2. SETH says:
  
  April 11, 2025 at 5:42 am
  
  This was the very same idea Yamaha used in their DX7 synths oscillators.
  
  Report comment
  
  Reply
ziggurat29 says:

April 11, 2025 at 5:30 am

“…the benefits of shifting this to the ALU of a CPU…”
I suspect that it’s less about using a CPU than it is for using few gates and less energy.
Notionally you could produce an NPU much more cheaply and/or larger, and maybe we won’t need to have to fire up 3 Mile Island to feed the data center, either.

Report comment

Reply
d0rk says:

April 11, 2025 at 4:07 pm

i don’t know enough about how the magic of floating point multiplication works out in a cpu.

Multiplication is just recursive addition, a single bitshift can multiply/divide by two. but if integer is still faster than floats you could multiply by 10 until the decimal is gone and then do the operation as an integer, then divide by how ever many iterations of 10 you multiplied by getting rid of the decimal.

with division being “destructive” you could potentially wind up splitting a number and wind up in a with recursion and you’d have to set a precision floor to trim to break out of the loop.

if multiplying out your decimals pushes your number out of 32 bit or 64 bit space, you could always concatenate it across memory addresses or multiple registers in the cpu and chunk through the numbers long division style swapping out the significant bits in and out as you go.

theres probably some specialized formula that could blow all my theorizing away, or a magic lookup table of hacks thats implemented in hardware.

but you could totally get it narrowed down to nX instructions per float operation that scales linear based on how massive your input numbers are.

Report comment

Reply
d0rk says:

April 11, 2025 at 4:23 pm

of course multiplying a float by 10 is still a floating point operation =(

i’m pretty sure there’s a low or high bit that stores the location of the the decimal and that would allow you to iterate or mask out the decimal. but that part might be where the magic of higher math comes in.

Report comment

Reply
1. ZG says:
  
  April 11, 2025 at 4:51 pm
  
  The mantissa will speak with you now
  
  Report comment
  
  Reply

Hackaday

Using Integer Addition To Approximate Float Multiplication

14 thoughts on “Using Integer Addition To Approximate Float Multiplication”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Smartphone Hackability, Or, A Pocket Computer That Isn’t

VRML And The Dream Of Bringing 3D To The World Wide Web

Australia’s Space Program Finally Gets Off The Pad, But Only Barely

What Happens When Lightning Strikes A Plane?

Happy Birthday 6502

Our Columns

Neon Bulbs? They’re A Gas!

Hackaday Links: August 10, 2025

A Love Letter To Prototype Zero

Hackaday Podcast Episode 332: 5 Axes Are Better Than 3, Hacking Your Behavior, And The Man Who Made Models

This Week In Security: Perplexity V Cloudflare, GreedyBear, And HashiCorp

14 thoughts on “Using Integer Addition To Approximate Float Multiplication”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns