On February 25, 1991, during the eve of the of an Iraqi invasion of Saudi Arabia, a Scud missile fired from Iraqi positions hit a US Army barracks in Dhahran, Saudi Arabia. A defense was available – Patriot missiles had intercepted Iraqi Scuds earlier in the year, but not on this day.
The computer controlling the Patriot missile in Dhahran had been operating for over 100 hours when it was launched. The internal clock of this computer was multiplied by 1/10th, and then shoved into a 24-bit register. The binary representation of 1/10th is non-terminating, and after chopping this down to 24 bits, a small error was introduced. This error increased slightly every second, and after 100 hours, the system clock of the Patriot missile system was 0.34 seconds off.
A Scud missile travels at about 1,600 meters per second. In one third of a second, it travels half a kilometer, and well outside the “range gate” that the Patriot tracked. On February 25, 1991, a Patriot missile would fail to intercept a Scud launched at a US Army barracks, killing 28 and wounding 100 others. It was the first time a floating point error had killed a person, and it certainly won’t be the last.
Floating point numbers have been around for as long as digital computers; the Zuse Z1, the first binary, programmable mechanical computer used a 24-bit number representation that included a sign bit, a seven-bit exponent, and a sixteen-bit significand, the progenitor of floating point formats of today.
Despite such a long history, floating point numbers, or floats have problems. Small programming errors can creep in from a poor implementation. floats do not guarantee identical answers across different computers. Floats underflow, overflow, and have rounding errors. Floats substitute a nearby number for the mathematically correct answer. There is a better way to do floating point arithmetic. It’s called a unum, a universal number that is easier to use, more accurate, less demanding on memory and energy, and faster than IEEE standard floating point numbers.
The End of Error
The End of Error: Unum Computing by [John L. Gustafson] begins his case for a superset of floating point arithmetic with a simple number system of integers expressed with just five bits. The number 11111 represents +16, the number 01111 represents zero, and 00000 represents -15. Adding 10001 (+one) and 10010 (+two) equals 10011 (+three). This is simple binary arithmetic.
What happens when 10111 (+seven) and 11010 (+ten) are added together. The answer, obviously is seventeen. How is this represented in the five bit number format? If this data type were implemented in any computer, the answer would overflow, the answer would be 00000 (-fifteen), and while the answer would be wrong, nothing of significance would happen until that answer leaked out into the real world.
IEEE floats have exceptions and tests, but results calculated as floats will not always be the exact result. This is the problem of floating point arithmetic; the problem of giving exact results is too hard, so use an inexact result instead.
[Gustafson]’s solution to this problem is a superset of IEEE floats.
IEEE floating point numbers have just three parts – a sign, an exponent, and a fraction. While a 32-bit floating point number is capable of expressing numbers between 10^-38 and 10^38, with about seven decimals of accuracy.
Unums, on the other hand, are a superset of IEEE floating point numbers. They include three additional pieces of metadata: a ubit, the size of the exponent, and the size of the fraction.
The example [Gustafson] gives for the utility of the ubit is just a question: what is the value of π? It’s 3.141592653, and that’s good enough for any calculation. Pi continues, and we signify that by adding an ellipsis on the end – “3.141592653…” Now, what is the value of 7/2? That’s “3.5” – no ellipsis on the end. For some reason, humans have had the capability to designate that a number continues, or if it is exact for hundreds of years, and computers haven’t caught up yet. The ubit is the solution to this problem, and it’s just a single bit that denotes if the fraction is exact, or if there are more non-zero bits in the fraction.
Exponent Size Size and Fraction Size Size
The two additional pieces of metadata [Gustafson] adds to the floating point number format are the size of the exponent and the size of the fraction. Both of these are simply the number of bits required to store the maximum number of bits in the exponent or fraction, respectively. For example, if the value of the fraction is 110101011, the fraction size is nine bits, which is 1001 in binary. The fraction size size, therefore, is four bits long, the number of bits required to express a fraction size of nine.
By only storing what is needed in the exponent and fraction, the average size of each unum is decreased, even if in the worst case the unum is larger than a quad precision floating point. Accuracy is retained, and even larger numbers than what a quad can handle can be represented.
If dealing with numbers that vary the length of their data structure sounds ludicrous, think of it this way: we’re already dealing with at least four different floating point sizes, and conversions between the two can have disastrous consequences.
In the first launch of the Ariane 5 rocket, engineers reused the inertial platform of the earlier Ariane 4. The Ariane 5 was a much larger and much faster rocket, and higher acceleration caused a data conversion from a 64-bit float to a 16-bit integer to overflow. A unum would have prevented this.
There are considerable downsides to unums; since the Zuse Z1, continuing through the first math coprocessors of the 80s, and even today, chip designers have put floating point units in silicon. Your computer is more than capable of handling floating points, poorly or not.
Technologically, we’re at a local minima. Unums are an exceedingly better choice than floats for representing numbers, but it comes at a cost. Any implementation of unums eventually falls back onto software, and not 30 years of chip design that has followed the introduction of IEEE floating points. It may never be implemented in commercial hardware, but it would be something that solved a lot of problems.