Hash Functions With The Golden Ratio

In the realm of computer science, it’s hard to go too far without encountering hashing or hash functions. The concept appears throughout security, from encryption to password storage to crypto, and more generally whenever large or complex data must be efficiently mapped to a smaller, fixed-size set. Hashing makes the process of looking for data much faster for a computer than performing a search and can be incredibly powerful when mastered. [Malte] did some investigation into hash functions and seems to have found a method called Fibonacci hashing that not only seems to have been largely forgotten but which speeds up this lookup process even further.

In a typical hashing operation, the data is transformed in some way, with part of this new value used to store it in a specific location. That second step is often done with an integer modulo function. But the problem with any hashing operation is that two different pieces of data end up with the same value after the modulo operation is performed, resulting in these two different pieces of data being placed at the same point. The Fibonacci hash, on the other hand, uses the golden ratio rather than the modulo function to map the final location of the data, resulting in many fewer instances of collisions like these while also being much faster. It also appears to do a better job of using the smaller fixed-size set more evenly as a consequence of being based around Fibonacci numbers, just as long as the input data doesn’t have a large number of Fibonacci numbers themselves.

Going through the math that [Malte] goes over in his paper shows that, at least as far as performing the mapping part of a hash function, the Fibonacci hash performs much better than integer modulo. Some of the comments mention that it’s a specific type of a more general method called multiplicative hashing. For those using hash functions in their code it might be worth taking a look at either way, and [Malte] admits to not knowing everything about this branch of computer science as well but still goes into an incredible amount of depth about this specific method. If you’re more of a newcomer to this topic, take a look at this person who put an enormous bounty on a bitcoin wallet which shows why reverse-hashing is so hard.

9 thoughts on “Hash Functions With The Golden Ratio

  1. “…just as long as the input data doesn’t have a large number of Fibonacci numbers themselves…”

    Given that the Fibonacci Sequence–meaning Fibonacci numbers, of course–appears so regularly and is so pervasive in the physical–and mathematical–world, this could prove highly problematic.

  2. “Fibonacci hashing solves both of these problems. 1. It’s really fast.”

    Exactly a feature I do NOT want in a (cryptographic) hash function. I want slow and secure. Without a doubt this has still very interesting uses as the author implied. E.g. maybe searching distributed hash tables.

    1. The linked article kinda explains this – this is a good “hash” function for mapping a large set to a small set, but not a cryptographic hash, which is often what we mean. This confusion in terms is probably why it’s been ignored.

    2. No, for a cryptographic hash fonction, you need it to be fast and secure. Not slow. Because hashing is omnipresent in signature and integrity control, if they are slow, they risk to slow your entire system, specially if they are used at low level.
      If you need slowness in hashing, e.g. for password hashing, then use deficated algorithms such as Argon2 or even bcrypt.

      1. I don’t think you want a password hash to be slow (at least not algorithmically slow). Fundamentally the entire time you’re computing a hash, it’s a security vulnerability to some degree. If you intentionally slow it down by randomly adding waits to defeat like, glitching/power consumption attacks that’s one thing, but if there are a ton of steps I’m not sure that doesn’t make it less secure fundamentally.

        1. You need a slow hashing function (and seeds) for passwords in order to mitigate against generation of rainbow tables/ brute force matching against a stolen password database.

  3. As someone for whom this is terra icognita I appreciate the long and detailed by local standards introduction to the topic of hashing. Now I can be slightly more informed when I read my next introduction. :)

  4. This has been bugging me since I read it and I guess I have to say something.

    The reason this isn’t used is not that people don’t know about it. I assure you the people working on our standard runtimes know about all the reasonable options. I would assume the reasoning they use is that most consumers of hash tables don’t have lookup performance quite so much in their hot path, and there’s risk associated with this method, so it’s a bad tradeoff. You need to know something about the data to apply it safely. If you just blindly do a multiply-and, then you get problems — this is mentioned in what I think is an overly cursory way in the article when they say “in order to use this you have to be certain that all the important information is in the lower bits”.

    If you aren’t careful about that you get a critical failure where everything hashes to the same bin! Imagine you didn’t, and you have a bunch of numbers that are multiples of the hash table size — they will always map to bin 0, because the multiplication only mixes leftwards, and then because the table is a power-of-2 size the integer truncation will not help you at all.

    I knew about this trick already, I use it when I’m implementing my own hash tables, but also when I do that I’m explicitly pairing the hash algorithm to the data I expect to see. J Random Hacker that isn’t a data structures expert doesn’t need that headache, and can afford a couple extra nanoseconds for the integer modulus.

    Realistically — if the performance of your hash table is that critical in your algorithm, fine, survey the landscape of different hash table implementations. Don’t just use the standard library. Consider building your own. I don’t think it’s right to suggest the standard library implementers were naive, though. They have priorities besides just making that one step in the lookup super fast!

    As far as I know, there’s nothing particularly special about the golden ratio when you’re doing this kind of hashing, you just want something relatively prime (i.e. 2 should not be a factor) with a reasonable number of bits set so you get some mixing. That “spread out” pattern will appear whenever you do this repeated multiply-modulo with relatively prime numbers. I suspect there is a bit of myth going on here, because I don’t see any real justification of the properties (and what I do see doesn’t make sense).

    I don’t have Knuth on my shelf (I know, for shame) but I have to imagine the “Something related to CRC hashes” is a reference to Galois field arithmetic, which is used in CRCs, and actually this is really significant, because more recent CPUs have GF multiplication instructions that could actually give the benefit that this article is looking for.

    Anyhoo. Off my soapbox. Be careful out there folks.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.