Compiling Four Billion If Statements

With modern tools, you have to try very hard to do something stupid, because the tools (rightly) recognize you’re doing something stupid. [Andreas Karlsson] can speak to that first hand as he tried to get four billion if statements to compile.

You may ask what state space requires four billion comparisons to evaluate? The answer is easy: the range of an unsigned 32-bit integer. The whole endeavor started with a simple idea: what if instead of evaluating whether an integer is even or odd with a modulo or bit mask, you just did an if statement for every case? Small ranges like 0-10 are trivial to write out by hand, but you reach for more automated solutions as you pass 8 bits and move towards 16. [Andreas] wrote some Python that outputs a valid C program with all the comparisons. For 16 bits, the source only clocks in at 130k lines with the executable less than 2 MB.

Of course, scaling to 32 bits is a very different problem. The source file balloons to 330 GB, and most compilers barf at that point. Undeterred, [Andreas] modified the Python to output x86_64 assembly instead of C. Of course, the executable format of Windows (PE) only allows executables up to 4 GB, so a helper program mapped the 40 GB generated executable and jumped into it.

What’s incredible about this whole journey is how performant the program is. Even large numbers complete in a few seconds. Considering that it has to thrash 40 GB of an executable through memory, we can’t help but shake our heads at how even terrible solutions can work. We love seeing someone turn a bad idea into an interesting one, like this desoldering setup.

44 thoughts on “Compiling Four Billion If Statements

  1. Older class AMD GPUs with DX10 class hardware can only execute loop with 32 bit counter and don’t have actual jump, only rigid loop and if/else constructs in shader instruction set.
    So when used for GPGPU they cannot truly implements infinite loop, so the best next thing the compiler does is 4 billion iteration loop. It completes in few second (when empty). So they can finish infinite loops in few seconds ;-)

    1. Isn’t this still the case for all GPUs? The shader units in a group can only execute the same instruction, one can’t branch while the others don’t. Groups are made up of 32-64 shader units or something around that number.
      If a single shader unit would be able to branch independently that would mean it needs its own code memory, instruction fetch, decode and program counter. Basically it would be a full processor. If all shaders in a group run the same code then only one code mem, fetch, decode, program counter unit is needed for all the shader units in that group. 16000 independent processors on a single die would eat up a lot more area.

      1. No, a wavefront on modern GPUs can run forever in infinite loop.
        It has been possible on nVidia G80 arch from the very beginning.
        AMD stopped implementing DirectX spec to the letter (including the rigid shader structure) starting with Southern Islands. Where basically scalar unit (which does flow control) can run any general purpose code.
        Infinite loops have nothing to do with how massively parallel GPU is or divergence between pixels, It is just about instruction set.

  2. An important note is that he obviously turned off optimizations even before resorting to machine code: I’m kinda curious as to the compiler behavior as n grows. Would it actually be able to recognize the modulo operation for the elements compared? Or at least convert it into a jump table of some sort?

    1. yeah! the sequential if() on a single variable can be translated to a switch table. the identical code for each of the two possible results can be collapsed into just two destination labels. the switch table branch code can take into account the bitwise pattern. it’s very possible for a compiler to reduce it to the proper if (number&1).

      one challenge is to actually chain them all together, to take care of all of the details that inevitably foul up the next step in the optimization. the other challenge is to implement it in a reasonably performant way. for example, searching for bitwise operations that will match a pattern of “case x” constants can easily be exhaustively slow, and isn’t rewarding at all very often. i don’t think i’ve ever seen a compiler that would recognize an even/odd pattern of case labels, though i’ve seen plenty of other neat bitwise tricks in generated code.

      if i was spending a lot of time on conditional optimization, i think there’s a lot more low hanging fruit converting … } else if (!strcmp(opt, “-flag”)) { … chains into a sort of lexer, maybe a series of nested switch tables. :)

  3. There must be something wrong with me.

    I immediately thought of how to reduce the average execution time, by a nested tree:

    if (number == 0x7ffffff) printf(“odd\n”);
    else { if (number < 0x7fffffff) {
    if (number == 0x3fffffff) printf(“odd\n”);
    else { if (number == 0x1fffffff) printf(“odd\n”);

    } else /* number > 0x7ffff */
    if (number = 0xc0000000) printf(“even\n”);
    else { if (number > 0xc0000000) printf(“even\n”);

    Maximum of 63 (I think) comparisons instead of maximum 4 billion.

    1. Nice, but I prefer the elegant recursive version:

      int iseven(int num) {
      if (num == 1) return 0;
      return ! iseven(num – 1);
      }

      For positive integers only, naturally; anything else is a luxury. With a q&d test with MSVC (too lazy to ssh to a Linux VM to try under GCC) compiling to a 64-bit Windows executable in debug mode, this can handle values as high as 16106 before overflowing the stack and crashing (which it does silently, in typical awesome Microsoft fashion). That should be large enough for anyone.

    1. There is, of course! You could write the program with a for loop instead, like this:

      void evenorodd() {
      int I;
      for (I = 0 ; I < INT_MAX ; I += 2) {
      if (number == I) {
      printf(“even\n”);
      return;
      }
      }
      printf(“odd\n”);
      }

      This is much shorter and efficient!

    1. There is, of course! You could write the program with a for loop instead, like this:

      void evenorodd() {
      int I;
      for (I = 0 ; I < INT_MAX ; I += 2) {
      if (number == I) {
      printf(“even\n”);
      return;
      }
      }
      printf(“odd\n”);
      }

      This is much shorter and efficient!

  4. Except they didn’t “compile” it.
    They wrote a script to generate intentionally bloated assembly.

    A modern compiler, if able to load the dumb source code of that huge file, would optimize it into something reasonable like a binary tree(or something more efficient. But even non coders know what a binary tree is even if they don’t know what it is called).

    This is the same kind of trickery as a compiler unrolling a loop into 40k if statements. Or the opposite, changing a variable and functions into a constant, because the output always ends up the same based on other code.

    The difference here is the coder intentionally generated bad assembly for… reasons? I guess just to see if they could get it to run?
    Perfectly good reason to do it, but not exactly useful.

    This is one of those “why did you climb that mountain?” Because it was there” reasons. Nothing gained but the experience.

    1. Come on, just upgrade your computer :-p. In the XXI century, 40 GB of RAM wasted on a silly program with 4 billion IFs and not a single else statement (or return, for that matter) is, or will soon become, the norm. Or – wait, there’s a solution: put it in the cloud and pay only when you use it.

  5. “What’s incredible about this whole journey is how performant the program is. Even large numbers complete in a few seconds.”

    One shouldn’t be too surprised. Windows seems to contain similar high quality code and it still runs reasonably well, doesn’t it?

  6. Sad to see 330GB source failling in GCC, try minifying it, and macro for printf, and shorter var name.

    PS. what happened to HD comment module? It doesn’t remember me and cannot use single login.

  7. Makes me nostalgic for the first text adventure I ever wrote in BASIC. I didn’t know about flow control or subroutines, so it was one enormous loop:

    10 INPUT I$
    20 IF I$ = “go north” AND POSITION = 1 AND HEALTH > 50 AND SWORD = 1 THEN PRINT “You enter the castle”

    10000 GOTO 10

  8. I cant believe Im not the only one working on scalable integer size approaches to solutions. I do algorithm checks for fun, and Ive started making sure my code can handle arbitrary sized integers. The research paper I based this idea on only covers 16, 32, 64bit cases, but Ive extended it to any integer with bitmasking.

  9. But for the last 10+ years we have 64 bit numbers and addressing (although that seems to be limited to 56 bits or so). So there is plenty of room left to heat the universe even more.

    Or maybe it was 20 years and it was only the raspberry foundation that kept the 32 bitters alive for another 10 years.

    Gosh, at the moment it would be quite limiting to only have a 32 bit (4GiBi) address space.

  10. Have I missed something? Or have I indeed missed since else do this exact same comment but…..
    Surely if you check the least significant bit of the number, that would instantly tell you whether it’s odd or even? (Ie. If bit 1 is zero, then it’s an even number, likewise, if bit 1 is odd, then it’s an odd number
    Or is this the usual method? And even this is slower than trailing through all those if statements?

    Feel free to call me a dumbarse if I have missed something 😂

    1. You missed this part:
      what if instead of evaluating whether an integer is even or odd with a modulo or bit mask, you just did an if statement for every case?
      There was no problem solving – just curiosity.

    2. Once I saw an algorithm to calculate the factorial of a number. But it had some twists in it.
      First, it was a recursive algorithm calling itself. This is common enough, as calculating factorials is often used as an example for recursive algorithms.

      Second, for calculating the factorial of a number, it did not multiply the factorial of the previous number with the last (next?) number, but it re-calculated the factorial of the previous number, and as any recursive algorithm, it had to recalculate the factorials of the numbers before that to get to that factorial. This resulted in an explosion of excursiveness (is that quadratic or exponential), but it did come with the right answer at the end. And it even had a useful application too.

      It was used in an example application for writing multi threaded applications, and in that example a “work thread” was needed that kept itself busy for a while and then reported back to the GUI thread when it had a result. And that factorial work thread sure kept itself busy!

  11. I recall some drama about an indie game that had it’s source doe leaked. and all the uninformed people were ragging on the amount of if statements blaming them instead of all the assets with bloated polygon counts for performance issues. Compilers are smart, it’s more about best practices and legibility than performance.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.