Someone Set Us Up The Compiler Bomb

Despite the general public’s hijacking of the word “hacker,” we don’t advocate doing disruptive things. However, studying code exploits can often be useful both as an academic exercise and to understand what kind of things your systems might experience in the wild. [Code Explainer] takes apart a compiler bomb in a recent blog post.

If you haven’t heard of a compiler bomb, perhaps you’ve heard of a zip bomb. This is a small zip file that “explodes” into a very large file. A compiler bomb is a small piece of C code that will blow up a compiler — in this case, specifically, gcc. [Code Explainer] didn’t create the bomb though, that credit goes to [Digital Trauma].

We aren’t sure what practical use this would have, but it did illustrate a few interesting points. First, the code itself is simple and probably surprises you that it would even work:

main[-1u]={1};

The linker apparently doesn’t care that main is actually a function. You could argue that it doesn’t work because it blows up the compiler, but if you restrict the size to fit in your available memory, it will create an executable. Of course, that executable won’t actually run, but still.

To us, this seems less of an exploit as a strange bug in the linker. The reason it works is that the compiler runs out of memory during the link phase. We didn’t try it, but we wondered if –no-keep-memory or some other option could help. After all, you could imagine the linker being smart enough to initialize the static array in a “streaming” way and not trying to build it all in memory. It just doesn’t.

If you are more interested in making gcc produce smaller executables, we don’t blame you.

43 thoughts on “Someone Set Us Up The Compiler Bomb

  1. For a more interesting program, see
    https://en.wikibooks.org/wiki/C%2B%2B_Programming/Templates/Template_Meta-Programming#History_of_TMP

    It causes the *compiler* to compute prime numbers *during compilation* in the form of an error message. Naturally it does not actually finish compiling.

    Most interestingly, the C++ template designers at first refused to believe that their creation was Turing-complete. This program rubbed their faces in it, and demonstrated they did not understand their creation. Now is something is so complex that the creators can’t understand it, what chance to mere mortals have?

    If C++ is the answer, you need to change the question.

    1. Modern C++ (fourteen) is actually a really different beast – you have to treat it differently and there’s a good language in there. It shares a room with “old” C++ but you can work around it. If it were taught “right”, from fresh, I think a lot of the kickback against C++ would disappear….

      1. Ah, an optimist; you must be young.

        Tools that require workarounds are waving a big red flag warning that Something Is Fundamentally Wrong.

        Workarounds won’t be used. If they can, people will tend to workaround using a tool properly. Fools are damn ingenious.

        1. If you seriously believe other languages don’t have debilitation quirks all over you haven’t been much around. It’s just a matter of choosing you poison, and I preffer the chainsaw and machete as I can accomplish just about any task with it, really fast and very efficient (though there might be some collateral damage). Tomy toys and padded screwdrivers are for wussies.

          1. Actually it neither fast nor efficient to get most working programs.

            As someone that has used C professionally since 1982 (and C++ since 1987), I was gobsmacked at how soon and how many high quality plug-and-play libraries became available for Java within 9 months of its release. In contrast C++ had dismally failed to manage that in 10 years of its release.

      2. I have been using C++ for years (started back when Turbo C++ for dos was a thing) and am currently reading through the latest edition of “The C++ Programming Language” (which covers up to C++11) and I can see just how much better modern C++ actually is.

          1. As they say “you can write FORTRAN in any language”. No matter how much of an improvement C++14 is, most developer swill still write in earlier versions of the language.

            Then there’s all the existing code that simply can’t be changed. Whether new code can use ’14s features when linked with “old” libraries is unclear – and is probably undecidable.

            Then there’s the issue of how long it takes compilers to support the *full* language, as opposed to compiler-specific subsets. With C/C++99 I remember the triumphant announcement of the first full compiler – SIX YEARS LATER. That was another indication that the language was part of the problem rather than part of the solution.

          2. @[Tom G], I was not advocating for C++, but making a joke, which I blew because most of the good stuff (e.g. type inference, lambdas) arrived in C++11. In my defense, my recent C++ work has been on a project that is stranded on a compiler release from 2003.

            As Stroustrup himself has said, “Within C++, there is a much smaller and cleaner language struggling to get out.”

          3. JDX: you illustrate a common problem (existing 2003 codebase) that makes potential benefits somewhat moot!

            I wasn’t aware of Stroustrup’s comment, but it is certainly true. Tony Hoare’s famous comment (see my earlier reply) is valid and relevant.

            Fortunately new /simple/ languages are appearing. Hopefully over the decades C++ will become the new COBOL.

    2. C++ is a tool and if you don’t like it, that’s ok. However, there’s nothing mystical about metaprogramming because it’s been around forever. But if you are all about C, then I got some bad news about the preprocessor: it too is Turing-complete. Lots of things are Turing-complete, even Conway’s Game of Life.

      Honestly, just get over yourself.

      1. You’ve missed the point. C++ is so overly complex that even its designers refused to believe how complex it had become – that’s a sign that it is out of control of even the *experts*, let alone mere mortals.

        As C.A.R. Hoare famously put it, “There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”
        https://en.wikiquote.org/wiki/C._A._R._Hoare

          1. aaah we’re producing linker input, are we? I only was *compiling* the code. Yeah, if you ask the compiler to produce the representation of `main` that would get linked, you simply run out of disk space eventually; to explain:

            The compiler output looks like

            main:
              .long 1
              .zero 17179869176
            

            as the compiler runs fine. The assembler, however, tries to convert that into an object that actually contains that many zeros, to hand over to the linker. Fun time for the whole family!

            test.c:1:1: warning: data definition has no type or storage class
             main[-1u]={1};
             ^~~~
            test.c:1:1: warning: type defaults to ‘int’ in declaration of ‘main’ [-Wimplicit-int]
            test.c:1:1: warning: ‘main’ is usually a function [-Wmain]
            /tmp/ccrzGbyZ.s: Assembler messages:
            /tmp/ccrzGbyZ.s: Fatal error: cannot fill 256 bytes in section .data of /tmp/ccbpdxDa.o because: 'No space left on device'
            
          2. If you use the main[0xFFFFFFFF] version then I get this (after a few minutes)

            /usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `deregister_tm_clones’:
            crtstuff.c:(.text+0x1): relocation truncated to fit: R_X86_64_32 against symbol `__TMC_END__’ defined in .data section in a.out
            crtstuff.c:(.text+0x8): relocation truncated to fit: R_X86_64_32S against `.tm_clone_table’
            crtstuff.c:(.text+0x21): relocation truncated to fit: R_X86_64_32 against `.tm_clone_table’
            /usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `register_tm_clones’:
            crtstuff.c:(.text+0x41): relocation truncated to fit: R_X86_64_32 against symbol `__TMC_END__’ defined in .data section in a.out
            crtstuff.c:(.text+0x49): relocation truncated to fit: R_X86_64_32S against `.tm_clone_table’
            crtstuff.c:(.text+0x6f): relocation truncated to fit: R_X86_64_32 against `.tm_clone_table’
            /usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `__do_global_dtors_aux’:
            crtstuff.c:(.text+0x82): relocation truncated to fit: R_X86_64_PC32 against `.bss’
            crtstuff.c:(.text+0x95): relocation truncated to fit: R_X86_64_PC32 against `.bss’

  2. This article is using the terms “compiler” and “linker” interchangably. Which one, exactly, does it break?

    Makes perfect sense to me how it would break a linker, but can someone explain to me why the static initializer is needed?

    1. It clearly says the linker is what blows up. If you notice, about 99% of gcc builds out there use gcc as the driver for everything. So you call gcc with a bunch of .o files and it is the “compiler program” that calls ld, the linker.

      Without doing my homework, I think the static init causes the compiler to emit an initialized data segment for the linker. If it were not initialized, I suspect it would go in BSS and the elaboration phase just zeros BSS. So that would then fail at run time linking, not at the link phase. But I’m speculating — I didn’t go do the work to test that.

  3. Why is this worth an article?
    You tell the compiler to emit a huge initialized array that won’t fit into your RAM and the linker dies when it tries to load it (or the assembler when it tries to write the object file). Congratulations, who would have thought that this is possible…
    Btw., the name of the array doesn’t matter.

  4. The QNX 4.1 C compiler had a bug where a null program, An empty file would generate a program that crashed the OS when executed.
    This is not a bug or a bomb.

  5. i was disappointed that this headline adorns an article that is just “hey allocate a large global array huhhuhuhuh”. also, i haven’t found a compiler bold enough to try yet. even the under-tested niche compiler at my day job says the array is too big for the addressing space.

  6. Mr. Williams, you seriously need to up your game. There are too many foggy misunderstandings in this article. The compiler and linker are two different things for example. And main is just a function by golly, there should be no surprise there, although it does get some special treatment in the usual way things are set up.
    Fundamentally though, this boils down to “tell your computer to do something stupid, and by golly it will”. Like being surprised that compiling for (;;) ; will run for ever (or get optimized away). Calling this a “bomb” borders on yellow journalism.

  7. The deceiving aspect in this silly example is the name main. Here it is just an array, and the name “dog” or “cat” could be chosen just as well. And there is no great surprise that when you declare and initialize an array of immense size that some kind of error results. This whole business gets less interesting the more I look at it.

    And Williams write up only gets worse as he suggests that the linker should set this up in some “streaming way”. Note that the (well written) original article mentions that the array is 17179869180 bytes in size. You can do the math and probably conclude that your machine doesn’t have enough ram to handle this, or swap space if you have some foggy notion of getting virtual memory to save the day.

    So the bottom line lesson in this? Computers have limits, resources aren’t infinite.

    1. Dude, what has you so torqued up on this? Maybe you should ask HAD to refund your subscription price. Oh yeah. Or maybe you ought to point out to your article. Oh yeah.

      I read a lot of HAD stuff. Some of it is great. Some of it is not to my liking for whatever reason. You know what I do? I just go look at something else. It really is that easy. I agree that the best artciles are the ones where they are talking about stuff they did themselves but I don’t go complaining every time I don’t like some artcile.

      I am always curious about people like you. Would you go up to these people in person and tell them to >up their game<? If you came to my work and did that I would tell you up yours. I can never figure out if the internet just brings out jerks or brings out the jerk in people but either way, dude. Click on another story and live your life.

      1. Anonymity provides a sense of personal security for those that are insecure.

        This is something I started to notice decades ago when “the Internet” started to become available to the PUBLIC. People hiding behind the anonymous nature of the Internet and saying things they would not normally say to and/or about others.

        Lately, I’ve been running into the same thing with people I meet on a day-to-day basis. I randomly meet someone, we have a great conversation and positive interaction, might even swap phone numbers. Next thing I know, I get feedback from someone else that that “random encounter person” is now back-stabbing me with lies about me. That “random encounter person” is too cowardly to face me and tell me the same thing (or denies they did) but has no problems going out of their way to character assassination me behind my back. That includes some family members as well! WTF is wrong with people these days? Is it all really just lack of self-image, self-confidence, self-esteem, insecurities —> over-zealous EGO compensation <— ?? Its like some global collective-subconscious energy demon has been programmed into people at the subconscious level, taken them over (bypasses their rational conscious thinking) and causes them to act in malicious ways against others. Its craziness!

        Peace and blessings.

        1. It’s because our ancestors clearly were baboons!!!!

          OK, joking apart…
          Sometimes it is also those whom claim that someone is backstabbing you with lies that are lying… so who do you believe anymore?

          I’ve heard stories… i.e. including about someone claiming an abusive relationship only for their ex to go to depression and disappear with their ex-friends hating on them only for the “abuse-victim” to have a recently ex-ed partner for “violence”… however it tuned out the “victim” was crying wolf! This was obvious because everyone knew the “victim”s last ex (especially the last ex’s old partner) would never harm anyone (unless attacked, life threateningly… obviously).
          So one person got cast out into suicidal depression and another almost had the same fate due to jealousy, deceit, backstabbing, etc…

          Personally I can’t be asked with such drama… this seems to predominately affect the anglo-sphereic populous.
          Theresa May!!! Kick me out of your country and forget my long genetically-British history, It has no value!!!

          Peace to the remaining good people :)

          1. Not sure what your comment has to do with my observations. Although you seem to be specifically targeting me and my comment BUT no harm done nor taken. If you are implying; I’m not claiming and never have claimed “victim” as that’s purely an individual and personal perspective and I don’t perceive my self as a “victim”, I’m only observing and verbalizing my observations.

            Yes, much DRAMA going on with folks these days. Not sure if this is a “world-wide” affliction (as it appears to be) as I have yet to travel the world and directly, personally observed other culture’s general behavior. Still, one must look at the psychological aspects of the behavior being exhibited.

            Fortunately. I’ve not seen this phenomenon with any more that about 25% of the folks I randomly meet but it does concern me since I’ve noticed an increase in the past few years.

            Peace and blessings.

          2. BTW: I’ve never stated nor implied that these people are “bad” because everyone has facets of “good” mixed in with the facets of “not so good”. I’m only observing that their anonymous malicious behavior is well, “not so good”. The “soul” expressing into the physical world through the “personality” or “life” filters as some call it. :)

            Just look at many of the HaD comments on various projects and articles over the years and you will find many “personal attacks” on the HaD author, designer, other commentators, etc., which goes back to my original observation that folks tend to “safely hide” behind the anonymity of the Internet.

            Best to be able to observe without emotional association to the specific “person” and/or following behavioral event. :)

            Peace and blessings.

  8. Quote:
    “What is surprising is that main does not have to be a function. You can throw just about anything called main at the compiler and it will produce a valid executable — but don’t expect much more than a segmentation fault when you run it.”

    Strangely My program example as below debunks the statement mostly, it actually compiled, ran and worked normally until it completed… then the program seg-faults upon exit!!! I got a load more than a mere segfault… lol.

    //Begin rog.cpp

    //formatting maybe messed up by htmlification of this code!
    //HTML tag issue on wordpress pages replace curly with usual include brackets!!!
    #include {stdio.h}

    class PROGRAM
    {
    public:
    PROGRAM()
    {

    //int myfavvar[255];
    printf(“\nInsert a word here:”);
    gets(myfavvar);
    printf(“%s”,myfavvar);
    printf(“\nAnother?:”);
    gets(myfavvar);
    printf(“%s”,myfavvar);
    printf(“\nThis Actually Worked!\n”);
    }

    ~PROGRAM()
    {
    //we’ll never make it here!! SegFaults trying to find main!
    }

    private:
    char myfavvar[255];
    };

    PROGRAM main;

  9. I haven’t tried your code, but I suspect you are running into C++ name decoration so the symbol main in your code won’t be main in your object file. It will be __main@12345 or something like that. So at link time, the linker is looking for _main and can’t find it. There is probably a weak main() in one of the standard libraries so you pick that up instead. So you do elaboration which should include the constructor, then do nothing, then do the destructor. I am not sure why you got a segfault at all with this unless printf and friends (which includes malloc) isn’t initialized during elaboration. I’m not sure about that.

    So if you want to be pedantic, you can throw anything at the linker that generates the same symbol as main() does and it doesn’t care if it is a function. You didn’t do that. You only appeared to do that at your layer of abstraction.

    1. It segfaults for the same reason as why the quoted person’s program segfaults.
      I made a type (class) of PROGRAM and declared the object main as an object of type PROGRAM.
      This is the same(ish) as declaring an object main of type int.
      The rule exploited here is “Global variables are initialized first”, therefore Program main; is initialized first with constructor PROGRAM().
      Because the variable (object) is declared and global, the object/var stays constructed… thus a call to main is made… There is no int main() to point to and the only other main is at another location (PROGRAM main;) where the contents are in it’s own context (object) and thus boundary (segmentation).

      Int A=1;
      int B;
      wont make:
      int B == 1;
      or we obviously have bad hardware (segmentation fault).
      I’ll dump another code piece shortly after this so you can see when main is called during global variable initialization…

      1. //Begin obfiscate.cpp

        //formatting maybe messed up by htmlification of this code!
        //HTML tag issue on wordpress pages replace curly with usual include brackets!!!
        #include {stdio.h}

        class PROGRAM
        {
        public:
        PROGRAM()
        {

        //int myfavvar[255];
        printf(“\nConstructed object of PROGRAM”);
        }

        ~PROGRAM()
        {
        printf(“\nDestructed object of PROGRAM\n”);
        }

        };

        PROGRAM program;

        int main()
        {
        printf(“\nIn main!”)

        //EOF
        .
        .
        .
        .
        .
        Output:

        Constructed object of PROGRAM
        In main!
        Destructed object of PROGRAM

        Try this and the other example to see ;-)

  10. This code is invalid in both C and C++. Any chance of “compilability” would be based on language extensions implemented by the compiler. These extensions have nothing to do with C or C++. So, even if you somehow managed to blow up the compiler with this, it still has nothing to do with C++. I’m pretty sure the compiler would’ve rejected it outright if you properly configured it to enforce the rules of C++ language.

Leave a Reply to MarcusCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.