The Linux X86 Journey To Main()

Have you ever had a program crash before your main function executes? it is rare, but it can happen. When it does, you need to understand what happens behind the scenes between the time the operating system starts your program and your first line of code in main executes. Luckily [Patrick Horgan] has a tutorial about the subject that’s very detailed. It doesn’t cover statically linked libraries but, as he points out, if you understand what he does cover, that’s easy to figure out on your own.

The operating system, it turns out, knows nothing about main. It does, however, know about a symbol called _start. Your runtime library provides this. That code contains some stack manipulation and eventually calls __libc_start_main which is also provided by the library.

From there, you wind up with some trickery to manage the program’s environment and more library calls such as __libc_init_first and __libc_init do some more setup work. You’d think that would get you close, but there’s plenty more to do including setting up for at_exit and thunking for position-independent code, not to mention dynamically linked libraries.

This is one of those topics it will seem like you don’t really need until you do. Even if you use another language to generate executables, they all have to follow these steps somewhere. Granted, for many languages the startup is static and unlikely to require you to debug it, but it is still good to know what’s going on under the hood.

If you want a quick Linux assembly tutorial, have at it. If you prefer to shovel your assembly into a C source code file, you can do that, too.

7 thoughts on “The Linux X86 Journey To Main()

  1. The info is gcc/glibc specific if you are using a different compiler/runtime library the mechanics are not the same. Much of this is dictated by the glibc ABI.

    The ELF loader is a piece of code that gets executed first in your process PID – for dynamically linked ELF binaries. It is responsible for loading all dependent “shared objects” and mapping them into memory and also resolving linker relocations – aka adapting your code the the virtual address it was loaded at. The loader also calls an array of “initializer” functions specified in each binary and finally it passes control to the main binary “entry point”. The loader binary is specified in the ELF file itself. Typically something like /lib/ld-linux.so.2 for 32-bit binaries.

    What is describe is the mechanics of the code at the entry point of a binary compiled with gcc and glibc runtime library. The described code itself is part of the glibc library. If you are using newlib for embedded systems or bionic for andorid the mechanics are slightly different even though the gcc compiler is the same.

  2. Great analysis by the original author and the commenters. Gentlemen and ladies, great job.
    I don’t think ANYONE has EVER put this kind of an analysis of program startup and initialization
    together since VAX/VMS. (My apologies to the SNA fans…)

    1. back in the decade ago this site https://opensecuritytraining.info/Training.html had a class called life of binaries and many more classes. Now they are back with updated/updating content at https://ost2.fyi/ . added space () in-case links can’t be uploaded here. As I don’t know if my original post went thru. I make no money, nor am i affiliate with them(links). only sharing these link to pass on what had been so kindly provide to me before.
      just throwing it out there ya know receive and give back in kind . Thanks to hackaday..com also for being a great site to check everyday.

  3. Imagine if the file system which you paged the first page of your executable – the one which exec*() was looking at, suddenly got very very slow… before you paged in all your shared libraries.

    it’s a dark tunnel, at night, and suddenly you are driving on jello

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.