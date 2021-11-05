Have you ever had a program crash before your
main function executes? it is rare, but it can happen. When it does, you need to understand what happens behind the scenes between the time the operating system starts your program and your first line of code in
main executes. Luckily [Patrick Horgan] has a tutorial about the subject that’s very detailed. It doesn’t cover statically linked libraries but, as he points out, if you understand what he does cover, that’s easy to figure out on your own.
The operating system, it turns out, knows nothing about main. It does, however, know about a symbol called _start. Your runtime library provides this. That code contains some stack manipulation and eventually calls
__libc_start_main which is also provided by the library.
From there, you wind up with some trickery to manage the program’s environment and more library calls such as
__libc_init_first and
__libc_init do some more setup work. You’d think that would get you close, but there’s plenty more to do including setting up for
at_exit and thunking for position-independent code, not to mention dynamically linked libraries.
This is one of those topics it will seem like you don’t really need until you do. Even if you use another language to generate executables, they all have to follow these steps somewhere. Granted, for many languages the startup is static and unlikely to require you to debug it, but it is still good to know what’s going on under the hood.
If you want a quick Linux assembly tutorial, have at it. If you prefer to shovel your assembly into a C source code file, you can do that, too.
One thought on “The Linux X86 Journey To Main()”
The info is gcc/glibc specific if you are using a different compiler/runtime library the mechanics are not the same. Much of this is dictated by the glibc ABI.
The ELF loader is a piece of code that gets executed first in your process PID – for dynamically linked ELF binaries. It is responsible for loading all dependent “shared objects” and mapping them into memory and also resolving linker relocations – aka adapting your code the the virtual address it was loaded at. The loader also calls an array of “initializer” functions specified in each binary and finally it passes control to the main binary “entry point”. The loader binary is specified in the ELF file itself. Typically something like /lib/ld-linux.so.2 for 32-bit binaries.
What is describe is the mechanics of the code at the entry point of a binary compiled with gcc and glibc runtime library. The described code itself is part of the glibc library. If you are using newlib for embedded systems or bionic for andorid the mechanics are slightly different even though the gcc compiler is the same.
Please be kind and respectful to help make the comments section excellent. (Comment Policy)