Now that we've learned that compilation is not just a simple process, lets try to see what is the complete list of steps taken by the compiler in order to compile a C program.
- Driver - what we invoked as "gcc". This is actually the "engine", that drives the whole set of tools the compiler is made of. We invoke it, and it begins to invoke the other tools one by one, passing the output of each tool as an input to the next tool.
- C Pre-Processor - normally called "cpp". It takes a C source file, and handles all the pre-processor definitions (#include files, #define macros, conditional source code inclusion with #ifdef, etc.) You can invoke it separately on your program, usually with a command like:
$gcc -E code1.c
Try this and see what the resulting code looks like.
- The C Compiler - normally called "cc1". This is the actual compiler, that translates the input file into assembly language. As you saw, we used the "-c" flag to invoke it, along with the C Pre-Processor, (and possibly the optimizer too, read on), and the assembler.
- Optimizer - sometimes comes as a separate module and sometimes as the found inside the compiler module. This one handles the optimization on a representation of the code that is language-neutral. This way, you can use the same optimizer for compilers of different programming languages.
- Assembler - sometimes called "as". This takes the assembly code generated by the compiler, and translates it into machine language code kept in object files. With gcc, you could tell the driver to generated only the assembly code, by a command like:
$gcc -S code1.c
- Linker-Loader - This is the tool that takes all the object files (and C libraries), and links them together, to form one executable file, in a format the operating system supports. A Common format these days is known as "ELF". On SunOs systems, and other older systems, a format named "a.out" was used. This format defines the internal structure of the executable file - location of data segment, location of source code segment, location of debug information and so on.
As you see, the compilation is split in to many different phases. Not all compiler employs exactly the same phases, and sometimes (e.g. for C++ compilers) the situation is even more complex. But the basic idea is quite similar - split the compiler into many different parts, to give the programmer more flexibility, and to allow the compiler developers to re-use as many modules as possible in different compilers for different languages (by replacing the preprocessor and compiler modules), or for different architectures (by replacing the assembly and linker-loader parts).
The explanations given here are specific to the "gdb" debugger, since there are no real standards regarding the activation and usage of debuggers, but once you know what features to expect from a debugger, it's not too hard to adapt your knowledge to different debuggers.
Cem Ozdogan
2011-02-14