Compilation Process

Everytime I say compilation and execution. What is exactly happening when I say compilation and execution. We write a program in Dev CPP, save it and then compile it. What internally happens is some series of steps that will create an executable file which can be understood by the operating system so that we can run/execute it.

The program we write is according to the standards/syntax that is specified for C. Only then the C Compilers can understand our programs. These standards are specified keeping in mind, how easy the developers can understand the programs. But the computer system can understand only 0s and 1s (which will be difficult for the developers to deal with). It doesnt know what is "printf" and what is "scanf" unless it is converted to someother form the computer can understand. This translation of our source program to machine understandable object codes is done by compilers. But will it blindly convert whatever we write ? definitely not. So lets look into what are all the steps that are taking place...

Step 1: Lexical Analysis or Linear Analysis or Scanner : All the statements in the C program are further divided into tokens.
There are 5 types of tokens
a) Identifiers eg) Variables
b) Constants eg) 3,5,10 etc
c) Operators eg) +, -, (, ) etc
d) Keywords eg) if, else etc
e) Delimitors/Separators eg) ;

Say we have the following piece of code,
Code
if(x<5)
x = x + 2;
else
x = x + 10;

Here the tokens that will be generated are
Keywords : if , else
Identifier : x
Constants : 2, 10
Operators : +,=

Step 2: Syntax Analysis : It checks for syntax of our statements. It will then generate a parse tree for our statements. Something like the following (eg. x=2+5)
  =
/ \
x +
/ \
2 5

In case of any errors in the syntax like x=x+; It misses one identifier/constant after '+' operator. So it will generate an error.

Step 3: Semantic Analysis : Checks for type mismatch. Say we have declared variable x as float. And we are trying to use Modulo operator (% - finds the reminder in division operation) on x. The type of x is float, and we cant use % on that variable. So this causes an semantic error.

Now when the same program is used in different platforms/machines and compiled, the above phases are the same. The above operations output only changes when the Source Language is changed. So since the above phases are largely dependent on the source language, they are called front-end operations.

Step 4: Intermediate Code Generation : The actual machine understandable code is generated after phases like Code-Optimization(step 5) and Code Generation(step 6). Since they are largely dependent on the Target Machine, they are called Back-End Operations. And these phases are independent on the Source Program used.

In order to make compilation process more efficient, so that we need to have less number of backend and front end operations for compiling from programs in many source languages for many platforms, we need to introduce this step of Intermediate Code Generation. This creates a logical separation between the machine dependent phases and source dependent phases.

Step 5: Code Optimization: Inorder to make the execution of programs efficient, like usage of less resources and time efficient, compiler will optimize our code upto some extent. We should also remember that we should write the codes by following some code tuning techniques (will be explained later).

Step 6: Code Generation: Final step is the actual generation of code for the target machine. It is dependent on the architecture of the CPU and other devices.

In all the steps, compiler will make use of Hash Tables to keep track of name,type, address,size of variables. It will also make use of Error Processing module for reporting and recovery of errors that occurs in each stage of compilation. Error Reporting helps in getting the line in which error has occured and possible reason. Error Recovery will try to correct or skip lines where error has occurred and will allow the compiler proceed with compilation for the rest of the lines.

Thats the basics about compilation process. Now the generated output code can be executed to give the actual output of the program.

0 comments:

Post a Comment

 
Template designed using TrixTG