Computer Science Learners: Semantic Analysis

Introduction

Compiler:

A compiler is a piece of software that translates the source code written in a high-level programming language to object code in a low-level programming language. This process is performed so that the code may be transformed into an executable program.

The first compiler was written in 1952 by Grace Hopper for the A-0 programming language. The FORTRAN team is generally credited as having introduced the first complete compiler in 1957. In 1960, COBOL was the first language to be compiled on multiple architectures.

Structure of a Compiler:

The overall structure of a compiler can be broken down into the following three parts:-

Front-end - The front-end part of the compiler is responsible for checking the correctness of the program with respect to the syntax and semantics of the programming language in which it is written. Legal and Illegal programs are classified into their respective categories in this part. If there are any errors in the program, they are also reported to the programmer in a useful way. Type checking is also done in this part of the compiler. After these steps have been performed, the front-end generates an Intermediate Representation (IR) of the code for processing by the middle-end.
Middle-end - In the middle-end part of the compiler, the optimization of the code is performed. Several steps are performed for optimization like removal of useless or unreachable code, discovery and propagation of constant values, etc. After these processes, the middle-end generates an IR of the code for processing by the back-end of the compiler.
Back-end - The back-end is responsible for translating the IR received from the middle-end into assembly code. The target instructions for each IR instruction are chosen. Register allocation assigns the processor registers to the variables of the program where possible. The back-end also utilizes the hardware by filling delay slots and figuring out how to keep parallel execution units busy.

Phases of Compiler:

The phases of compiler, as seen in above figure, can be described as follows:

After the source code has been written and sent to the compiler, the program is passed to the lexical analyzer.
The lexical analyzer, also known as Scanner using Regular Expression, takes the source code as input and generates tokens & lexemes as output. This output is then sent to the syntax analyzer.
The syntax analyzer analyzes the code using Context Free Grammar by taking the tokens as input and generating a parse tree or syntax tree as output. This output is then sent to the semantic analyzer.
The semantic analyzer takes the generated parse tree as input and generates an annotated parse tree as output. An annotated parse tree can also be called as an error-free parse tree. This output is then sent to the intermediate code generator.
The intermediate code generator takes the annotated parse tree as input and translates it into code. It also generates some temporary variables in the code that help in code execution. This output code is then sent to the code optimizer.
The code optimizer optimizes the code by reducing extra lines and removing some extra temporary variables which may increase the code execution time. The optimized code is then sent to the code generator
The code generator takes the optimized code as input and converts it into object code as output. This object code is the final version of code that is executed by the system.

Semantic Analysis
Semantic Analysis is a process that is performed by the semantic analyzer. Semantic analyzer is the part of compiler which finds out any remaining errors that were left out by the syntax analyzer. It performs this process on the parse tree generated by the syntax analyzer and if no errors are found, it generates an annotated parse tree, also commonly known as an error free parse tree. The use of semantic analysis in a compiler is important because the process of parsing cannot find out all the errors in the source code.
What is performed?
Some of the procedures that are performed during the semantic analysis process are:-

Checks whether all the identifiers being used in the program have been declared or not
Type compatibility and type checking
It checks if the classes being used in the program are defined only once or not
It also checks whether the methods being declared in classes are defined single or multiple times
Checks whether the programming language’s reserved identifiers are being misused or not

Type Checking
Type checking is the process of verifying whether each operation being performed in the program follows the type system of the language or not. This is done by checking whether appropriate values are being assigned to variables or not. If an error is found, a type error is generated and displayed to the programmer. This process can be performed at compilation, execution or divided across both.

Following are the 2 methods of type checking:-

Type Checking (Static):
Static type checking refers to the type checking that is done at compile time. Because this is done at compile time, hence all errors of such type cannot be detected by the compiler. Languages using this type checking have difficulty in pointing out the operation where the error lies. An example of an error that may occur is that 2 variables “int a” and “int b” may give a resultant value greater than the total range of integer values when multiplied. Such errors cannot be detected at compile time.

Type Checking (Dynamic):
Dynamic type checking refers to the type checking that is done during the execution of the program. Languages that use this type checking can detect a far greater number of errors than static type checking. Although the dynamic type checking process degrades performance because it is being done alongside the executing program, it is more powerful and has much better error detection than static type checking.

Type Compatibility/Subtyping
The process of checking type compatibility varies from language to language. This is because some languages allow values to be exchanged between similar types only while other languages allow values to be exchanged between variables with compatible data types. For example, substituting the values of 2 variables “int a” and “float b” in C language is possible as the language automatically truncates the value of one variable to match the other. In Java however, the same process would require an explicit type cast otherwise it would not be performed successfully and would give errors.
Subtypes are also related in a way to type compatibility. Basically subtypes are a way of designating freely compatible data types for use in the program. In other words, we can say that if a data type has all the behavior and features of another data type, then the first data type is a subtype of the second data type. For example, “enum” variables in C language can be made compatible with variables of any data type if the value stored in it is the one supported by that data type.
Scope Checking
Scope checking is the process of checking whether the variables being used in the program are being used in their available scope or not. If the programmer is trying to use a variable outside of its available scope, then an Out of Scope error will be generated and displayed to the programmer. For example, a local variable declared in a function cannot be used outside that function. If it trying to be accessed elsewhere, the relevant error will be generated and displayed.

Feel free to comment with your questions and suggestions regarding the post content...!

Computer Science Learners

Pages

Thursday, September 27, 2012

Semantic Analysis

No comments:

Post a Comment