Chapter 1.pptx compiler design lecture note

adugnanegero 227 views 33 slides Aug 06, 2024
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

hghghgghghgghghghghgghghghg


Slide Content

Chapter One Introduction Compiler Design 1

Basic knowledge of Programming languages. Basic knowledge of Formal language and Automata Theory. Textbook: Alfred V. Aho , Ravi Sethi , and Jeffrey D. Ullman , “ Compilers: Principles, Techniques, and Tools ” Addison-Wesley, 2007. Course Outline Preliminaries Required

Objective At the end of this session students will be able to: Understand the basic concepts and principles of Compiler Design Understand the term compiler, its functions and how it works. Be familiar with the different classification of compilers. Be familiar with cousins of compiler: Linkers, Loaders, Interpreters, Assemblers Understand the need of studying compiler Design and construction Understand the phases of compilation and the steps of compilation Understand the history of compilers. (Assignment For G1 Students) Be familiar with the different compiler tools (Assignment For G2 Students) 3

Introduction Definition Compiler is an executable program that can read a program in one high-level language and translate it into an equivalent executable program in machine language. A compiler is a computer program that translates an executable program in a source language into an equivalent program in a target language. A source program/code is a program/code written in the source language, which is usually a high-level language . 4 A target program/code is a program/code written in the target language, which often is a machine language or an intermediate code (Object Code) . C ompiler Source program Target program Error message

Contd. As a discipline compiler design involves multiple computer science and Engineering courses like: Programming Languages Data structures and Algorithms Theory of Computation (Automata and formal language theory) Assembly language Software Engineering Computer Architecture Operating Systems and Discrete Mathematics 5

Why Study Theory of Compiler? curiosity Prerequisite for developing advanced compilers, which continues to be active as new computer architectures emerge To improve capabilities of existing compiler/interpreter To write more efficient code in a high-level language Useful to develop software tools that parse computer codes or strings E.g., editors, debuggers, interpreters, preprocessors, … Important to understand how compliers work to program more effectively To provide solid foundation in parsing theory for parser writing To make compiler design as an excellent “capstone” project To apply almost all of the major computer science fields, such as: automata theory, computer programming, programming language design theory, assembly language, computer architecture, data structures, algorithms, and software engineering . 6

Classification of Compilers 7 Compilers Viewed from Many Perspectives Single Pass Multiple Pass Load & Go Debugging Optimizing However, all utilize same basic tasks to accomplish their actions Construction Functional Classifying compilers by number of passes has its background in the hardware resource limitations of computers . Compiling involves performing lots of work and early computers did not have enough memory to contain one program that did all of this work. So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations.

Contd. Single(One) Pass Compilers:- is a compiler that passes through the source code of each compilation unit only once Also called narrow compilers . The ability to compile in a single pass has classically been seen as a benefit because it simplifies the job of writing a compiler . Single-pass compilers generally perform compilations faster than multi-pass compilers. Due to the resource limitations of early systems, many early languages were specifically designed so that they could be compiled in a single pass (e.g., Pascal ). Disadvantage of single pass Compilers: It is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once. 8

Contd. Multi-Pass Compilers:- is a type of compiler that processes the source code or abstract syntax tree of a program several times . Also called wide compilers . Phases are separate "Programs", which run sequentially Here, by splitting the compiler up into small programs, correct programs will be produced. Proving the correctness of a set of small programs often requires less effort than proving the correctness of a larger, single, equivalent program. Many programming languages cannot be represented with a single pass compilers, for example most latest languages like Java require a multi-pass compiler. 9

Contd. Load and Go Compilers:- generates machine code & then immediately executes it. Compilers usually produce either absolute code that is executed immediately upon conclusion of the compilation or object code that is transformed by a linking loader into absolute code. These compiler organizations will be called Load & Go and Link/Load . Both Load & Go and Link/Load compilers use a number of passes to translate the source program into absolute code. A pass reads some form of the source program, transforms it into an another form, and normally outputs this form to an intermediate file which may be the input of a later pass. 10

Contd. Optimizing Compilers:- is a compiler that tries to minimize or maximize some attributes of an executable computer program . The most common requirement is to minimize the time taken to execute a program ; a less common one is to minimize the amount of memory occupied . The growth of portable computers has created a market for minimizing the power consumed by a program . Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources. Types of Optimization includes Peephole optimizations , local optimization , global optimization, loop optimization, machine-code optimization, etc. [Read more ] 11

Cousins of Compilers Assembler:- is a translator that converts programs written in assembly language into machine code. Translate mnemonic operation codes to their machine language equivalents. Assigning machine addresses to symbolic labels. Interpreter:- is a computer program that translates high level instructions/programs into machine code as they are encountered. It produces output of statement as they are interpreted It generally uses one of the following strategies for program execution: execute the source code directly translate source code into some efficient intermediate representation and immediately execute this explicitly execute stored precompiled code made by a compiler which is part of the interpreter system 12

Contd. Linker:- is a program that takes one or more objects generated by a compiler and combines them into a single executable program. Loader:- is the part of an operating system that is responsible for loading programs from executables (i.e., executable files) into memory, preparing them for execution and then executing them. 13 preprocessor compiler assembler linker/loader source program modified source program target assembly program Relocatable machine code target machine code Library files

Compiler vs. Interpreter Most languages are usually thought of as using either one or the other: Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1 Interpreters: Lisp, scheme, BASIC, APL, Perl, Python, Smalltalk BUT: not always implemented this way 14 Ideal concept: Compiler Executable Executable Output data Interpreter Source code Input data Output data Input data Source code

Basic Compiler Design Write a huge program that takes as input another program in the source language for the compiler, and gives as output an executable that we can run. For modifying code easily, usually, we use modular design (decomposition) methodology to design a compiler. Two design strategies: Write a “front end” of the compiler (i.e. the lexer, parser, semantic analyzer, and assembly tree generator), and write a separate back end for each platform that you want to support Write an efficient highly optimized back end , and write a different front end for several languages, such as Fortran, C, C++, and Java. 15 Front End Back End Source code Intermediate code Target code

The Analysis-Synthesis Model of Compilation 16 1. Lexical Analysis 2. Syntax Analysis 3. Semantic Analysis 4. Code Generation 5. Optimization Analysis Synthesis Front End Back End Breaks up source program into constituent pieces Creates intermediate representation of source program Construct target program from intermediate representation Takes the tree structure and translates the operations into the target program There are two parts to compilation: analysis & synthesis . During analysis, the operations implied by the source program are determined and recorded in a hierarchical structure called a tree . During synthesis, the operations involved in producing translated code.

Analysis In compiling, analysis has three phases : Linear analysis : stream of characters read from left-to-right and grouped into tokens ; known as lexical analysis or scanning Converting input text into stream of known objects called tokens. It simplifies scanning process Hierarchical analysis : tokens grouped hierarchically with collective meaning; known as parsing or syntax analysis Translating code to rules of grammar. Building representation of code. Semantic analysis : check if the program components fit together meaningfully Checks source program for semantic errors Gathers type information for subsequent code generation ( type checking ) Identifies operator and operands of expressions and statements 17

Phases of Compilation scanner parser Semantic analyzer Intermediate code generator Code optimization Code generator Code optimization Stream of characters Stream of tokens Parse/syntax tree Annotated tree Intermediate code Intermediate code Target code Target code General Structure of a Compiler

Phase I: Lexical Analysis The low-level text processing portion of the compiler The source file, a stream of characters, is broken into larger chunks called tokens . For example: void main() { int x; x=3; } The lexical analyzer (scanner) reads a stream of characters and puts them together into some meaningful (with respect to the source language) units called tokens . Typically, spaces, tabs, end-of-line characters and comments are ignored by the lexical analyzer. To design a lexical analyzer: input a description (regular expressions) of the tokens in the language, and output a lexical analyzer (a program). 19 It will be broken into 13 tokens as below: void main ( ) { int x ; x = 3 ; }

Phase II: Parsing (Syntax Analysis) A parser gets a stream of tokens from the scanner, and determines if the syntax (structure) of the program is correct according to the ( context-free ) grammar of the source language. Then, it produces a data structure, called a parse tree or an abstract syntax tree , which describes the syntactic structure of the program. The parser ensures that the sequence of tokens returned by the lexical analyzer forms a syntactically correct program It also builds a structured representation of the program called an abstract syntax tree that is easier for the type checker to analyze than a stream of tokens It catches the syntax errors as the statement below: if if (x > 3) then x = x + 1 Context-free grammars will be used (as the input ) by the parser generator to describe the syntax of the compiling language Most compilers do not generate a parse tree explicitly but rather go to intermediate code directly as syntax analysis takes place. 20

Parse Tree Is output of parsing that shows the Top-down description of program syntax Root node is entire program and leaves are tokens that were identified during lexical analysis Constructed by repeated application of rules in Context Free Grammar (CFG) Syntax structures are analyzed by DPDA (Deterministic Push Down Automata) Example: parse tree for position:=initial + rate*60 21

Phase III: Semantic Analysis It gets the parse tree from the parser together with information about some syntactic elements It determines if the semantics ( meanings ) of the program is correct. It detects errors of the program, such as using variables before they are declared , assign an integer value to a Boolean variable, … This part deals with static semantic . semantic of programs that can be checked by reading off from the program only. syntax of the language which cannot be described in context-free grammar. Mostly, a semantic analyzer does type checking (i.e. Gathers type information for subsequent code generation .) It modifies the parse tree in order to get that (static) semantically correct code In this phase, the abstract syntax tree that is produced by the parser is traversed, looking for semantic errors 22

Contd. The main tool used by the semantic analyzer is a symbol table Symbol table:- is a data structure with a record for each identifier and its attributes Attributes include storage allocation, type, scope, etc All the compiler phases insert and modify the symbol table Discovery of meaning in a program using the symbol table Do static semantics check Simplify the structure of the parse tree ( from parse tree to abstract syntax tree (AST) ) Static semantics check Making sure identifiers are declared before use Type checking for assignments and operators Checking types and number of parameters to subroutines Making sure functions contain return statements Making sure there are no repeats among switch statement labels 23

Phase IV: Intermediate Code Generation An intermediate code generator takes a parse tree from the semantic analyzer generates a program in the intermediate language. In some compilers, a source program is translated into an intermediate code first and then the intermediate code is translated into the target language. In other compilers, a source program is translated directly into the target language. Compiler makes a second pass over the parse tree to produce the translated code If there are no compile-time errors , the semantic analyzer translates the abstract syntax tree into the abstract assembly tree The abstract assembly tree will be passed to the code optimization and assembly code generation phase 24

Contd. Using intermediate code is beneficial when compilers which translates a single source language to many target languages are required. The front-end of a compiler:- scanner to intermediate code generator can be used for every compilers. Different back-ends:- code optimizer and code generator is required for each target language. One of the popular intermediate code is three-address code . A three-address code instruction is in the form of x = y op z . 25

Phase V: Assembly Code Generation Code generator coverts the abstract assembly tree into the actual assembly code To do code generation The generator covers the abstract assembly tree with tiles (each tile represents a small portion of an abstract assembly tree) and Output the actual assembly code associated with the tiles that we used to cover the tree 26 Phase VI: Machine Code Generation and Linking The final phase of compilation coverts the assembly code into machine code and links (by a linker ) in appropriate language libraries

Code Optimization Replacing an inefficient sequence of instructions with a better sequence of instructions. Sometimes called code improvement . Code optimization can be done: after semantic analyzing performed on a parse tree after intermediate code generation performed on a intermediate code after code generation performed on a target code Two types of optimization Local Global 27

Local Optimization The compiler looks at a very small block of instructions and tries to determine how it can improve the efficiency of this local code block Relatively easy; included as part of most compilers Examples of possible local optimizations Constant evaluation Strength reduction Eliminating unnecessary operations 28

Global Optimization The compiler looks at large segments of the program to decide how to improve performance Much more difficult ; usually omitted from all but the most sophisticated and expensive production-level “optimizing compilers” Optimization cannot make an inefficient algorithm efficient 29

30 The Phases of a Compiler Phase Output Sample Programmer (source code producer) Source string A=B+C; Scanner (performs lexical analysis ) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’ And symbol table with names Parser (performs syntax analysis based on the grammar of the programming language) Parse tree or abstract syntax tree ; | = / \ A + / \ B C Semantic analyzer (type checking, etc ) Annotated parse tree or abstract syntax tree Intermediate code generator Three-address code, quads, or RTL int2fp B t1 + t1 C t2 := t2 A Optimizer Three-address code, quads, or RTL int2fp B t1 + t1 #2.3 A Code generator Assembly code MOVF #2.3,r1 ADDF2 r1,r2 MOVF r2,A

Summary of Phases of Compiler 31

Compiler Construction Tools 32 Software development tools are available to implement one or more compiler phases Scanner generators Parser generators. Syntax-directed translation engines Automatic code generators Data Flow Engines Other compiler tools: JavaCC , a parser generator for Java, including scanner generator and parser generator. Input specifications are different than those suitable for Lex /YACC. Also, unlike YACC, JavaCC generates a top-down parser. ANTLR , a set of language translation tools (formerly PCCTS). Includes scanner/parser generators for C, C++, and Java. Scanner generators for C/C++: Flex,  Lex . Parser generators for C/C++: Bison, YACC. Available scanner generators for Java: JLex , a scanner generator for Java, very similar to Lex . JFlex , flex for Java. Available parser generators for Java: CUP , a parser generator for Java, very similar to YACC. BYACC/J, a different version of Berkeley YACC for Java. It is an extension of the standard YACC (a -j flag has been added to generate Java code).

Assignment I Compiler Design Tools History of Compilers Thank You ... ?
Tags