COMPILED BY: RITURAJ JAIN
Lex – A Lexical Analyzer GeneratorLex – A Lexical Analyzer Generator
A Unix Utility from early 1970s
A tool widely used to specify lexical analyzers for a variety of languages
We refer to the tool as Lex compiler , and to its input specification as the Lex language.
A Compiler that Takes as Source a Specification for:
Tokens/Patterns of a Language
Generates a “C” Lexical Analyzer Program
COMPILED BY: RITURAJ JAIN
Lex – A Lexical Analyzer GeneratorLex – A Lexical Analyzer Generator
Lex
Compiler
C
Compiler
a.out
Lex Source
Program:
lex.l
lex.yy.c
lex.yy.c a.out
Input stream Sequence
of tokens
Generates lex.yy.c which defines a routine yylex()
COMPILED BY: RITURAJ JAIN
Lex – A Lexical Analyzer GeneratorLex – A Lexical Analyzer Generator
COMPILED BY: RITURAJ JAIN
Format of a Lexical SpecificationFormat of a Lexical Specification
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
The lex input file consists of three sections, separated by a
line with just %% in it:
COMPILED BY: RITURAJ JAIN
Format of a Lexical SpecificationFormat of a Lexical Specification
Definitions SectionDefinitions Section
This section helps to create an atmosphere in two area.
First, it creates an environment for the lexer, which is a C code.
This area of the Lex specification is separated by “%{%{” and “%}%}”
It contains C statements, such as global declarations, commands, including
library files and other declarations which will be copied to the lexical analyzer
(i.e. lex.yy.c) when it passes through the lex tool.
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
COMPILED BY: RITURAJ JAIN
Format of a Lexical SpecificationFormat of a Lexical Specification
Definitions SectionDefinitions Section
Secondly, the definition section provides an environment for the lex tool to
convert the Lex specification correctly and efficiently to a lexical analyzer.
It has declarations of simple name definitions i.e. regular definition to simplify
the scanner specification.
Regular / Name definitions have the form:
name definition
Example:
DIGIT [0-9]
ID [a-z][a-z0-9]*
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
COMPILED BY: RITURAJ JAIN
Format of a Lexical SpecificationFormat of a Lexical Specification
Rules SectionRules Section
The rules section of the lex input contains a series of rules of the form:
pattern1 {action1}
pattern2 {action2}
Pattern is in the form of a regular expression to match the largest possible
string.
Once the pattern is matched, the corresponding action part is invoked.
The action part contains normal C language statements which are enclosed
in “{” and “}” characters.
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
COMPILED BY: RITURAJ JAIN
Format of a Lexical SpecificationFormat of a Lexical Specification
Rules SectionRules Section
Example:
{ID} printf( "An identifier: %s\n", yytext );
The yytext is used to store lexeme of the matched input string and
yylength variable is used to store length of the lexemes.
If action is empty, the matched token is discarded.
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
COMPILED BY: RITURAJ JAIN
Format of a Lexical SpecificationFormat of a Lexical Specification
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
Auxiliary ProceduresAuxiliary Procedures
The third section holds whatever auxiliary procedures are needed by the
actions and it is simply copied to lex.yy.c verbatim.
Alternatively these procedures can be compiled separately and loaded with
the lexical analyzer.
The auxiliary procedures are written in C language.
The presence of this section is optional; if it is missing, the second %% in
the input file may be skipped.
In the definitions and rules sections, any indented text or text enclosed in
%{ and %} is copied verbatim to the output (with the %{}'s removed).