CSE-3201-Lecture-04-Lexical Analysis-Regular Expression.pptx

mahammedcse 1 views 16 slides Oct 11, 2025
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

Topic:
Lexical Analysis-Regular Expression.


Slide Content

Lecture-04 Lexical Analysis ( Regular Expressions) Mostafiz Ahammed Lecturer Department of Computer Science and Engineering Notre Dame University Bangladesh

Language and Regular Expressions A Regular Expression is a Set of Rules / Techniques for Constructing Sequences of Symbols (Strings) From an Alphabet. Let  Be an A l ph a bet, r a Regular Expre s s i on Th e n L(r) is the Language that is Characterized by the Rules of r.

Rules for specifying Regular Expressions Regular expressions over alphabet   is a regular expression that denotes {  }. If a is a symbol (i.e., if a   ), then a is a regular expression that denotes {a}. Suppose r and s are regular expressions denoting the languages L(r) and L(s). Then (r) | (s) is a regular expression denoting L(r) U L(s). (r)(s) is a regular expression denoting L(r)L(s). (r) * is a regular expression denoting (L(r)) * . (r) is a regular expression denoting L(r).

How to “Parse” Regular Expressions Precedence: * has highest precedence. Concatenation as middle precedence. | has lowest precedence. Use parentheses to override these rules. Examples: – a b* = a (b*) If you want (a b)* you must use parentheses. a | b c = a | (b c) If you want (a | b) c you must use parentheses. Concatenation and | are associative. – (a b) c = a (b c) = a b c – (a | b) | c = a | (b | c) = a | b | c Example: – b d | e f * | g a = (b d) | (e (f *)) | (g a)

Example Let  = {a, b} The regular expression a | b denotes the set {a, b} The regular expression (a|b)(a|b) denotes {aa, ab, ba, bb} The regular expression a * denotes the set of all strings of zero or more a’s. i.e., {  , a, aa, aaa, ….. } The regular expression (a|b) * denotes the set containing zero or more instances of an a or b. The regular expression a|a*b denotes the set containing the string a and all strings consisting of zero or more a’s followed by one b.

Equality vs Equivalence Are these regular expressions equal? R = a a* (b | c) S = a* a (c | b) ... No! Yet, they describe the same language. L(R) = L(S) “Equivalence” of regular expressions If L(R) = L(S) then we say R  S “R is equivalent to S” From now on, we’ll just say R = S to mean R  S

Algebraic law of regular expressions

Regular Definition If Σ is an alphabet of basic symbols then a regular definition is a sequence of the following form: d 1  r 1 d 2  r 2 …….. d n  r n where Each d i is a new symbol such that d i  Σ and d i  d j where j < I Each r i is a regular expression over Σ  {d 1 ,d 2 ,…,d i-1 )

Regular Definition

Addition Notation / Shorthand

Unsigned Number 1240, 39.45, 6.33E15, or 1.578E-41 digit  0 | 1 | 2 | … | 9 digits  digit digit* optional_fraction  . digits |  optional_exponent  ( E ( + | -|  ) digits ) |  num  digits optional_fraction optional_exponent digit  0 | 1 | 2 | … | 9 digits  digit + optional_fraction  (. digits ) ? optional_exponent  ( E ( + | -) ? digits ) ? num  digits optional_fraction optional_exponent Shorthand

Nonregular sets

Token Recognition How can we use concepts developed so far to assist in recognizing tokens of a source language ? Assume Following Tokens: if, then, else, relop, id, num Given Tokens, What are Patterns ? if  if then  then else  else relop  < | <= | > | >= | = | <> id  letter ( letter | digit )* num  digit + (. digit + ) ? ( E(+ | -) ? digit + ) ? Grammar: stmt  |if expr then stmt | if expr then stmt else stmt |  expr  term relop term | term term  id | num

What Else Does Lexical Analyzer Do? Scan a w ay b l a n k s , new li n es, t a bs Can we Define Tokens For These? bl a nk tab  blank  tab newline  newline delim  blank | tab | newline ws  delim + Ans: No token is returned to parser

Overall Regular Token Attribute-Value Expression ws - - if if - then then - else else - id id pointer to table entry num num Exact value < relop LT <= relop LE = relop EQ < > relop NE > relop GT >= relop GE Note: Each token has a unique token identifier to define category of lexemes

Example of Regular Expression Regular Expression for numbers digit  0|1|…|9 digits  digit digit* optional_fraction  .digits|  optional_exponent  ( E (+|-|  ) digits ) |  num  digits optional_fraction optional_exponent Using shorthands: digit  0|1|…|9 digits  digit + optional_fraction  (.digits)? optional_exponent  ( E (+|-|  ) digits ) ? num  digits optional_fraction optional_exponent ? means “zero or one instance of” r? is a shorthand of r | 
Tags