• Skip to main content
  • Skip to primary sidebar
  • Skip to secondary sidebar
  • Skip to footer

Computer Notes

Library
    • Computer Fundamental
    • Computer Memory
    • DBMS Tutorial
    • Operating System
    • Computer Networking
    • C Programming
    • C++ Programming
    • Java Programming
    • C# Programming
    • SQL Tutorial
    • Management Tutorial
    • Computer Graphics
    • Compiler Design
    • Style Sheet
    • JavaScript Tutorial
    • Html Tutorial
    • Wordpress Tutorial
    • Python Tutorial
    • PHP Tutorial
    • JSP Tutorial
    • AngularJS Tutorial
    • Data Structures
    • E Commerce Tutorial
    • Visual Basic
    • Structs2 Tutorial
    • Digital Electronics
    • Internet Terms
    • Servlet Tutorial
    • Software Engineering
    • Interviews Questions
    • Basic Terms
    • Troubleshooting
Menu

Header Right

Home » Compiler » Regular Expression – Compiler Design
Next →
← Prev

Regular Expression – Compiler Design

By Dinesh Thakur

• Regular expressions are a notation to represent lexeme patterns for a token.

• They are used to represent the language for lexical analyzer.

• They assist in finding the type of token that accounts for a particular lexeme.

We’ll be covering the following topics in this tutorial:

  • Strings and Languages
  • Operations on Languages
  • Regular Expressions
  • Regular Definition

Strings and Languages

Alphabets are finite, non-empty set of input symbols.

               Σ = {0, 1} – binary alphabets

String represents the collection of alphabets.

               w = {0,1, 00, 01, 10, 11, 001, 010, … }

w indicates the set of possible strings for the given binary alphabet Σ

Language (L) is the collection of strings which are accepted by finite automata.

                L = {0n1 I n >= 0}

Length of string is defined as the number of input symbols in a given string. It is found by || operator.

             Let ω = 0101

             | ω | =4

Empty string denotes zero occurrence of input symbol. It is represented by Ɛ. Concatenation of two strings p and q is denoted by pq.

        Let       p = 010

        And      q = 001

                  pq = 010001

                  qp = 001010

                  i.e., pq ≠ qp

 

Empty string is identity under concatenation.

     Let x be a string.

                    Ex= XE= X

Prefix A prefix of any string s, is obtained by removing zero or more symbols from the end of s.

          (eg.) s = balloon

Possible prefixes are: ball, balloon,

Suffix A suffix of any string s, is obtained by removing zero or more symbols from the beginning of s.

          (eg.) s =balloon

Possible prefixes are: loon, balloon

Proper prefix: Proper prefix p of a strings, can be given by s ≠ p and p ≠ E

Proper suffix: Proper suffix x of a string s, can be given by s ≠ x and x ≠ E

Substring: Substring is part of a string obtained by removing any prefix and any suffix from s.

Operations on Languages

Important operations on a language are:

• Union

• Concatenation and

• Closure

Union

Union of two languages Land M produces the set of strings which may be either in language L or in language M or in both. It can be denoted as,

LUM = {p I p is in L or p is in M}

Concatenation

Concatenation of two languages L and M, produces a set of strings which are formed by merging the strings in L with strings in M (strings in L must be followed by strings in M). It can be represented as,

LUM= {pq | p is in L and q is in M}

Closure

Kleene closure (L*)

Kleene closure refers to zero or more occurrences of input symbols in a string, i.e., it includes empty string Ɛ(set of strings with 0 or more occurrences of input symbols).

                                       

Positive closure (L +)

Positive closure indicates one or more occurrences of input symbols in a string, i.e., it excludes empty string Ɛ(set of strings with 1or more occurrences of input symbols).

                                       

L3– set of strings each with length 3.

(eg.) Let Σ = {a, b}

L* = {E, a, b, aa, ab, ba, bb, aab, aba, aaba, … }

L+ = {a, b, aa, ab, ba, bb, aab, aaba, }

L3 = {aaa, aba, abb, bba, bob, bbb, }

Precedence of operators

• Unary operator (*) is having highest precedence.

• Concatenation operator (-) is second highest and is left associative.

           letter_ (letter_ I digit )*

• Union operator ( I or U) has least precedence and is left associative.

Based on the precedence, the regular expression is transformed to finite automata when implementing lexical analyzer.

Regular Expressions

Regular expressions are a combination of input symbols and language operators such as union, concatenation and closure.

It can be used to describe the identifier for a language. The identifier is a collection of letters, digits and underscore which must begin with a letter. Hence, the regular expression for an identifier can be given by,

Letter_ (letter I digit)*

Note: Vertical bar ( I ) refers to ‘or’ (Union operator).

The following describes the language for given regular expression:

                                       Languages for regular expressions

                

S.No.

Regular expression

Language

1

r

L(r)

2

a

L(a)

3

r | s

L(r) | L(s)

4

rs

L(r) L(s)

5

r*

(L(r))*

 

Regular set Language defined by regular expression.

Two regular expressions are equivalent, if they represent the same regular set.

                                        (p I q) = (q | p)

 

                                    Algebraic laws of regular expressions

 

Law

Description

r | s = s | r| is commutative
r | (s | t) = (r | s ) | t| is associative
r (st) = (rs)tConcatenation is associative
r(s|t) = rs | rt; (s|t)r = sr | trConcatenation is distributive
Ɛr = rƐ = rƐ is identity for concatenation
r* = (r | Ɛ)*Ɛ is guaranteed in closure
r** = r** is idempotent

Regular Definition

Regular definition d gives aliases to regular expressions r and uses it for convenience. Sequences of definitions are of the following form

di –> ri

d2–>r2

d3–> rs

dn–> rn

in which definitions di, d2, … , can be used in place of ri, r2 respectively.

letter –> A I B I · · · I Z I a I b I · · · I z I

digit –>0 |1 I 2 … I 9

id –> letter_ (letter I digit)*

You’ll also like:

  1. Convert Regular Expression to DFA – Compiler Design
  2. Compiler Construction tools – Compiler Design
  3. Phases of Compiler – Compiler Design
  4. Regular Expression in Python
  5. LR Parsers – Compiler Design
Next →
← Prev
Like/Subscribe us for latest updates     

About Dinesh Thakur
Dinesh ThakurDinesh Thakur holds an B.C.A, MCDBA, MCSD certifications. Dinesh authors the hugely popular Computer Notes blog. Where he writes how-to guides around Computer fundamental , computer software, Computer programming, and web apps.

Dinesh Thakur is a Freelance Writer who helps different clients from all over the globe. Dinesh has written over 500+ blogs, 30+ eBooks, and 10000+ Posts for all types of clients.


For any type of query or something that you think is missing, please feel free to Contact us.


Primary Sidebar

Compiler Design

Compiler Design

  • CD - Translators
  • CD - Compiler Phases
  • CD - Input Buffering
  • CD - Lexical Analysis
  • CD - Regular Expression
  • CD - LR Parsers
  • CD - Parse Tree
  • CD - Parsing Type
  • CD - Lex
  • CD - Syntax Directed Definition
  • CD - Compiler Construction tools
  • CD - Regular Expression
  • CD - Context Free Grammars
  • CD - Syntax Analysis
  • CD - Grouping of Phases
  • CD - Language Processing System
  • CD - Derivations
  • CD - Error Handling

Other Links

  • Compiler Design - PDF Version

Footer

Basic Course

  • Computer Fundamental
  • Computer Networking
  • Operating System
  • Database System
  • Computer Graphics
  • Management System
  • Software Engineering
  • Digital Electronics
  • Electronic Commerce
  • Compiler Design
  • Troubleshooting

Programming

  • Java Programming
  • Structured Query (SQL)
  • C Programming
  • C++ Programming
  • Visual Basic
  • Data Structures
  • Struts 2
  • Java Servlet
  • C# Programming
  • Basic Terms
  • Interviews

World Wide Web

  • Internet
  • Java Script
  • HTML Language
  • Cascading Style Sheet
  • Java Server Pages
  • Wordpress
  • PHP
  • Python Tutorial
  • AngularJS
  • Troubleshooting

 About Us |  Contact Us |  FAQ

Dinesh Thakur is a Technology Columinist and founder of Computer Notes.

Copyright © 2023. All Rights Reserved.