by Dinesh Thakur Category: Compiler Design

LR parsers are used to parse the large class of context free grammars. This technique is called LR(k) parsing.

• L is left-to-right scanning of the input.

• R is for constructing a right most derivation in reverse.

*• k *is the number of input symbols of lookahead that are used in making parsing decisions.

There are three widely used algorithms available for constructing an LR parser:

• SLR(l) - Simple LR

o Works on smallest class of grammar.

o Few number of states, hence very small table.

o Simple and fast construction.

• LR( 1) - LR parser

o Also called as Canonical LR parser.

o Works on complete set of LR(l) Grammar.

o Generates large table and large number of states.

o Slow construction.

• LALR(l) - Look ahead LR parser

o Works on intermediate size of grammar.

o Number of states are same as in SLR(l).

**Reasons for attractiveness of LR parser**

• LR parsers can handle a large class of context-free grammars.

• The LR parsing method is a most general non-back tracking shift-reduce parsing method.

• An LR parser can detect the syntax errors as soon as they can occur.

• LR grammars can describe more languages than LL grammars.

**Drawbacks of LR parsers**

• It is too much work to construct LR parser by hand. It needs an automated parser generator.

• If the grammar contains ambiguities or other constructs then it is difficult to parse in a left-to-right scan of the input.

LR parser consists of an input, an output, a stack, a driver program and a parsing table that has two functions

1. Action

2. Goto

The driver program is same for all LR parsers. Only the parsing table changes from one parser to another.

The parsing program reads character from an input buffer one at a time, where a shift reduces parser would shift a symbol; an LR parser shifts a state. Each state summarizes the information contained in the stack.

The stack holds a sequence of states, *s _{o}, *s

**Action **This function takes as arguments a state *i *and a terminal *a *(or $, the input end marker). The value of ACTION [i, *a] *can have one of the four forms:

i) Shift *j, *where *j *is a state.

ii) Reduce by a grammar production A---> *β.*

iii) Accept.

iv) Error.

**Goto **This function takes a state and grammar symbol as arguments and produces a state.

If GOTO [I_{i} ,A] = I_{j}, the GOTO also maps a state *i *and non terminal A to state *j.*

**Behavior of the LR parser**

1. If ACTION[s_{m}, *ai] *= shift *s. *The parser executes the shift move, it shifts the next state *s *onto the stack, entering the configuration

a) *S _{m} - *the state on top of the stack.

b) *a _{i}- *the current input symbol.

2. If ACTION[s_{m}, *a _{i}] *=reduce A---> β

* (s _{0}s_{1} ... S_{(m-r)}S, *a

a) where *r *is the length of *β *and s= *GOTO[s _{m} - r, *A].

b) First popped *r *state symbols off the stack, exposing state *S _{m-r}·*

c) Then pushed *s, *the entry for *GOTO[s _{m-r}, *A], onto the stack.

3. If ACTION[s_{m}, *a _{i}] *= accept, parsing is completed.

4. If ACTION[s_{m}, *a _{i}] *= error, the parser has discovered an error and calls an error recovery routine.

**Algorithm **LR Parsing Algorithm.

**Input** Input string w,

LR-Parsing table with functions ACTION and

GOTO for a grammar G

Output If w is in L(G), the reduction steps of a

bottom-up parse for w,

otherwise, an error indication.

Method Initially, the parser has So on its stack,

where So is the initial state, and w $ in the

input buffer.

let a be the first symbol of w $

while(l) { //repeat forever

let s be the state on top of the stack;

if(ACTION[s, a] =shift t {

push t onto the stack;

let a be the next input symbol;

} else if (ACTION [s, a] = reduce A---> *β*) {

pop *β* symbols off the stack;

let state t now be on top of the stack;

push GOTO[t, A] onto the stack;

output the production A---> *β**;*

} else if (ACTION [s, a] accept) break;

//parsing is done

else call error-recovery routine;

}

An LR(O) item of a grammar G is a production of G with a dot at some position of the body.

(eg.)

A ---> •XYZ

A ---> XeYZ

A ---> XYeZ

A ---> XYZ•

One collection of set of LR(O) items, called the canonical LR(O) collection, provides finite automaton that is used to make parsing decisions. Such an automaton is called an LR(O) automaton.

An LR(O)parser is a shift-reduce parser that uses zero tokens of lookahead to determine what action to take (hence the 0). This means that in any configuration of the parser, the parser must have an unambiguous action to choose-either it shifts a specific symbol or applies a specific reduction. If there are ever two or more choices to make, the parser fails and the grammar is not LR(O).

An LR parser makes shift-reduce decisions by maintaining states to keep track of parsing. States represent a set of *items.*

If I is a set of items for a grammar G, then *CLOSURE(I) *is the set of items constructed from I by the two rules.

• Initially, add every item I to *CLOSURE(I).*

• If A ---> αB,β is in *CLOSURE(I) *and B ---> ɣ is a production, then add the item B ---> • ɣ to *CLOSURE(i), *if it is not already there. Apply this rule until no more items can be added to *CLOSURE (!).*

• Augmented grammar is defined with two functions, CLOSURE and GOTO. If G is a grammar with start symbol S, then augmented grammar G' is G with a new start symbol S' ---> S.

• The role of augmented production is to stop parsing and notify the acceptance of the input i.e., acceptance occurs when and only when the parser performs reduction by S' ---> S.

Consider grammar for matched parentheses

1. S' ---> S

2. S' ---> (S) S

3. S' ---> Ɛ

The LR(O) DFA of grammar G is shown below

In states: 0, 2 and 4 parser can shift (and reduce *Ɛ *to S)

**Conflicts**

Conflicts are the situations which arise due to more than one option to opt for a particular step of shift or reduce.

• Two kinds of conflicts may arise.

Shift-reduce and *reduce-reduce.*

• In state 0 parser encounters a conflict.

It can shift state 2 on stack when next token is (.

It can reduce production 2: S ---> Ɛ on +.

This is a called a *shift-reduce *conflict.

This conflict also appears in states 2 and 4.

** Shift-reduce **conflict parser can shift and can reduce.

** Reduce-reduce **conflict two (or more) productions can be reduced.

• SLR(l) parsing increases the power of LR(O) significantly.

Look ahead token is used to make parsing decisions

Reduce action is applied more selectively according to FOLLOW set.

• A grammar is SLR(l) if two conditions are met in every state.

If A ---> α *•* x ɣ and B ---> β ,•then token *x Ɛ *FOLLOW(B).

If A ---> α *• *and B ---> • then FOLLOW(A) П FOLLOW(B) = Ø.

• Violation of first condition results in *shift-reduce conflict.*

A---> α *•* x ɣ and B ---> β*•* and *x *Ɛ FOLLOW(B) then ...

Parser can shift *x *and reduce B ---> β*.*

• Violation of second condition results in *reduce-reduce conflict.*

A---> α *•* and B ---> β,•and *x *Ɛ FOLLOW(A) n FOLLOW(B).

Parser can reduce A ---> α and B ---> β,*.*

• SLR(l) grammars are a superset of LR(O) grammars.

• Even more powerful than SLR(l) is the LR(l) parsing method.

• LR(l) includes LR(O) items and a look ahead token in itemsets.

• An LR(l) item consists of,

o Grammar production rule.

o Right-hand position represented by the dot and.

o Lookahead token.

o A --->X_{1} · · · X_{i} • X_{i+1} · · · *X _{n}, *

• The • represents how much of the right-hand side has been seen,

o X_{1} · · · X_{i} appear on top of the stack.

o X_{i+l} · · · *X _{n} *are expected to appear on input buffer.

• The lookahead token *l *is expected after X_{1} · · · *X _{n} *appears on stack.

• An LR(l) state is a set of LR(l) items.

• LALR stands for *lookahead LR *parser.

• This is the extension of LR(O) items, by introducing the one symbol of lookahead on the input.

• It supports large class of grammars.

• The number of states is LALR parser is lesser than that of LR( 1) parser. Hence, LALR is preferable as it can be used with reduced memory.

• Most syntactic constructs of programming language can be stated conveniently.

Steps to construct LALR parsing table

• Generate LR(l) items.

• Find the items that have same set of first components (core) and merge these sets into one.

• Merge the *goto's *of combined itemsets.

• Revise the parsing table of LR(l) parser by replacing states and *goto's *with combined states and combined *goto's *respectively.

About Dinesh Thakur

Dinesh Thakur holds an B.SC (Computer Science), MCSE, MCDBA, CCNA, CCNP, A+, SCJP certifications. Dinesh authors the hugely popular Computer Notes blog. Where he writes how-to guides around Computer fundamental , computer software, Computer programming, and web apps. For any type of query or something that you think is missing, please feel free to Contact us.

Search Content

Popular Article

#### Phases of Compiler - Compiler Design

#### What is Translators? Different type of translators

#### Input Buffering – Compiler Design

#### Lexical Analysis – Compiler Design

#### Compiler Construction tools - Compiler Design

#### Convert Regular Expression to DFA - Compiler Design

#### LR Parsers - Compiler Design

#### Syntax Directed Definition (SDD) and Types of Syntax Directed Definitions

#### Type of Parsing

#### What is Parse Tree? - Compiler Design

#### What is LEX? Use of Lex.

#### What is Parser (Syntax analysis)? Error Handling and Recovery in Syntax Analyzer

#### Regular Expression - Compiler Design

#### Grouping of Phases - Compiler Design

#### Compiler Design - Language Processing System

#### What is Context Free Grammars? Compiler Design

#### What is Derivations? - Compiler Design

#### Error Handling and Error Recovery In Syntax Analyzer

Basic Courses

Advance Courses