Syntax analysis is the second phase of compiler.
Syntax analysis is also known as parsing.
Parsing is the process of determining whether a string of tokens can be generated by a grammar.
It is performed by syntax analyzer which can also be termed as parser.
In addition to construction of the parse tree, syntax analysis also checks and reports syntax errors accurately.
(eg.)
C = a + b * 5
Syntax tree can be given as,
Parser is a program that obtains tokens from lexical analyzer and constructs the parse tree which is passed to the next phase of compiler for further processing.
Parser implements context free grammar for performing error checks.
We’ll be covering the following topics in this tutorial:
Types of Parser
• Top down parsers Top down parsers construct parse tree from root to leaves.
• Bottom up parsers Bottom up parsers construct parse tree from leaves to root.
Role of Parser
Figure depicts the role of parser with respect to other phases.
• Once a token is generated by the lexical analyzer, it is passed to the parser.
• On receiving a token, the parser verifies the string of token names that can be generated by the grammar of source language.
• It calls the function getNextToken(), to notify the lexical analyzer to yield another token.
• It scans the token one at a time from left to right to construct the parse tree.
• It also checks the syntactic constructs of the grammar.
Need for Parser
• Parser is needed to detect syntactic errors efficiently.
• Error is detected as soon as a prefix of the input cannot be completed to form a string in the language. This process of analyzing the prefix of input is called viable-prefix property.
Error Recovery Strategies
Error recovery strategies are used by the parser to recover from errors once it is detected. The simplest recovery strategy is to quit parsing with an error message for the first error itself.
Panic Mode Recovery
Once an error is found, the parser intends to find designated set of synchronizing tokens by discarding input symbols one at a time.
Synchronizing tokens are delimiters, semicolon or } whose role in source program is clear.
• When parser finds an error in the statement, it ignores the rest of the statement by not processing the input.
• This is the easiest way of error-recovery.
• It prevents the parser from developing infinite loops.
Advantages
• Simplicity.
• Never get into infinite loop.
Disadvantage
• Additional errors cannot be checked as some of the input symbols will be skipped.
Phrase Level Recovery
Parser performs local correction on the remaining input when an error is detected.
• When a parser finds an error, it tries to take corrective measures so that the rest of inputs of statement allow the parser to parse ahead.
• One wrong correction will lead to an infinite loop.
The local correction may be
• Replacing a prefix by some string.
• Replacing comma by semicolon.
• Deleting extraneous semicolon.
• Insert missing semicolon.
Advantage
• It can correct any input string.
Disadvantage
• It is difficult to cope up with actual error if it has occurred before the point of detection.
Error Production
Productions which generate erroneous constructs are augmented to the grammar by considering common errors that occur.
These productions detect the anticipated errors during parsing.
Error diagnostics about the erroneous constructs are generated by the parser.
Global Correction
There are algorithms which make changes to modify an incorrect string into a correct string.
These algorithms perform minimal sequence of changes to obtain globally least-cost correction.
When a grammar G and an incorrect string pis given, these algorithms find a parse tree for a string q related top with smaller number of transformations.
The transformations may be insertions, deletions and change of tokens.
Advantage
• It has been used for phrase level recovery to find optimal replacement strings.
Disadvantage
• This strategy is too costly to implement in terms of time and space.