Compiler Design (MCA554) – Unit 2: Parsing and Syntax Analysis

 Compiler Design (MCA554) – Unit 2: Parsing and Syntax Analysis

Based on MCA Semester III Syllabus 
---
Introduction to Parsing
Parsing is the second phase of a compiler.
After Lexical Analysis generates tokens, the Parser checks whether the sequence of tokens follows the grammar rules of the programming language.
Example
Source Code:
a = b + c;
Tokens:
id = id + id ;
Parser verifies whether the token sequence is grammatically correct.

---
Role of Parser
The parser performs the following tasks:
Checks syntax errors
Builds parse trees
Verifies grammar rules
Reports syntax violations

Position in Compiler
Source Program
      ↓
Lexical Analyzer
      ↓
Parser
      ↓
Semantic Analyzer

---
Context Free Grammar (CFG)
A Context Free Grammar is used to describe the syntax of programming languages.
A CFG consists of:
Terminals
Non-terminals
Productions
Start Symbol

Example Grammar
E → E + T
E → T
T → id
Where:
E = Expression
T = Term
id = Identifier


---
Derivation
Derivation means generating a string using grammar rules.
Example
Grammar:
E → E + T
E → T
T → id
Derivation:
E
→ E + T
→ T + T
→ id + T
→ id + id

---
Parse Tree
A Parse Tree is a graphical representation of derivation.
Example
E
      / | \
     E + T
     | |
     T id
     |
    id
Parse Trees help understand the structure of expressions.

---
Ambiguous Grammar
A grammar is ambiguous if one string can have multiple parse trees.
Example
E → E + E
E → E * E
E → id
Expression:
id + id * id
Can have multiple interpretations.

---
Left Recursion
A grammar is left recursive when a non-terminal appears first on the right side of its own production.
Example
E → E + T
E → T
This is left recursive.

---
Elimination of Left Recursion
Original Grammar:
E → E + T | T
Converted Grammar:
E → T E'
E' → + T E' | ε
Advantages:
Suitable for top-down parsing
Avoids infinite recursion


---
Left Factoring
Used when multiple productions share common prefixes.
Example
Original:
A → if E then S
A → if E then S else S
After Left Factoring:
A → if E then S A'
A' → else S | ε
Benefits:
Simplifies parsing
Reduces ambiguity


---
Dangling Else Problem
Consider:
if(condition1)
   if(condition2)
      statement1;
   else
      statement2;
Question:
Which "if" does the "else" belong to?
This ambiguity is called the Dangling Else Problem.

---
Top Down Parsing
Top-down parsers start from the root and construct the parse tree downward.
Methods:
1. Recursive Descent Parsing

2. Predictive Parsing



---
Recursive Descent Parsing
Uses recursive procedures for each grammar rule.
Example:
E()
T()
F()
Advantages:
Easy to implement

Disadvantages:
Cannot handle left recursion


---
Predictive Parsing
A special recursive descent parser without backtracking.
Requirements:
No left recursion
Left factored grammar

Advantages:
Fast
Efficient


---
Bottom Up Parsing
Bottom-up parsers start from input symbols and build toward the start symbol.
Process:
Input String
      ↓
Reductions
      ↓
Start Symbol

---
Shift Reduce Parsing
Most important Bottom-Up Parsing technique.
Operations:
Shift
Push input symbol onto stack.
Reduce
Replace symbols using grammar rules.
Accept
Parsing successful.
Error
Invalid input.

---
Example
Input:
id + id
Actions:
Shift id
Reduce T→id
Reduce E→T
Shift +
Shift id
Reduce T→id
Reduce E→E+T
Accept

---
Handle
A Handle is a substring that matches the right side of a production and can be reduced.
Example:
E → E + T
Handle:
E + T

---
Handle Pruning
Process of repeatedly reducing handles until the start symbol is obtained.

---
Shift Reduce Conflicts
Two common conflicts:
Shift-Reduce Conflict
Parser cannot decide:
Shift?
or
Reduce?

---
Reduce-Reduce Conflict
Parser cannot decide which reduction to apply.

---
LR Parsers
LR means:
L = Left to Right Scanning
R = Rightmost Derivation in Reverse
LR Parsers are powerful Bottom-Up Parsers.

---
Types of LR Parsers
1. SLR Parser
Simple LR Parser
Advantages:
Easy implementation

Disadvantages:
Less powerful


---
2. CLR Parser
Canonical LR Parser
Advantages:
Most powerful

Disadvantages:
Large parsing tables


---
3. LALR Parser
Look Ahead LR Parser
Advantages:
Compact tables
Widely used

Used in:
Compiler construction tools


---
Parsing Ambiguous Grammars
Ambiguous grammars create multiple parse trees.
Solutions:
Rewrite grammar
Use precedence rules
Use associativity rules


---
YACC
YACC stands for:
Yet Another Compiler Compiler
Used to generate parsers automatically.
Input:
Grammar Rules
Output:
Parser Program
Works together with:
LEX
LEX → Token Generator
YACC → Parser Generator

---
Important Exam Questions
Short Questions
1. Define Parsing.

2. What is CFG?

3. Define Parse Tree.

4. What is Left Recursion?

5. What is Left Factoring?

6. Define Handle.

7. What is Shift Reduce Parsing?

8. What is YACC?



---
Long Questions
1. Explain Context Free Grammar with examples.

2. Explain Parse Tree and Derivation.

3. Discuss Left Recursion and its elimination.

4. Explain Predictive Parsing.

5. Explain Shift Reduce Parsing with example.

6. Compare SLR, CLR and LALR Parsers.

7. Explain YACC and its role in compiler design.



---
Quick Revision Notes
Parsing = Syntax checking phase.
CFG = Context Free Grammar.
Parse Tree = Graphical representation of derivation.
Left Recursion = Non-terminal appears first on RHS.
Left Factoring = Removes common prefixes.
Top Down = Root → Leaves.
Bottom Up = Leaves → Root.
Shift Reduce = Most common Bottom-Up technique.
LR = Left-to-right scan, Rightmost derivation reverse.
SLR < LALR < CLR (Power).
LEX = Scanner Generator.
YACC = Parser Generator.



Comments

Popular posts from this blog

Raster scan Vs Vector Scan

MCA SYLLABUS ALLAHABAD UNIVERSITY 2025

📘 PAPER 4 – DESIGN & ANALYSIS OF ALGORITHMS (UNIT 1 – INTRODUCTION & SORTING TECHNIQUES ) university of allahabad