Title: Design Patterns for Recursive Descent Parsing
1Design Patterns for Recursive Descent Parsing
- Dung Nguyen, Mathias Ricken Stephen Wong
- Rice University
2RDP in CS2?
- Context objects-first intro curriculum which
already covers - Polymorphism
- Recursion
- Design patterns (visitors, factories, etc)
- OOD principles
- Want good OOP/D example
- Want a relevant CS topic
- Recursive Descent Parsing
- Smooth transitions from simple to complex
examples, developing abstract model - ? change in grammar ? ? change in code
3The Problem of Teaching RDP
Mutual Recursion!
A complex, isolated, advanced topic for upper
division only
Global Analysis
? ?
New Grammar
New Code
4Object-Oriented Approach
- Grammar must drive any processing related to it,
e.g. parsing. - ? Model the grammar first
- Terminal symbols (tokens)
- Non-Terminal symbols (incl. start symbol)
- Rules
- Driving forces
- Decouple intelligent tokens from rules ? visitors
to tokens - Extensible system open ended number of tokens ?
extended visitors
Then Parsing will come!
5Representing Tokens
- Intelligent Tokens ? No type checking!
- Decoupled from processing ? Visitor pattern
- For LL(1) grammars, in any given situation, the
token determines the parsing action taken - ? Parsing is done by visitors to tokens
6Processing Tokens with Visitors
Standard Visitor Pattern
Visitor caseA caseB
visits
Token A
calls
visits
calls
Token B
But we want to be able to add an unbounded number
of tokens!
7Processing Tokens with Visitors
Visitor Pattern modified with Chain-of-Responsibil
ity
Visitor caseA
VisitorA defaultCase
visits
Token A
caseA
calls
delegates to
visits
chain
calls
Token B
visits
VisitorB defaultCase
caseB
caseB
calls
Handles Any Types of Tokens!
8Modeling an LL(1) Grammar
E ?
E
F
F
E1 ?
F ?
empty
E1
num id
- Left-Factoring
- Make grammar predictively parsable
9Modeling an LL(1) Grammar
E ?
F
E1
E1 ?
empty
E
F ?
num
id
E1a ?
E1a
F ?
num
id
F1 ?
F1
F2 ?
F2
- In multiple rules (branches), replace sequences
and tokens with unique non-terminal symbols - Branches only contain non-terminals
10Modeling an LL(1) Grammar
- Branches modeled by inheritance (is-a)
A ? B C
- Sequences modeled by composition (has-a)
S ? X Y
11Object Model of Grammar
E ? F E1 E1 ? empty E1a E1a ? E F ? F1
F2 F1 ? num F2 ? id
Grammar StructureClass Structure
12Modeling an LL(1) Grammar
No Predictive Parsing Table!
Declarative, not procedural
Model the grammar, not the parsing!
13Detailed and Global Analysis
Abstract and Local Analysis!
E ?
F
E1
To process E, we must firstknow about F and E1
To process E, we must have the ability to
process F and E1, independent of how either F
or E1 are processed!
E1 ?
empty
E1a
E
E1a ?
E1a
But to process F, we must first know about F1
and F2
F ?
F1
F2
Since parsing is done with visitors to tokens,
all we need to parse E are the visitors to parse
F and E1.
num
F1 ?
F1
but to process F1, we must firstknow about num!
id
F2 ?
But E doesnt know what it takes to make the F
and E1 parsing visitors
The processing of one rule requires deep
knowledge of the whole grammar!
We need abstract construction of the visitors
Or does it??...
Abstract Factories Decouple Rules
14Factory Model of Parser
E ? F E1 E1 ? empty E1a E1a ? E F ? F1
F2 F1 ? num F2 ? id
Parser StructureFactory Structure Grammar
represented purely with composition
15Extending the Grammar
- Adding new tokens and rules
- Highly localized impact on code
- No re-computing of prediction tables
16E ? S E1E1 ? empty E1aE1a ? ES ? P
TP ? (E)T ? F T1T1 ? empty T1aT1a
? SF ? F1 F2F1 ? numF2 ? id
E ? F E1E1 ? empty E1aE1a ? EF ? F1
F2F1 ? numF2 ? id
17Parser Demo
(If time permits)
We change your grammar in two minutes while you
wait!
gram
18Automatic Parser Generator
- No additional theory needed for generalization
- No fixed-points, FIRST and FOLLOWS sets
- Kooprey
- Parser generator BNF ? Java
- kouprey (noun) a rare short-haired ox (Bos
sauveli) of forests of Indochina
(Merriam-Webster Online) - Extensions
- Skip generation of source, create parser at
runtime
19Conclusion
- Simple enough to introduce in CS2 course (_at_Rice
near end of CS2) - Teaches an abstraction of grammars and parsing
- Reinforces foundational OO principles
- Abstract representations
- Abstract construction
- Decoupled systems
- Recursion
http///www.exciton.cs.rice.edu/research/sigcse05