Title: The reuse of grammars with embedded semantic actions
1The reuse of grammars with embedded semantic
actions
- Terence Parr
- University of San Francisco
2The goal and problem
- We want to reuse grammars (and fragments)
- All apps recognize same language
- Bug fixes propagate automatically
- Development / testing easier with reuse
- Sometimes we need embedded, unrestricted semantic
actions - Uses program comprehension, configuration file
readers, - But, actions within grammar locks into a specific
app
3Common rewriting approach
- Simply decouple syntax and semantics
- Can treat grammars like libraries
- Term rewriters use this approach
- ASFSDF, Stratego (also AST rewrites)
- TXL has concrete syntax transformation rules
- These systems have rewrites in nice concrete
declarative form - Can generate API for accessing implicitly-created
tree, visitors
4ANTLR rewriter strategy
- Have grammar build IR tree
- Isolate semantics in tree grammars e.g., walk
trees to annotate, build symbol tables, or make
use-def chains, - Also can trigger events like XML SAX
- For rewriting, ANTLR more raw than purely
declarative approach must describe AST, tree
grammar
5Decoupled approach issues
- New app must use verbatim syntax and AST
structure (if walking ASTs) or - Grammar metalanguages must support
- includes, inheritance, or modules
- ANTLR v2 supported grammar inheritance
- Subgrammar overrides syntax or AST rules
- Such metalanguage reuse mechanisms work best when
decoupled I.e., no actions in grammar
6What about non-rewrite apps?
- Cant decouple when you need an internal data
structure, not text output - Cannot escape need to execute actions in general
purpose language - External visitors or API calls are cumbersome,
lack grammatical context - Cost of proximity (actions in grammar) is
entangling syntax, semantics
7Reuse in presence of actions
- Existing reuse mechanisms dont deal well with
tweaks to actions from super or new actions - Currently, coders dup then modify existing
grammar to add actions, change rules, - Ok, except bug fixes arent propagated decidely
lowbrow and undignified
8Aside ANTLR LL() parsers
- LL() recursive descent generator unified
syntax lexer, parser, tree parser - decl not LL(k) fixed k but LL()
- DFA spins ahead to or
- No strict ordering with LR(k) decl is not LR(k)
- reduce-reduce conflict between modifier rules
- For non-LL(), ANTLR accepts PEGs
// simplified Java declaration rule e.g., //
"public static int i" // "public static int f()
..." decl variable_modifier variable
function_modifier function
9Current ANTLR grammar reuse
- Cobble together grammar from others
- Embed complete grammar in another
- E.g., Java within HTML or SQL within C
- Derive variant
- E.g., GCC vs C, vendor specific SQL
- Traverse trees from existing grammar
- E.g., program comprehension tools like lint
- Copy, modify semantics of existing grammar
- Change symbol table actions, tweak pretty printer
- Lots of cut-and-paste going on, few opportunities
for verbatim grammar reuse
10Grammar inheritance imperfect
- Single grammar inheritance (v2 did include)
- Works in some cases, but
- subgrammar can require changes to super
- fine grained control forces small unnatural rules
- copy-n-paste preferable to altering working super
- lots of subs hard to imagine overall language
11Grammar inheritance imperfect (Contd)
- Inheritance is blunt instrument for altering
actions strewn through super - At least with ANTLR, had to override entire rule
to change a single action - Could identify by name but would require labeling
all in case needed - Not sufficient might need to tweak inside of
action not replace - Is it a lost cause to reuse grammars with actions?
12Prototype-based grammar reuse
- Solution consider a tool not metalanguage
feature - Recall cut-n-paste has great flexibility only
problem is lack of change propagation (lose
single change point) - If we propagate changes from an original
prototype grammar to our modified version, that
smacks of revision control - ANTLR philosophy is to formalize what programmers
do naturally - So, formalize grammar reuse via prototype grammar
mechanism - Track changes between prototype and derived, not
history of changes to single file
13Grammar prototyping tools
- Begin project gderive Java.g MyJava.g
- Later, pick up changes gsync MyJava.g
- Could either update rules, actions or both
- Works well
- add actions to standard action-free grammar
- tweak actions to get slightly different app
- have multiple versions but diff actions (ANTLR
has 5 identical tree grammars but diff actions) - Yeah, but how is this different than diff3?
14Why not diff3?
- Must compare structure not text lines
- Text tools like diff3 cannot separate grammars
from actions etc(ID names.add(ID.text)) vs
ID - Cant see rule renaming as a rename since diff3
cant identify refactoring patterns
15Change patterns seen in antlr.g over 14 months
- new rules
- add new rule and refer to that
- extract rules and change references
- modify rules
- add/remove branch
- add/change actions
- change closure notations ()?, (), ()
- rename rules
- rule labels
- add new label and the references to the label
variables - rename label and update the references
- delete label and delete the references
- meta-language changes
- add token declarations, change options
16What does gsync look like?
- Computer-aided cant always do a safe automatic
merge - Visual diff will show different perspectives of
prototype and derived grammar with tabbed views,
one for each pattern it can detect - gsync could use grammar refactoring patch tools
to do the actual merge (e.g., work of Laemmel).
17ANTLR grammar composition
- Prototype grammar mechanism breaks down when
composing grammars - Might need to break up big grammar
- An Oracle 10g grammar yielded 129k lines of Java
code! (v2s include mech.) - Cant auto-split due to actions
- Let coder break up into logical chunks
- We need grammar composition now to put it back
together - Allows better organization, size control, and
opportunities for reuse
18Composition mechanism
- Root grammar imports dependent grammars
- Import operates like inheritance
- rule overriding
- polymorphic rule evocation
- Duplicates resolved in favor of rule imported
first - Imported grammars can import others
- Impl delegation model if R imports A, B
- ANTLR generates classes R, R_A, R_B
- R has 2 delegate pointers to R_A, R_B
- R_A, R_B have back delegator pointers to R
- Every rule can get to every other rule
19Composition example
parser grammar JavaDecl type 'int' decl
type ID '' type ID init '' init
'' INT
// Root grammar parser grammar Java import
JavaDecl prog decl type 'int' 'float'
overrides
calls Java.type
class Java extends Parser JavaDecl d void
prog() decl() void type() void
decl() d.decl() void init() d.init()
class JavaDecl extends Parser public Java
parent void decl() void init()
20Interesting effects
- Overriding alters lookahead
- rule type overridden, alters lookahead of rule
decl - FIRST3(decl) was (int, ID, ,) but is
now (int,float, ID, ,) - Polymorphism through delegate pointer
- Ref to rule r in delegate sees R.r if overridden
in R. - Here, JavaDecl.decl invokes Java.type via
parent.type() - Broke up Java.g from 22k line file to 6 smaller
21Prototypes and composition
- Even with composition, need prototype grammars to
alter pre-existing - Easy to combined mechanisms
- Delegates become prototypes
- Composer imports refined delegates
- Sync by syncing derived delegates from respective
prototypes then run ANTLR on the root grammar
22Summary
- Cant always decouple the semantics from syntax
- Nonrewriter apps need actions to build data
structures - Currently coders cut, paste, and modify
- Prototype grammars are like live cut-and-paste
changes to prototype merged into derived via
structured tree difference tool akin to RCS - Tool applicable to any metalanguage tool
- Grammar composition to factor large grammars
uses delegation to get rule polymorphism and
inheritance - Combination allows programmers to easily reuse
existing grammars or grammar fragments even in
presence of semantic actions