Parsers and Grammar - PowerPoint PPT Presentation

About This Presentation
Title:

Parsers and Grammar

Description:

Structures such as statement blocks, methods, and ... Bottoms-up (LR) Parsing Example (1) tokens: ID = ( NUMBER * ID NUMBER ) * ID ... Bottoms-up Parsing ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 20
Provided by: kenneth67
Category:

less

Transcript and Presenter's Notes

Title: Parsers and Grammar


1
Parsers and Grammar

2
Categories of Grammar Rules
  • Declarations or definitions.
  • AttributeDeclaration
  • final static access datatype
    expression , datatype expression
  • access ' public ' ' protected ' '
    private '
  • Statements.
  • assignment, if, for, while, do_while
  • Expressions,
  • such as the examples in these slides.
  • Structures such as statement blocks, methods, and
    entire classes.
  • StatementBlock '' Statement ''

3
Parsing Algorithms (1)
  • Broadly divided into LL and LR.
  • LL algorithms match input directly to left-side
    symbols, then choose a right-side production that
    matches the tokens. This is top-down parsing
  • LR algorithms try to match tokens to the
    right-side productions, then replace groups of
    tokens with the left-side nonterminal. They
    continue until the entire input has been
    "reduced" to the start symbol
  • LALR (look-ahead LR) are a special case of LR
    they require a few restrictions to the LR case
  • Reference Sebesta, section 4.3 - 4.5.

4
Parsing Algorithms (2)
  • Look ahead
  • algorithms must look at next token(s) to decide
    between alternate productions for current tokens
  • LALR(1) means LALR with 1 token look-ahead
  • LL(1) means LL with 1 token look-ahead
  • LL algorithms are simpler and easier to
    visualize.
  • LR algorithms are more powerful can parse some
    grammars that LL cannot, such as left recursion.
  • yacc, bison, and CUP generate LALR(1) parsers
  • Recursive-descent is a useful LL algorithm that
    "every computer professional should know"
    Louden.

5
Top-down (LL) Parsing Example
  • For the input z (2x 5)y - 7
  • tokens ID ( NUMBER ID NUMBER ) ID -
    NUMBER
  • Grammar rules (as before)

assignment gt ID expression expression gt
expression term expression - term
term term gt term factor term /
factor factor factor gt ( expression
) ID NUMBER
6
Top-down Parsing Example (2)
  • The top-down parser tries to match input to left
    sides.

ID ( NUMBER ID NUMBER ) ID - NUMBER
assignment ID expression
ID expression
- term ID term
- term ID term
factor - term ID factor
factor - term ID ( expression
factor - term ID ( expression term )
factor - term ID ( term term )
factor - term ID ( term factor term
) factor - term ID ( factor
ID factor ) factor - term ID ( NUMBER
ID NUMBER ) factor - term ID ( NUMBER
ID NUMBER ) ID - factor ID ( NUMBER
ID NUMBER ) ID - ID

7
Top-down Parsing Example (3)
  • Problem in example we had to look ahead many
    tokens in order to know which production to use.
  • This isn't necessary provided that we know the
    grammar is parsable using LL (top-down) methods.
  • There are conditions on the grammar that we can
    test to verify this. (see The Parsing Problem)
  • Later we will study the recursive-descent
    algorithm which does top-down parsing with
    minimal look-ahead.

8
Bottoms-up (LR) Parsing Example (1)
  • tokens ID ( NUMBER ID NUMBER ) ID -
    NUMBER

parser ID ... read (shift) first
token factor ... reduce factor
... shift FAIL Can't match any rules
(reduce) Backtrack and try again ID ( NUMBER
... shift ID ( factor ... reduce ID ( term
... sh/reduce ID ( term ID ... shift ID
( term factor ... reduce ID ( term
... reduce ID ( term ... shift ID (
expression NUMBER ... reduce/sh ID (
expression factor ... reduce ID ( expression
term ... reduce
Action
9
Bottoms-up Parsing Example (2)
  • tokens ID ( NUMBER ID NUMBER ) ID
    -NUMBER

input ID ( expression ... reduce ID (
expression ) ... shift ID factor ...
reduce ID factor ... shift ID
term ID ... reduce/sh ID term factor
... reduce ID term ... reduce ID
term - ... shift ID expression - ...
reduce ID expression - NUMBER ... shift ID
expression - factor ... reduce ID expression -
term ... reduce ID expression
shift assignment reduce SUCCESS!!
Start Symbol
10
Bottoms-up Parsing Example (3)
  • LR parsing processes the input stream from the
    Left and tries to match the input to the Right
    side of a production.
  • When something matches, it reduces the expression
    to a left side non-terminal symbol.
  • Repeat the process until the entire input stream
    is matched.
  • This could potentially be an O(n3) task, but
    Knuth and others devised a table-based algorithm
    that is O(n).

11
The Parsing Problem
12
The Parsing Problem
  • Top-down parsers must decide which production to
    use based on the current symbol, and perhaps
    "peeking" at the next symbol (or two...).
  • Predictive parser a parser that bases its
    actions on the next available token (called
    single symbol look-ahead).
  • Two conditions are necessary see Louden, p.
    108-110

13
The Parsing Problem (cont.)
  • Condition 1 the ability to choose between
    multiple alternatives, such as A ? ?1 ?2
    ... ?n
  • define First(?) set of all tokens that can be
    the first token for any production cascade that
    produces symbol ?
  • then a predictive parser can be used for rule A
    if
  • First(?1) ? First(?2) ... ? First(?n) is empty.
  • Condition 2 the ability of the parser to detect
    presence of an optional element, such as A ? ?
    b .
  • Can the parser detect for certain when b is
    present?

14
The Parsing Problem (cont.)
  • Example list ? expr list.
  • How do we know that list isn't part of expr?
  • define Follow( ? ) set of all tokens that can
    follow the non-terminal ? some production. Use a
    special symbol () to represent the end of input
    if ? can be the end of input.
  • Example Follow( factor ) , -, , /, ),
    while Follow( term ) , /, ),
  • then a predictive parser can detect the presence
    of optional symbol b if First( b ) ? Follow( b )
    is empty.

15
Review and Thought Questions
16
Lexics vs. Syntax vs. Semantics
  • Division between lexical and syntactic structure
    is not fixed
  • number can be a token or defined by a grammar
    rule.
  • Implementation can often decide
  • scanners are faster
  • parsers are more flexible
  • error checking of number format as regex is
    simpler
  • Division between syntax and semantics is not
    fixed
  • we could define separate rules for IntegerNumber
    and FloatingPtNumber , IntegerTerm,
    FloatingPtTerm, ... in order to specify which
    mixed-mode operations are allowed.
  • or specify as part of semantics

17
Numbers Scan or Parse?
  • We can construct numbers from digits using the
    scanner or parser. Which is easier / better ?
  • Scanner Define numbers as tokens
  • number -\d
  • Parser grammar rules define numbers (digits are
    tokens)
  • number gt '-' unsignednumber unsignednumber
  • unsignednumber gt unsignednumber digit digit
  • digit gt 0 1 2 3 4 5 6 7 8
    9

18
Is Java 'Class' grammar context-free?
  • A class may have static and instance attributes.
  • An inner class or local class have same syntax as
    top-level class, but
  • may not contain static members (except static
    constants)
  • inner class may access outer class using
    OuterClass.this
  • local class cannot be "public"
  • Does this means the syntax for a class depends on
    context?

19
Alternative operator notation
  • Some languages use prefix notation operator
    comes first
  • expr gt expr expr expr expr NUMBER
  • Examples
  • 2 3 4 means (2 3) 4
  • 2 3 4 means 2 (3 4)
  • Using prefix notation, we don't have to worry
    about precedence of different operators in BNF
    rules !
Write a Comment
User Comments (0)
About PowerShow.com