Title: Formal Languages and Automata in Programming Languages
1Formal Languages and Automata in Programming
Languages
2Motivation
The problem of parsing structured text is very
common Consider the structure of email addresses
(using a grammar) ltemailAddressgt ltpersongt _at_
lthostgt ltpersongt ltwordgt lthostgt ltwordgt
ltwordgt.lthostgt Describe and recognize email
addresses in arbitrary text.
3Languages and Automata in Programming Languages
- Regular languages
- Recognized(accepted) by finite automata
- Useful for tokenizing program text (lexical
analysis) - Context-free languages
- Recognized(accepted) by pushdown automata
- Useful for parsing the syntax of a program
4Deterministic Finite Automata (DFA)
- Q finite set of states
- S finite set of letters (alphabet)
- d QxS -gt Q (transition function)
- q0 start state (in Q)
- F set of accept states (subset of Q)
- Acceptance input consumed with the automaton in
a final state.
5Example of DFA
1
0
1
q2
q1
0
Accepts all strings that end in 1
6Another Example of a DFA
S
b
a
b
a
r1
q1
a
b
a
b
q2
r2
a
b
Accepts all strings that start and end with a
OR start and end with b
7Non-deterministic Finite Automata (NFA)
- Transition function is different
- d QxSe -gt P(Q)
- P(Q) is the powerset of Q (set of all subsets)
- Se is the union of S and the special symbol e
(denoting empty)
String is accepted if there is at least one path
leading to an accept state, and input consumed.
8Example of an NFA
0, 1
0, 1
0, e
1
1
q1
q2
q3
q4
What strings does this NFA accept?
9Regular Expressions
- R is a regular expression if R is.
- a for some a in S.
- e (the empty string).
- member of the empty language.
- the union of two regular expressions.
- the concatenation of two regular expr.
- R1 (Kleene closure zero or more repetitions of
R1).
10Examples of Regular Expressions
0, 1 0 all strings that end in 0 0, 1
0 string that start with 1 or 0 followed
by zero or more 0s. 0, 1
all strings 0n1n, n gt0 not a regular
expression!!! The value of a regular expression
is a regular language (a set of
strings).Utilities such as AWK, GREP, PERL,
emacs, provide facilities for the description
and matchingof patterns using regular
expressions
11Important Theorems
- A language is regular if a regular expression
describes it. - A language is regular if a finite automaton
recognizes it. - DFAs and NFAs are equally powerful.
12Context-free Grammars
Context-free grammars are defined by substitution
rules Big Jim ate gree cheesegreen Jim
ate green cheese Jim ate cheese Cheese ate Jim
13Backus-Naur Form
- Context-free grammars are used to formally
describe the syntax of programming languages. - Every syntactically correct program is derived
using the context-free grammar of the language. - Parsing a program involves tracing such
derivation, given the context-free grammar and
the program. - See SlonnegerKurz for more details.
14Context-free Grammars
- A context-free grammar consists of
- V a finite set of variables
- S a finite set of terminals
- R a finite set of rules of the formvariable -gt
variable, terminal - S the start variable
15Pushdown Automata (PDA)
- A pushdown automaton consists of
- Q a set of states
- S input alphabet (of terminals)
- G stack alphabet
- d a set of transition rulesQ x Se x Ge -gt P(Q
x Ge)currentState, inputSymbol, headOfStack
-gtnewState, pushSymbolOnStack - q0 the start state
- F the set of accept states (subset of Q)
- Deterministic At most one move is possible from
any configuration
16How does a PDA accept?
- By final state
- Consume all the input while
- Reaching a final state
- By empty stack
- Consume all the input while
- Having an empty stack
- Set of final states is irrelevant
17Example of a PDA
e, e -gt
0, e-gt0
q2
q1
1, 0-gte
e, -gte
q3
q4
1, 0-gte
Notation a, b-gtc when PDA reads a from input,
it replaces b at the top of stack with c.
What does this PDA accept?
18Important Theorems
- A language is context-free iff a pushdown
automaton recognizes it - Non-deterministic PDA are more powerful than
deterministic ones
19Example of Context-free Language That Requires a
Non-deterministic PDA
w wR w belongs to 0, 1 i.e. wR is w
written backwards
Idea Non-deterministically guess the middle of
the input string
20The Solution
e, e -gt
0, e-gt0 1, e-gt1
q2
q1
e, e-gte
e, -gte
q3
q4
1, 1-gte0, 0-gte