The CockeKasamiYounger Algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

The CockeKasamiYounger Algorithm

Description:

Assume A is in the cell (i,j), then form A we can derive a string xj ... The content of a cell is the result of at most n-1 pairings of lower cells. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 15
Provided by: annn6
Category:

less

Transcript and Presenter's Notes

Title: The CockeKasamiYounger Algorithm


1
The Cocke-Kasami-Younger Algorithm
An example of a CFG in CNF
  • An example of bottom-up parsing,
  • for CFG in Chomsky normal form

G S ? AB BB A ? CC AB a B ? BB CA
b C ? BA AA b
2 possibilities forfirst production
S
S
S
B
B
A
B
A
A
aa bb
a abb
aab b
S
S
S
Possible splits for the string aabb
B
B
B
B
B
B
aa bb
a abb
aab b
2
The CKYounger Algorithm
  • Provides an efficient way of generating substring
    devisions and checking whether each substring can
    be legally derived

Thus if the cell (4,1) contains S, string ? L(G)
A non terminal will be placed in the cell
(i,j) if it can derive i consecutive symbolsof
the string starting at jth position
If the cell (i,j) contains the nonterminal A1 and
the cell (i,ij) contains the nonterminal A2 and
there is a production A ? A1 A2 then the cell
(ii,j) will contain the nonterminal A
3
The CKYounger Algorithm
  • Provides an efficient way of generating substring
    devisions and checking whether each substring can
    be legally derived

G S ? AB BB A ? CC AB a B ? BB CA
b C ? BA AA b
A nonterminal will be placed in the cell (i,j) if
it can derive i consecutive symbolsof the string
starting at jth position
4
The Cocke-Kasami-Younger Algorithm
  • Relation derivation tree and pyramid

S
S
S
B
B
A
A
B
A
aa bb
aab b
a abb
5
S
S
S
B
B
B
B
B
B
aa bb
a abb
aab b
6
The Cocke-Kasami-Younger Algorithm
  • Builds up the pyramid in a bottom-up fashion

G S ? AB BB A ? CC AB a B ? BB CA
b C ? BA AA b
Step 1, fill the cell at row 1
Because of A ? a
Because of B ? b, and C ? b
7
The Cocke-Kasami-Younger Algorithm
  • Builds up the pyramid in a bottom-up fashion

G S ? AB BB A ? CC AB a B ? BB CA
b C ? BA AA b
B is in cell (2,3) Because of B ? BB and B is in
cell (1,3) and B is in cell (1,4)
Step 2, fill the cell at row 2
C is in cell (2,1) Because of C ? AA and A is in
cell (1,1) and A is in cell (1,2)
S is in cell (2,3) Because of S ? BB and B is in
cell (1,3) and B is in cell (1,4)
A is in cell (2,2) Because of A ? AB and A is in
cell (1,2) and B is in cell (1,3)
8
The Cocke-Kasami-Younger Algorithm
  • Builds up the pyramid in a bottom-up fashion

G S ? AB BB A ? CC AB a B ? BB CA
b C ? BA AA b
C is in cell (3,1) Because of C ? AA and A is in
cell (1,1) and A is in cell (2,2)
Step 3, fill the cell at row 3
? is in cell (3,1) Because of ? ? XY X is in
cell (1,1) Y is in cell (2,2) or X is in cell
(2,1) Y is in cell (1,3) or
A is in cell (3,1) Because of C ? CC and C is in
cell (2,1) and C is in cell (1,3)
9
The Cocke-Kasami-Younger Algorithm
  • Builds up the pyramid in a bottom-up fashion

G S ? AB BB A ? CC AB a B ? BB CA
b C ? BA AA b
Since S is at the top, aabb ? L(G)
Step 4, fill the cell at row 4
S
General rule ? is in cell (i,j) Because of ? ?
XY X is in cell (m,j) Y is in cell (i-m,jm) with
1 m i-1
Step i
B
A
b
C
C
A
A
b
a
a
10
Theorem
  • The CKY algorithm is correct

Given a grammar (T, N, P, S) in Chomsky normal
form and w x1 ... xn ? T then A ? N is in cell
(i,j) of the CKY pyramid if and only if A ? xj
... xji-1 Proof by induction on the row
number Base step i 1 in row 1 we get the
nonterminals from which length 1 substrings of
the string to parse can be derived. This is only
possible by using productions of type A ? a.
Thus if A is in cell (1,i), 1 i n, then A ?
xi ? P, thus A ? xi Induction hypothesis
theorem applies for all rows lt i, i.e. all
substrings of length lt i.


11
Induction step we first prove ? Assume a
derivation of a substring of length i, igt1, A ?
BC ? xj ... xji-1, then for some m gt 0there
must hold that B ? xj ... xjm-1 and C ? xjm ...
xji-1. Thus by the induction hypothesis if B is
in cell (m,j) and C in the cell (i-m, jm).
Since there is a production A ? BC, A is in the
cell (i,j). We now prove ? Assume A is in the
cell (i,j), then form A we can derive a string xj
... xji-1, with length i gt 1, therefore there
must be a production of the form A ? BC with B,C
? N, and for some m, 1 m i-1, B is in cell
(m,j) and C in the cell (i-m, jm). By the
induction hypothesis we have B ? xj ... xjm-1
and C ? xjm ... xji-1. Therefore we can write
A ? BC ? xj ... xji-1 and conclude A ? xj ...
xji-1





Both cells have a lower row , so induction
hypothesis applies
12
The complexity of the CKY algorithm
  • The time complexity for w?L(G)?
  • Let G (T, N, P, S) be a CFG in Chomsky normal
    form, with k N.
  • Then using the CKY algorithm, w ? L(G) can be
    decided in time proportional to n3 ,
  • where n w.
  • Proof
  • First notice that
  • the number of entries in a cell is at most k.
  • maximum number of productions is k3,
  • I Complexity for row 1 cells
  • For each A ? N, we have to check if it can be
    placed in cell(1,i), i.e. if A derives (in 1
    step)
  • the terminal on position i. There are k
    nonterminals, thus cost per cell is k X 1.
  • There are n row 1 cells, thus total cost for row
    1 kn.

Each nonterminal can only occur once in a cell
A ? BC
Cfr. 3
13
II Complexity for cell in a row gt 1 The content
of a cell is the result of at most n-1 pairings
of lower cells. For each paring at most k
nonterminals are paired with at most k other
nonterminals, and each pairing is checked against
at most k3 productions. Thus for each cell cost
k X k X k3 X 1 X (n-1) k5 X (n-1) There
are (n-1) (n-2) . 1 n(n-1)/2 cells in
rows 2 to n, thus total cost for these rows is
bounded above by n(n-1)/2 X k5 X (n-1) To
conclude The total cost is bounded above by
kn n(n-1)/2 X k5 X (n-1)
See slide 119
Cfr. 1 and 2
Since k is independent of n the conclusion is
O(n3)
14
Some remarks
  • Not really of practical use since
  • O(n3) is too slow
  • the grammar must be converted to CNF
  • only tests membership, this is not the
    complexity for building the derivation tree

See course on compilers for faster algorithms
Semantics!!!!
To think about CKY and unambiguous grammars.
Write a Comment
User Comments (0)
About PowerShow.com