Title: ??af??e?a 1
1????t?s? ?????f???a?
??µ?? ?e??t?d?t?s??
2??a?s?? ?e??t??
- ??p?? s?µß???se???? p?? µp????? ?a
de??t?d?t????? - - d?µ?? p?? de??t?d?t??? s?µß???se????/?e?µe?a
f?s???? ????? (linguistic texts) - - d?µ?? p?????? de??t?d?t?s??
- St?? p??t? ?at?????a a?est?aµµ??a a??e?a
(inverted files), a??e?a ?p???af?? (signature
files), bitmaps - St?? de?te?? ?at?????a d??t?a ep??eµ?t??
(suffix trees), p??a?e? ep??eµ?t?? (suffix
arrays), ??????? ?ate?????µe??? ???f?? ???e??
(DAWG ?a? cDAWG). - 2. ???t??? ?p?????sµ?? ?? a??t??? d?µ?? e??a?
?at?????e? ??a RAM, ??a t? de?te?e???sa µ??µ? p??
?at?????? ? d?µ? string b-tree.
3Linguistic Text Indexing
- ???s? Se µe???e? s??????? ?e?µ????. ??????s?
p????f???a? se µ??f? pe??e??µ???? (se p??a
?e?µe?a ß??s?eta? ???e ????). ?p??t?s? se
e??t?µata. - ????d??
- ??est?aµµ??a ???e?a (Inverted Files)
- ???e?a ?p???af?? (Signature Files)
- Bitmaps
- Boolean Queries
- ??a?e??t??? (Disjunctive t1 ? t2 ?? tq )
- S??e??t??? (Conjunctive t1 ? t2 ?? tq )
- S??d?asµ?? t???.
- Ranked Queries
- ?p??????eta? ??a score ?µ???t?ta?.
- Proximity Queries
- ?es??aße? ??p??a ap?stas? µeta?? t?? ????.
5??est?aµµ??a ???e?a
?p?te????ta? ap? ??a ?e???? ?a? µ?a a?est?aµ???
??sta ??a ???e ??? µe de??te? p???
?e?µe?a. ?????? ??a s?µp?es? ???s? d-gaps ?a?
s?µp?es? t??? µe global ? local µe??d???. ??
global e??a? pa?aµet??p???µ??e? ? µ?
6????d?? S?µp?es?? ??est?aµµ???? ???e???
- Binary ? log N ? bits ??a ???e de??t?.
- Unary G?a ??a? a???µ? x ?????µe x 1 as???
?a? 1 µ?d?? st? t????. Ge???? as?µf??? µ???d??. - ? Unary t?? 1 ? log x ? a????????µe?? ap?
binary t?? x 2 (? log x ?). ?pa?t???ta?
pe??p?? 1 2log x bits. - d ? ??d???p???s? t?? 1 ? log x ?
a????????µe?? ap? binary t?? - x 2 (? log x ?). G?a µe?????? a???µ???
?pe???e? t?? ?. - ? a??t??? te?????? e??a? µ? pa?aµet??p???µ??e?,
st?? pa?aµet??p???µ??e? a??????e?ta? µ?a
µ??te??p???s? µe ???s? bernouli trials ?p??
prob(x)(1-p)x-1p, - ?a? µeta a??????e?ta? ??d???p???s? huffmann ?
7????d?? S?µp?es?? ??est?aµµ???? ???e???
- Global Bernoulli ???s? a???µ?t????
??d???p???s?? ? µ???d?? Golomb. - G?a µ?a pa??µet?? b, ???e a???µ?? xgt0
??d???p??e?ta? se d?? µ??? q1 se unary ?a? t?
?p????p? rx-qb-1 se d?ad??? (pe??p?? logb bits). - ??e? de???e? ?t? ??a
- ? te????? ?s?d??aµe? µe t?? optimal huffman
??d??a ??a ??a ?pe??? s????? p??a??t?t??. - ???s? pa?aµ?t??? b ??a ??e? t?? ??ste? µe b
(0.69?n) / f. - Local Bernoulli ??af??et??? b 0.69N / ft ??a
???e ??sta. ?a??te?? s?µp?es? ap? t?? global. - Interpolative ??µeta??e?eta? t? clustering.
S?µp???e? de??te? ?a? ??? d-gaps. ? p?? ap?d?t???
µ???d?? a??? ? p?? p???p????. ???????e? ??????
s?µp?es?? se ep?peda.
8????d?? S?µp?es?? ??est?aµµ???? ???e???
- Ge???? S?????s? Local ?a??te?e? ap? Global. ?a??
ap?d?s? ? local Bernoulli µe Golomb, ?a? ?, d
ap??? µe s?et??? ?a?? ap?d?s?, ????? apa?t?s?
pa?aµ?t???, e????a ???p?????ta?.
9???e?a ?p???af??
- Bitstring Signature Files Hash string ??a ???e
???. Se ???e ?e?µe?? µ?a ?p???af? p?? e??a? t? ?R
t?? hash strings t?? ???e?? t??. G?a ta queries
ft??????ta? query hash strings. ??e???? ??a false
matches. ?p?d?t??? ??a p?? s???e???µ??a queries. - Bitsliced Signature Files ??ast??f? p??a?a
?p???af?? se s????? bitslices. ??????s? ????te???
bits. ?a??te?a ??a a?a?? bitslices, t?te ????te?a
false matches.
10???e?a ?p???af??
- Blocked Signature Files ??????s? ????af?? se
blocks. ???e bit se ???e slice a?t?st???e? se ?
e???af?? (B blocking factor). ??af??e????
a?t?st????s? a???µ?? e???af?? se a???µ??? blocks
se ???e slice. ?e???? B ?d??e? se pe??ss?te????
e??????? ??a false matches, p??spe??se?? st?
d?s??. - Bitmaps ???e ???? ??te? 1 bit se ???e ?p???af?
µ????? ?s?? µe t?? a???µ? t?? ????. 1-1 hash
function. ?????sta, ??????a a??? ?ata?a??????
????. ?a??te?a ??a common terms. ?ß??d????
11S?µp?es? ???e??? ?p???af??
- ? s?µp?es? de? ap?d?de? ?p?? sta a?est?aµµ??a
a??e?a. ??e? ap??e?e? ?a? e?s??eta? ????ß??.
?e??ss?te?a false matches. ?pa?te?ta? ?????? ??a
ap?s?µp?es?. ???a?a? d?e????se?? slices st??
????a µ??µ?. ???a ??µata - ????af?? µe µ??? µe µe???e? d?a??µ??se?? (?aµ???
ap?d?s?). - ?eta?e???s? s?????sµ???? ???? (p??ß??µa a?
?p???e? d?aµ???as? ???? µe sp?????? ?????).
12S?????s? ?e??d?? Indexing
- ??a a?est?aµµ??? ??sta de? e??a? p?t? p?? p????
ap? ??a a?t?st???? bitslice. ??a ta a?est?aµµ??a
a??e?a de? ?????? p?t? pe??ss?te?e? p??sß?se??
st? d?s??. ??a??es? ?ta? t? ?e??????? de ????e?
st?? ????a µ??µ?. ?a bitslices µp??e? ?a
?pe?????? ?ta? ?? ???? t?? query e??a? p????? se
e?d???? pe??pt?se??.
- ????? st? ??s?? ta IF apa?t??? 6-10 ?a? ta
bitsliced SF t? 25-40 t?? s???????. - ?pa?t?se?? ???µ?? ?a SF de? apa?t??? ???? ??a
t? ?e???????, a??? ??a ??a ta ?p????pa ??st?
ß??s???ta? se s?????s?µa µe???? µe a?t? t?? IF. - ?atas?e?? Index p?? ?????ß??a ta SF (hash
functions ??p). - ???at?t?te? ?a?a?????a? ???e e??t?µa t?
?e????eta? d?af??et???? epe?e??ast??.
13S?????s? ?e??d?? Indexing
- ???µ???s? a?a??a?a se d??aµ???? ??. ??? e?????
??a ta bitstring SF. ?? ??st?? µe???eta? µe
batching, ?a? t?te ta IF e??a? ?a??te?a. - Scalability ? ?p?????sµ?? query ??a SF e??a?
??aµµ???? ?? p??? t? µ??e??? t?? ??, e?? sta ?F
?p???aµµ????. - Ranking ?a ?F ?e??????ta? ?a??te?a ranking
queries. - ?pe?tas?µ?t?ta ?a IF t??p?p?????ta? e????a ??a
t?? ap??t?s? proximity, NOT, ranked queries.
- G?a t?p???? efa?µ???? de??t?d?t?s?? ta IF
?pe??????. ?a SF apa?t??? µe???? ???? ?a? p???
????? ?a ?atas?e?ast???. ?a ?F e??a? p??
ap?d?t??? se ranked queries ?a? proximity
15- Full Text Indexing
- ????? d?µ? de??t?d?t?s?? (ap?????? a??????µ??,
a??????µ?? ßas???? p??epe?e??as?a?, Knuth Morris
Pratt, Boyer Moore, Aho Corasick Automaton) - ?e d?µ? de??t?d?t?s?? (suffix tree, suffix array,
dwag, string b-tree)
16?as???? ???sµ??
- S?µß???se???-string xx1x2..xn, xi?S
xn - x acgttaaaca, x10 Sa,c,g,t
- ?e?? s?µß???se??? e
- ?p?-s?µß???se???-substring w xuwv
- ????eµa Prefix w xwu
- ?p??eµa-Suffix w xuw
- ???e s?µß???se??? S, µ????? Sm, ??e? m d??at?
µ? ?e?? ep???µata p?? e??a? ta a??????a S1m,
S2m, . Sm-1m ?a? Sm. - ?a??de??µa "sequence" sequence, equence,
quence, uence, ence, nce, ce, e.
17?? ???t?? ?p??eµ?t?? Suffix Tree
- ???sµ?? ap????e?e? ??a ta d??at? ep???µata µ?a?
s?µß???se???? S (?p??es? t? te?e?ta?? ???µµa
t?? a?fa???µ?t???? de? eµfa???eta? st? es?te????
t??) - ??af???p???s? ap? Pat Tree p?? ap????e?e? µ???
???e?? ?a? efa?µ??e? d?af??et??? ?????? s?µp?es??
(??at? p??t? ???µµa se ???e a?µ?). - x xabxac
???sµ?? ?st? s?µpa? USl ??a a?f?ß?t? S ?a?
lgt0. Trie ?a?e?ta? t? ?-d??? d??d??
(?S) t? ?p??? pe????e? ??a ta
p????µata t?? st???e??? t?? S. ???e
ep?ped? t?? d??d??? a?t?st?????eta? ?a? se
??a di (d1????a). ???e st???e?? xx1x2xl
t?p??ete?ta? st? ?p?d??d?? x1?x2? .
19Trie - pa??de??µa
S102, 120, 121, 210, 211, 212, S0,1,2
?????? ins/del/search ?(l) ????? ?(nlk)
20Compressed Trie
??????(nlk) ??(nk)O(n)
21Compressed Trie - example
22Suffix Tree
???sµ?? ?? suffix tree µ?a? s?µß???se????
S1n e??a? ??a s?µpa??? trie p?? pe????e?
?? ??e?d?? ??a ta ep???µata Sin, 1in.
23Naïve ?atas?e??
2 cabc
1 bcabc
?????? ?(n2) Saaaaa...
24Suffix Links Speed Up
x u
- St? d??t?? ?? µ??? t? headi de? ?a ??e? valid
suffix link - St? i-?st? ß?µa ? a??????µ??
ep?s??pteta? t? contracted locus t?? headi st?
25Suffix Links Speed Up
Sbbbbbababbbaabbbbb i-1Inserted
S13..19abbbbb xa, ubbb, zbb u1b,
u2bb i Insert S14..19BBBBB
Node c
Node d
26Exact Pattern Matching
?????? ?(nma)
27Repeated Sub-Sequences - Regularities
Longest Repeated Substring
28Longest Common Substring of two (more) Strings
Generalized Suffix Tree
29Maximal Pairs
Gusfield O(na)
Brodal O(nlogna) , t1gapt2 O(na)
, t1gap
30Maximal Pairs in Multiple Strings
31Nearest Common Ancestor Suffix Tree
nca(x,y)u se ????? ?(1)
nca(001, 101)leftmost1(XOR(001, 101)) 100 1xx
nca(001, 111) leftmost1(XOR(001, 111)) 110
nca(011, 010)leftmost1(XOR(011, 010) 001
32Exact Matching with wild cards using longest
common extention
33Suffix Arrays
- ?st?
- - a?f?ß?t? S
- - ?e?µe?? ??a0a1 a?-1 µe?????? ?, ai?S,
- 0 ? i lt?.
- S?µß???se??? W?w0w1 wP-1
- ??te?ta? ?a ß?e???? ??e? ?? eµfa??se?? t?? W st?
341? p??s????s? - Suffix trees
- Suffix tree ??a t? ?
- A???ß?? ? f???a, a???µ?µ??a 0 ??? ?-1
- K??e es?te????? ??µß?? e?t?? t?? ???a? ??e?
t??????st?? d?? pa?d?? - Se ???e a?µ? ? et???ta ? µ? ?e?? substring t?? ?
- Se ???e ??µß?, ?? et???te? t?? e?e???µe??? a?µ??
?e?????e ap? d?af??et??? ?a?a?t??a - S?????s? t?? et??et?? t?? a?µ?? se µ???p?t? ap?
t? ???a p??? f???? k ? suffix t?? ? st? ??s? k. - ?atas?e?? ?????? ?(? logS), ????? ?(?)
- ?p???e? ??aµµ??? ??s? (a??????µ?? Farach)
- ?p??t?s? e??t?µ?t?? ?????? ?(P logS)
352? p??s????s? - Suffix Arrays
- ??ateta?µ??? ??sta ???? t?? ep??eµ?t?? t?? ?.
- ?atas?e?? ?????? ?(? log N) (st? µ?s? pe??pt?s?
?µ?? ? ?????? e??a? ?(?)) - ????? 2? a???a???
- ?p??t?s? e??t?µ?t?? ?????? P?log2(N-1)?
s?????se?? s?µß????
36Suffix Trees vs. Suffix Arrays
- Query Is W a substring of A?
- Suffix Tree
- O(PlogS) with O(N) space, or
- O(P) with O(NS) space (impractical)
- Suffix Array
- Competitive/better O(PlogN) search
- Main advantage Space 2N integers
- (In practice, problem is space overhead of query
data structure) - Another advantage Independent of S
- Ai ? suffix t?? ? p?? ?e???? st? ??s? i, d??. ?i
aiai1a?-1 - ?? u string, t?te up ? p???eµa t?? p??t?? p
s?µß???? t?? u. - u, ? strings, t?te ??????µe t? s??s? ltp ?? e???
- u ltp ? ? up lt ?p
- Oµ???? ?a? ?? s??se?? p, gtp, ?p, ?p, ?p
Pos p??a?a? ? ??se??, ?e??????af???
d?ateta?µ??a ta ep???µata t?? ?. ?Pos0 lt
?Pos1 ltlt ?PosN-1 ?st? LW min k
W?PAPosk ? kN RW max k W?PAPosk ?
k-1 ?? LWlt Rw, t?te ??a ???e k?LW, Rw, ?a?
i Posk, ?a e??a? W ? aiai1 aiP-1, ?a?
a?t?st??fa. ???, a? LWlt Rw, t?te ?p?????? RW - LW
1 eµfa??se?? t?? W st? ?. Pos d?ateta?µ???? ?
binary search ??a LW, Rw ? ?????? O(P logN).
39?e?d???d??a? - 1 (a?a??t?s? ??a t? LW)
if W ?P APos0 then LW 0 else if W ?P
APosN-1 then LW N else L 0, R N -
1 while R-L gt1 do M (LR)/2 if W ?P
APosM then R M else L M LW
40Let A mississippi
Let W issa
41??? ?a µe???e? ? ??????
lcp(?,w) ? µ??e??? t?? longest common prefix ?,
w llcp(APosL,W), rlcp(W, APosR). ??????, l
lcp(APos0,W), rlcp(W, APosN-1) K??e
s?????s? t?? W µe t? APosM e??µe???e? t? l ? t?
r. ?PosL l W, W r APosR, ?
APoskhW, ?k?L,R , h min(l, r) ? t?
p????? t?? s?????se?? s?µß???? p?? apa?t???ta?
µe???eta? ?at? h.
42??? ?a µe???e? ep?p???? ? ??????
- p???p?????sµ??? p????f???a s?et??? µe t? lcp t??
? Pos? µe ta ?PosL, ?PosR. - ?-2 t???de? (L, M, R) µe µesa?? s?µe?? M?1,
N-2, 0 ? L lt M lt R ? N-1. - (LM, M, RM) ? µ??ad??? t???da µe t? ? µesa??
s?µe??. - Llcp , R lcp p??a?e? µe?????? ?-2
- LlcpMlcp(APosLM, APosM)
- R lcpM lcp(APosM, APosRM).
43?e??s? s?????se?? ?
- ?st? µ?a epa?????? t?? ß?????? a?a??t?s??, t???da
(L, M, R), H max(l, r), ?H ? d?af??? µeta?? t??
t?µ?? t?? H st?? a??? ?a? t? t???? t??
epa???????. -
- ?st? r ? l H. ( llcp(APosL,W) )
- ?e??pt?s? 1 LlcpM gt l , ??a lcp(AposLM,
APosM) gt l. - ??te, APosM l1 APosL ?l1 W.
- W e??a? st? de?? µ?s?, L M, l µ??e? ?p?? e??a?.
- ?e??pt?s? 2 LlcpM lt l , ??a lcp(AposLM,
APosM) lt l. - ??te, W l APosL ltl APosM ,
- W e??a? st? a??ste?? µ????, ??a t?µ? t?? r e??a?
44?e??s? s?????se?? ??
- ?e??pt?s? 3 LlcpM l, ??a lcp(APosLM , A
Pos M ) l. - ??te, A PosM l W.
- S????????µe t? l1 s?µß???, t? l2 ?t?, µ????
lj, t? p??t? ??a t? ?p??? e??a? W ?lj A PosM.
- To lj ?a?????e? a? t? W e??a? st?? a??ste?? ?
de??? p?e???, ??a t?µ? t?? r ? t?? l e??a? lj-1. - St?? a??? t?? epa??????? e??a? l h, ??a ?H1
s?????se?? s?µß????. - ?H 1 s?????se?? ??a ???e epa??????, S?H ? P
- S??????? p????? s?????se?? s?µß????
- t? p??? P ?log2(N-1)?
- O(P log N) ?????? st? ?e???te?? pe??pt?s?.
45(No Transcript)
46?e?d???d??a? - 2
l lcp(APos0,W), r lcp(APosN-1,W) if lP
or wl? aPos0l then LW 0 else if r lt P and
wr gt aPosN-1r then LW N else L0,
RN-1 while R-Lgt1 do M (LR)/2 if l ? r
then if LlcpM?l then m llcp(APosMl,Wl)
else m LlcpM else if
RlcpM? r then m rlcp(APosMr,Wr) el
se m RlcpM if m P or wm ? aPosMm
then RM, rm else LM, lm LW
47DAWG (Directed Acyclic Word Graphs) and cDAWGs
48(No Transcript)
49(No Transcript)
50?e?te?e???sa ???µ?
- ?e???e? s??????? ??e?t??????? ?e?µ???? ta ?p??a
e??a? ete???e?? µeta?? t??? ?a? p???????ta? ap?
d??f??e? p???? - ?e???? ??e?t??????? ?? t?? t???? t?? Gigabyte /
Terabyte - ???s? s?s?e??? de?te?e???sa? µ??µ?? (s??????
d?s???, µa???t??? ta???a, DVD) - ?? d?µ?? de??t?d?t?s?? t?? ded?µ???? ?a? ??
µ??a??? a?a??t?s?? e??a? ßas??? e??a?e?a ??a t??
ap????e?s?, e??µ???s? ?a? e?a???? ???s?µ??
51??µ?? ?ed?µ????
Supra-suffix array (?pe?-p??a?a? ep??eµ?t??
?ß??d??? d?µ?) Compact pat-trees (s?µpa??
pat-d??t?a d?ad??? a?apa??stas?, µetaf??? t??
d?µ?? se de?te?e???sa µ??µ?) ?-d??t?? p???eµ?t??
(prefix B-tree) String B-tree
52String B-Tree
- S??d?asµ?? t?? B-Trees ?a? Patricia Tries ??a
de??te? es?te????? ??µß?? - ????? p??ste?e? ep?p???? de??te? ??a ?a a????e? ?
ta??t?ta a?a??t?s?? ?a? ?? ?e?t?????e? e??µ???s?? - ?a String B-Trees ????? t?? ?d?a ap?d?s?
?e???te??? pe??pt?s?? µe ta ?a?????? B-Trees a???
ep?t???????? ap????e?s? ape?????st?? µ?????
a?fa???µ?t???? ?a? e?te???? t?? ?e?t?????e?
a?a??t?s?? a?t?st???a µe ta suffix trees.
53String B-Tree S?µß???sµ??
- S?µß??????µe ??a a?fa???µ?t??? s ?a?a?t???? µe
?1 , s ?a? µe ? ?a ??????µe t? µ???? t?? (s) - Ta s?µß??????µe µe ?1,i ??a p???eµa, µe ?j ,s
??a ep??eµa ?a? µe ?i, j ??a ?p?-a?fa???µ?t???
t?? ?, ?p?? 1 ? i ? j ? s - ??a a?fa???µ?t??? µ?t?ß? ? ?p???e? µ?sa st? ?
?ta? µp????µe ?a ß???µe ?p?-a?fa???µ?t??? t??
?i, i P - 1 p?? ?a ?s??ta? µe t? ?
54???ß??µa 1 ??a??t?s? ????eµ?t?? ?a? ???t?µa
- ??????µe ? d1,..,d? ??a s????? a?fa???µ?t????
e??? ?e?µ???? p?? t? s??????? t??? µ???? e??a? ? - ?p????e???µe t? ? ?a? t? ??at?µe ta????µ?µ???
st?? e??te???? µ??µ? - Prefix Search (P) ??a?t? ??a ta a?fa???µ?t???
t?? ? p?? t? p???eµ? t??? e??a? t? µ?t?ß? ? - Range Query (K, K) p?? a?a?t? ??a ta
a?fa???µ?t??? t?? ? a??µesa st? ? ?a? t? ? se
?e??????af??? se???
55???ß??µa 2 ??a??t?s? ?p?-a?fa???µ?t????
- ?? e??t?µa Substring Search (P) ß??s?e? ??a ta ?
p?? ?p?????? sta a?fa???µ?t??? t?? ? - ?? occ d????e? t? p????? t?? ?p???e??
- ?p?te?e? ep??tas? t?? p??ß??µat?? 1, µe t? s?????
?a apa?t??eta? ap? ??a ta ep???µata
a?fa???µ?t???? t?? ?.
56???t??? ???µ??
- Ta a?a??s??µe ta p??ß??µata 1 ?a? 2 µe ß?s? t?
??ass??? µ??t??? µ??µ?? 2 ep?p?d?? Cormen et al.
1990 - Te????µe ?t? ?p???e? µ?a ??????? ?a? µ???? ????a
µ??µ? (RAM) ?a? µ?a p?? a??? a??? p??? µe????
e??te???? µ??µ? (s?????? d?s???, DVDs). ?
e??te???? µ??µ? e??a? ????sµ??? se blocks
µetaf????, p?? ?a????ta? se??de? d?s??? (disk
57???t??? ???µ??
- ???e se??da d?s??? pe????e? ? at?µ??? a?t??e?µe?a
p?? µp??e? ?a e??a? a???a???, ?a?a?t??e? ?
de??te? - ?? ? ???µ??eta? µ??e??? se??da? d?s??? (disk page
size) ?a? ? e???af? ? ? a?????s? p??sp??as? st??
d?s?? (disk access)
58????p????t?ta ???ß??µa 1
- ? Prefix Search (P) apa?te? O(( p occ) / B
logB k) p??spe??se?? st?? d?s?? st?? ?e???te??
pe??pt?s?, p P - ? Range Query (K,K) apa?te? ?(( k k occ)
/ B logB k) p??spe??se?? st?? ?e???te??
pe??pt?s?, k ? ?a? k ? - ? e?sa???? ? d?a??af? e??? a?fa???µ?t???? µ?????
m t?? s?????? ? apa?te? O(m/B logB k)
p??spe??se?? st?? ?e???te?? pe??pt?s? - ? ???s? t?? ????? e??a? T(k/B) se??de? d?s???,
?p?? ? ????? p?? ?ata?aµß??eta? ap? t? s????? ?
e??a? T(?/?) se??de?
59????p????t?ta ???ß??µa 2
- ? Substring Search (P) apa?te? O(( p occ) / B
logBN) p??spe??se?? st?? d?s?? st?? ?e???te??
pe??pt?s?, ?p?? p P - ? e?sa???? ? d?a??af? e??? a?fa???µ?t???? µ?????
m t?? s?????? ? apa?te? O(m logB (Nm))
p??spe??se?? st?? ?e???te?? pe??pt?s? - ? ????? p?? ???s?µ?p??e?ta? ap? t? String B-Tree
?a? ap? t? s????? ? e??a? T(? / ?) se??de? d?s???
60?p????e?s? ??fa???µ?t????
61?p????e?s? ??fa???µ?t????
- ?e a?t? t? d?µ? µp????µe ?a e?t?p?s??µe t? se??da
d?s??? p?? pe????e? t?? i-?st? ?a?a?t??a e???
a?fa???µ?t???? e?te???ta? ??a? sta?e?? a???µ?
ap??? a???µ?t???? p???e?? st?? de??t? t?? - ?p????µe ?a ?µad?p???s??µe T(?) ???????? de??te?
se µ?a µ??? se??da d?s???, a??? a? d?aß?s??µe
µ??? a?t? t? se??da de? ?a e?µaste se ??s? ?a
a?a?t?s??µe ????? t??? ?a?a?t??e? t??
62?p????e?s? ??fa???µ?t????
- ?p????µe ?a s????????µe ?p??ad?p?te d??
a?fa???µ?t??? ?a?a?t??a p??? ?a?a?t??a, a??? a?t?
? p???? e??a? p??? a?ap?te?esµat??? a?
epa?a?aµß??eta? d??t? t? ??st?? ?e???te???
pe??pt?s?? e??a? a?????? t?? µ????? t?? d??
a?fa???µ?t???? - ?a???µe t? p??ß??µa a?t? epa?as???s? (rescanning)
d??t? ?? ?d??? ?a?a?t??e? epa?e?et????ta? p?????
63B-tree Like ??µ?
- ??apa??st??µe ta a?fa???µ?t??? µ?s? ???????
de??t?? - Sa? e?s?d? ????µe t? s????? a?fa???µ?t???? ? µe
s??????? a???µ? ?a?a?t???? ? - ??????µe ? ?1,, ?? ta st???e?a t?? s?????? ?
se a????sa ?e??????af??? se??? (?L) - ?a a?fa???µ?t??? t?? ? d?a??µ??ta? sta f???a t??
B-Tree, ?a? µ??? ???sµ??a a?fa???µ?t???
ß??s???ta? st??? es?te?????? ??µß???
64B-tree Like ??µ?
- ??????µe t? d?ateta?µ??? s????? a?fa???µ?t????
p?? s??d???ta? µe t?? ??µß? p ?? Fp? ?, ?a?
s?µß??????µe t? a??ste??te?? ?a? de???te??
a?fa???µ?t??? t?? Fp µe L(p) ?a? R(p) a?t?st???a - ???e ??µß? p t?? ap????e???µe se µ?a se??da
d?s??? ?a? ??t??µe pe?????sµ? st? p????? t??
a?fa???µ?t???? t?? b ? Fp ? 2b, ?p?? bT(?)
e??a? ??a? ????? a???a??? ep??e?µ???? ?ts? ?ste
?a µp??e? ??a? ??µß?? ?a ????se? se µ?a se??da
d?s??? - ???? ? ???a ep?t??peta? ?a ??e? ????te?a ap? b
65B-tree Like ??µ?
66B-tree Like ??µ?
- ???????µe t? s????? ? se ?µ?de? t?? b s??e??µe???
a?fa???µ?t???? - ?a?t???af??µe ???e ?µ?da se ??a f????, ?st? p,
?a? a?apt?ss??µe t? Fp - ???e es?te????? ??µß?? p ??e? n(p) pa?d??
s1,,sn(p) ?a? t? d?ateta?µ??? s????? t?? p??
e??a? t? Fp L(s1), R(s1), , L(sn(p)) ,
R(sn(p)) - ?f?? n(p) Fp / 2, ???e ??µß?? ??e? ap? b/2
µ???? b pa?d?? e?t?? ap? t? ???a ?a? ta f???a.
??a t? te???? ???? t?? B-Tree p?? s??µat?st??e
e??a? HO(logb/2 k) O(logB k)
67Prefix Search (P)
- G?a t?? a????s? t?? p????? Prefix Search(P)
?a ßas?st??µe se d?? pa?at???se?? t?? Manber ?a?
Mayers - Ta a?fa???µ?t??? p?? ????? p???eµa P
?ata?aµß????? ?e?t?????? ??se?? t?? ? - T? a??ste??te?? a?fa???µ?t??? p?? ??e? p???eµa P
e??a? ?e?t????? t?? P st? s????? ? µe ß?s? t??
a????sa ?e??????af??? se???
68Prefix Search(P)
- ??apa??st??µe t? ??s? t?? P st? s????? K µe t?
?e???? (t , j) ?ts? ?ste t? t ?a e??a? t? f????
p?? ??e? a?t? t? ??s? ?a? j-1 ?a e??a? t? p?????
t?? a?fa???µ?t???? t?? Ft p?? e??a? ?e??????af???
µ????te?a ap? t? P, ?p?? 1 ? j ? Ft 1 - ??µe ?t? t? j e??a? ? ??s? t?? P st? Ft
- ???????µe t?? d?ad??as?a p?? ?a?????e? t?? ??s?
t?? P st? s????? Fp µe t?? p???? PT-Search(P, Fp)
69Prefix Search(?) ???????µ??
- ???????µe t?? d?? pe??pt?se?? st?? ?p??e? t? P
e??a? µ????te?? / µe?a??te?? ap? ??a ta
a?fa???µ?t??? t?? K - ?? ?a? ?? d?? ??e???? e??a? a???t???? , ?e????µe
µe proot ?a? e?te???µe s???s? p??? ta ??t? t??
B-tree - ?ta? ep?s?ept?µaste ??a? ??µß? p, f??t????µe t??
a?t?st???? se??da d?s??? ?a? efa?µ????µe t??
d?ad??as?a PT-Search ?ste ?a ß???µe t? ??s? j t??
P st? s????? Fp - ?a d?? ?e?t????? a?fa???µ?t??? ?a???????ta? ap?
t? s??s? Kj-1 ltL P ?L Kj
70Prefix Search(?) ???????µ??
- ?? ? ??µß?? p e??a? f???? staµat?µe t? s???s?,
a????? ????µe t?? a??????e? d?? pe??pt?se?? - ?? ta a?fa???µ?t??? Kj-1 ?a? Kj a?????? se d??
?e????st? pa?d?? t?? p t?te ta d?? a?fa???µ?t???
e??a? ?e?t????? st? ?. ?st? s t? pa?d? t?? p p??
pe????e? t? Kj ??a?????µe t? t ?? t?
a??ste??te?? f???? t?? d??t??? p?? ?at??eta? ap?
t? s ?a? ?????µe ?t? t? ? ß??s?eta? st?? p??t?
??s? t?? Ft d??t? L(t) L(s) Kj - ?? ta Kj-1 ?a? Kj a?????? st? ?d?? pa?d? t?? p,
s??e?????µe t? s???s? pe???d??? st? ep?µe??
71Prefix Search(?) ???????µ??
- Se ???e pe??pt?s? , st? t???? t?? s???s?? ?a
ß???µe ??a ?e???? (tL , jL) p?? ?a a?t?p??s?pe?e?
t? ??s? t?? a??ste??te??? a?fa???µ?t???? t?? ?
p?? ??e? p???eµa ? - ??t?st???a µp????µe ?a ß???µe t? ?e???? (tR ,
jR) - G?a ?a apa?t?s??µe st? e??t?µa Prefix Search (P)
?a p??pe? ?a sa??s??µe t? se??? t?? f????? p??
??????ta? ap? ta tL ?a? tR ?a? ?a ft?????µe
??sta ap? t? jL-?st? a?fa???µ?t??? t?? FtL ??? t?
(jR 1)-?st? t?? FtR
72?p??p???µ??? String B-Tree
- ???a? ? s??d?asµ?? t?? B-Tree-like d?µ?? p??
????µe µe t? Patricia Trie - S??p?? µa? e??a? ?a ???a??s??µe s?st? ta
a?fa???µ?t??? ?a? ?a ?????µe a?a??t?se?? p?? ?a
apa?t??? s?????s? µ??? e??? a?fa???µ?t???? t?? Fp
st?? ?e???te?? pe??pt?s?, se s??s? µe t??
log2Fp p?? ??e???eta? ? ap?? d?ad??? a?a??t?s?
73?p??p???µ??? String B-Tree
74?p??p???µ??? String B-Tree
- ?p????µe ?a ???s??µe t? ?? se d?? ß?µata
- (1) ?atas?e?????µe ??a s?µpa??? Trie µe ta
a?fa???µ?t??? t?? Fp - (2) ?????µe et???ta se ???e ??µß? t?? Trie µe
ß?s? t? µ???? t?? ?p?-a?fa???µ?t???? p??
ap????e?eta? se a?t?? ?a? a?t??a??st??µe ???e
a?fa???µ?t??? p??? sta ??ad?? µe t?? p??t? t??
75?p??p???µ??? String B-Tree
- ?? Patricia Trie ???e? ??p??a p????f???a se s??s?
µe t? s?µpa??? Trie ???? t?? d?a??af?? t??
?a?a?t????, e?t?? t?? p??t??, sta ??ad?? t??
d??t??? - ???e? T(?) a?fa???µ?t??? se ??a? ??µß? t??
B-Tree, a?e?a?t?t?? t?? µe?????? t??? - ?p?t??pe? ?e??????af???? a?a??t?se?? se ??a?
??µß?, ????? ep?p???? p??spe??se?? t?? d?s???
76?a??de??µa PT-Search
77?a??de??µa PT-Search
- Ta d??µe ??a pa??de??µa PT-Search(P, Fp) , ?p??
- P bcbabcba
- ???ste?? fa??eta? ? p??t? f?s? st?? ?p??a t? l
a?apa??st? t? de???te?? f???? - ?e????µe p??sd???????ta? t? ????? p???eµa t??
a?fa???µ?t???? t?? l ?a? t?? ? (p.?. bcb) ?a?
st? s????e?a ß??s???µe t?? ?aµ???te?? p?????? t??
l p?? ??e? et???ta µe?a??te?? ap? bcb3 - St?? s????e?a ???s?µ?p????µe t?? ata???ast?
?a?a?t??a ?4 a ??a ?a ß???µe t?? a???ß?
??s? t?? ? (j 4) d?as?????ta? ta µa??a??sµ??a
78?a??de??µa PT-Search
- S??d?????ta? ta Patricia Tries µe t? B-Tree
ap?fe????µe t? d?ad??? a?a??t?s? st??? ??µß???
p?? pe???µe ?a? ?ts? µe?????µe t? s???????
p???p????t?ta - ?? ???? a?t? de? e??a? a??µa ??a??p???t???. ?
????? e??a? ?t? se ???e ??µß? p?? ????µe
ep?s?efte? ?a?asa?????µe (rescanning) t? ? ap?
t?? a???
79?a??de??µa PT-Search
- ???pe? ?a s?ed??s??µe ?ts? t? d?ad??as?a
PT-Search ?ste ?a e?µeta??e?eta? ?a??te?a t??
?d??t?te? t?? String B-Tree ?a? t?? Patricia Trie - Ta ???s?µ?p???s??µe t?e?? pa?aµ?t???? e?s?d??
(?, Fp, l) ?p?? ? pa??µet??? l ??a??p??e? t??
?d??t?ta ?t? ?p???e? ??a a?fa???µ?t??? st? Fp p??
?? l p??t?? ?a?a?t??e? t?? e??a? ?s?? µe t?? P - ? d?ad??as?a PT-Search (?, Fp, l) ep?st??fe?
?e???? (j, lcp) ?p?? t? j e??a? ? ??s? t?? ? st?
Fp ?a? ? pa??µet??? lcp e??a? t? µ???? t?? ??????
80????p????t?ta Prefix Search
- ??a?t?µe ta a?fa???µ?t??? t?? ? p?? ????? p???eµa
? e?et????ta? ta f???a t?? String B-Tree p??
??????ta? ap? ta tL ?a? tR se ?(occ/B)
p??spe??se?? st?? d?s?? - ?? s??????? ??st?? t?? Prefix Search (P) e??a?
?((p occ) / B logB k) p??spe??se?? st?? d?s??
81???ß??µa 2 Substring Search
- ?? p??ß??µa 2 af??? se µ?a p?? ap?d?t??? p????
Substring Search(P) p?? a?a??t? eµfa??se?? t?? ?
sta a?fa???µ?t??? t?? ? - ? a?a??t?s? ßas??eta? st?? e??es?
?p?-a?fa???µ?t???? µ????? ? p?? ?s???ta? µe t? ? - ??????µe t? s????? t?? ep??eµ?t?? ?? SUF(?)
di, d 1 ? i ? d , ?p?? d??, t? ?p???
pe????e? ? ?e??????af??? d?ateta?µ??a ep???µata
82???ß??µa 2 Substring Search
- ?? p??ß??µ? µa? e??a? ?a a?a?t?s??µe ??a ta
SUF(?) a?fa???µ?t??? p?? ????? p???eµa ? - ? p???? Substring Search(P) st? s????? ?
µetas??µat??eta? se Prefix Search(P) st? s?????
83?a??de??µa Substring Search
84?a??de??µa Substring Search
- Pat, ? aid, atlas,atom, attenuate,
car, patent, zoo, b 4 ?a? ?aid,
ar, as, . . . , uate, zoo - St? s???e???µ??? pa??de??µa ????µe occ4
eµfa??se?? atlas,atom, attenuate ?a?
patent - ?a ep???µata µe p???eµa ? p?? a?tap???????ta? se
a?t?? t?? ?p???e?? ????? t??? ???????? t???
de??te? ap????e?µ????? se ?e?t????? f???a t??
d??t??? - ?p????µe ?a ???s??µe t? s????? ?SUF(?) ?a? t?
µ??e??? t?? kN ?a? ?a e?te??s??µe t?? p????
Prefix Search(P)
85String B-Tree
- H e?sa???? e??? µ??? a?fa???µ?t???? ? st? s?????
?, ?p?? m Y, apa?te? t?? e?sa???? ???? t?? m
ep??eµ?t?? st? s????? SUF(?) se ?e??????af???
se??? - ?? p??ß??µa e??a? ?t? ?e?????µaste ta m
e?sa??µe?a ep???µata sa? te?e??? ?e????st?
a?fa???µ?t??? ?a? pa???s???eta? t? fa???µe?? t??
rescanning - ?pe?te????µe t? ap??p???µ??? String B-Tree
e?s????ta? d?? t?p??? ß????t???? de??t?? µe t???
?p????? ?a ap?f????µe t? rescanning ?at? t?
d?ad??as?a t?? e??µ???s??
86String B-Tree
- ? ??a? t?p?? de??t? e??a? ? ???st?? de??t?? ????a
p?? ????eta? ??a ???e ??µß? - ? ????? e??a? ? succ de??t?? p?? ????eta? ??a
???e a?fa???µ?t??? st? SUF(?) ?? e??? ? succ
de??t?? ??a t? di, d ?SUF(?) ?d??e? st? f????
t?? String B-Tree p?? pe????e? t? di1, d. ??
?s??e? i d t?te ? succ de???e? st? ?d?? t??
t? f???? (self-loop pointer)
87???????µ?? ??sa?????
- ?et? t?? e?sa???? t?? ?i, m p??pe? ?a
??a??p?????ta? ?? pa?a??t? d?? s?????e? - ?a ep???µata ?j, m e??a? ap????e?µ??a st?
String B-Tree, ??a 1 ? j ? i ?a? t? ?i, m
µ?????eta? t??? hi p??t??? ?a?a?t??e? t?? µe ??a
ap? ta ?e?t????? t?? a?fa???µ?t??? - ???? ?? succ de??te? e??a? s?st? t?p??et?µ????
??a ??a ta a?fa???µ?t??? st? String B-Tree e?t??
ap? t? ?i, m. ??t? s?µa??e? ?t? ? succ(?i, m)
e??a? ? µ???? e?a?t?µe??? de??t?? e?t?? ?a? a? i
m ?p?te ?a? de???e? st? ?d?? t?? t? f????
88???????µ?? ??sa?????
- ??s????µe ta ep???µata t?? ? st? String B-Tree
ap????e???ta? t? SUF(?) st?? a??? ap? t?
µe?a??te?? st? µ????te?? - S??e?????µe t?? e?sa???? ??a i 1, 2, , m
- G?a i1, ß??s???µe t? ??s? t?? ?1, m sa?????ta?
t? d??t?? ?p?? ?a? st?? pe??pt?s? t?? Prefix
Search(?i, m) - G?a ta ?p????pa ep???µata t?? ? (igt1)
???s?µ?p????µe d?af??et??? p??s????s? ??a ?a
ap?f????µe t? rescanning
89???????µ?? ??sa?????
- ?ta? ß???µe t? ??s? t?? ?i, m, a?t? ?a
?e????s??µe ap? t? ???a, sa?????µe t? d??t?? ap?
t? te?e?ta?? f???? p?? ep?s?eft??aµe - ?????µe ?t? t? ?i-1, m µ?????eta? t??? hi-1
p??t??? ?a?a?t??e? t?? µe ??p??? ap? ta ?e?t?????
t?? a?fa???µ?t??? - ?p????µe ?a p????µe t?? de??t? succ t??
?e?t?????? a?fa???µ?t????, ?a? ?a ?ata?????µe se
??a f???? µe t?? pa?a??t? ?d??t?ta ?a pe????e?
??a a?fa???µ?t??? p?? ?a µ?????eta? t??? p??t???
max0, hi-1 -1) ?a?a?t??e? µe t? ?i,m
90???????µ?? ??sa?????
- S??e?????µe t?? e?sa???? e?te???ta? µ?a p??? ta
p??? ?a? p??? ta ??t? s???s? t?? String B-Tree
µ???? ?a ft?s??µe st? f???? pi p?? ?a pe????e? t?
??s? t?? ?i, m - ?p? t? st??µ? p?? µp????µe ?a ap?de????µe ?t? h??
max0, hi-1 -1) ? a??????µ?? µa? ap?fe??e? t?
rescanning e?et????ta? µ??? t??? ?a?a?t??e? t?? ?
st?? ??se?? i max0, hi-1 -1), , i hi
91???????µ?? ??sa?????
- ??a? ap e??e?a? ?e???sµ?? t?? de??t?? parent ?a?
succ ?a ??e?aste? ?(B logB (Nm)) p??spe??se??
st?? d?s?? a?? e?sa??µe?? ep??eµa st?? ?e???te??
pe??pt?s? - S??????? S mi1 di O(m logB(N m))
p??spe??se?? st?? d?s?? apa?t???ta? ??a t??
e?sa???? t?? ? st? ? - ?p?t???????µe ?d?a ap?d?s? ?e???te??? pe??pt?s??
µe t?? e?sa???? m a???a??? se ??a ?a??????