Title: ??af??e?a 1
1????t?s? ?????f???a?
??µ?? ?e??t?d?t?s??
2??a?s?? ?e??t??
- ??p?? s?µß???se???? p?? µp????? ?a
de??t?d?t????? - - d?µ?? p?? de??t?d?t??? s?µß???se????/?e?µe?a
f?s???? ????? (linguistic texts) - - d?µ?? p?????? de??t?d?t?s??
- St?? p??t? ?at?????a a?est?aµµ??a a??e?a
(inverted files), a??e?a ?p???af?? (signature
files), bitmaps - St?? de?te?? ?at?????a d??t?a ep??eµ?t??
(suffix trees), p??a?e? ep??eµ?t?? (suffix
arrays), ??????? ?ate?????µe??? ???f?? ???e??
(DAWG ?a? cDAWG). - 2. ???t??? ?p?????sµ?? ?? a??t??? d?µ?? e??a?
?at?????e? ??a RAM, ??a t? de?te?e???sa µ??µ? p??
?at?????? ? d?µ? string b-tree.
3Linguistic Text Indexing
- ???s? Se µe???e? s??????? ?e?µ????. ??????s?
p????f???a? se µ??f? pe??e??µ???? (se p??a
?e?µe?a ß??s?eta? ???e ????). ?p??t?s? se
e??t?µata. - ????d??
- ??est?aµµ??a ???e?a (Inverted Files)
- ???e?a ?p???af?? (Signature Files)
- Bitmaps
4???t?µata
- Boolean Queries
- ??a?e??t??? (Disjunctive t1 ? t2 ?? tq )
- S??e??t??? (Conjunctive t1 ? t2 ?? tq )
- S??d?asµ?? t???.
- Ranked Queries
- ?p??????eta? ??a score ?µ???t?ta?.
- Proximity Queries
- ?es??aße? ??p??a ap?stas? µeta?? t?? ????.
5??est?aµµ??a ???e?a
?p?te????ta? ap? ??a ?e???? ?a? µ?a a?est?aµ???
??sta ??a ???e ??? µe de??te? p???
?e?µe?a. ?????? ??a s?µp?es? ???s? d-gaps ?a?
s?µp?es? t??? µe global ? local µe??d???. ??
global e??a? pa?aµet??p???µ??e? ? µ?
pa?aµet??p???µ??e?.
6????d?? S?µp?es?? ??est?aµµ???? ???e???
- Binary ? log N ? bits ??a ???e de??t?.
- Unary G?a ??a? a???µ? x ?????µe x 1 as???
?a? 1 µ?d?? st? t????. Ge???? as?µf??? µ???d??. - ? Unary t?? 1 ? log x ? a????????µe?? ap?
binary t?? x 2 (? log x ?). ?pa?t???ta?
pe??p?? 1 2log x bits. - d ? ??d???p???s? t?? 1 ? log x ?
a????????µe?? ap? binary t?? - x 2 (? log x ?). G?a µe?????? a???µ???
?pe???e? t?? ?. - ? a??t??? te?????? e??a? µ? pa?aµet??p???µ??e?,
st?? pa?aµet??p???µ??e? a??????e?ta? µ?a
µ??te??p???s? µe ???s? bernouli trials ?p??
prob(x)(1-p)x-1p, - ?a? µeta a??????e?ta? ??d???p???s? huffmann ?
a???µ?t???.
7????d?? S?µp?es?? ??est?aµµ???? ???e???
- Global Bernoulli ???s? a???µ?t????
??d???p???s?? ? µ???d?? Golomb. - G?a µ?a pa??µet?? b, ???e a???µ?? xgt0
??d???p??e?ta? se d?? µ??? q1 se unary ?a? t?
?p????p? rx-qb-1 se d?ad??? (pe??p?? logb bits). - ??e? de???e? ?t? ??a
- ? te????? ?s?d??aµe? µe t?? optimal huffman
??d??a ??a ??a ?pe??? s????? p??a??t?t??. - ???s? pa?aµ?t??? b ??a ??e? t?? ??ste? µe b
(0.69?n) / f. - Local Bernoulli ??af??et??? b 0.69N / ft ??a
???e ??sta. ?a??te?? s?µp?es? ap? t?? global. - Interpolative ??µeta??e?eta? t? clustering.
S?µp???e? de??te? ?a? ??? d-gaps. ? p?? ap?d?t???
µ???d?? a??? ? p?? p???p????. ???????e? ??????
s?µp?es?? se ep?peda.
8????d?? S?µp?es?? ??est?aµµ???? ???e???
- Ge???? S?????s? Local ?a??te?e? ap? Global. ?a??
ap?d?s? ? local Bernoulli µe Golomb, ?a? ?, d
ap??? µe s?et??? ?a?? ap?d?s?, ????? apa?t?s?
pa?aµ?t???, e????a ???p?????ta?.
9???e?a ?p???af??
- Bitstring Signature Files Hash string ??a ???e
???. Se ???e ?e?µe?? µ?a ?p???af? p?? e??a? t? ?R
t?? hash strings t?? ???e?? t??. G?a ta queries
ft??????ta? query hash strings. ??e???? ??a false
matches. ?p?d?t??? ??a p?? s???e???µ??a queries. - Bitsliced Signature Files ??ast??f? p??a?a
?p???af?? se s????? bitslices. ??????s? ????te???
bits. ?a??te?a ??a a?a?? bitslices, t?te ????te?a
false matches.
10???e?a ?p???af??
- Blocked Signature Files ??????s? ????af?? se
blocks. ???e bit se ???e slice a?t?st???e? se ?
e???af?? (B blocking factor). ??af??e????
a?t?st????s? a???µ?? e???af?? se a???µ??? blocks
se ???e slice. ?e???? B ?d??e? se pe??ss?te????
e??????? ??a false matches, p??spe??se?? st?
d?s??. - Bitmaps ???e ???? ??te? 1 bit se ???e ?p???af?
µ????? ?s?? µe t?? a???µ? t?? ????. 1-1 hash
function. ?????sta, ??????a a??? ?ata?a??????
????. ?a??te?a ??a common terms. ?ß??d????
te??????.
11S?µp?es? ???e??? ?p???af??
- ? s?µp?es? de? ap?d?de? ?p?? sta a?est?aµµ??a
a??e?a. ??e? ap??e?e? ?a? e?s??eta? ????ß??.
?e??ss?te?a false matches. ?pa?te?ta? ?????? ??a
ap?s?µp?es?. ???a?a? d?e????se?? slices st??
????a µ??µ?. ???a ??µata - ????af?? µe µ??? µe µe???e? d?a??µ??se?? (?aµ???
ap?d?s?). - ?eta?e???s? s?????sµ???? ???? (p??ß??µa a?
?p???e? d?aµ???as? ???? µe sp?????? ?????).
12S?????s? ?e??d?? Indexing
- ??a a?est?aµµ??? ??sta de? e??a? p?t? p?? p????
ap? ??a a?t?st???? bitslice. ??a ta a?est?aµµ??a
a??e?a de? ?????? p?t? pe??ss?te?e? p??sß?se??
st? d?s??. ??a??es? ?ta? t? ?e??????? de ????e?
st?? ????a µ??µ?. ?a bitslices µp??e? ?a
?pe?????? ?ta? ?? ???? t?? query e??a? p????? se
e?d???? pe??pt?se??.
- ????? st? ??s?? ta IF apa?t??? 6-10 ?a? ta
bitsliced SF t? 25-40 t?? s???????. - ?pa?t?se?? ???µ?? ?a SF de? apa?t??? ???? ??a
t? ?e???????, a??? ??a ??a ta ?p????pa ??st?
ß??s???ta? se s?????s?µa µe???? µe a?t? t?? IF. - ?atas?e?? Index p?? ?????ß??a ta SF (hash
functions ??p). - ???at?t?te? ?a?a?????a? ???e e??t?µa t?
?e????eta? d?af??et???? epe?e??ast??.
13S?????s? ?e??d?? Indexing
- ???µ???s? a?a??a?a se d??aµ???? ??. ??? e?????
??a ta bitstring SF. ?? ??st?? µe???eta? µe
batching, ?a? t?te ta IF e??a? ?a??te?a. - Scalability ? ?p?????sµ?? query ??a SF e??a?
??aµµ???? ?? p??? t? µ??e??? t?? ??, e?? sta ?F
?p???aµµ????. - Ranking ?a ?F ?e??????ta? ?a??te?a ranking
queries. - ?pe?tas?µ?t?ta ?a IF t??p?p?????ta? e????a ??a
t?? ap??t?s? proximity, NOT, ranked queries.
14S?µpe??sµata
- G?a t?p???? efa?µ???? de??t?d?t?s?? ta IF
?pe??????. ?a SF apa?t??? µe???? ???? ?a? p???
????? ?a ?atas?e?ast???. ?a ?F e??a? p??
ap?d?t??? se ranked queries ?a? proximity
queries.
15- Full Text Indexing
- ????? d?µ? de??t?d?t?s?? (ap?????? a??????µ??,
a??????µ?? ßas???? p??epe?e??as?a?, Knuth Morris
Pratt, Boyer Moore, Aho Corasick Automaton) - ?e d?µ? de??t?d?t?s?? (suffix tree, suffix array,
dwag, string b-tree)
16?as???? ???sµ??
- S?µß???se???-string xx1x2..xn, xi?S
xn - x acgttaaaca, x10 Sa,c,g,t
- ?e?? s?µß???se??? e
- ?p?-s?µß???se???-substring w xuwv
- ????eµa Prefix w xwu
- ?p??eµa-Suffix w xuw
- ???e s?µß???se??? S, µ????? Sm, ??e? m d??at?
µ? ?e?? ep???µata p?? e??a? ta a??????a S1m,
S2m, . Sm-1m ?a? Sm. - ?a??de??µa "sequence" sequence, equence,
quence, uence, ence, nce, ce, e.
17?? ???t?? ?p??eµ?t?? Suffix Tree
- ???sµ?? ap????e?e? ??a ta d??at? ep???µata µ?a?
s?µß???se???? S (?p??es? t? te?e?ta?? ???µµa
t?? a?fa???µ?t???? de? eµfa???eta? st? es?te????
t??) - ??af???p???s? ap? Pat Tree p?? ap????e?e? µ???
???e?? ?a? efa?µ??e? d?af??et??? ?????? s?µp?es??
(??at? p??t? ???µµa se ???e a?µ?). - x xabxac
18Trie
???sµ?? ?st? s?µpa? USl ??a a?f?ß?t? S ?a?
lgt0. Trie ?a?e?ta? t? ?-d??? d??d??
(?S) t? ?p??? pe????e? ??a ta
p????µata t?? st???e??? t?? S. ???e
ep?ped? t?? d??d??? a?t?st?????eta? ?a? se
??a di (d1????a). ???e st???e?? xx1x2xl
t?p??ete?ta? st? ?p?d??d?? x1?x2? .
19Trie - pa??de??µa
S102, 120, 121, 210, 211, 212, S0,1,2
?????? ins/del/search ?(l) ????? ?(nlk)
20Compressed Trie
??????(nlk) ??(nk)O(n)
21Compressed Trie - example
22Suffix Tree
???sµ?? ?? suffix tree µ?a? s?µß???se????
S1n e??a? ??a s?µpa??? trie p?? pe????e?
?? ??e?d?? ??a ta ep???µata Sin, 1in.
23Naïve ?atas?e??
Sbcabc
2 cabc
1 bcabc
?????? ?(n2) Saaaaa...
24Suffix Links Speed Up
i-1
i
x u
z
u
ux
z
z
i
i-1
- St? d??t?? ?? µ??? t? headi de? ?a ??e? valid
suffix link - St? i-?st? ß?µa ? a??????µ??
ep?s??pteta? t? contracted locus t?? headi st?
Ti-1
25Suffix Links Speed Up
Sbbbbbababbbaabbbbb i-1Inserted
S13..19abbbbb xa, ubbb, zbb u1b,
u2bb i Insert S14..19BBBBB
a
b
Node c
a..
b
b
b
Node d
a..
bb
b
head13
a..
b
bb
b..
a..
head14
a..
26Exact Pattern Matching
p1
p2
p3
?????? ?(nma)
pm
c
i3
i1
i2
i
27Repeated Sub-Sequences - Regularities
u
u
u
umax
i3
i1
i2
umax
umax
umax
Longest Repeated Substring
28Longest Common Substring of two (more) Strings
umax
text1
i
umax
text2
j
Generalized Suffix Tree
umax
29Maximal Pairs
i
j
x
x
gap
Gusfield O(na)
Brodal O(nlogna) , t1gapt2 O(na)
, t1gap
30Maximal Pairs in Multiple Strings
x
x
gap
31Nearest Common Ancestor Suffix Tree
r
nca(x,y)u se ????? ?(1)
w
u
b
a
y
x
nca(001, 101)leftmost1(XOR(001, 101)) 100 1xx
nca(001, 111) leftmost1(XOR(001, 111)) 110
1xx
nca(011, 010)leftmost1(XOR(011, 010) 001
32Exact Matching with wild cards using longest
common extention
text
acgtttaacctttgagtttgggcv
at
pattern
33Suffix Arrays
- ?st?
- - a?f?ß?t? S
- - ?e?µe?? ??a0a1 a?-1 µe?????? ?, ai?S,
- 0 ? i lt?.
- S?µß???se??? W?w0w1 wP-1
- ??te?ta? ?a ß?e???? ??e? ?? eµfa??se?? t?? W st?
A.
341? p??s????s? - Suffix trees
- Suffix tree ??a t? ?
- A???ß?? ? f???a, a???µ?µ??a 0 ??? ?-1
- K??e es?te????? ??µß?? e?t?? t?? ???a? ??e?
t??????st?? d?? pa?d?? - Se ???e a?µ? ? et???ta ? µ? ?e?? substring t?? ?
- Se ???e ??µß?, ?? et???te? t?? e?e???µe??? a?µ??
?e?????e ap? d?af??et??? ?a?a?t??a - S?????s? t?? et??et?? t?? a?µ?? se µ???p?t? ap?
t? ???a p??? f???? k ? suffix t?? ? st? ??s? k. - ?atas?e?? ?????? ?(? logS), ????? ?(?)
- ?p???e? ??aµµ??? ??s? (a??????µ?? Farach)
- ?p??t?s? e??t?µ?t?? ?????? ?(P logS)
352? p??s????s? - Suffix Arrays
- ??ateta?µ??? ??sta ???? t?? ep??eµ?t?? t?? ?.
- ?atas?e?? ?????? ?(? log N) (st? µ?s? pe??pt?s?
?µ?? ? ?????? e??a? ?(?)) - ????? 2? a???a???
- ?p??t?s? e??t?µ?t?? ?????? P?log2(N-1)?
s?????se?? s?µß????
36Suffix Trees vs. Suffix Arrays
- Query Is W a substring of A?
- Suffix Tree
- O(PlogS) with O(N) space, or
- O(P) with O(NS) space (impractical)
- Suffix Array
- Competitive/better O(PlogN) search
- Main advantage Space 2N integers
- (In practice, problem is space overhead of query
data structure) - Another advantage Independent of S
37???sµ??
- Ai ? suffix t?? ? p?? ?e???? st? ??s? i, d??. ?i
aiai1a?-1 - ?? u string, t?te up ? p???eµa t?? p??t?? p
s?µß???? t?? u. - u, ? strings, t?te ??????µe t? s??s? ltp ?? e???
- u ltp ? ? up lt ?p
- Oµ???? ?a? ?? s??se?? p, gtp, ?p, ?p, ?p
38??a??t?s?
Pos p??a?a? ? ??se??, ?e??????af???
d?ateta?µ??a ta ep???µata t?? ?. ?Pos0 lt
?Pos1 ltlt ?PosN-1 ?st? LW min k
W?PAPosk ? kN RW max k W?PAPosk ?
k-1 ?? LWlt Rw, t?te ??a ???e k?LW, Rw, ?a?
i Posk, ?a e??a? W ? aiai1 aiP-1, ?a?
a?t?st??fa. ???, a? LWlt Rw, t?te ?p?????? RW - LW
1 eµfa??se?? t?? W st? ?. Pos d?ateta?µ???? ?
binary search ??a LW, Rw ? ?????? O(P logN).
39?e?d???d??a? - 1 (a?a??t?s? ??a t? LW)
if W ?P APos0 then LW 0 else if W ?P
APosN-1 then LW N else L 0, R N -
1 while R-L gt1 do M (LR)/2 if W ?P
APosM then R M else L M LW
R
40Let A mississippi
L
i
ippi
issippi
Let W issa
ississippi
mississippi
pi
M
ppi
sippi
sisippi
ssippi
R
ssissippi
41??? ?a µe???e? ? ??????
lcp(?,w) ? µ??e??? t?? longest common prefix ?,
w llcp(APosL,W), rlcp(W, APosR). ??????, l
lcp(APos0,W), rlcp(W, APosN-1) K??e
s?????s? t?? W µe t? APosM e??µe???e? t? l ? t?
r. ?PosL l W, W r APosR, ?
APoskhW, ?k?L,R , h min(l, r) ? t?
p????? t?? s?????se?? s?µß???? p?? apa?t???ta?
µe???eta? ?at? h.
42??? ?a µe???e? ep?p???? ? ??????
- p???p?????sµ??? p????f???a s?et??? µe t? lcp t??
? Pos? µe ta ?PosL, ?PosR. - ?-2 t???de? (L, M, R) µe µesa?? s?µe?? M?1,
N-2, 0 ? L lt M lt R ? N-1. - (LM, M, RM) ? µ??ad??? t???da µe t? ? µesa??
s?µe??. - Llcp , R lcp p??a?e? µe?????? ?-2
- LlcpMlcp(APosLM, APosM)
- R lcpM lcp(APosM, APosRM).
43?e??s? s?????se?? ?
- ?st? µ?a epa?????? t?? ß?????? a?a??t?s??, t???da
(L, M, R), H max(l, r), ?H ? d?af??? µeta?? t??
t?µ?? t?? H st?? a??? ?a? t? t???? t??
epa???????. -
- ?st? r ? l H. ( llcp(APosL,W) )
- ?e??pt?s? 1 LlcpM gt l , ??a lcp(AposLM,
APosM) gt l. - ??te, APosM l1 APosL ?l1 W.
- W e??a? st? de?? µ?s?, L M, l µ??e? ?p?? e??a?.
-
- ?e??pt?s? 2 LlcpM lt l , ??a lcp(AposLM,
APosM) lt l. - ??te, W l APosL ltl APosM ,
- W e??a? st? a??ste?? µ????, ??a t?µ? t?? r e??a?
LlcpM.
44?e??s? s?????se?? ??
- ?e??pt?s? 3 LlcpM l, ??a lcp(APosLM , A
Pos M ) l. - ??te, A PosM l W.
- S????????µe t? l1 s?µß???, t? l2 ?t?, µ????
lj, t? p??t? ??a t? ?p??? e??a? W ?lj A PosM.
- To lj ?a?????e? a? t? W e??a? st?? a??ste?? ?
de??? p?e???, ??a t?µ? t?? r ? t?? l e??a? lj-1. - St?? a??? t?? epa??????? e??a? l h, ??a ?H1
s?????se?? s?µß????. - ?H 1 s?????se?? ??a ???e epa??????, S?H ? P
- S??????? p????? s?????se?? s?µß????
- t? p??? P ?log2(N-1)?
- O(P log N) ?????? st? ?e???te?? pe??pt?s?.
45(No Transcript)
46?e?d???d??a? - 2
l lcp(APos0,W), r lcp(APosN-1,W) if lP
or wl? aPos0l then LW 0 else if r lt P and
wr gt aPosN-1r then LW N else L0,
RN-1 while R-Lgt1 do M (LR)/2 if l ? r
then if LlcpM?l then m llcp(APosMl,Wl)
else m LlcpM else if
RlcpM? r then m rlcp(APosMr,Wr) el
se m RlcpM if m P or wm ? aPosMm
then RM, rm else LM, lm LW
R
47DAWG (Directed Acyclic Word Graphs) and cDAWGs
48(No Transcript)
49(No Transcript)
50?e?te?e???sa ???µ?
- ?e???e? s??????? ??e?t??????? ?e?µ???? ta ?p??a
e??a? ete???e?? µeta?? t??? ?a? p???????ta? ap?
d??f??e? p???? - ?e???? ??e?t??????? ?? t?? t???? t?? Gigabyte /
Terabyte - ???s? s?s?e??? de?te?e???sa? µ??µ?? (s??????
d?s???, µa???t??? ta???a, DVD) - ?? d?µ?? de??t?d?t?s?? t?? ded?µ???? ?a? ??
µ??a??? a?a??t?s?? e??a? ßas??? e??a?e?a ??a t??
ap????e?s?, e??µ???s? ?a? e?a???? ???s?µ??
p????f???a?
51??µ?? ?ed?µ????
Supra-suffix array (?pe?-p??a?a? ep??eµ?t??
?ß??d??? d?µ?) Compact pat-trees (s?µpa??
pat-d??t?a d?ad??? a?apa??stas?, µetaf??? t??
d?µ?? se de?te?e???sa µ??µ?) ?-d??t?? p???eµ?t??
(prefix B-tree) String B-tree
52String B-Tree
- S??d?asµ?? t?? B-Trees ?a? Patricia Tries ??a
de??te? es?te????? ??µß?? - ????? p??ste?e? ep?p???? de??te? ??a ?a a????e? ?
ta??t?ta a?a??t?s?? ?a? ?? ?e?t?????e? e??µ???s?? - ?a String B-Trees ????? t?? ?d?a ap?d?s?
?e???te??? pe??pt?s?? µe ta ?a?????? B-Trees a???
ep?t???????? ap????e?s? ape?????st?? µ?????
a?fa???µ?t???? ?a? e?te???? t?? ?e?t?????e?
a?a??t?s?? a?t?st???a µe ta suffix trees.
53String B-Tree S?µß???sµ??
- S?µß??????µe ??a a?fa???µ?t??? s ?a?a?t???? µe
?1 , s ?a? µe ? ?a ??????µe t? µ???? t?? (s) - Ta s?µß??????µe µe ?1,i ??a p???eµa, µe ?j ,s
??a ep??eµa ?a? µe ?i, j ??a ?p?-a?fa???µ?t???
t?? ?, ?p?? 1 ? i ? j ? s - ??a a?fa???µ?t??? µ?t?ß? ? ?p???e? µ?sa st? ?
?ta? µp????µe ?a ß???µe ?p?-a?fa???µ?t??? t??
?i, i P - 1 p?? ?a ?s??ta? µe t? ?
54???ß??µa 1 ??a??t?s? ????eµ?t?? ?a? ???t?µa
??????
- ??????µe ? d1,..,d? ??a s????? a?fa???µ?t????
e??? ?e?µ???? p?? t? s??????? t??? µ???? e??a? ? - ?p????e???µe t? ? ?a? t? ??at?µe ta????µ?µ???
st?? e??te???? µ??µ? - Prefix Search (P) ??a?t? ??a ta a?fa???µ?t???
t?? ? p?? t? p???eµ? t??? e??a? t? µ?t?ß? ? - Range Query (K, K) p?? a?a?t? ??a ta
a?fa???µ?t??? t?? ? a??µesa st? ? ?a? t? ? se
?e??????af??? se???
55???ß??µa 2 ??a??t?s? ?p?-a?fa???µ?t????
- ?? e??t?µa Substring Search (P) ß??s?e? ??a ta ?
p?? ?p?????? sta a?fa???µ?t??? t?? ? - ?? occ d????e? t? p????? t?? ?p???e??
- ?p?te?e? ep??tas? t?? p??ß??µat?? 1, µe t? s?????
?a apa?t??eta? ap? ??a ta ep???µata
a?fa???µ?t???? t?? ?.
56???t??? ???µ??
- Ta a?a??s??µe ta p??ß??µata 1 ?a? 2 µe ß?s? t?
??ass??? µ??t??? µ??µ?? 2 ep?p?d?? Cormen et al.
1990 - Te????µe ?t? ?p???e? µ?a ??????? ?a? µ???? ????a
µ??µ? (RAM) ?a? µ?a p?? a??? a??? p??? µe????
e??te???? µ??µ? (s?????? d?s???, DVDs). ?
e??te???? µ??µ? e??a? ????sµ??? se blocks
µetaf????, p?? ?a????ta? se??de? d?s??? (disk
pages)
57???t??? ???µ??
- ???e se??da d?s??? pe????e? ? at?µ??? a?t??e?µe?a
p?? µp??e? ?a e??a? a???a???, ?a?a?t??e? ?
de??te? - ?? ? ???µ??eta? µ??e??? se??da? d?s??? (disk page
size) ?a? ? e???af? ? ? a?????s? p??sp??as? st??
d?s?? (disk access)
58????p????t?ta ???ß??µa 1
- ? Prefix Search (P) apa?te? O(( p occ) / B
logB k) p??spe??se?? st?? d?s?? st?? ?e???te??
pe??pt?s?, p P - ? Range Query (K,K) apa?te? ?(( k k occ)
/ B logB k) p??spe??se?? st?? ?e???te??
pe??pt?s?, k ? ?a? k ? - ? e?sa???? ? d?a??af? e??? a?fa???µ?t???? µ?????
m t?? s?????? ? apa?te? O(m/B logB k)
p??spe??se?? st?? ?e???te?? pe??pt?s? - ? ???s? t?? ????? e??a? T(k/B) se??de? d?s???,
?p?? ? ????? p?? ?ata?aµß??eta? ap? t? s????? ?
e??a? T(?/?) se??de?
59????p????t?ta ???ß??µa 2
- ? Substring Search (P) apa?te? O(( p occ) / B
logBN) p??spe??se?? st?? d?s?? st?? ?e???te??
pe??pt?s?, ?p?? p P - ? e?sa???? ? d?a??af? e??? a?fa???µ?t???? µ?????
m t?? s?????? ? apa?te? O(m logB (Nm))
p??spe??se?? st?? ?e???te?? pe??pt?s? - ? ????? p?? ???s?µ?p??e?ta? ap? t? String B-Tree
?a? ap? t? s????? ? e??a? T(? / ?) se??de? d?s???
60?p????e?s? ??fa???µ?t????
61?p????e?s? ??fa???µ?t????
- ?e a?t? t? d?µ? µp????µe ?a e?t?p?s??µe t? se??da
d?s??? p?? pe????e? t?? i-?st? ?a?a?t??a e???
a?fa???µ?t???? e?te???ta? ??a? sta?e?? a???µ?
ap??? a???µ?t???? p???e?? st?? de??t? t?? - ?p????µe ?a ?µad?p???s??µe T(?) ???????? de??te?
se µ?a µ??? se??da d?s???, a??? a? d?aß?s??µe
µ??? a?t? t? se??da de? ?a e?µaste se ??s? ?a
a?a?t?s??µe ????? t??? ?a?a?t??e? t??
a?fa???µ?t????
62?p????e?s? ??fa???µ?t????
- ?p????µe ?a s????????µe ?p??ad?p?te d??
a?fa???µ?t??? ?a?a?t??a p??? ?a?a?t??a, a??? a?t?
? p???? e??a? p??? a?ap?te?esµat??? a?
epa?a?aµß??eta? d??t? t? ??st?? ?e???te???
pe??pt?s?? e??a? a?????? t?? µ????? t?? d??
a?fa???µ?t???? - ?a???µe t? p??ß??µa a?t? epa?as???s? (rescanning)
d??t? ?? ?d??? ?a?a?t??e? epa?e?et????ta? p?????
f????
63B-tree Like ??µ?
- ??apa??st??µe ta a?fa???µ?t??? µ?s? ???????
de??t?? - Sa? e?s?d? ????µe t? s????? a?fa???µ?t???? ? µe
s??????? a???µ? ?a?a?t???? ? - ??????µe ? ?1,, ?? ta st???e?a t?? s?????? ?
se a????sa ?e??????af??? se??? (?L) - ?a a?fa???µ?t??? t?? ? d?a??µ??ta? sta f???a t??
B-Tree, ?a? µ??? ???sµ??a a?fa???µ?t???
ß??s???ta? st??? es?te?????? ??µß???
64B-tree Like ??µ?
- ??????µe t? d?ateta?µ??? s????? a?fa???µ?t????
p?? s??d???ta? µe t?? ??µß? p ?? Fp? ?, ?a?
s?µß??????µe t? a??ste??te?? ?a? de???te??
a?fa???µ?t??? t?? Fp µe L(p) ?a? R(p) a?t?st???a - ???e ??µß? p t?? ap????e???µe se µ?a se??da
d?s??? ?a? ??t??µe pe?????sµ? st? p????? t??
a?fa???µ?t???? t?? b ? Fp ? 2b, ?p?? bT(?)
e??a? ??a? ????? a???a??? ep??e?µ???? ?ts? ?ste
?a µp??e? ??a? ??µß?? ?a ????se? se µ?a se??da
d?s??? - ???? ? ???a ep?t??peta? ?a ??e? ????te?a ap? b
a?fa???µ?t???
65B-tree Like ??µ?
66B-tree Like ??µ?
- ???????µe t? s????? ? se ?µ?de? t?? b s??e??µe???
a?fa???µ?t???? - ?a?t???af??µe ???e ?µ?da se ??a f????, ?st? p,
?a? a?apt?ss??µe t? Fp - ???e es?te????? ??µß?? p ??e? n(p) pa?d??
s1,,sn(p) ?a? t? d?ateta?µ??? s????? t?? p??
e??a? t? Fp L(s1), R(s1), , L(sn(p)) ,
R(sn(p)) - ?f?? n(p) Fp / 2, ???e ??µß?? ??e? ap? b/2
µ???? b pa?d?? e?t?? ap? t? ???a ?a? ta f???a.
??a t? te???? ???? t?? B-Tree p?? s??µat?st??e
e??a? HO(logb/2 k) O(logB k)
67Prefix Search (P)
- G?a t?? a????s? t?? p????? Prefix Search(P)
?a ßas?st??µe se d?? pa?at???se?? t?? Manber ?a?
Mayers - Ta a?fa???µ?t??? p?? ????? p???eµa P
?ata?aµß????? ?e?t?????? ??se?? t?? ? - T? a??ste??te?? a?fa???µ?t??? p?? ??e? p???eµa P
e??a? ?e?t????? t?? P st? s????? ? µe ß?s? t??
a????sa ?e??????af??? se???
68Prefix Search(P)
- ??apa??st??µe t? ??s? t?? P st? s????? K µe t?
?e???? (t , j) ?ts? ?ste t? t ?a e??a? t? f????
p?? ??e? a?t? t? ??s? ?a? j-1 ?a e??a? t? p?????
t?? a?fa???µ?t???? t?? Ft p?? e??a? ?e??????af???
µ????te?a ap? t? P, ?p?? 1 ? j ? Ft 1 - ??µe ?t? t? j e??a? ? ??s? t?? P st? Ft
- ???????µe t?? d?ad??as?a p?? ?a?????e? t?? ??s?
t?? P st? s????? Fp µe t?? p???? PT-Search(P, Fp)
69Prefix Search(?) ???????µ??
- ???????µe t?? d?? pe??pt?se?? st?? ?p??e? t? P
e??a? µ????te?? / µe?a??te?? ap? ??a ta
a?fa???µ?t??? t?? K - ?? ?a? ?? d?? ??e???? e??a? a???t???? , ?e????µe
µe proot ?a? e?te???µe s???s? p??? ta ??t? t??
B-tree - ?ta? ep?s?ept?µaste ??a? ??µß? p, f??t????µe t??
a?t?st???? se??da d?s??? ?a? efa?µ????µe t??
d?ad??as?a PT-Search ?ste ?a ß???µe t? ??s? j t??
P st? s????? Fp - ?a d?? ?e?t????? a?fa???µ?t??? ?a???????ta? ap?
t? s??s? Kj-1 ltL P ?L Kj
70Prefix Search(?) ???????µ??
- ?? ? ??µß?? p e??a? f???? staµat?µe t? s???s?,
a????? ????µe t?? a??????e? d?? pe??pt?se?? - ?? ta a?fa???µ?t??? Kj-1 ?a? Kj a?????? se d??
?e????st? pa?d?? t?? p t?te ta d?? a?fa???µ?t???
e??a? ?e?t????? st? ?. ?st? s t? pa?d? t?? p p??
pe????e? t? Kj ??a?????µe t? t ?? t?
a??ste??te?? f???? t?? d??t??? p?? ?at??eta? ap?
t? s ?a? ?????µe ?t? t? ? ß??s?eta? st?? p??t?
??s? t?? Ft d??t? L(t) L(s) Kj - ?? ta Kj-1 ?a? Kj a?????? st? ?d?? pa?d? t?? p,
s??e?????µe t? s???s? pe???d??? st? ep?µe??
ep?ped?
71Prefix Search(?) ???????µ??
- Se ???e pe??pt?s? , st? t???? t?? s???s?? ?a
ß???µe ??a ?e???? (tL , jL) p?? ?a a?t?p??s?pe?e?
t? ??s? t?? a??ste??te??? a?fa???µ?t???? t?? ?
p?? ??e? p???eµa ? - ??t?st???a µp????µe ?a ß???µe t? ?e???? (tR ,
jR) - G?a ?a apa?t?s??µe st? e??t?µa Prefix Search (P)
?a p??pe? ?a sa??s??µe t? se??? t?? f????? p??
??????ta? ap? ta tL ?a? tR ?a? ?a ft?????µe
??sta ap? t? jL-?st? a?fa???µ?t??? t?? FtL ??? t?
(jR 1)-?st? t?? FtR
72?p??p???µ??? String B-Tree
- ???a? ? s??d?asµ?? t?? B-Tree-like d?µ?? p??
????µe µe t? Patricia Trie - S??p?? µa? e??a? ?a ???a??s??µe s?st? ta
a?fa???µ?t??? ?a? ?a ?????µe a?a??t?se?? p?? ?a
apa?t??? s?????s? µ??? e??? a?fa???µ?t???? t?? Fp
st?? ?e???te?? pe??pt?s?, se s??s? µe t??
log2Fp p?? ??e???eta? ? ap?? d?ad??? a?a??t?s?
73?p??p???µ??? String B-Tree
74?p??p???µ??? String B-Tree
- ?p????µe ?a ???s??µe t? ?? se d?? ß?µata
- (1) ?atas?e?????µe ??a s?µpa??? Trie µe ta
a?fa???µ?t??? t?? Fp - (2) ?????µe et???ta se ???e ??µß? t?? Trie µe
ß?s? t? µ???? t?? ?p?-a?fa???µ?t???? p??
ap????e?eta? se a?t?? ?a? a?t??a??st??µe ???e
a?fa???µ?t??? p??? sta ??ad?? µe t?? p??t? t??
?a?a?t??a
75?p??p???µ??? String B-Tree
- ?? Patricia Trie ???e? ??p??a p????f???a se s??s?
µe t? s?µpa??? Trie ???? t?? d?a??af?? t??
?a?a?t????, e?t?? t?? p??t??, sta ??ad?? t??
d??t??? - ???e? T(?) a?fa???µ?t??? se ??a? ??µß? t??
B-Tree, a?e?a?t?t?? t?? µe?????? t??? - ?p?t??pe? ?e??????af???? a?a??t?se?? se ??a?
??µß?, ????? ep?p???? p??spe??se?? t?? d?s???
76?a??de??µa PT-Search
77?a??de??µa PT-Search
- Ta d??µe ??a pa??de??µa PT-Search(P, Fp) , ?p??
- P bcbabcba
- ???ste?? fa??eta? ? p??t? f?s? st?? ?p??a t? l
a?apa??st? t? de???te?? f???? - ?e????µe p??sd???????ta? t? ????? p???eµa t??
a?fa???µ?t???? t?? l ?a? t?? ? (p.?. bcb) ?a?
st? s????e?a ß??s???µe t?? ?aµ???te?? p?????? t??
l p?? ??e? et???ta µe?a??te?? ap? bcb3 - St?? s????e?a ???s?µ?p????µe t?? ata???ast?
?a?a?t??a ?4 a ??a ?a ß???µe t?? a???ß?
??s? t?? ? (j 4) d?as?????ta? ta µa??a??sµ??a
t??a
78?a??de??µa PT-Search
- S??d?????ta? ta Patricia Tries µe t? B-Tree
ap?fe????µe t? d?ad??? a?a??t?s? st??? ??µß???
p?? pe???µe ?a? ?ts? µe?????µe t? s???????
p???p????t?ta - ?? ???? a?t? de? e??a? a??µa ??a??p???t???. ?
????? e??a? ?t? se ???e ??µß? p?? ????µe
ep?s?efte? ?a?asa?????µe (rescanning) t? ? ap?
t?? a???
79?a??de??µa PT-Search
- ???pe? ?a s?ed??s??µe ?ts? t? d?ad??as?a
PT-Search ?ste ?a e?µeta??e?eta? ?a??te?a t??
?d??t?te? t?? String B-Tree ?a? t?? Patricia Trie - Ta ???s?µ?p???s??µe t?e?? pa?aµ?t???? e?s?d??
(?, Fp, l) ?p?? ? pa??µet??? l ??a??p??e? t??
?d??t?ta ?t? ?p???e? ??a a?fa???µ?t??? st? Fp p??
?? l p??t?? ?a?a?t??e? t?? e??a? ?s?? µe t?? P - ? d?ad??as?a PT-Search (?, Fp, l) ep?st??fe?
?e???? (j, lcp) ?p?? t? j e??a? ? ??s? t?? ? st?
Fp ?a? ? pa??µet??? lcp e??a? t? µ???? t?? ??????
p????µat??
80????p????t?ta Prefix Search
- ??a?t?µe ta a?fa???µ?t??? t?? ? p?? ????? p???eµa
? e?et????ta? ta f???a t?? String B-Tree p??
??????ta? ap? ta tL ?a? tR se ?(occ/B)
p??spe??se?? st?? d?s?? - ?? s??????? ??st?? t?? Prefix Search (P) e??a?
?((p occ) / B logB k) p??spe??se?? st?? d?s??
81???ß??µa 2 Substring Search
- ?? p??ß??µa 2 af??? se µ?a p?? ap?d?t??? p????
Substring Search(P) p?? a?a??t? eµfa??se?? t?? ?
sta a?fa???µ?t??? t?? ? - ? a?a??t?s? ßas??eta? st?? e??es?
?p?-a?fa???µ?t???? µ????? ? p?? ?s???ta? µe t? ? - ??????µe t? s????? t?? ep??eµ?t?? ?? SUF(?)
di, d 1 ? i ? d , ?p?? d??, t? ?p???
pe????e? ? ?e??????af??? d?ateta?µ??a ep???µata
82???ß??µa 2 Substring Search
- ?? p??ß??µ? µa? e??a? ?a a?a?t?s??µe ??a ta
SUF(?) a?fa???µ?t??? p?? ????? p???eµa ? - ? p???? Substring Search(P) st? s????? ?
µetas??µat??eta? se Prefix Search(P) st? s?????
SUF(?)
83?a??de??µa Substring Search
84?a??de??µa Substring Search
- Pat, ? aid, atlas,atom, attenuate,
car, patent, zoo, b 4 ?a? ?aid,
ar, as, . . . , uate, zoo - St? s???e???µ??? pa??de??µa ????µe occ4
eµfa??se?? atlas,atom, attenuate ?a?
patent - ?a ep???µata µe p???eµa ? p?? a?tap???????ta? se
a?t?? t?? ?p???e?? ????? t??? ???????? t???
de??te? ap????e?µ????? se ?e?t????? f???a t??
d??t??? - ?p????µe ?a ???s??µe t? s????? ?SUF(?) ?a? t?
µ??e??? t?? kN ?a? ?a e?te??s??µe t?? p????
Prefix Search(P)
85String B-Tree
- H e?sa???? e??? µ??? a?fa???µ?t???? ? st? s?????
?, ?p?? m Y, apa?te? t?? e?sa???? ???? t?? m
ep??eµ?t?? st? s????? SUF(?) se ?e??????af???
se??? - ?? p??ß??µa e??a? ?t? ?e?????µaste ta m
e?sa??µe?a ep???µata sa? te?e??? ?e????st?
a?fa???µ?t??? ?a? pa???s???eta? t? fa???µe?? t??
rescanning - ?pe?te????µe t? ap??p???µ??? String B-Tree
e?s????ta? d?? t?p??? ß????t???? de??t?? µe t???
?p????? ?a ap?f????µe t? rescanning ?at? t?
d?ad??as?a t?? e??µ???s??
86String B-Tree
- ? ??a? t?p?? de??t? e??a? ? ???st?? de??t?? ????a
p?? ????eta? ??a ???e ??µß? - ? ????? e??a? ? succ de??t?? p?? ????eta? ??a
???e a?fa???µ?t??? st? SUF(?) ?? e??? ? succ
de??t?? ??a t? di, d ?SUF(?) ?d??e? st? f????
t?? String B-Tree p?? pe????e? t? di1, d. ??
?s??e? i d t?te ? succ de???e? st? ?d?? t??
t? f???? (self-loop pointer)
87???????µ?? ??sa?????
- ?et? t?? e?sa???? t?? ?i, m p??pe? ?a
??a??p?????ta? ?? pa?a??t? d?? s?????e? - ?a ep???µata ?j, m e??a? ap????e?µ??a st?
String B-Tree, ??a 1 ? j ? i ?a? t? ?i, m
µ?????eta? t??? hi p??t??? ?a?a?t??e? t?? µe ??a
ap? ta ?e?t????? t?? a?fa???µ?t??? - ???? ?? succ de??te? e??a? s?st? t?p??et?µ????
??a ??a ta a?fa???µ?t??? st? String B-Tree e?t??
ap? t? ?i, m. ??t? s?µa??e? ?t? ? succ(?i, m)
e??a? ? µ???? e?a?t?µe??? de??t?? e?t?? ?a? a? i
m ?p?te ?a? de???e? st? ?d?? t?? t? f????
88???????µ?? ??sa?????
- ??s????µe ta ep???µata t?? ? st? String B-Tree
ap????e???ta? t? SUF(?) st?? a??? ap? t?
µe?a??te?? st? µ????te?? - S??e?????µe t?? e?sa???? ??a i 1, 2, , m
- G?a i1, ß??s???µe t? ??s? t?? ?1, m sa?????ta?
t? d??t?? ?p?? ?a? st?? pe??pt?s? t?? Prefix
Search(?i, m) - G?a ta ?p????pa ep???µata t?? ? (igt1)
???s?µ?p????µe d?af??et??? p??s????s? ??a ?a
ap?f????µe t? rescanning
89???????µ?? ??sa?????
- ?ta? ß???µe t? ??s? t?? ?i, m, a?t? ?a
?e????s??µe ap? t? ???a, sa?????µe t? d??t?? ap?
t? te?e?ta?? f???? p?? ep?s?eft??aµe - ?????µe ?t? t? ?i-1, m µ?????eta? t??? hi-1
p??t??? ?a?a?t??e? t?? µe ??p??? ap? ta ?e?t?????
t?? a?fa???µ?t??? - ?p????µe ?a p????µe t?? de??t? succ t??
?e?t?????? a?fa???µ?t????, ?a? ?a ?ata?????µe se
??a f???? µe t?? pa?a??t? ?d??t?ta ?a pe????e?
??a a?fa???µ?t??? p?? ?a µ?????eta? t??? p??t???
max0, hi-1 -1) ?a?a?t??e? µe t? ?i,m
90???????µ?? ??sa?????
- S??e?????µe t?? e?sa???? e?te???ta? µ?a p??? ta
p??? ?a? p??? ta ??t? s???s? t?? String B-Tree
µ???? ?a ft?s??µe st? f???? pi p?? ?a pe????e? t?
??s? t?? ?i, m - ?p? t? st??µ? p?? µp????µe ?a ap?de????µe ?t? h??
max0, hi-1 -1) ? a??????µ?? µa? ap?fe??e? t?
rescanning e?et????ta? µ??? t??? ?a?a?t??e? t?? ?
st?? ??se?? i max0, hi-1 -1), , i hi
91???????µ?? ??sa?????
- ??a? ap e??e?a? ?e???sµ?? t?? de??t?? parent ?a?
succ ?a ??e?aste? ?(B logB (Nm)) p??spe??se??
st?? d?s?? a?? e?sa??µe?? ep??eµa st?? ?e???te??
pe??pt?s? - S??????? S mi1 di O(m logB(N m))
p??spe??se?? st?? d?s?? apa?t???ta? ??a t??
e?sa???? t?? ? st? ? - ?p?t???????µe ?d?a ap?d?s? ?e???te??? pe??pt?s??
µe t?? e?sa???? m a???a??? se ??a ?a??????
?-d??t??