??af??e?a 1 - PowerPoint PPT Presentation

1 / 91
About This Presentation
Title:

??af??e?a 1

Description:

– PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 92
Provided by: theo131
Category:
Tags: locus | root

less

Transcript and Presenter's Notes

Title: ??af??e?a 1


1
????t?s? ?????f???a?
??µ?? ?e??t?d?t?s??
2
??a?s?? ?e??t??
  • ??p?? s?µß???se???? p?? µp????? ?a
    de??t?d?t?????
  • - d?µ?? p?? de??t?d?t??? s?µß???se????/?e?µe?a
    f?s???? ????? (linguistic texts)
  • - d?µ?? p?????? de??t?d?t?s??
  • St?? p??t? ?at?????a a?est?aµµ??a a??e?a
    (inverted files), a??e?a ?p???af?? (signature
    files), bitmaps
  • St?? de?te?? ?at?????a d??t?a ep??eµ?t??
    (suffix trees), p??a?e? ep??eµ?t?? (suffix
    arrays), ??????? ?ate?????µe??? ???f?? ???e??
    (DAWG ?a? cDAWG).
  • 2. ???t??? ?p?????sµ?? ?? a??t??? d?µ?? e??a?
    ?at?????e? ??a RAM, ??a t? de?te?e???sa µ??µ? p??
    ?at?????? ? d?µ? string b-tree.

3
Linguistic Text Indexing
  • ???s? Se µe???e? s??????? ?e?µ????. ??????s?
    p????f???a? se µ??f? pe??e??µ???? (se p??a
    ?e?µe?a ß??s?eta? ???e ????). ?p??t?s? se
    e??t?µata.
  • ????d??
  • ??est?aµµ??a ???e?a (Inverted Files)
  • ???e?a ?p???af?? (Signature Files)
  • Bitmaps

4
???t?µata
  • Boolean Queries
  • ??a?e??t??? (Disjunctive t1 ? t2 ?? tq )
  • S??e??t??? (Conjunctive t1 ? t2 ?? tq )
  • S??d?asµ?? t???.
  • Ranked Queries
  • ?p??????eta? ??a score ?µ???t?ta?.
  • Proximity Queries
  • ?es??aße? ??p??a ap?stas? µeta?? t?? ????.

5
??est?aµµ??a ???e?a
?p?te????ta? ap? ??a ?e???? ?a? µ?a a?est?aµ???
??sta ??a ???e ??? µe de??te? p???
?e?µe?a. ?????? ??a s?µp?es? ???s? d-gaps ?a?
s?µp?es? t??? µe global ? local µe??d???. ??
global e??a? pa?aµet??p???µ??e? ? µ?
pa?aµet??p???µ??e?.
6
????d?? S?µp?es?? ??est?aµµ???? ???e???
  • Binary ? log N ? bits ??a ???e de??t?.
  • Unary G?a ??a? a???µ? x ?????µe x 1 as???
    ?a? 1 µ?d?? st? t????. Ge???? as?µf??? µ???d??.
  • ? Unary t?? 1 ? log x ? a????????µe?? ap?
    binary t?? x 2 (? log x ?). ?pa?t???ta?
    pe??p?? 1 2log x bits.
  • d ? ??d???p???s? t?? 1 ? log x ?
    a????????µe?? ap? binary t??
  • x 2 (? log x ?). G?a µe?????? a???µ???
    ?pe???e? t?? ?.
  • ? a??t??? te?????? e??a? µ? pa?aµet??p???µ??e?,
    st?? pa?aµet??p???µ??e? a??????e?ta? µ?a
    µ??te??p???s? µe ???s? bernouli trials ?p??
    prob(x)(1-p)x-1p,
  • ?a? µeta a??????e?ta? ??d???p???s? huffmann ?
    a???µ?t???.

7
????d?? S?µp?es?? ??est?aµµ???? ???e???
  • Global Bernoulli ???s? a???µ?t????
    ??d???p???s?? ? µ???d?? Golomb.
  • G?a µ?a pa??µet?? b, ???e a???µ?? xgt0
    ??d???p??e?ta? se d?? µ??? q1 se unary ?a? t?
    ?p????p? rx-qb-1 se d?ad??? (pe??p?? logb bits).
  • ??e? de???e? ?t? ??a
  • ? te????? ?s?d??aµe? µe t?? optimal huffman
    ??d??a ??a ??a ?pe??? s????? p??a??t?t??.
  • ???s? pa?aµ?t??? b ??a ??e? t?? ??ste? µe b
    (0.69?n) / f.
  • Local Bernoulli ??af??et??? b 0.69N / ft ??a
    ???e ??sta. ?a??te?? s?µp?es? ap? t?? global.
  • Interpolative ??µeta??e?eta? t? clustering.
    S?µp???e? de??te? ?a? ??? d-gaps. ? p?? ap?d?t???
    µ???d?? a??? ? p?? p???p????. ???????e? ??????
    s?µp?es?? se ep?peda.

8
????d?? S?µp?es?? ??est?aµµ???? ???e???
  • Ge???? S?????s? Local ?a??te?e? ap? Global. ?a??
    ap?d?s? ? local Bernoulli µe Golomb, ?a? ?, d
    ap??? µe s?et??? ?a?? ap?d?s?, ????? apa?t?s?
    pa?aµ?t???, e????a ???p?????ta?.

9
???e?a ?p???af??
  • Bitstring Signature Files Hash string ??a ???e
    ???. Se ???e ?e?µe?? µ?a ?p???af? p?? e??a? t? ?R
    t?? hash strings t?? ???e?? t??. G?a ta queries
    ft??????ta? query hash strings. ??e???? ??a false
    matches. ?p?d?t??? ??a p?? s???e???µ??a queries.
  • Bitsliced Signature Files ??ast??f? p??a?a
    ?p???af?? se s????? bitslices. ??????s? ????te???
    bits. ?a??te?a ??a a?a?? bitslices, t?te ????te?a
    false matches.

10
???e?a ?p???af??
  • Blocked Signature Files ??????s? ????af?? se
    blocks. ???e bit se ???e slice a?t?st???e? se ?
    e???af?? (B blocking factor). ??af??e????
    a?t?st????s? a???µ?? e???af?? se a???µ??? blocks
    se ???e slice. ?e???? B ?d??e? se pe??ss?te????
    e??????? ??a false matches, p??spe??se?? st?
    d?s??.
  • Bitmaps ???e ???? ??te? 1 bit se ???e ?p???af?
    µ????? ?s?? µe t?? a???µ? t?? ????. 1-1 hash
    function. ?????sta, ??????a a??? ?ata?a??????
    ????. ?a??te?a ??a common terms. ?ß??d????
    te??????.

11
S?µp?es? ???e??? ?p???af??
  • ? s?µp?es? de? ap?d?de? ?p?? sta a?est?aµµ??a
    a??e?a. ??e? ap??e?e? ?a? e?s??eta? ????ß??.
    ?e??ss?te?a false matches. ?pa?te?ta? ?????? ??a
    ap?s?µp?es?. ???a?a? d?e????se?? slices st??
    ????a µ??µ?. ???a ??µata
  • ????af?? µe µ??? µe µe???e? d?a??µ??se?? (?aµ???
    ap?d?s?).
  • ?eta?e???s? s?????sµ???? ???? (p??ß??µa a?
    ?p???e? d?aµ???as? ???? µe sp?????? ?????).

12
S?????s? ?e??d?? Indexing
  • ??a a?est?aµµ??? ??sta de? e??a? p?t? p?? p????
    ap? ??a a?t?st???? bitslice. ??a ta a?est?aµµ??a
    a??e?a de? ?????? p?t? pe??ss?te?e? p??sß?se??
    st? d?s??. ??a??es? ?ta? t? ?e??????? de ????e?
    st?? ????a µ??µ?. ?a bitslices µp??e? ?a
    ?pe?????? ?ta? ?? ???? t?? query e??a? p????? se
    e?d???? pe??pt?se??.
  • ????? st? ??s?? ta IF apa?t??? 6-10 ?a? ta
    bitsliced SF t? 25-40 t?? s???????.
  • ?pa?t?se?? ???µ?? ?a SF de? apa?t??? ???? ??a
    t? ?e???????, a??? ??a ??a ta ?p????pa ??st?
    ß??s???ta? se s?????s?µa µe???? µe a?t? t?? IF.
  • ?atas?e?? Index p?? ?????ß??a ta SF (hash
    functions ??p).
  • ???at?t?te? ?a?a?????a? ???e e??t?µa t?
    ?e????eta? d?af??et???? epe?e??ast??.

13
S?????s? ?e??d?? Indexing
  • ???µ???s? a?a??a?a se d??aµ???? ??. ??? e?????
    ??a ta bitstring SF. ?? ??st?? µe???eta? µe
    batching, ?a? t?te ta IF e??a? ?a??te?a.
  • Scalability ? ?p?????sµ?? query ??a SF e??a?
    ??aµµ???? ?? p??? t? µ??e??? t?? ??, e?? sta ?F
    ?p???aµµ????.
  • Ranking ?a ?F ?e??????ta? ?a??te?a ranking
    queries.
  • ?pe?tas?µ?t?ta ?a IF t??p?p?????ta? e????a ??a
    t?? ap??t?s? proximity, NOT, ranked queries.

14
S?µpe??sµata
  • G?a t?p???? efa?µ???? de??t?d?t?s?? ta IF
    ?pe??????. ?a SF apa?t??? µe???? ???? ?a? p???
    ????? ?a ?atas?e?ast???. ?a ?F e??a? p??
    ap?d?t??? se ranked queries ?a? proximity
    queries.

15
  • Full Text Indexing
  • ????? d?µ? de??t?d?t?s?? (ap?????? a??????µ??,
    a??????µ?? ßas???? p??epe?e??as?a?, Knuth Morris
    Pratt, Boyer Moore, Aho Corasick Automaton)
  • ?e d?µ? de??t?d?t?s?? (suffix tree, suffix array,
    dwag, string b-tree)

16
?as???? ???sµ??
  • S?µß???se???-string xx1x2..xn, xi?S
    xn
  • x acgttaaaca, x10 Sa,c,g,t
  • ?e?? s?µß???se??? e
  • ?p?-s?µß???se???-substring w xuwv
  • ????eµa Prefix w xwu
  • ?p??eµa-Suffix w xuw
  • ???e s?µß???se??? S, µ????? Sm, ??e? m d??at?
    µ? ?e?? ep???µata p?? e??a? ta a??????a S1m,
    S2m, . Sm-1m ?a? Sm.
  • ?a??de??µa "sequence" sequence, equence,
    quence, uence, ence, nce, ce, e.

17
?? ???t?? ?p??eµ?t?? Suffix Tree
  • ???sµ?? ap????e?e? ??a ta d??at? ep???µata µ?a?
    s?µß???se???? S (?p??es? t? te?e?ta?? ???µµa
    t?? a?fa???µ?t???? de? eµfa???eta? st? es?te????
    t??)
  • ??af???p???s? ap? Pat Tree p?? ap????e?e? µ???
    ???e?? ?a? efa?µ??e? d?af??et??? ?????? s?µp?es??
    (??at? p??t? ???µµa se ???e a?µ?).
  • x xabxac

18
Trie
???sµ?? ?st? s?µpa? USl ??a a?f?ß?t? S ?a?
lgt0. Trie ?a?e?ta? t? ?-d??? d??d??
(?S) t? ?p??? pe????e? ??a ta
p????µata t?? st???e??? t?? S. ???e
ep?ped? t?? d??d??? a?t?st?????eta? ?a? se
??a di (d1????a). ???e st???e?? xx1x2xl
t?p??ete?ta? st? ?p?d??d?? x1?x2? .
19
Trie - pa??de??µa
S102, 120, 121, 210, 211, 212, S0,1,2
?????? ins/del/search ?(l) ????? ?(nlk)
20
Compressed Trie
??????(nlk) ??(nk)O(n)
21
Compressed Trie - example
22
Suffix Tree
???sµ?? ?? suffix tree µ?a? s?µß???se????
S1n e??a? ??a s?µpa??? trie p?? pe????e?
?? ??e?d?? ??a ta ep???µata Sin, 1in.
23
Naïve ?atas?e??
Sbcabc
2 cabc
1 bcabc
?????? ?(n2) Saaaaa...
24
Suffix Links Speed Up
i-1
i
x u
z
u
ux
z
z
i
i-1
- St? d??t?? ?? µ??? t? headi de? ?a ??e? valid
suffix link - St? i-?st? ß?µa ? a??????µ??
ep?s??pteta? t? contracted locus t?? headi st?
Ti-1
25
Suffix Links Speed Up
Sbbbbbababbbaabbbbb i-1Inserted
S13..19abbbbb xa, ubbb, zbb u1b,
u2bb i Insert S14..19BBBBB
a
b
Node c
a..
b
b
b
Node d
a..
bb
b
head13
a..
b
bb
b..
a..
head14
a..

26
Exact Pattern Matching
p1
p2
p3
?????? ?(nma)
pm
c
i3
i1
i2
i
27
Repeated Sub-Sequences - Regularities
u
u
u
umax
i3
i1
i2
umax
umax
umax
Longest Repeated Substring
28
Longest Common Substring of two (more) Strings
umax
text1
i
umax
text2
j
Generalized Suffix Tree
umax
29
Maximal Pairs
i
j
x
x
gap
Gusfield O(na)
Brodal O(nlogna) , t1gapt2 O(na)
, t1gap
30
Maximal Pairs in Multiple Strings
x
x
gap
31
Nearest Common Ancestor Suffix Tree
r

nca(x,y)u se ????? ?(1)
w
u
b
a
y
x
nca(001, 101)leftmost1(XOR(001, 101)) 100 1xx
nca(001, 111) leftmost1(XOR(001, 111)) 110
1xx
nca(011, 010)leftmost1(XOR(011, 010) 001
32
Exact Matching with wild cards using longest
common extention
text
acgtttaacctttgagtttgggcv
at
pattern



33
Suffix Arrays
  • ?st?
  • - a?f?ß?t? S
  • - ?e?µe?? ??a0a1 a?-1 µe?????? ?, ai?S,
  • 0 ? i lt?.
  • S?µß???se??? W?w0w1 wP-1
  • ??te?ta? ?a ß?e???? ??e? ?? eµfa??se?? t?? W st?
    A.

34
1? p??s????s? - Suffix trees
  • Suffix tree ??a t? ?
  • A???ß?? ? f???a, a???µ?µ??a 0 ??? ?-1
  • K??e es?te????? ??µß?? e?t?? t?? ???a? ??e?
    t??????st?? d?? pa?d??
  • Se ???e a?µ? ? et???ta ? µ? ?e?? substring t?? ?
  • Se ???e ??µß?, ?? et???te? t?? e?e???µe??? a?µ??
    ?e?????e ap? d?af??et??? ?a?a?t??a
  • S?????s? t?? et??et?? t?? a?µ?? se µ???p?t? ap?
    t? ???a p??? f???? k ? suffix t?? ? st? ??s? k.
  • ?atas?e?? ?????? ?(? logS), ????? ?(?)
  • ?p???e? ??aµµ??? ??s? (a??????µ?? Farach)
  • ?p??t?s? e??t?µ?t?? ?????? ?(P logS)

35
2? p??s????s? - Suffix Arrays
  • ??ateta?µ??? ??sta ???? t?? ep??eµ?t?? t?? ?.
  • ?atas?e?? ?????? ?(? log N) (st? µ?s? pe??pt?s?
    ?µ?? ? ?????? e??a? ?(?))
  • ????? 2? a???a???
  • ?p??t?s? e??t?µ?t?? ?????? P?log2(N-1)?
    s?????se?? s?µß????

36
Suffix Trees vs. Suffix Arrays
  • Query Is W a substring of A?
  • Suffix Tree
  • O(PlogS) with O(N) space, or
  • O(P) with O(NS) space (impractical)
  • Suffix Array
  • Competitive/better O(PlogN) search
  • Main advantage Space 2N integers
  • (In practice, problem is space overhead of query
    data structure)
  • Another advantage Independent of S

37
???sµ??
  • Ai ? suffix t?? ? p?? ?e???? st? ??s? i, d??. ?i
    aiai1a?-1
  • ?? u string, t?te up ? p???eµa t?? p??t?? p
    s?µß???? t?? u.
  • u, ? strings, t?te ??????µe t? s??s? ltp ?? e???
  • u ltp ? ? up lt ?p
  • Oµ???? ?a? ?? s??se?? p, gtp, ?p, ?p, ?p

38
??a??t?s?
Pos p??a?a? ? ??se??, ?e??????af???
d?ateta?µ??a ta ep???µata t?? ?. ?Pos0 lt
?Pos1 ltlt ?PosN-1 ?st? LW min k
W?PAPosk ? kN RW max k W?PAPosk ?
k-1 ?? LWlt Rw, t?te ??a ???e k?LW, Rw, ?a?
i Posk, ?a e??a? W ? aiai1 aiP-1, ?a?
a?t?st??fa. ???, a? LWlt Rw, t?te ?p?????? RW - LW
1 eµfa??se?? t?? W st? ?. Pos d?ateta?µ???? ?
binary search ??a LW, Rw ? ?????? O(P logN).
39
?e?d???d??a? - 1 (a?a??t?s? ??a t? LW)
if W ?P APos0 then LW 0 else if W ?P
APosN-1 then LW N else L 0, R N -
1 while R-L gt1 do M (LR)/2 if W ?P
APosM then R M else L M LW
R
40
Let A mississippi
L
i
ippi
issippi
Let W issa
ississippi
mississippi
pi
M
ppi
sippi
sisippi
ssippi
R
ssissippi
41
??? ?a µe???e? ? ??????
lcp(?,w) ? µ??e??? t?? longest common prefix ?,
w llcp(APosL,W), rlcp(W, APosR). ??????, l
lcp(APos0,W), rlcp(W, APosN-1) K??e
s?????s? t?? W µe t? APosM e??µe???e? t? l ? t?
r. ?PosL l W, W r APosR, ?
APoskhW, ?k?L,R , h min(l, r) ? t?
p????? t?? s?????se?? s?µß???? p?? apa?t???ta?
µe???eta? ?at? h.
42
??? ?a µe???e? ep?p???? ? ??????
  • p???p?????sµ??? p????f???a s?et??? µe t? lcp t??
    ? Pos? µe ta ?PosL, ?PosR.
  • ?-2 t???de? (L, M, R) µe µesa?? s?µe?? M?1,
    N-2, 0 ? L lt M lt R ? N-1.
  • (LM, M, RM) ? µ??ad??? t???da µe t? ? µesa??
    s?µe??.
  • Llcp , R lcp p??a?e? µe?????? ?-2
  • LlcpMlcp(APosLM, APosM)
  • R lcpM lcp(APosM, APosRM).

43
?e??s? s?????se?? ?
  • ?st? µ?a epa?????? t?? ß?????? a?a??t?s??, t???da
    (L, M, R), H max(l, r), ?H ? d?af??? µeta?? t??
    t?µ?? t?? H st?? a??? ?a? t? t???? t??
    epa???????.
  • ?st? r ? l H. ( llcp(APosL,W) )
  • ?e??pt?s? 1 LlcpM gt l , ??a lcp(AposLM,
    APosM) gt l.
  • ??te, APosM l1 APosL ?l1 W.
  • W e??a? st? de?? µ?s?, L M, l µ??e? ?p?? e??a?.
  • ?e??pt?s? 2 LlcpM lt l , ??a lcp(AposLM,
    APosM) lt l.
  • ??te, W l APosL ltl APosM ,
  • W e??a? st? a??ste?? µ????, ??a t?µ? t?? r e??a?
    LlcpM.

44
?e??s? s?????se?? ??
  • ?e??pt?s? 3 LlcpM l, ??a lcp(APosLM , A
    Pos M ) l.
  • ??te, A PosM l W.
  • S????????µe t? l1 s?µß???, t? l2 ?t?, µ????
    lj, t? p??t? ??a t? ?p??? e??a? W ?lj A PosM.
  • To lj ?a?????e? a? t? W e??a? st?? a??ste?? ?
    de??? p?e???, ??a t?µ? t?? r ? t?? l e??a? lj-1.
  • St?? a??? t?? epa??????? e??a? l h, ??a ?H1
    s?????se?? s?µß????.
  • ?H 1 s?????se?? ??a ???e epa??????, S?H ? P
  • S??????? p????? s?????se?? s?µß????
  • t? p??? P ?log2(N-1)?
  • O(P log N) ?????? st? ?e???te?? pe??pt?s?.

45
(No Transcript)
46
?e?d???d??a? - 2
l lcp(APos0,W), r lcp(APosN-1,W) if lP
or wl? aPos0l then LW 0 else if r lt P and
wr gt aPosN-1r then LW N else L0,
RN-1 while R-Lgt1 do M (LR)/2 if l ? r
then if LlcpM?l then m llcp(APosMl,Wl)
else m LlcpM else if
RlcpM? r then m rlcp(APosMr,Wr) el
se m RlcpM if m P or wm ? aPosMm
then RM, rm else LM, lm LW
R
47
DAWG (Directed Acyclic Word Graphs) and cDAWGs
48
(No Transcript)
49
(No Transcript)
50
?e?te?e???sa ???µ?
  • ?e???e? s??????? ??e?t??????? ?e?µ???? ta ?p??a
    e??a? ete???e?? µeta?? t??? ?a? p???????ta? ap?
    d??f??e? p????
  • ?e???? ??e?t??????? ?? t?? t???? t?? Gigabyte /
    Terabyte
  • ???s? s?s?e??? de?te?e???sa? µ??µ?? (s??????
    d?s???, µa???t??? ta???a, DVD)
  • ?? d?µ?? de??t?d?t?s?? t?? ded?µ???? ?a? ??
    µ??a??? a?a??t?s?? e??a? ßas??? e??a?e?a ??a t??
    ap????e?s?, e??µ???s? ?a? e?a???? ???s?µ??
    p????f???a?

51
??µ?? ?ed?µ????
Supra-suffix array (?pe?-p??a?a? ep??eµ?t??
?ß??d??? d?µ?) Compact pat-trees (s?µpa??
pat-d??t?a d?ad??? a?apa??stas?, µetaf??? t??
d?µ?? se de?te?e???sa µ??µ?) ?-d??t?? p???eµ?t??
(prefix B-tree) String B-tree
52
String B-Tree
  • S??d?asµ?? t?? B-Trees ?a? Patricia Tries ??a
    de??te? es?te????? ??µß??
  • ????? p??ste?e? ep?p???? de??te? ??a ?a a????e? ?
    ta??t?ta a?a??t?s?? ?a? ?? ?e?t?????e? e??µ???s??
  • ?a String B-Trees ????? t?? ?d?a ap?d?s?
    ?e???te??? pe??pt?s?? µe ta ?a?????? B-Trees a???
    ep?t???????? ap????e?s? ape?????st?? µ?????
    a?fa???µ?t???? ?a? e?te???? t?? ?e?t?????e?
    a?a??t?s?? a?t?st???a µe ta suffix trees.

53
String B-Tree S?µß???sµ??
  • S?µß??????µe ??a a?fa???µ?t??? s ?a?a?t???? µe
    ?1 , s ?a? µe ? ?a ??????µe t? µ???? t?? (s)
  • Ta s?µß??????µe µe ?1,i ??a p???eµa, µe ?j ,s
    ??a ep??eµa ?a? µe ?i, j ??a ?p?-a?fa???µ?t???
    t?? ?, ?p?? 1 ? i ? j ? s
  • ??a a?fa???µ?t??? µ?t?ß? ? ?p???e? µ?sa st? ?
    ?ta? µp????µe ?a ß???µe ?p?-a?fa???µ?t??? t??
    ?i, i P - 1 p?? ?a ?s??ta? µe t? ?

54
???ß??µa 1 ??a??t?s? ????eµ?t?? ?a? ???t?µa
??????
  • ??????µe ? d1,..,d? ??a s????? a?fa???µ?t????
    e??? ?e?µ???? p?? t? s??????? t??? µ???? e??a? ?
  • ?p????e???µe t? ? ?a? t? ??at?µe ta????µ?µ???
    st?? e??te???? µ??µ?
  • Prefix Search (P) ??a?t? ??a ta a?fa???µ?t???
    t?? ? p?? t? p???eµ? t??? e??a? t? µ?t?ß? ?
  • Range Query (K, K) p?? a?a?t? ??a ta
    a?fa???µ?t??? t?? ? a??µesa st? ? ?a? t? ? se
    ?e??????af??? se???

55
???ß??µa 2 ??a??t?s? ?p?-a?fa???µ?t????
  • ?? e??t?µa Substring Search (P) ß??s?e? ??a ta ?
    p?? ?p?????? sta a?fa???µ?t??? t?? ?
  • ?? occ d????e? t? p????? t?? ?p???e??
  • ?p?te?e? ep??tas? t?? p??ß??µat?? 1, µe t? s?????
    ?a apa?t??eta? ap? ??a ta ep???µata
    a?fa???µ?t???? t?? ?.

56
???t??? ???µ??
  • Ta a?a??s??µe ta p??ß??µata 1 ?a? 2 µe ß?s? t?
    ??ass??? µ??t??? µ??µ?? 2 ep?p?d?? Cormen et al.
    1990
  • Te????µe ?t? ?p???e? µ?a ??????? ?a? µ???? ????a
    µ??µ? (RAM) ?a? µ?a p?? a??? a??? p??? µe????
    e??te???? µ??µ? (s?????? d?s???, DVDs). ?
    e??te???? µ??µ? e??a? ????sµ??? se blocks
    µetaf????, p?? ?a????ta? se??de? d?s??? (disk
    pages)

57
???t??? ???µ??
  • ???e se??da d?s??? pe????e? ? at?µ??? a?t??e?µe?a
    p?? µp??e? ?a e??a? a???a???, ?a?a?t??e? ?
    de??te?
  • ?? ? ???µ??eta? µ??e??? se??da? d?s??? (disk page
    size) ?a? ? e???af? ? ? a?????s? p??sp??as? st??
    d?s?? (disk access)

58
????p????t?ta ???ß??µa 1
  • ? Prefix Search (P) apa?te? O(( p occ) / B
    logB k) p??spe??se?? st?? d?s?? st?? ?e???te??
    pe??pt?s?, p P
  • ? Range Query (K,K) apa?te? ?(( k k occ)
    / B logB k) p??spe??se?? st?? ?e???te??
    pe??pt?s?, k ? ?a? k ?
  • ? e?sa???? ? d?a??af? e??? a?fa???µ?t???? µ?????
    m t?? s?????? ? apa?te? O(m/B logB k)
    p??spe??se?? st?? ?e???te?? pe??pt?s?
  • ? ???s? t?? ????? e??a? T(k/B) se??de? d?s???,
    ?p?? ? ????? p?? ?ata?aµß??eta? ap? t? s????? ?
    e??a? T(?/?) se??de?

59
????p????t?ta ???ß??µa 2
  • ? Substring Search (P) apa?te? O(( p occ) / B
    logBN) p??spe??se?? st?? d?s?? st?? ?e???te??
    pe??pt?s?, ?p?? p P
  • ? e?sa???? ? d?a??af? e??? a?fa???µ?t???? µ?????
    m t?? s?????? ? apa?te? O(m logB (Nm))
    p??spe??se?? st?? ?e???te?? pe??pt?s?
  • ? ????? p?? ???s?µ?p??e?ta? ap? t? String B-Tree
    ?a? ap? t? s????? ? e??a? T(? / ?) se??de? d?s???

60
?p????e?s? ??fa???µ?t????
61
?p????e?s? ??fa???µ?t????
  • ?e a?t? t? d?µ? µp????µe ?a e?t?p?s??µe t? se??da
    d?s??? p?? pe????e? t?? i-?st? ?a?a?t??a e???
    a?fa???µ?t???? e?te???ta? ??a? sta?e?? a???µ?
    ap??? a???µ?t???? p???e?? st?? de??t? t??
  • ?p????µe ?a ?µad?p???s??µe T(?) ???????? de??te?
    se µ?a µ??? se??da d?s???, a??? a? d?aß?s??µe
    µ??? a?t? t? se??da de? ?a e?µaste se ??s? ?a
    a?a?t?s??µe ????? t??? ?a?a?t??e? t??
    a?fa???µ?t????

62
?p????e?s? ??fa???µ?t????
  • ?p????µe ?a s????????µe ?p??ad?p?te d??
    a?fa???µ?t??? ?a?a?t??a p??? ?a?a?t??a, a??? a?t?
    ? p???? e??a? p??? a?ap?te?esµat??? a?
    epa?a?aµß??eta? d??t? t? ??st?? ?e???te???
    pe??pt?s?? e??a? a?????? t?? µ????? t?? d??
    a?fa???µ?t????
  • ?a???µe t? p??ß??µa a?t? epa?as???s? (rescanning)
    d??t? ?? ?d??? ?a?a?t??e? epa?e?et????ta? p?????
    f????

63
B-tree Like ??µ?
  • ??apa??st??µe ta a?fa???µ?t??? µ?s? ???????
    de??t??
  • Sa? e?s?d? ????µe t? s????? a?fa???µ?t???? ? µe
    s??????? a???µ? ?a?a?t???? ?
  • ??????µe ? ?1,, ?? ta st???e?a t?? s?????? ?
    se a????sa ?e??????af??? se??? (?L)
  • ?a a?fa???µ?t??? t?? ? d?a??µ??ta? sta f???a t??
    B-Tree, ?a? µ??? ???sµ??a a?fa???µ?t???
    ß??s???ta? st??? es?te?????? ??µß???

64
B-tree Like ??µ?
  • ??????µe t? d?ateta?µ??? s????? a?fa???µ?t????
    p?? s??d???ta? µe t?? ??µß? p ?? Fp? ?, ?a?
    s?µß??????µe t? a??ste??te?? ?a? de???te??
    a?fa???µ?t??? t?? Fp µe L(p) ?a? R(p) a?t?st???a
  • ???e ??µß? p t?? ap????e???µe se µ?a se??da
    d?s??? ?a? ??t??µe pe?????sµ? st? p????? t??
    a?fa???µ?t???? t?? b ? Fp ? 2b, ?p?? bT(?)
    e??a? ??a? ????? a???a??? ep??e?µ???? ?ts? ?ste
    ?a µp??e? ??a? ??µß?? ?a ????se? se µ?a se??da
    d?s???
  • ???? ? ???a ep?t??peta? ?a ??e? ????te?a ap? b
    a?fa???µ?t???

65
B-tree Like ??µ?
66
B-tree Like ??µ?
  • ???????µe t? s????? ? se ?µ?de? t?? b s??e??µe???
    a?fa???µ?t????
  • ?a?t???af??µe ???e ?µ?da se ??a f????, ?st? p,
    ?a? a?apt?ss??µe t? Fp
  • ???e es?te????? ??µß?? p ??e? n(p) pa?d??
    s1,,sn(p) ?a? t? d?ateta?µ??? s????? t?? p??
    e??a? t? Fp L(s1), R(s1), , L(sn(p)) ,
    R(sn(p))
  • ?f?? n(p) Fp / 2, ???e ??µß?? ??e? ap? b/2
    µ???? b pa?d?? e?t?? ap? t? ???a ?a? ta f???a.
    ??a t? te???? ???? t?? B-Tree p?? s??µat?st??e
    e??a? HO(logb/2 k) O(logB k)

67
Prefix Search (P)
  • G?a t?? a????s? t?? p????? Prefix Search(P)
    ?a ßas?st??µe se d?? pa?at???se?? t?? Manber ?a?
    Mayers
  • Ta a?fa???µ?t??? p?? ????? p???eµa P
    ?ata?aµß????? ?e?t?????? ??se?? t?? ?
  • T? a??ste??te?? a?fa???µ?t??? p?? ??e? p???eµa P
    e??a? ?e?t????? t?? P st? s????? ? µe ß?s? t??
    a????sa ?e??????af??? se???

68
Prefix Search(P)
  • ??apa??st??µe t? ??s? t?? P st? s????? K µe t?
    ?e???? (t , j) ?ts? ?ste t? t ?a e??a? t? f????
    p?? ??e? a?t? t? ??s? ?a? j-1 ?a e??a? t? p?????
    t?? a?fa???µ?t???? t?? Ft p?? e??a? ?e??????af???
    µ????te?a ap? t? P, ?p?? 1 ? j ? Ft 1
  • ??µe ?t? t? j e??a? ? ??s? t?? P st? Ft
  • ???????µe t?? d?ad??as?a p?? ?a?????e? t?? ??s?
    t?? P st? s????? Fp µe t?? p???? PT-Search(P, Fp)

69
Prefix Search(?) ???????µ??
  • ???????µe t?? d?? pe??pt?se?? st?? ?p??e? t? P
    e??a? µ????te?? / µe?a??te?? ap? ??a ta
    a?fa???µ?t??? t?? K
  • ?? ?a? ?? d?? ??e???? e??a? a???t???? , ?e????µe
    µe proot ?a? e?te???µe s???s? p??? ta ??t? t??
    B-tree
  • ?ta? ep?s?ept?µaste ??a? ??µß? p, f??t????µe t??
    a?t?st???? se??da d?s??? ?a? efa?µ????µe t??
    d?ad??as?a PT-Search ?ste ?a ß???µe t? ??s? j t??
    P st? s????? Fp
  • ?a d?? ?e?t????? a?fa???µ?t??? ?a???????ta? ap?
    t? s??s? Kj-1 ltL P ?L Kj

70
Prefix Search(?) ???????µ??
  • ?? ? ??µß?? p e??a? f???? staµat?µe t? s???s?,
    a????? ????µe t?? a??????e? d?? pe??pt?se??
  • ?? ta a?fa???µ?t??? Kj-1 ?a? Kj a?????? se d??
    ?e????st? pa?d?? t?? p t?te ta d?? a?fa???µ?t???
    e??a? ?e?t????? st? ?. ?st? s t? pa?d? t?? p p??
    pe????e? t? Kj ??a?????µe t? t ?? t?
    a??ste??te?? f???? t?? d??t??? p?? ?at??eta? ap?
    t? s ?a? ?????µe ?t? t? ? ß??s?eta? st?? p??t?
    ??s? t?? Ft d??t? L(t) L(s) Kj
  • ?? ta Kj-1 ?a? Kj a?????? st? ?d?? pa?d? t?? p,
    s??e?????µe t? s???s? pe???d??? st? ep?µe??
    ep?ped?

71
Prefix Search(?) ???????µ??
  • Se ???e pe??pt?s? , st? t???? t?? s???s?? ?a
    ß???µe ??a ?e???? (tL , jL) p?? ?a a?t?p??s?pe?e?
    t? ??s? t?? a??ste??te??? a?fa???µ?t???? t?? ?
    p?? ??e? p???eµa ?
  • ??t?st???a µp????µe ?a ß???µe t? ?e???? (tR ,
    jR)
  • G?a ?a apa?t?s??µe st? e??t?µa Prefix Search (P)
    ?a p??pe? ?a sa??s??µe t? se??? t?? f????? p??
    ??????ta? ap? ta tL ?a? tR ?a? ?a ft?????µe
    ??sta ap? t? jL-?st? a?fa???µ?t??? t?? FtL ??? t?
    (jR 1)-?st? t?? FtR

72
?p??p???µ??? String B-Tree
  • ???a? ? s??d?asµ?? t?? B-Tree-like d?µ?? p??
    ????µe µe t? Patricia Trie
  • S??p?? µa? e??a? ?a ???a??s??µe s?st? ta
    a?fa???µ?t??? ?a? ?a ?????µe a?a??t?se?? p?? ?a
    apa?t??? s?????s? µ??? e??? a?fa???µ?t???? t?? Fp
    st?? ?e???te?? pe??pt?s?, se s??s? µe t??
    log2Fp p?? ??e???eta? ? ap?? d?ad??? a?a??t?s?

73
?p??p???µ??? String B-Tree
74
?p??p???µ??? String B-Tree
  • ?p????µe ?a ???s??µe t? ?? se d?? ß?µata
  • (1) ?atas?e?????µe ??a s?µpa??? Trie µe ta
    a?fa???µ?t??? t?? Fp
  • (2) ?????µe et???ta se ???e ??µß? t?? Trie µe
    ß?s? t? µ???? t?? ?p?-a?fa???µ?t???? p??
    ap????e?eta? se a?t?? ?a? a?t??a??st??µe ???e
    a?fa???µ?t??? p??? sta ??ad?? µe t?? p??t? t??
    ?a?a?t??a

75
?p??p???µ??? String B-Tree
  • ?? Patricia Trie ???e? ??p??a p????f???a se s??s?
    µe t? s?µpa??? Trie ???? t?? d?a??af?? t??
    ?a?a?t????, e?t?? t?? p??t??, sta ??ad?? t??
    d??t???
  • ???e? T(?) a?fa???µ?t??? se ??a? ??µß? t??
    B-Tree, a?e?a?t?t?? t?? µe?????? t???
  • ?p?t??pe? ?e??????af???? a?a??t?se?? se ??a?
    ??µß?, ????? ep?p???? p??spe??se?? t?? d?s???

76
?a??de??µa PT-Search
77
?a??de??µa PT-Search
  • Ta d??µe ??a pa??de??µa PT-Search(P, Fp) , ?p??
  • P bcbabcba
  • ???ste?? fa??eta? ? p??t? f?s? st?? ?p??a t? l
    a?apa??st? t? de???te?? f????
  • ?e????µe p??sd???????ta? t? ????? p???eµa t??
    a?fa???µ?t???? t?? l ?a? t?? ? (p.?. bcb) ?a?
    st? s????e?a ß??s???µe t?? ?aµ???te?? p?????? t??
    l p?? ??e? et???ta µe?a??te?? ap? bcb3
  • St?? s????e?a ???s?µ?p????µe t?? ata???ast?
    ?a?a?t??a ?4 a ??a ?a ß???µe t?? a???ß?
    ??s? t?? ? (j 4) d?as?????ta? ta µa??a??sµ??a
    t??a

78
?a??de??µa PT-Search
  • S??d?????ta? ta Patricia Tries µe t? B-Tree
    ap?fe????µe t? d?ad??? a?a??t?s? st??? ??µß???
    p?? pe???µe ?a? ?ts? µe?????µe t? s???????
    p???p????t?ta
  • ?? ???? a?t? de? e??a? a??µa ??a??p???t???. ?
    ????? e??a? ?t? se ???e ??µß? p?? ????µe
    ep?s?efte? ?a?asa?????µe (rescanning) t? ? ap?
    t?? a???

79
?a??de??µa PT-Search
  • ???pe? ?a s?ed??s??µe ?ts? t? d?ad??as?a
    PT-Search ?ste ?a e?µeta??e?eta? ?a??te?a t??
    ?d??t?te? t?? String B-Tree ?a? t?? Patricia Trie
  • Ta ???s?µ?p???s??µe t?e?? pa?aµ?t???? e?s?d??
    (?, Fp, l) ?p?? ? pa??µet??? l ??a??p??e? t??
    ?d??t?ta ?t? ?p???e? ??a a?fa???µ?t??? st? Fp p??
    ?? l p??t?? ?a?a?t??e? t?? e??a? ?s?? µe t?? P
  • ? d?ad??as?a PT-Search (?, Fp, l) ep?st??fe?
    ?e???? (j, lcp) ?p?? t? j e??a? ? ??s? t?? ? st?
    Fp ?a? ? pa??µet??? lcp e??a? t? µ???? t?? ??????
    p????µat??

80
????p????t?ta Prefix Search
  • ??a?t?µe ta a?fa???µ?t??? t?? ? p?? ????? p???eµa
    ? e?et????ta? ta f???a t?? String B-Tree p??
    ??????ta? ap? ta tL ?a? tR se ?(occ/B)
    p??spe??se?? st?? d?s??
  • ?? s??????? ??st?? t?? Prefix Search (P) e??a?
    ?((p occ) / B logB k) p??spe??se?? st?? d?s??

81
???ß??µa 2 Substring Search
  • ?? p??ß??µa 2 af??? se µ?a p?? ap?d?t??? p????
    Substring Search(P) p?? a?a??t? eµfa??se?? t?? ?
    sta a?fa???µ?t??? t?? ?
  • ? a?a??t?s? ßas??eta? st?? e??es?
    ?p?-a?fa???µ?t???? µ????? ? p?? ?s???ta? µe t? ?
  • ??????µe t? s????? t?? ep??eµ?t?? ?? SUF(?)
    di, d 1 ? i ? d , ?p?? d??, t? ?p???
    pe????e? ? ?e??????af??? d?ateta?µ??a ep???µata

82
???ß??µa 2 Substring Search
  • ?? p??ß??µ? µa? e??a? ?a a?a?t?s??µe ??a ta
    SUF(?) a?fa???µ?t??? p?? ????? p???eµa ?
  • ? p???? Substring Search(P) st? s????? ?
    µetas??µat??eta? se Prefix Search(P) st? s?????
    SUF(?)

83
?a??de??µa Substring Search
84
?a??de??µa Substring Search
  • Pat, ? aid, atlas,atom, attenuate,
    car, patent, zoo, b 4 ?a? ?aid,
    ar, as, . . . , uate, zoo
  • St? s???e???µ??? pa??de??µa ????µe occ4
    eµfa??se?? atlas,atom, attenuate ?a?
    patent
  • ?a ep???µata µe p???eµa ? p?? a?tap???????ta? se
    a?t?? t?? ?p???e?? ????? t??? ???????? t???
    de??te? ap????e?µ????? se ?e?t????? f???a t??
    d??t???
  • ?p????µe ?a ???s??µe t? s????? ?SUF(?) ?a? t?
    µ??e??? t?? kN ?a? ?a e?te??s??µe t?? p????
    Prefix Search(P)

85
String B-Tree
  • H e?sa???? e??? µ??? a?fa???µ?t???? ? st? s?????
    ?, ?p?? m Y, apa?te? t?? e?sa???? ???? t?? m
    ep??eµ?t?? st? s????? SUF(?) se ?e??????af???
    se???
  • ?? p??ß??µa e??a? ?t? ?e?????µaste ta m
    e?sa??µe?a ep???µata sa? te?e??? ?e????st?
    a?fa???µ?t??? ?a? pa???s???eta? t? fa???µe?? t??
    rescanning
  • ?pe?te????µe t? ap??p???µ??? String B-Tree
    e?s????ta? d?? t?p??? ß????t???? de??t?? µe t???
    ?p????? ?a ap?f????µe t? rescanning ?at? t?
    d?ad??as?a t?? e??µ???s??

86
String B-Tree
  • ? ??a? t?p?? de??t? e??a? ? ???st?? de??t?? ????a
    p?? ????eta? ??a ???e ??µß?
  • ? ????? e??a? ? succ de??t?? p?? ????eta? ??a
    ???e a?fa???µ?t??? st? SUF(?) ?? e??? ? succ
    de??t?? ??a t? di, d ?SUF(?) ?d??e? st? f????
    t?? String B-Tree p?? pe????e? t? di1, d. ??
    ?s??e? i d t?te ? succ de???e? st? ?d?? t??
    t? f???? (self-loop pointer)

87
???????µ?? ??sa?????
  • ?et? t?? e?sa???? t?? ?i, m p??pe? ?a
    ??a??p?????ta? ?? pa?a??t? d?? s?????e?
  • ?a ep???µata ?j, m e??a? ap????e?µ??a st?
    String B-Tree, ??a 1 ? j ? i ?a? t? ?i, m
    µ?????eta? t??? hi p??t??? ?a?a?t??e? t?? µe ??a
    ap? ta ?e?t????? t?? a?fa???µ?t???
  • ???? ?? succ de??te? e??a? s?st? t?p??et?µ????
    ??a ??a ta a?fa???µ?t??? st? String B-Tree e?t??
    ap? t? ?i, m. ??t? s?µa??e? ?t? ? succ(?i, m)
    e??a? ? µ???? e?a?t?µe??? de??t?? e?t?? ?a? a? i
    m ?p?te ?a? de???e? st? ?d?? t?? t? f????

88
???????µ?? ??sa?????
  • ??s????µe ta ep???µata t?? ? st? String B-Tree
    ap????e???ta? t? SUF(?) st?? a??? ap? t?
    µe?a??te?? st? µ????te??
  • S??e?????µe t?? e?sa???? ??a i 1, 2, , m
  • G?a i1, ß??s???µe t? ??s? t?? ?1, m sa?????ta?
    t? d??t?? ?p?? ?a? st?? pe??pt?s? t?? Prefix
    Search(?i, m)
  • G?a ta ?p????pa ep???µata t?? ? (igt1)
    ???s?µ?p????µe d?af??et??? p??s????s? ??a ?a
    ap?f????µe t? rescanning

89
???????µ?? ??sa?????
  • ?ta? ß???µe t? ??s? t?? ?i, m, a?t? ?a
    ?e????s??µe ap? t? ???a, sa?????µe t? d??t?? ap?
    t? te?e?ta?? f???? p?? ep?s?eft??aµe
  • ?????µe ?t? t? ?i-1, m µ?????eta? t??? hi-1
    p??t??? ?a?a?t??e? t?? µe ??p??? ap? ta ?e?t?????
    t?? a?fa???µ?t???
  • ?p????µe ?a p????µe t?? de??t? succ t??
    ?e?t?????? a?fa???µ?t????, ?a? ?a ?ata?????µe se
    ??a f???? µe t?? pa?a??t? ?d??t?ta ?a pe????e?
    ??a a?fa???µ?t??? p?? ?a µ?????eta? t??? p??t???
    max0, hi-1 -1) ?a?a?t??e? µe t? ?i,m

90
???????µ?? ??sa?????
  • S??e?????µe t?? e?sa???? e?te???ta? µ?a p??? ta
    p??? ?a? p??? ta ??t? s???s? t?? String B-Tree
    µ???? ?a ft?s??µe st? f???? pi p?? ?a pe????e? t?
    ??s? t?? ?i, m
  • ?p? t? st??µ? p?? µp????µe ?a ap?de????µe ?t? h??
    max0, hi-1 -1) ? a??????µ?? µa? ap?fe??e? t?
    rescanning e?et????ta? µ??? t??? ?a?a?t??e? t?? ?
    st?? ??se?? i max0, hi-1 -1), , i hi

91
???????µ?? ??sa?????
  • ??a? ap e??e?a? ?e???sµ?? t?? de??t?? parent ?a?
    succ ?a ??e?aste? ?(B logB (Nm)) p??spe??se??
    st?? d?s?? a?? e?sa??µe?? ep??eµa st?? ?e???te??
    pe??pt?s?
  • S??????? S mi1 di O(m logB(N m))
    p??spe??se?? st?? d?s?? apa?t???ta? ??a t??
    e?sa???? t?? ? st? ?
  • ?p?t???????µe ?d?a ap?d?s? ?e???te??? pe??pt?s??
    µe t?? e?sa???? m a???a??? se ??a ?a??????
    ?-d??t??
Write a Comment
User Comments (0)
About PowerShow.com