Title: PAT?(Patricia%20tree)
1?????
2??
- ????
- ????
- ????
- ????
- PAT?(Patricia tree)
- ????
3??
- ????????
- ?????????????
- ???????????????
- ????????
- ??(search)
- ?????????????
- ??(query)
- Query???????????????
- ?????????????????????
- ????????????
- ????????????????????
4????
- ??????
- Brute Force
- Knuth-Morris-Pratt
- Boyer-Moore
- Shift-Or
- Suffix Automaton
- ??????
- Dynamic Programming
- Non-deterministic Finite Automaton
- Bit-Parallelism
- ??????????
5??
- ????
- ?????,????????????
- ????,????,?????????
6????
- Karp-Rabin????
- ????????????A????B????
- ?A?B???????hash (A)?hash (B)
- ??hash (A) ! hash (B) ?A ! B
- ??hash (A) hash (B) ???? A B
- Karp-Rabin????
- ??? x0..5 A A C T C T
Hash( x0..5 ) 17579 - ??y0..9 G C A A C T C T C A
Hash( y0..5 ) 17819 - ??y0..9 G C A A C T C T C A
Hash( y1..6 ) 17533 - ??y0..9 G C A A C T C T C A
Hash( y2..7 ) 17579
7????
- ?????
- ???????????F????Signature
- ???????????,??????????????
- ????(superimposed coding)
- ????????????????Signature
- ?????????Signature?????????
- ????(False drop)
- ??????????????,????Signature?????????
- Signature??????????????,?????
8????
Block 1 Block2 Block3 Block4
This is a text. A text has many words. Words
are made from letters.
??
000101 110101 100100 101101
????
h(text) 000101 h(many) 110000 h(words) 100100
h(made) 001100 h(letters) 100001
9????
- ??
- ??????,??????????
- ????,??,??,??????
- ?????,???????????
- ??
- ?????,??????
- ??,?False Drop????????????
- ??
- ???????????????????
10????
- ??????
- ?????????????????
- ????????????????????????????????
- ??????
- ???(Vocabulary)
- ??Heaps??,?????O (n?), ? 0.40.6
- ??????????????????(index file)
- ????(Occurrence)
- ??,O(n),???????3040
- ???????????????????(posting file)
11????
1 6 9 11 17 19 24 28
33 40 46 50
55 60 This is a text. A text has many
words. Words are made from letters.
Text
Vocabulary Occurrences
- addressing granularity
- inverted list
- word positions
- character positions
- inverted file
- document
letters 60 made 50 many 28 text 11,
19 words 30, 40
12????
- ?????
- ???????????,????????
- ?????????,??????????
Block1 Block2
Block3 Block
4 This is a text. A text has many words. Words
are made from letters.
Vocabulary Occurrences
Text
letters 4 made 4 many 2 text 1, 2
words 3
Inverted index
13????
- ???????
- ?????????????????
- ?????????????
- ????????????????
- ???????????????????
- ??????????IO??,??????????
- ??????????
- ??Hash???
- ??????????
- ??Trie?,B?????
- ???????
- ????????(delta compression)
14????
- ????????
- ????????,???????
- ?????????????
- ??????????,???????????
??????,??2??????64K?? - ??
- ????
- ????
- ??
- ??????????????
- ???????????,??????
- ???????????
15????
- ????????
- ???????????
- ???????????
- ??
- ???? ??????(??????)
- ??
- ?????????
- ?????????????????????
- ?????????Nlog N (??????)
- ????????????????????????
- ??????????????IO????,???????
- ???????
- ????logN?????????????
- ??????????????????
16????
- Lucene???????????
- ????????16,000??
- indexInterval16
- ????????????16log(1000) 26?
17????
- ???????
- ????????????????
- ???,B?,Trie ?
- ?????
- ??????,????????
- B?
- ??????,???????,?????
- Trie ?
- ????????????
- ???????????
- ??????????????
- Log (????) gt E(??) E????
18Trie?
- ???trie?
- trie????????????????
- trie??????????????????
- ?????????????????
- ?trie????????????????
- ??,??????,??????????????,??????trie??
19- ????a?b?c?aa?ab?ac?ba?ca?
- aba?abc?baa?bab?bac?cab?abba?baba?caba?abaca?caab
a
20Trie?
- ??
- ?????,???????
- Trie???????????????
- ?????????????13??
- ?????????????
- ???????????????????
- ???????????????????
- ?????,?????
- ??,????Trie????????????
- ????????????,Trie?????
- ?????,????????????
- ??
- ??????
- ?????m??,????????
- ??Trie??,?????????
- ????? ?? ?????? ????? ????
- ??20000 6 256 4 120M
- ?????
21????(Delta Compression)
- ????
- ????????????
- ?????????ID,?????????Pos
- ????
- ????ID???ID???
- ????Pos???Pos???
- ??????????ID,Pos?????
- ?????A???13,124,346???
- ?????,??346gt256,???????
- ?346-124222lt256,??????
- ????
- Lucene?????????????????
22PAT?(Patricia tree)
- ???Patricia?
- Patricia??Trie??????
- ???????????????????
- ???(Suffix tree)
- ????????????Patricia?
- ???????????????????
- ????
- ??????
- ??????
- ????
- ????(Suffix array)
- ???????????,????
231 6 9 11 17 19 24 28
33 40 46 50
55 60 This is a text. A text has many
words. Words are made from letters.
Text
Suffix Trie
60
l
50
d
m
a
28
space overhead 120240 over the text size
19
n
t
e
x
t
w
11
40
o
r
d
s
33
60
l
Suffix Tree
50
d
m
3
1
28
19
n
t
5
11
w
40
6
33
24difference between suffix array and inverted list
- suffix array the occurrences of each word are
sorted lexicographically by the text following
the word - inverted list the occurrences of each word are
sorted by text position
1 6 9 11 17 19 24 28
33 40 46 50
55 60 This is a text. A text has many
words. Words are made from letters.
Vocabulary Supra-Index
Suffix Array
Inverted list
25????
- ????
- ??????????????????
- ????
- ????????????????
- ????????,?????????
- ????
- ???????????????
- ??????????????????
- ????(??????????)
- ?????????????????????
- ???????????????
- ??????,?????????????
- ??Trie ??,???????????,??E(??)
26??!