Recuperaci - PowerPoint PPT Presentation

About This Presentation
Title:

Recuperaci

Description:

... Data structure for the text (suffix tree, ... Part: Suffix trees. Algorithms on strings, trees and sequences, ... and it is the suffix tree of the concatenation ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 80
Provided by: lcl53
Learn more at: https://www.cs.upc.edu
Category:

less

Transcript and Presenter's Notes

Title: Recuperaci


1
Recuperació de la informació
  • Modern Information Retrieval (1999)
  • Ricardo-Baeza Yates and Berthier Ribeiro-Neto
  • Flexible Pattern Matching in Strings (2002)
  • Gonzalo Navarro and Mathieu Raffinot
  • Algorithms on strings (2001)
  • M. Crochemore, C. Hancart and T. Lecroq
  • http//www-igm.univ-mlv.fr/lecroq/string/index.ht
    ml

2
String Matching
String matching definition of the problem
(text,pattern)
depends on what we have text or patterns
  • Exact matching
  • The patterns ---gt Data structures for the
    patterns
  • 1 pattern ---gt The algorithm depends on p and
    ?
  • k patterns ---gt The algorithm depends on k, p
    and ?
  • Extensions
  • Regular Expressions
  • The text ----gt Data structure for the text
    (suffix tree, ...)
  • Approximate matching
  • Dynamic programming
  • Sequence alignment (pairwise and multiple)
  • Sequence assembly hash algorithm
  • Probabilistic search

Hidden Markov Models
3
Index
1a. Part Suffix trees Algorithms on strings,
trees and sequences, Dan Gusfield Cambridge
University Press
2a. Part Suffix arrays Suffix-arrays a new
method for on-line string searches, G.
Myers, U. Manber
4
Suffix trees
Given string ababaas
Suffixes
3 abaas
1 ababaas
4 baas
2 babaas
What kind of queries?
5
Applications of Suffix trees
1. Exact string matching
  • Does the sequence ababaas contain any ocurrence
    of patterns abab, aab, and ab?


6
Quadratic insertion algorithm
Invariant Properties
Given the string ......
...
P1 the leaves of suffixes from ? have been
inserted
7
Quadratic insertion algorithm
Given the string ababaabbs
8
Quadratic insertion algorithm
Given the string ababaabbs
ababaabbs,1
9
Quadratic insertion algorithm
Given the string ababaabbs
ababaabbs,1
babaabbs,2
10
Quadratic insertion algorithm
Given the string ababaabbs
babaabbs,2
11
Quadratic insertion algorithm
Given the string ababaabbs
babaabbs,2
12
Quadratic insertion algorithm
Given the string ababaabbs
13
Quadratic insertion algorithm
Given the string ababaabbs
ba
baabbs,2
14
Quadratic insertion algorithm
Given the string ababaabbs
ba
baabbs,2
15
Quadratic insertion algorithm
Given the string ababaabbs
ba
baabbs,2
16
Quadratic insertion algorithm
Given the string ababaabbs
ba
ba
baabbs,2
17
Quadratic insertion algorithm
Given the string ababaabbs
ba
baabbs,2
18
Quadratic insertion algorithm
Given the string ababaabbs
ba
baabbs,2
19
Quadratic insertion algorithm
Given the string ababaabbs
20
Quadratic insertion algorithm
Given the string ababaabbs
21
Quadratic insertion algorithm
Given the string ababaabbs
22
Generalizad suffix tree
The suffix tree of many strings
is called the generalized suffix tree
and it is the suffix tree of the concatenation
of strings.
For instance,
23
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
Given the suffix tree of ababaaba
24
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
25
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
ab
a
ba,5
26
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
ab
a
ba,5
27
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
b
a
bba,3
a
baabba,1
28
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
b
a
bba,3
a
baabba,1
29
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
a
a
bba,4
baabba,2
30
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
ab
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
a
a
bba,4
baabba,2
31
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
32
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
33
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
ß,4
ß,4
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
34
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
ß,4
ß,4
aaß,1
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
35
Generalizad suffix tree
Construction of the suffix tree of
ababaabbaaabaaß
ß,4
ß,4
aaß,1
ß,4
a
b
a
ba,5
ß,2
b
a
bba,3
a
b
baabba,1
ß,3
a
a
bba,4
baabba,2
36
Generalizad suffix tree
Generalized suffix tree of ababaabbaaabaaß
37
Applications of Generalized Suffix trees
1. The substring problem for a database of
strings DB
  • Does the DB contain any ocurrence of patterns
    abab, aab, and ab?

38
Applications of Generalized Suffix trees
2. The longest common substring of two strings
nbsp
39
Definition of MUM
Matching
Unique
Maximal
40
Applications of Generalized Suffix trees
3. Finding MUMs.
41
Quadratic insertion algorithm
Invariant Properties
Given the string ......
...
P1 the leaves of suffixes from ? have been
inserted
42
Linear insertion algorithm
Invariant Properties
Given the string ......
P1 the leaves of suffixes from ? have been
inserted
P2 the string ? is the longest string that can
be spelt through the tree.
43
Linear insertion algorithm example
Given the string ababaababb...
44
Linear insertion algorithm example
Given the string ababaababb...
6 7 8
45
Linear insertion algorithm example
?
Given the string ababaababb...
6 7 8
?
46
Linear insertion algorithm example
?
Given the string ababaababb...
6 7 89
?
47
Linear insertion algorithm example
48
Linear insertion algorithm example
49
Linear insertion algorithm example
50
Linear insertion algorithm example
ababb...,5
ababb...,3
ba
ba
ababb...,4
baababb...,2
51
Linear insertion algorithm example
ababb...,5
ababb...,3
ba
ba
ababb...,4
b
aababb...,2
baababb...,2
baababb...,2
52
Linear insertion algorithm example
?
Given the string ababaababb...
7 8
?
ababb...,5
ababb...,3
ba
ba
ababb...,4
baababb...,2
53
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
ababb...,3
ba
ba
ababb...,4
54
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
ababb...,3
ba
ba
ababb...,4
55
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
ababb...,3
ba
ba
ababb...,4
56
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
a
b
ba
ababb...,4
b
aababb...,2
b...,7
57
Linear insertion algorithm example
?
Given the string ababaababb...
89
?
ababb...,5
a
b
b...,8
ba
ababb...,4
b
aababb...,2
b...,7
58
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
ba
ababb...,4
b
aababb...,2
b...,7
59
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
ba
ababb...,4
b
aababb...,2
b...,7
60
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
a
b
ababb...,4
b
aababb...,2
b...,7
61
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
a
b
ababb...,4
b...,9
b
aababb...,2
b...,7
62
Linear insertion algorithm example
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
a
b
ababb...,4
b...,9
b
aababb...,2
b...,7
63
Linear insertion algorithm example
?
Given the string ababaababb...
9
?
ababb...,5
a
b
b...,8
a
b
ababb...,4
b...,9
b
aababb...,2
b...,7
64
Linear insertion algorithm
Given the string ababaababs
65
Linear insertion algorithm
Given the string ababaababs
66
Linear insertion algorithm
Given the string ababaababs
67
Linear insertion algorithm
Given the string ababaababs
68
Linear insertion algorithm
Given the string ababaababs
69
Linear insertion algorithm
Given the string ababaababs
70
Linear insertion algorithm
Given the string ababaababs
71
Linear insertion algorithm
Given the string ababaababs
72
Linear insertion algorithm
Given the string ababaababs
73
Index
1a. Part Suffix trees Algorithms on strings,
trees and sequences, Dan Gusfield Cambridge
University Press
2a. Part Suffix arrays Suffix-arrays a new
method for on-line string searches, G.
Myers, U. Manber
74
Suffix arrays
Given string ababaa
1 ababaa
Suffixes
but lexicographically sorted
2 babaa
1
3 abaa
6 a
4 baa
5 aa
3 abaa
1 ababaa
4 baa
2 babaa
Which is the cost?
O(n log(n))
75
Applications of suffix arrays
1. Exact string matching
  • Does the sequence ababaas contain any ocurrence
    of patterns abab, aab, and ab?

Binary search
76
Search with cost O(log(n) P)
Invariant Properties
77
Search with cost O(log(n) P)
Invariant Properties
Algorithm
If ?ltquery then a ?
else ß ?
Cost
O(log(n) P)
Can it be improved to O(log(n)P) ?
78
Fast search with cost O(log(n)P)
Invariant Properties
79
Fast search with cost O(log(n)P)
Suffix array
1 2 n
Invariant Properties
Algorithm
If xlty then a ? xgty then ß ? xy
then fi
Write a Comment
User Comments (0)
About PowerShow.com