Title: SCALED Pattern Matching
1SCALEDPattern Matching
- Amihood Amir Ayelet Butman
- Bar-Ilan University Moshe
Lewenstein - and
- Johns Hopkins University
Bar-Ilan University
2Motivation
- Searching for Templates in
- Aerial Photographs
- Input Aerial photo
- Template
- Task Search for all locations where the template
appears in the image.
3(No Transcript)
4Model
- Low level (pixel level)
- avoid costly processing
- Asymptotically efficient solutions.
- Serial, exact algorithms.
5Types of Approximations
- Local errors Level of detail
- Occlusion
- Noise
- results O(n² log m) mismatches
- O(n²k²( edit distance, k
errors, -
rectangular patterns. - O(n²kv(m log m) v(k log k)
- edit distance, k
errors, -
half rectangular patterns
AL-88
AF-95
6Types of Approximation
- Orientation.
- results O(n²m ) FU-98
- O(n²m³) ACL-98
- Scaling Natural scales
- results O(n) 1-d EV-88
- O(n² log S) 2-d ALV-92
- O(n²) dictionary
AC-96 - Real scales
- this result O(n) 1-d,
truncation
5
7It seems daunting, but
8CPM 2003 Morelia, Mexico
9Problem inherently inexact
- What if occurrence is 1½ times bigger?
- What is the meaning of ½ a pixel?
- Solutions until now Natural Scales -
- Consider only discrete scales
- 1, 2, 3, 4, 5, . . .
10Definition
- Text
Pattern -
-
-
- Find all occurrences of the pattern in the text
in all discrete sizes.
n
m
m
n
11Discrete exact Scaled Matching
- T
P - A A A A A A A A A A A A A
A A A - A A A A A A A A A A C C A
A C A - A A A C C A A A A A C C A
A A A - A A A C C A A A A A A A A
- A A A A A A A A A A A A A
- A A A A A A A A A C C A A
- A A A A A A A A A C C A A
- A A A C C C A A A A A A A
- A A A C C C A A A A A A A
- A A A C C C A A A A C A A
- A A A A A A A A A A A A A
- A A A A A A A A A C C A C
- A A A A A A A A A A A A A
12Discrete exact Scaled Matching
P³
Z Z Z U U U Y Y Y Z Z
Z U U U Y Y Y Z Z Z U U U Y Y Y K
K K V V V S S S K K K V V V S S S
K K K V V V S S S X X X E E E T T T
X X X E E E T T T X X X E E E T T T
P Z U Y K V S
X E T
13Idea Fix a scale s
s
- Constant amount of work for each square (s-block)
s
n/s
n
14Algorithm time
- Time for scale s
- Total time
-
converges to a constant - Making the total time O(n²)
15Problem Real scales
- Was open even for strings
- How do we define?
- aabcccbb
- Scaled to 2 aaaabbccccccbbbb
- Scaled to 1½ aaab cccc bbb
-
-
- truncate
truncate - ½b
½c -
16Formally
r times
r
Denote a aaa . . .
a Problem Definition 1 Input Pattern
Text Output All text locations where
appears for some
17Remark
- a 1 means we only scale up
- Reasons Avoid conceptual problem of loss of
resolution. - From far enough away everything looks the same.
- By our definition, for klt1/m there is a match at
every text location.
18Simplify definition
Definition 2 Look for in
the text. Example Paabcccbbbb Match
by definition 2 daaabccccbbbbbbe Match by
definition 1 but not by def 2
daaaabccccbbbbbbbe
19Why are definitions equivalent?
- Split text and pattern to
- symbol part Ts , Ps and
- length part TL , PL.
- Example P aabcccbbbb
- Psabcb
- PL2134
- Tdaaabccccbbbbbbe
- Tsdabcbe
- TL131461
20Time
-
- Time for split O(nm)
- Finding Ps in Ts O(nm) (e.g. KMP)
- HARD PART Finding PL in TL.
21Definitions are Equivalent
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n)). Why? - Find
in time O(f(n)) - For each
match verify 1st and last symbol in
constant time in Ts and TL. Total time
O(f(n)n)O(f(n)).
22Naïve algorithm for matching PL in TL
For each text location, position pattern starting
at that location and calculate interval t/p,
(t1)/p) for each resulting lttext, patterngt
pair. This is the interval of possible scales
since t/p?p t for every a lt t/p, ap lt
t (t1)/p ?p t1 for every a t/p,
ap gt t
23Check intersection
- If intersection of all intervals is not empty
then there is a match. - Time O(nm)
- Example
- PL 2 1 2 3 2
- TL 2 4 2 4 7 4 5 3
- 1,3/2) 4,5)
- The intersection is empty thus no scaled match in
location 1. But
24Check intersection
- If intersection of all intervals is not empty
then there is a match. - Time O(nm)
- Example
- PL 2 1 2 3 2
- TL 2 4 2 4 7 4 5 3
- 2,5/2) 2,3) 2,5/2)7/3,8/3)2,5
/2) - The intersection is 7/3,5/2) thus there is a
scaled match in location 2.
25Improvement Parameterized Matching
Introduced Baker 1994. Motivation
copying code.
26Parameterized Matching
- Input two strings s and t st, over
alphabets ?s and ?t. - s parameterize matches t if bijection
?s ?t , such that (s) t.
Example
a
a
b
b
b
(a)x
x
x
y
y
y
(b)y
27Parameterized Matching
- Claim (AFM-94)
- For S that can be sorted in linear time (e.g.
S1, . . . , n) - Parameterized matching can be done in time O(n).
28The reduction
Lemma for which PL
matches TL at location i scaled to a only if PL
p-matches TL at i. Proof Assume PL does not
p-match TL at location i. The possible
situations are
29Possibility 1
w.l.o.g. c a1
TL
a
c?a
PL
b
b
For c a1 (smallest possible)
30Possibility 2
TL
a
a
w.l.o.g. c b1
PL
b
c?b
Intersection not empty only if
(a1)/(b1) gt a/b i.e.
abb gt aba
bgta But this can never happen if a
1.
31Algorithm for Real Scaled String Matching
- Let Pi1, Pi2, . . ., Pij be the different
numbers in PL. - P-match PL in TL.
- For each match, chack intersection of intervals
between Pi1, . . . , Pij and corresponding
symbols in TL. - End Algorithm
32Example
PL 2 3 2 3 2
Pi12 Pi23 p-matches TL 5 6 5 6 5 6
10 6 10 6 10 7
scaled match
33Important Fact
- So there are at most O(vm) different Piks.
- Time O(n) for parameterized matching
-
(S1,2,,n). - O(vm) verification for each
- location.
- Total O(nvm).
34Tighter analysis
- Upper bound number of possible p-matches.
- Lemma Let Pm, Tn, Pi1, Pi2, . . ., Pij
be the different numbers in PL. - Then there are at most n/2j p-matches of PL in
TL. - Meaning Since verification time is O(j) per
p-match, the lemma implies that total
verification time is - O((n/2j) j) O(n)
35Proof of Lemma
- 1st appearance of Pi1, . . . , Pij
- PL Pi1
Pi2 Pij - TL a1 a2
aj - m-match
36Lemmas proof (cont.)
- Let x be the total number of p-matches in the
text. - The sum of all text elements that match 1st
occurrences of Piks in the pattern - (xj²)/2
- But There are overlaps!
- How many?
37Lemmas proof (cont.)
- For each text location, at most j matches will
count it. Therefore - Total count without overlaps
- Clearly xj/2 n thus
- x (2n)/j
38Open Problem
- Give 1-d algorithm linear in run-length
compressed text and pattern.