Title: Optimization of Sequence Queries in Database Systems
1Optimization of Sequence Queries in Database
Systems
- Reza Sadri Carlo Zaniolo
- reza_at_cs.ucla.edu zaniolo_at_cs.ucla.edu
- sadri_at_procom.com
-
- Amir Zarkesh Jafar Adibi
- azarkesh_at_u4cast.com jabibi_at_u4cast.com
2Time series Analysis
- Many Applications
- Querying purchase patterns for marketing
- Stock market analysis
- Studying meteorological data
- Whats needed
- Expressive query language for finding complex
patterns in database sequences - Efficient and scalable implementation Query
Optimization
3SQL-TS
- A query language for finding complex patterns in
sequences - Minimal extension of SQLonly the from clause
affected - A new Query optimization technique based on
extensions of the Knuth, Morris Pratt (KMP)
string-search algorithm
4Text Search Optimization
- Boyer and Moore
- Precomputed shift functions for each character
and sub-pattern - Dependent to the alphabet size
- Works best for non-repeating patterns
- O(mn) worst case time
- Knuth, Morris and Pratt (KMP)
- Independent of the alphabet size
- Most efficient in general O(mn) time
- Karp and Rabin
- Prefix Hashing
- Dependent to the alphabet size
- O(mn) worst case time
5Optimized string searchKMP
Consider text array text and pattern array p i
1 2 3 4 5 6 7 8
9 10 11 texti a b a b a
b c a b c a j 1 2
3 4 5 6 patternj a b a b c
a
- After failing, use the information acquired so
to - - backtrack to shift(j), rather than i1,
and - - only check pattern values after next(j)
- But in SQL-TS we have general predicates star
patterns
6shift and next
- Success for first j-1 elements of pattern.
Failure for jth element (when input is at i) - Any shift less than shift(j) is guaranteed to
lead to failure, - Match elements in the pattern starting at next(j)
7Equality Predicates KMP suffices
- Find companies whose closing stock price in
- three consecutive days was 10, 11, and 15.
- SELECT X.name
- FROM quote CLUSTER BY name
- SEQUENCE BY date AS (X, Y, Z)
- WHERE X.price 10 AND Y.price11
- AND Z.price15
- But in SQL-TS we have general predicates
8Optimal Pattern Search (OPS)
Search path for naive algorithm vs. optimized
algorithm
9Beyond KMP General Predicates
- For IBM stock prices, find all instances where
the pattern of two successive drops followed by
two successive increases, and the drops take the
price to a value between 40 and 50, and the first
increase doesn't move the price beyond 52. - SELECT X.date AS start_date, X.price
- U.date AS end_date, U.price
- FROM quote
- CLUSTER BY name
- SEQUENCE BY date
- AS (X, Y, Z, T, U)
- WHERE X.name'IBM'
- AND Y.price lt X.price
- AND Z.price lt Y.price
- AND 40 lt Z.price lt 50
- AND Z.price lt T.price
- AND T.price lt 52
- AND T.price lt U.price
10 Beyond KMP Star Patterns
- Relaxed Double Bottom
- Only considering increases and decreases that are
more than 2
11Relaxed Double Bottom Ninety fold improvement
12Relaxed Double Bottom in June 1990
13Conclusion
- Significant speedupsfrom 6 to 900 times faster
- Queries, partial ordered domains, aggregates also
treated in this approach - Many other optimization opportunities e.g.,
parallel search for multiple patterns
14shift and next
- Success for first j-1 elements of pattern.
Failure for jth element (when input is at i) - Any shift less than shift(j) is guaranteed to
lead to failure, - Match elements in the pattern starting at next(j)
15General Predicates--Cont
- p1(t) (t.price lt t.previous.price)
- p2(t) (t.price lt t.previous.price) ?
(40ltt.pricelt50) - p3(t) (t.price gt t.previous.price) ?
(t.pricelt52) - p4(t) (t.price gt t.previous.price)
- And we need to find the implication between this
pattern elements
16Matrices q and j Input tested on pj is now
tested against pk
pj succeeded
pj failed
Combing values of these lower triangular matrices
( j ³ k), We derive the values of next(j) and
shift (j)
17Example
18STAR Patterns
- SELECT X.NEXT.date, X.NEXT.price,
- S.previous.date, S.previous.price
- FROM quote
- CLUSTER BY name,
- SEQUENCE BY date
- AS (X, Y, Z, T, U, V, S)
- WHERE
- X.name'IBM AND X.price gt X.previous.price
- AND 30 lt Y.price AND Y.price lt 40
- AND Z.price lt Z.previous.price AND T.price gt
T.previous.price - AND 35 lt U.price AND U.price lt 40
- AND V.price lt V.previous.price AND S.price lt
30
19Handling Star Patterns
- Same input, Transitions on Original Pattern vs.
Transitions on Pattern after the index set back
j-k - ?21
- ?
- ?31 ? ?32
- ? ?
- ?41 ? ?42 ? ?43
- ? ?
- Example Elements j and k are star predicates and
?jk is U - U ? ?j,k1
- ?
- ?j1,k ?j1,k1
20Possible Transitions
- Elements j and k are star predicates and ?jk is
U - U ? ?j,k1
- ?
- ?j1,k ?j1,k1
- Elements j and k are star predicates and ?jk is
1 - 1 ?j,k1
- ?
- ?j1,k ?j1,k1
- Elements j and k are not star predicates
- ?j,k ?j,k1
-
- ?j1,k ?j1,k1
21Implication Graph