Optimization of Sequence Queries in Database Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Optimization of Sequence Queries in Database Systems

Description:

none – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 22
Provided by: rezas2
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Optimization of Sequence Queries in Database Systems


1
Optimization of Sequence Queries in Database
Systems
  • Reza Sadri Carlo Zaniolo
  • reza_at_cs.ucla.edu zaniolo_at_cs.ucla.edu
  • sadri_at_procom.com
  • Amir Zarkesh Jafar Adibi
  • azarkesh_at_u4cast.com jabibi_at_u4cast.com

2
Time series Analysis
  • Many Applications
  • Querying purchase patterns for marketing
  • Stock market analysis
  • Studying meteorological data
  • Whats needed
  • Expressive query language for finding complex
    patterns in database sequences
  • Efficient and scalable implementation Query
    Optimization

3
SQL-TS
  • A query language for finding complex patterns in
    sequences
  • Minimal extension of SQLonly the from clause
    affected
  • A new Query optimization technique based on
    extensions of the Knuth, Morris Pratt (KMP)
    string-search algorithm

4
Text Search Optimization
  • Boyer and Moore
  • Precomputed shift functions for each character
    and sub-pattern
  • Dependent to the alphabet size
  • Works best for non-repeating patterns
  • O(mn) worst case time
  • Knuth, Morris and Pratt (KMP)
  • Independent of the alphabet size
  • Most efficient in general O(mn) time
  • Karp and Rabin
  • Prefix Hashing
  • Dependent to the alphabet size
  • O(mn) worst case time

5
Optimized string searchKMP
Consider text array text and pattern array p i
1 2 3 4 5 6 7 8
9 10 11 texti a b a b a
b c a b c a j 1 2
3 4 5 6 patternj a b a b c
a
  • After failing, use the information acquired so
    to
  • - backtrack to shift(j), rather than i1,
    and
  • - only check pattern values after next(j)
  • But in SQL-TS we have general predicates star
    patterns

6
shift and next
  • Success for first j-1 elements of pattern.
    Failure for jth element (when input is at i)
  • Any shift less than shift(j) is guaranteed to
    lead to failure,
  • Match elements in the pattern starting at next(j)

7
Equality Predicates KMP suffices
  • Find companies whose closing stock price in
  • three consecutive days was 10, 11, and 15.
  • SELECT X.name
  • FROM quote CLUSTER BY name
  • SEQUENCE BY date AS (X, Y, Z)
  • WHERE X.price 10 AND Y.price11
  • AND Z.price15
  • But in SQL-TS we have general predicates

8
Optimal Pattern Search (OPS)
Search path for naive algorithm vs. optimized
algorithm
9
Beyond KMP General Predicates
  • For IBM stock prices, find all instances where
    the pattern of two successive drops followed by
    two successive increases, and the drops take the
    price to a value between 40 and 50, and the first
    increase doesn't move the price beyond 52.
  • SELECT X.date AS start_date, X.price
  • U.date AS end_date, U.price
  • FROM quote
  • CLUSTER BY name
  • SEQUENCE BY date
  • AS (X, Y, Z, T, U)
  • WHERE X.name'IBM'
  • AND Y.price lt X.price
  • AND Z.price lt Y.price
  • AND 40 lt Z.price lt 50
  • AND Z.price lt T.price
  • AND T.price lt 52
  • AND T.price lt U.price

10
Beyond KMP Star Patterns
  • Relaxed Double Bottom
  • Only considering increases and decreases that are
    more than 2

11
Relaxed Double Bottom Ninety fold improvement
12
Relaxed Double Bottom in June 1990
13
Conclusion
  • Significant speedupsfrom 6 to 900 times faster
  • Queries, partial ordered domains, aggregates also
    treated in this approach
  • Many other optimization opportunities e.g.,
    parallel search for multiple patterns

14
shift and next
  • Success for first j-1 elements of pattern.
    Failure for jth element (when input is at i)
  • Any shift less than shift(j) is guaranteed to
    lead to failure,
  • Match elements in the pattern starting at next(j)

15
General Predicates--Cont
  • p1(t) (t.price lt t.previous.price)
  • p2(t) (t.price lt t.previous.price) ?
    (40ltt.pricelt50)
  • p3(t) (t.price gt t.previous.price) ?
    (t.pricelt52)
  • p4(t) (t.price gt t.previous.price)
  • And we need to find the implication between this
    pattern elements

16
Matrices q and j Input tested on pj is now
tested against pk
pj succeeded
pj failed
Combing values of these lower triangular matrices
( j ³ k), We derive the values of next(j) and
shift (j)
17
Example
18
STAR Patterns
  • SELECT X.NEXT.date, X.NEXT.price,
  • S.previous.date, S.previous.price
  • FROM quote
  • CLUSTER BY name,
  • SEQUENCE BY date
  • AS (X, Y, Z, T, U, V, S)
  • WHERE
  • X.name'IBM AND X.price gt X.previous.price
  • AND 30 lt Y.price AND Y.price lt 40
  • AND Z.price lt Z.previous.price AND T.price gt
    T.previous.price
  • AND 35 lt U.price AND U.price lt 40
  • AND V.price lt V.previous.price AND S.price lt
    30

19
Handling Star Patterns
  • Same input, Transitions on Original Pattern vs.
    Transitions on Pattern after the index set back
    j-k
  • ?21
  • ?
  • ?31 ? ?32
  • ? ?
  • ?41 ? ?42 ? ?43
  • ? ?
  • Example Elements j and k are star predicates and
    ?jk is U
  • U ? ?j,k1
  • ?
  • ?j1,k ?j1,k1

20
Possible Transitions
  • Elements j and k are star predicates and ?jk is
    U
  • U ? ?j,k1
  • ?
  • ?j1,k ?j1,k1
  • Elements j and k are star predicates and ?jk is
    1
  • 1 ?j,k1
  • ?
  • ?j1,k ?j1,k1
  • Elements j and k are not star predicates
  • ?j,k ?j,k1
  • ?j1,k ?j1,k1

21
Implication Graph
Write a Comment
User Comments (0)
About PowerShow.com