Joining Interval Data in Relational Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Joining Interval Data in Relational Databases

Description:

... relations on join attribute values and compare ... Use indices defined on join attributes for efficient retrieval ... Each node represents an instant of time ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 66
Provided by: vai77
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Joining Interval Data in Relational Databases


1
Joining Interval Data in Relational Databases
  • By
  • Jost Enderle
  • Matthias Hampel
  • Thomas Seidl

2
Outline
  • Problem at hand
  • Current solutions
  • Important concepts
  • Solution to problem at hand
  • Some numbers
  • Final words

3
Problem at hand
  • Joining interval data in relational databases
  • Interval is duration of period of time
  • It is a scalar
  • E.g. 24 hours
  • Chronological period is a pair of date times
  • It is a compound object
  • E.g. 2nd January 2004 10th November 2004
  • Has definition for Overlaps operator
  • Above period Overlaps with 2nd November 2004
    3rd December 2004

4
Problem at hand
  • Overlaps operator formulation
  • Where R.period overlaps S.period
  • Where R.period.lower lt S.period.upper
  • and R.period.upper gt S.period.lower
  • P1 2nd January 2004 10th November 2004
  • P2 2nd November 2004 3rd December 2004
  • 2nd January 2004 lt 3rd December 2004
  • And 10th November 2004 gt 2nd November 2004
  • R and S can have 1000s of rows and the overlaps
    operator needs to be executed 1,000,000 of times

5
Problem at hand
  • Relational databases support equi-joins
    efficiently not Multiple Interdependent Join
    predicates
  • Sort order needed on interval values for sort
    merge joins
  • Here there are 2 such values
  • Index can help but it is still costly
  • It can provide advantage if Index is designed for
    interval data

6
Current solutions
  • Nested loop algorithms
  • Quadratic cost
  • E.g. TJ-2
  • Assumes inner join relation to be sorted
  • Other approaches
  • Assume inner join relation can be stored in memory

7
Current solutions
  • Sort-Merge algorithms
  • Need to find sort order on period data
  • E.g. TJ-1, multi-predicate merge join
  • Relations need to be sorted on lower and upper
    points of interval
  • E.g. TJ-3
  • For append only databases
  • E.g. Algorithm by Pfoser et al
  • For append only databases
  • E.g. Algorithm by Zhang et al, Multi predicate
    merge join
  • Does not support an overlaps join predicate

8
Current solutions
  • Partition Based join
  • Partition input relations on join attribute
    values and compare tuples in relevant partitions
  • Something like using MBRs and R trees
  • E.g. Algorithms by Snodgrass and Jensen
  • Perform poorly for lots of tuples having long
    intervals
  • E.g. Algorithms by Sitzmann and Stuckey
  • Optimize performance of above algorithm using
    histograms
  • Difficult to integrate with existing databases
  • E.g. Algorithm by lu, Ooi and Tan
  • Convert temporal data to spatial data and perform
    spatial join
  • Not efficient and difficult to integrate to
    existing databases

9
Current solutions
  • Index Based joins
  • Use indices defined on join attributes for
    efficient retrieval
  • Need to check if the index can be integrated in
    the existing databases
  • E.g. Algorithm by Zhang, Tsotras and Seeger
  • Efficient implementation by MVBT (Multi version
    B tree) index
  • Difficult to integrate with existing databases
  • Complex algorithms!!

10
Important concepts
  • Relational Interval Tree
  • Binary tree of height h covers range 1, 2h -1
    of interval bounds.
  • Tree is virtual, traversal of tree is purely
    arithmetic
  • Each node represents an instant of time
  • Each non-leaf node represents an interval for the
    entire sub-tree rooted at it

11
Important concepts
  • Relational Interval (RI) tree

12
Important concepts
  • Height h 5 , range 1, 31

13
Important concepts
  • Node 4 represents the instant 4, 21 represents
    instant 21

14
Important concepts
  • Node 2 represents interval 1, 3

15
Important concepts
  • Node 12 represents interval 9,15,10,14,11,14
    etc

16
Important concepts
  • Node 16 represents interval 1, 31, etc

17
Important concepts
  • Mapping of an interval to a node. E.g. 2, 13
  • It is the root of the smallest sub-tree including
    the nodes 2 and 13

18
Important concepts
  • It is the root of the smallest sub-tree including
    the nodes 2 and 13 i.e. Node 8

19
Important concepts
  • Database indices of RI tree
  • Two Indices one on the upper and the other on the
    lower interval values
  • Values in the Lower index stores has the
    following structure
  • Structure is indexed by the lower interval value
  • Node is the interval Node computed as previously
    shown 8 for 2, 13
  • Lower is the lower bound of the interval i.e. 2
  • ID is the primary key of the record in the table

20
Important concepts
  • Database indices of RI tree
  • Similarly for Upper index
  • Node is the interval Node computed as previously
    shown 8 for 2, 13
  • Upper is the upper bound of the interval i.e. 13
  • ID is the primary key of the record in the table

21
Important concepts
  • Using Relational Interval (RI) tree
  • Record with ID Mary has interval 2, 13
  • Record with ID John has interval 4, 23
  • Record with ID Bob has interval 10, 21
  • Record with ID Ann has interval 21, 30

22
Important Concepts
  • Left queries, Right queries and Inner Queries
  • For any Range lower, upper , for all the nodes
    on the path from root to lower and upper
  • Nodes registered to the left of lower may
    intersect the range - left queries
  • Nodes registered to the right of upper may
    intersect the range right queries
  • Nodes registered in lower, upper DEFINITELY
    intersect the range inner queries

23
Important Concepts
  • For any Range 11, 13 , for all the nodes on
    the path from root to lower and upper i.e. 16 to
    11 and 16 to 13

24
Important Concepts
  • Nodes registered to the left of lower may
    intersect the range - left queries here 8, 10
  • Note here that range 1, 9 and 1, 13 are
    both registered at 8, however only 1, 13
    intersects with 11, 13

25
Important Concepts
  • Nodes registered to the right of upper may
    intersect the range right queries here 14, 16

26
Important Concepts
  • Nodes registered in lower, upper DEFINITELY
    intersect the range inner queries here 11, 13

27
Solution to problem at hand
  • Determine the IDs of all the tuples that
    intersect with a given range lower, upper
  • Determine left queries
  • Get the IDs of the tuples from left queries that
    intersect with range lower, upper
  • Determine right queries
  • Get the IDs of the tuples from right queries that
    intersect with range lower, upper
  • Determine inner queries
  • Get IDs of all the tuples from inner queries
  • Return union of all the above

28
Solution to problem at hand
  • Determine the IDs of all the tuples that
    intersect with a given range lower, upper
  • Determine left queries
  • Get the IDs of the tuples from left queries that
    intersect with range lower, upper
  • Determine right queries
  • Get the IDs of the tuples from right queries that
    intersect with range lower, upper
  • Determine inner queries
  • Get IDs of all the tuples from inner queries
  • Return union of all the above

29
Solution to problem at hand
  • The final intersection query

30
Solution to problem at hand
  • Note here that range 1, 9 and 1, 13 are
    both registered at 8, however only 1, 13
    intersects with 11, 13

31
Solution to problem at hand
  • Both 1, 9 and 1, 13 are picked on condition 1
  • But only 1, 13 passes condition 2
  • Since upper of 1, 13 gt lower of query 11, 13
  • i.e. 13 gt 11

32
Solution to problem at hand
  • Condition 1 does filtering and condition 2 does
    range comparison
  • Less tuples have to be compared for range
    intersection here due to filtering lowering time
    for query evaluation

33
Solution to problem at hand
  • Index based loop join
  • Use the OVERLAPS algorithm defined previously to
    basic loop join algorithm
  • For each tuple in the outer relation, project the
    all tuples that OVERLAP with the inner relation
  • Approach is naïve, does not take partitioning
    into consideration

34
Solution to problem at hand
  • Partition based join
  • In an RI tree each interval in a tuple is
    assigned to exactly one node of the tree
  • Some observations can be made regarding which
    nodes of the second relation join with a given
    node of the first relation

35
Solution to problem at hand
  • For all tuples of relation R registered at node
    nR (node value n at relation R) , the tuples
    registered at the following nodes in S may join
    with them
  • Nodes on the path from the root of the tree to
    the node nS (called UP nodes)
  • Node nS ( definite overlap)
  • All nodes in the sub tree of nS (called down
    nodes)

36
Solution to problem at hand
  • Here nR ( nS) 12
  • The shaded region in S represents node that may
    join with nR

37
Solution to problem at hand
  • UP nodes 8, 16
  • DOWN nodes 9, 10, 11, 13, 14, 15

38
Solution to problem at hand
  • Depending on the manner in which the two
    relations are partitioned, we have three
    approaches
  • RI-Tree Up-Down join
  • RI-Tree Down-Down join
  • RI-Tree Up-Up join
  • We will discuss the Up-Down approach in greater
    detail

39
Solution to problem at hand
  • Steps for RI-Tree Up-Down join approach
  • Determine all nodes in relation R
  • Determine the max-range for all the nodes in
    relation R
  • Alternatively you can also find the exact-range
    for the nodes
  • Determine all join partners in UP of nS
  • Determine all join partners in DOWN of nS
  • Generate the query performing the join

40
Solution to problem at hand
  • Determine all nodes in relation R
  • Note that the Index structure is actually
    materialized in the database so you get all the
    nodes that are actually present in the database

41
Solution to problem at hand
  • Determine the max-range for all the nodes in
    relation R
  • max-range is fn_lower, fn_upper
  • fn_lower nR step 1
  • fn_upper nR step 1
  • step abs( nR parent( nR ))
  • For nR 12
  • step abs( 12 8 ) 4
  • fn_lower 12 4 1 9
  • fn_upper 12 4 1 15
  • max-range 9, 15

42
Solution to problem at hand
  • Alternatively you can also find the exact-range
    for the nodes
  • This will be a closer bound for range covered by
    node R since it is dynamically determined based
    on values in the database
  • Even if node 12 in theory may cover the range 9,
    15 the table R may only have tuples in range
    11, 15 registered at node 12.
  • Here you will get 11, 15, in the previous case
    you will get 9, 15

43
Solution to problem at hand
  • Determine all join partners in UP of nS
  • For all nodes in UP( nS )
  • If the node is less than ns add it to left
    queries
  • If the node is greater than ns add it to right
    queries

44
Solution to problem at hand
  • Determine all join partners in DOWN of nS
  • Nodes on the path of nS to fn_lower are stored as
    left queries
  • Nodes on the path of nS to fn_upper are stored as
    right queries
  • Range between fn_lower, ns -1 is stored as
    left query
  • Range between ns 1, fn_upper is stored as
    right query

45
Solution to problem at hand
  • Left query schema
  • (forknode, from, to) from and to define the
    range
  • Similarly, Right query schema
  • (forknode, from, to) from and to define the
    range

46
Solution to problem at hand
  • (12, 8, 8)
  • (12, 10, 10)
  • (12, 9, 11)
  • The following left queries are generated for nS
    12

47
Solution to problem at hand
  • Generate the query performing the join
  • Project ID pairs (R.ID, S.ID) for all the tuples
    where nR nS
  • Project ID pairs for all left queries of nS where
    upper portion of interval S.period intersects
    with lower portion of interval R.period
  • Project ID pairs for all right queries of nS
    where lower portion of interval S.period
    intersects with upper portion of interval R.period

48
Solution to problem at hand
  • Tuples registered at the same node in R and S
    definitely overlap and so project them
    unconditionally
  • Intersection guaranteed because range of tuples
    registered at node n intersects with instant n
  • I.E. instant n in period fn_lower, fn_upper

49
Solution to problem at hand
  • Project ID pairs for all left queries of nS where
    upper portion of interval S.period intersects
    with lower portion of interval R.period
  • Intervals registered at Nodes corresponding to
    Left queries of S does not include instant nR.
    However, the interval may match the interval
    registered at nR.
  • Similarly for right queries for nS

50
Solution to problem at hand
  • The finally generated query

51
Solution to problem at hand
  • The Down-Down strategy, briefly

52
Solution to problem at hand
  • The Up-Up strategy, briefly

53
Some numbers
  • Experimental setup
  • Integration with Oracle Server Release 9.2.0
  • Pentium 4 with 2.4 GHZ
  • 512 MB of RAM
  • EIDE hard drive
  • 200 database blocks cache
  • Block size 2kB

54
Some numbers
55
Some numbers
56
Some numbers
57
Some numbers
58
Some numbers
59
Some numbers
60
Some numbers
61
Final words
  • Number indicate
  • Approach can be used on an existing database
  • RI-tree UP-DOWN method outperforms standard
    B-tree methods
  • RI-tree UP-DOWN method scales well and works
    equally well on real data

62
Final words
  • Future work
  • Cost models for prediction of performance based
    on data characteristics

63
References
64
References
  • Bercken J. v. d., Seeger B. Query Processing
    Techniques for Multiversion Access Methods. VLDB
    1996, 168-179.
  • Sitzmann I., Stuckey P.J. Improving Temporal
    Joins Using Histograms. DEXA 2000, 488-498.

65
Questions
  • ???
  • Is a RI tree dynamically extensible
  • YES. Since it grows at the roots
  • How much memory will a RI tree with 123562 nodes
    require
  • 0 Bytes. Since, it is a virtual tree.
  • Why are all you guys still here!!!
Write a Comment
User Comments (0)
About PowerShow.com