Search Trees - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Search Trees

Description:

Leaf nodes in an R-tree contain index record entries of the form (I, tuple-identifier) where ... ptr: Identifier of some tuple of the DB. Non-leaf nodes: (p,ptr) ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 35
Provided by: sus69
Category:

less

Transcript and Presenter's Notes

Title: Search Trees


1
Search Trees
2
R-Trees
  • Introduction
  • In order to handle spatial data efficiently, as
    required in CAD and Geo-data applications, a
    database system needs an index mechanism that
    will help it retrieve data items quickly
    according to their spatial locations

3
R-Trees
  • R-Tree Index Structure
  • An R-tree is a height-balanced tree similar to
    a B-tree with index records in its leaf nodes
    containing pointers to data objects
  • Leaf nodes in an R-tree contain index record
    entries of the form (I, tuple-identifier) where
    tuple-identifier refers to a tuple in the
    database and I is an n-dimensional rectangle
  • Non-leaf nodes contain entries of the form (I,
    child-pointer) where child-pointer is the address
    of a lower node in the R-tree and I covers all
    rectangles in the lower nodes entries

4
R-Trees
  • Properties of R-Tree
  • Every leaf node contains between m(ltM/2) and M
    index records unless it is the root
  • For each index record (I, tuple-identifier) in
    a leaf node, I is the smallest rectangle that
    spatially contains the n-dimensional data object
    represented by the indicated tuple
  • Every non-leaf node has between m and M
    children unless it is the root
  • For each entry (I, child-pointer) in a non-leaf
    node, I is the smallest rectangle that spatially
    contains the rectangles in the child node
  • The root node has at least two children unless
    it is a leaf and all leaves appear on the same
    level

5
R-Tree structure
6
R-Tree structure
7
R-Trees . contd
  • Algorithm Search
  • Given an R-tree whose root node is T, find all
    index records whose rectangles overlap a search
    rectangle S
  • S1 Search subtrees If T is not a leaf, check
    each entry E to determine whether EI overlaps S.
    For all overlapping entries, invoke Search on
    the tree whose root node is pointed to by Ep
  • S2 Search leaf node If T is a leaf, check all
    entries E to determine whether EI overlaps S. If
    so, E is a qualifying record

8
R-Trees . contd
  • Algorithm Insert
  • L1 Find position for new record Invoke
    ChooseLeaf to select a leaf node L in which to
    place E
  • L2 Add record to leaf node If L has room for
    another entry, install E. Otherwise invoke
    SplitNode to obtain L and LL containing E and
    all the old entries of L
  • L3 Propagate changes upward Invoke AdjustTree
    on L, also passing LL if a split was performed
  • L4 Grow tree taller If node split propagation
    caused the root to split, create a new root
    whose children are the two resulting nodes

9
R-Trees . contd
  • Algorithm ChooseLeaf
  • CL1 Initialize Set N to be the root node
  • CL2 Leaf check If N is a leaf, return N
  • CL3 Choose subtree If N is not a leaf, let F
    be the entry in N whose rectangle FI needs least
    enlargement to include EI. Resolve ties by
    choosing the entry with the rectangle of
    smallest area.
  • CL4 Descend until a leaf is reached Set N to
    be the child node pointed to by Fp and repeat
    from CL2

10
R-Trees . contd
  • Algorithm AdjustTree
  • AT1 Initialize Set NL. If L was spilt
    previously, set NN to be the resulting second
    node
  • AT2 Check if done If N is the root, stop
  • AT3 Adjust covering rectangle in parent entry
    Let P be the parent node of N, and let En be Ns
    entry in P. Adjust EnI so that it tightly
    encloses all entry rectangles in N
  • AT4 Propagate node split upward If N has a
    partner NN resulting from an earlier split,
    create a new entry Enn with EnnP pointing to NN
    and EnnI enclosing all rectangles in NN. Add Enn
    to P if there is room. Otherwise, invoke
    SpiltNode to produce P and PP containing Enn and
    all Ps old entries contd

11
R-Trees . contd
  • Algorithm AdjustTree contd
  • AT5 Move up to next level Set NP and set
    NNPP if a split occurred. Repeat from AT2

12
R-Trees . contd
  • Algorithm Deletion
  • D1 Find node containing record Invoke FindLeaf
    to locate the leaf node L containing E. Stop if
    the record was not found.
  • D2 Delete record Remove E from L
  • D3 Propagate changes Invoke CondenseTree,
    passing L
  • D4 Shorten tree If the root node has only one
    child after the tree has been adjusted, make the
    child the new root

13
R-Trees . contd
  • Algorithm FindLeaf
  • FL1 Search subtrees If T is not a leaf, check
    each entry F in T to determine if FI overlaps
    EI. For each such entry invoke FindLeaf on the
    tree whose root is pointed to by Fp until E is
    found or all entries have been checked
  • D2 Search leaf node for record If T is a leaf,
    check each entry to see if it matches E. If E is
    found return T

14
R-Trees . contd
  • Algorithm CondenseTree
  • CT1 Initialize Set NL. Set Q, the set of
    eliminated nodes, to be empty
  • CT2 Find parent entry If N is the root, go to
    CT6. Otherwise let P be the parent of N, and let
    En be Ns entry in P
  • CT3 Eliminate under-full node If N has fewer
    than m entries, delete En from P and add N to
    set Q
  • CT4 Adjust covering rectangle If N has not
    been eliminated, adjust EnI to tightly contain
    all entries in N
  • contd

15
R-Trees . contd
  • Algorithm CondenseTree contd
  • CT5 Move up one level in the tree Set NP and
    repeat from CT2
  • CT6 Re-insert orphaned entries Re-insert all
    entries of nodes in set Q. Entries from
    eliminated leaf nodes are re-inserted in tree
    leaves as described in Algorithm Insert, but
    entries from higher-level nodes must be placed
    higher in the tree, so that leaves of their
    dependent subtrees will be on the same level as
    leaves of the main tree

16
R-Trees . contd
  • Algorithm Quadratic Split
  • QS1 Pick first entry for each group Apply
    PickSeeds to choose two entries to be the first
    elements of the groups. Assign each to a group.
  • QS2 Check if done If all entries have been
    assigned, stop. If one group has so few entries
    that all the rest must be assigned to it in
    order for it to have the minimum number m,
    assign them and stop.
  • QS3 Select entry to assign Invoke PickNext to
    choose the next entry to assign. Add it to the
    group whose covering rectangle will have to be
    enlarged least to accommodate it. Resolve ties
    by adding the entry to the group with smaller
    area, then to the one with fewer entries, then
    to either. Repeat from QS2

17
R-Trees . contd
  • Algorithm PickSeeds
  • PS1 Calculate inefficiency of grouping entries
    together For each pair of entries E1 and E2,
    compose a rectangle J including E1I and E2I.
    Calculate darea(J)-area(E1I)-area(E2I).
  • PS2 Choose the most wasteful pair Choose the
    pair with the largest d

18
R-Trees . contd
  • Algorithm PickNext
  • PN1 Determine cost of putting each entry in
    each group For each entry E not yet in a group,
    calculate d1the area increase required in the
    covering rectangle of Group 1 to include E1.
    Calculate d2 similarly for Group 2.
  • PN2 Find entry with greatest preference for one
    group Choose any entry with the maximum
    difference between d1 and d2

19
Generalized Search Tree (GiST)
  • Why GiST
  • Extensible both in data types supported and in
    the queries applied on this data
  • Allows new data types to be indexed in a
    manner that supports the queries natural to the
    data type
  • Unifies previously disparate structures for
    currently common data types
  • Example
  • B and R trees can be implemented as
    extensions to GiST. Single code base for indexing
    multiple dissimilar applications

20
GiST . contd
  • Definition
  • A GiST is a balanced multi-way tree of variable
    fan-out between kM and M Where k is the fill
    factor
  • With the exception of the root node that can
    have fan-out from 2 to M
  • Leaf nodes (p,ptr)
  • ptr Identifier of some tuple of the DB
  • Non-leaf nodes (p,ptr)
  • ptr Pointer to another tree node
  • and p Predicate used as a search key

21
GiST . contd
  • Properties
  • Every node contains between kM and M index
    entries unless it is the root.
  • For each index entry (p,ptr) in a leaf node, p
    holds for the tuple
  • For each index entry (p,ptr) in a non-leaf
    node, p is true when instantiated with the values
    of any tuple reachable from ptr
  • The root has at least two children unless it is
    a leaf
  • All leaves appear on the same level

22
GiST . contd
  • GiST Methods
  • Key Methods
  • The methods the user can specify to configure
    the GiST. The methods encapsulate the structure
    and behavior of the object class used for keys in
    the tree
  • Tree Methods
  • Provided by the GiST, and may invoke the
    required key methods

23
GiST . contd
  • GiST Key Methods contd
  • E is an entry of the form (p,ptr) , q is a
    query, P a set of entries
  • Consistent(E,q) returns false if pq guaranteed
    unsatisfiable, true otherwise.
  • Union(P) returns predicate r that holds for all
    predicates in P
  • Compress(E) returns (p,ptr).
  • Decompress(E) returns (r,ptr) where p?r. This
    is a lossy compression as we do not require p?r

24
GiST . contd
  • GiST Key Methods contd
  • Penalty(E1,E2) returns domain specific penalty
    for inserting E2 into the subtree rooted at E1.
    Typically the penalty metric is representation of
    the increase of size from E1.p to Union(E1,E2).
  • PickSplit(P) M1 entries, splits P into two
    sets of entries P1,P2, each of the size kM. The
    choice of the minimum fill factor is controlled
    here

25
GiST . contd
  • GiST Tree Methods
  • Search Controlled by the Consistent Method.
  • Insert Controlled by the Penalty and PickSplit.
  • Delete Controlled by the Consistent

26
Example
New (q,ptr)
Penalty m
Penalty n
m lt n
Penalty i
Penalty j
j lt i
Full.. Then split according to PickSplit
27
Applications
  • GiST Over Z (B Trees)
  • GiST Over Polygons in R2 (R Trees)

28
B Trees Using GiST
  • p here is on the form Contains(xp,yp),v)
  • Consistent(E,q) returns true if
  • If q Contains(xq,yq),v) (xpltyq)(ypgtxq)
  • If q Equal (xq,v) xp? xq ltyp
  • Union(P) returns Min(x1,x2,,xn),MAX(y1,y2,.
    ,yn)).

29
B Trees Using GiST contd
  • Penalty(E,F)
  • If E is the leftmost pointer on its node,
    returns MAX(y2-y1,0)
  • If E is the rightmost pointer on its node,
    returns MAX(x1-x2,0)
  • Otherwise, returns MAX(y2-y1,0)MAX(x1-x2,0)
  • PickSplit(P) let the first entries in order to go
    to the left node and the remaining in the right
    node.

30
B Trees Using GiST contd
  • Compress(E) if E is the leftmost key on a
    non-leaf node return 0 bytes otherwise, returns
    E.p.x
  • Decompress(E)
  • If E is the leftmost key on a non-leaf node let
    x -? otherwise let xE.p.x
  • If E is the rightmost key on a non-leaf node let
    y ?. If E is other entry in a non-leaf node, let
    y the value stored in the next key. Otherwise,
    let y x1

31
R-Trees Using GiST
  • The key here is in the form (xul,yul,xlr,ylr)
  • Query predicates are
  • Contains ((xul1,yul1,xlr1,ylr1),
    (xul2,yul2,xlr2,ylr2))
  • Returns true if (xul1? xul2) ( yul1? yul2) (
    xlr1? xlr2) ( ylr1? ylr2)
  • Overlaps ((xul1,yul1,xlr1,ylr1),
    (xul2,yul2,xlr2,ylr2))
  • Returns true if (xul1? xlr2) ( yul1? ylr2) (
    xul2? xlr1) ( ylr1? yul2)
  • Equal ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2
    ))
  • Returns true if (xul1 xul2) ( yul1 yul2) (
    xlr1 xlr2) ( ylr1 ylr2)

32
R-Trees Using GiST contd
  • Consistent(E,q)
  • p contains (xul1,yul1,xlr1,ylr1), and q is either
    Contains, Overlap or Equal (xul2,yul2,xlr2,ylr2)
  • Returns true if Overlaps ((xul1,yul1,xlr1,ylr1),
    (xul2,yul2,xlr2,ylr2))
  • Union(P) returns coordinates of the maximum
    bounding rectangles of all rectangles in P.

33
R-Trees Using GiST contd
  • Penalty(E,F)
  • Compute q Union(E,F) and return area(q)
    area(E.p)
  • PickSplit(P)
  • Variety of algorithms are provided to best split
    the entries in a over-full node

34
R-Trees Using GiST contd
  • Compress(E)
  • Form the bounding rectangle of E.p
  • Decompress(E)
  • The identity function
Write a Comment
User Comments (0)
About PowerShow.com