ICS 222: Principles of Data Management - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

ICS 222: Principles of Data Management

Description:

... fanout by compressing key representation. These compression ... Consider a page containing keys: Manino, Manna, Mannari, Mannarino, Mannella, Mannelli ' ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 28
Provided by: che7
Category:

less

Transcript and Presenter's Notes

Title: ICS 222: Principles of Data Management


1
ICS 222 Principles of Data Management
  • B-Trees

2
Overview
  • B-trees
  • Give up on sequentiality of index
  • Try to get balance the structure
  • B means balanced.
  • Can have as many levels of index as appropriate
  • Make sure each block is between half used and
    completely full
  • One particular variant B tree
  • We first discuss on B trees
  • Then we will talk about B-trees

3
B tree Example
  • Root

Fanout n3
100
120 150 180
30
3 5 11
120 130
180 200
100 101 110
150 156 179
30 35
4
Sample non-leaf (interior) node
57 81 95
  • to keys to keys to keys to keys
  • lt 57 57lt klt81 81ltklt95 gt95

5
Sample leaf node
  • From non-leaf node
  • to next leaf
  • in the sequence

57 81 95
To record with key 57 To record with key
81 To record with key 85
6
In textbooks notation
n3
  • NonLeaf

30
30
Leaf
30 35
30
35
7
Control shape of Btree
  • All leaves at same lowest level (balanced tree)
  • Pointers in leaves point to records, except for
    the sequence pointer
  • Size of nodes (fixed)
  • n1 pointers
  • n keys
  • Dont want a node to be too empty. Thus use at
    least
  • Non-leaf ?(n1)/2? pointers
  • Leaf ?(n1)/2? pointers to tuples

8
Example
  • Full node min. node
  • Non-leaf
  • Leaf

n3
120 150 180
30
3 5 11
30 35
9
Number of pointers/keys
Max Max Min Min ptrs keys
ptrs keys
Root
n1
n
2
1
Non-leaf (non-root)
n1
n
?(n1)/2?
?(n1)/2?-1
Leaf (non-root)
n
n
?(n1)/2?
?(n1)/2?
Not including the sequence pointer
10
Using B-trees to do a search
  • Lookup queries
  • empId 235
  • Range queries
  • empID gt 400
  • empID lt 600
  • 100 lt empID lt 400

11
Insertions
  • Cases
  • (a) simple case space available in leaf
  • (b) leaf overflow
  • (c) non-leaf overflow
  • (d) new root

12
Case (a) Insert Key 32
100
n3
30
3 5 11
30 31
13
Case (b) Insert Key 7
100
n3
30
3 5 11
30 31
14
Case (c) Insert Key 160
100
n3
120 150 180
180 200
150 156 179
15
Case (d) New root (insert 45)
n3
10 20 30
1 2 3
10 12
20 25
30 32 40
16
Deletion
  • Cases
  • (a) Simple case - no example
  • (b) Coalesce with neighbor (sibling)
  • (c) Redistribute keys
  • (d) Cases (b) or (c) at non-leaf

17
Case (b) Coalesce with a sibling
  • Delete 50

n4
10 40 100
10 20 30
40 50
18
Case (c) redistribute key
n4
  • Delete 50

10 40 100
10 20 30 35
40 50
19
Case (d) nonleaf coalesce
Delete 37
25
n4
10 20
30 40
25 26
30 37
40 45
20 22
10 14
1 3
20
Btree deletions in practice
  • Often, coalescing is not implemented
  • Since its too hard and not worth it!

21
Buffering for Btree
  • Is LRU a good policy for Btree buffers?
  • No!
  • Should try to keep root in memory at all times
    and
  • perhaps some nodes from second level

22
How to choose fanout n?
  • n affects
  • The depth of the tree logn (N), where N of
    records
  • Why?
  • The time of binary search within a node is very
    small compared to the disk IO time
  • We want to have a large n to reduce the tree
    height
  • If a node corresponds to a block, we choose a
    largest n to fill up the block space.

23
Btree Optimizations
  • To improve performance, we want to reduce the
    height.
  • Two strategies
  • decrease the number of leaf pages
  • shorten the data stored in leaf pages using
    compression techniques
  • increase index node fanout by compressing key
    representation
  • These compression techniques
  • reduce I/O costs
  • but typically complicate insertion, deletion, and
    search algorithms
  • Deciding the policy for maintaining Btree is
    part of physical database design.

24
Example prefix compression
  • Use prefix compression at the lower nodes.
  • Consider a page containing keys
  • Manino, Manna, Mannari, Mannarino, Mannella,
    Mannelli
  • Man is a common prefix. Store keys as
  • (3, ino) (3 na) (5 ri) (7 no) (4 ella) (7
    i)
  • Construct the strings
  • (3, ino) Man ino ? Manino
  • (3 na) Man na ? Manna
  • (5 ri) Manna ri ? Mannari ? e.g., pick
    first 5 characters of previous string
  • (7 no) Mannari no ? Mannarino
  • (4 ella) Mann ella ? Mannella
  • (7 i) Mannell I ? Mannelli

25
Suffix Compression
Beret
Bertolucci
  • Use suffix compression in high nodes.
  • Keys truncated to the right as long as they can
    distinguish between the high key value the left
    pointer points to and the low value the right
    pointer points to.
  • E.g. If Bert is enough at the root node to do
    the routing, we dont need to store Bertolucci

26
B-tree versus Btree
  • to record to record to record
  • with K1 with K2 with K3
  • to keys to keys to keys to
    keys
  • lt K1 K1ltxltK2 K2ltxltk3 gtk3
  • A B-tree has record pointers in non-leaf nodes
  • In practice, Btrees are preferred, and widely
    used

27
B-tree example n2
65 125

145 165
85 105
25 45
10 20
30 40
110 120
90 100
70 80
170 180
50 60
130 140
150 160
Write a Comment
User Comments (0)
About PowerShow.com