B -Tree Index - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

B -Tree Index

Description:

Chapter 10 Modified by Donghui Zhang Nov 9, 2005 14 13 15 16 17 18 20 21 10 10 Motivation Suppose every disk page holds 133 records. You are given 1334 = 0.3 billion ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 22
Provided by: RaghuRama126
Category:
Tags: double | index | root | tree

less

Transcript and Presenter's Notes

Title: B -Tree Index


1
B-Tree Index
  • Chapter 10

Modified by Donghui Zhang Nov 9, 2005
2
Motivation
  • Suppose every disk page holds 133 records.
  • You are given 1334 0.3 billion records. They
    occupy 1333 2.3 million disk pages.
  • You can utilize a small memory buffer of 134
    pages.
  • You can build an index structure.
  • Given the key of a record, what is the minimum
    guaranteed number of disk I/Os to find the record?

3
Content
  • B-tree index
  • Structure
  • Search
  • Insert
  • Delete
  • Bulk-loading a B-tree
  • Aggregation Query
  • SB-tree

4
B Tree Structure
  • Insert/delete at log F N cost keep tree
    height-balanced. (F fanout, N leaf pages)
  • Minimum 50 occupancy (except for root). Each
    node contains d lt m lt 2d entries. The
    parameter d is called the order of the tree.
  • Supports equality and range-searches efficiently.

5
B Tree Equality Search
  • Search begins at root, and key comparisons direct
    it to a leaf.
  • Search for 15

Root
17
24
30
13
39
3
5
19
20
22
24
27
38
2
7
14
16
29
33
34
  • Based on the search for 15, we know it is not
    in the tree!

6
B Tree Range Search
  • Search all records whose ages are in 15,28.
  • Equality search 15.
  • Follow sibling pointers.

Root
17
24
30
13
39
3
5
19
20
22
24
27
38
2
7
14
16
29
33
34
7
B Trees in Practice
  • Typical order 100. Typical fill-factor 67.
  • average fanout 133
  • Can often hold top levels in buffer pool
  • Level 1 1 page 8 KB
  • Level 2 133 pages 1 MB
  • Level 3 17,689 pages 145 MB
  • Level 4 2,352,637 pages 19 GB
  • With 1 MB buffer, can locate one record in 19 GB
    (or 0.3 billion records) in two I/Os!

8
Inserting a Data Entry into a B Tree
  • Find correct leaf L.
  • Put data entry onto L.
  • If L has enough space, done!
  • Else, must split L (into L and a new node L2)
  • Redistribute entries evenly, copy up middle key.
  • Insert index entry pointing to L2 into parent of
    L.
  • This can happen recursively
  • To split index node, redistribute entries evenly,
    but push up middle key. (Contrast with leaf
    splits.)
  • Splits grow tree root split increases height.
  • Tree growth gets wider or one level taller at
    top.

9
Inserting 8 into Example B Tree
  • Find leaf, in the same way as the Search
    algorithm.
  • Handle overflow by splitting.

Root
17
24
30
13
39
3
5
19
20
22
24
27
38
2
7
14
16
29
33
34
10
Inserting 8 into Example B Tree
Entry to be inserted in parent node.
  • Observe how minimum occupancy is guaranteed in
    both leaf and index pg splits.
  • Note difference between copy-up and push-up be
    sure you understand the reasons for this.

(Note that 5 is
s copied up and
5
continues to appear in the leaf.)
3
5
2
7
8
appears once in the index. Contrast
11
Example B Tree After Inserting 8
Root
17
24
30
13
5
2
3
39
19
20
22
24
27
38
7
5
8
14
16
29
33
34
  • Notice that root was split, leading to increase
    in height.
  • In this example, we can avoid split by
    re-distributing entries however,
    this is usually not done in practice.

12
Deleting a Data Entry from a B Tree
  • Start at root, find leaf L where entry belongs.
  • Remove the entry.
  • If L is at least half-full, done!
  • If L has only d-1 entries,
  • Try to re-distribute, borrowing from sibling
    (adjacent node with same parent as L).
  • If re-distribution fails, merge L and sibling.
  • If merge occurred, must delete entry (pointing to
    L or sibling) from parent of L.
  • Merge could propagate to root, decreasing height.

13
Deleting 19 and 20
Root
17
24
30
13
5
2
3
39
19
20
22
24
27
38
7
5
8
14
16
29
33
34
  • Deleting 19 is easy.
  • Deleting 20 is done with re-distribution.

14
After Deleting 19 and 20
Root
17
27
30
13
5
2
3
39
38
7
5
8
22
24
27
29
14
16
33
34
  • Notice, in re-distribution, how middle key is
    copied up.
  • If delete 24

15
... And Then Deleting 24
  • Must merge.
  • Observe toss of index entry (on right), and
    pull down of index entry (below).

30
39
22
27
38
29
33
34
Root
13
5
30
17
3
39
2
7
22
38
5
8
27
33
34
14
16
29
16
Example of Non-leaf Re-distribution
  • Tree is shown below during deletion of 24. (What
    could be a possible initial tree?)
  • In contrast to previous example, can
    re-distribute entry from left child of root to
    right child.

Root
22
30
17
20
13
5
17
After Re-distribution
  • Intuitively, entries are re-distributed by
    pushing through the splitting entry in the
    parent node.
  • It suffices to re-distribute index entry with key
    20 weve re-distributed 17 as well for
    illustration.

Root
17
13
5
30
22
20
39
7
5
8
2
3
38
17
18
33
34
22
27
29
20
21
14
16
18
Bulk Loading of a B Tree
  • If we have a large collection of records, and we
    want to create a B tree on some field, doing so
    by repeatedly inserting records is very slow.
  • Bulk Loading can be done much more efficiently.
  • Initialization Sort all data entries, insert
    pointer to first (leaf) page in a new (root) page.

Root
Sorted pages of data entries not yet in B tree
19
Bulk Loading (Contd.)
  • Index entries for leaf pages always entered into
    right-most index page just above leaf level.
  • Assume pages in the rightmost path to have double
    page size.
  • Split when double plus one.

Root
Data entry pages
10
12
20
6
not yet in B tree
3
6
9
10
11
12
13
23
31
36
38
41
44
4
20
22
35
Root
12
Data entry pages
not yet in B tree
6
20
10
23
3
6
9
10
11
12
13
23
31
36
38
41
44
4
20
22
35
20
Summary of Bulk Loading
  • Option 1 multiple inserts.
  • Slow.
  • Does not give sequential storage of leaves.
  • Option 2 Bulk Loading
  • Has advantages for concurrency control.
  • Fewer I/Os during build.
  • Leaves will be stored sequentially (and linked,
    of course).
  • Can control fill factor on pages.

21
Exercise for B-tree with Buffering
  • r, w, p, u logical read/write, pin, unpin.
  • R, W physical read/write.
  • ? one pin.
  • O a dirty page.
  • Draw the evolution
  • of an LRU buffer for
  • an insertion to page
  • 5 which results page 5 to split, followed by a
    search in page 8. Assume the buffer initially
    contains page 1 with pincount1.

1
2
3
4
5
6
7
8
9
Write a Comment
User Comments (0)
About PowerShow.com