Title: CS 245 Database System Principles
1CPSC-608 Database Systems
Fall 2008
Instructor Jianer Chen Office HRBB 309B Phone
845-4259 Email chen_at_cs.tamu.edu
Notes 7
2Terms
- Sequential file (data file)
- Index file (key-pointer pairs)
- Search key
- Primary index
- Secondary index
- Dense index
- Sparse index
- Multi-level index
3BTrees
- Support fast search
- Support range search
- Support dynamic changes
- Could be either dense or sparse
4BTree Example n3
100
120 150 180
30
120 130
3 5 11
180 200
100 101 110
150 156 179
30 35
5Sample non-leaf
57 81 95
- to keys to keys to keys to keys
- lt 57 57? klt81 81?klt95 ?95
6Sample leaf node
- From non-leaf node
- to next leaf
- in sequence
57 81 95
To record with key 57 To record with key
81 To record with key 85
7- Size of nodes n1 pointers
- n keys
FIXED
8Dont want nodes to be too empty
- Use at least
- Non-leaf ?(n1)/2? pointers
- Leaf ?(n1)/2? pointers to data
9n3
- Full node Min. node
- Non-leaf
- Leaf
120 150 180
30
3 5 11
30 35
10Btree rules tree of order n
- (1) All leaves at same lowest level (balanced
tree) - (2) Pointers in leaves point to records except
for sequence pointer
11- (3) Number of pointers/keys for Btree
Max Max Min Min ptrs keys
ptrs?data keys
Non-leaf (non-root)
n1
n
?(n1)/2?
?(n1)/2?1
Leaf (non-root)
n1
n
?(n1)/2?
?(n1)/2?
Root
n1
n
2
1
could be 1
12Search in a Btree
- Start from the root
- Search in a leaf block
- May not have to go to the data file
-
13Pseudo Code for Search in a Btree
- Search(ptr, k)
- \\ search a record of key value k in the subtree
rooted at ptr - Case 1. ptr is a leaf
- IF (k ki) for a key ki in ptr THEN
return(pi) - ELSE return(Null)
- Case 2. ptr is not a leaf
- find a key ki in ptr such that ki lt k lt
k(i1) - return(Search(pi, k)
14Insert into Btree
- (a) simple case
- space available in leaf
- (b) leaf overflow
- (c) non-leaf overflow
- (d) new root
-
15Pseudo Code for Insertion in a Btree
- Insert(ptr, (k,p), (k',p'))
- \\ p is a pointer to a data record with key value
k, which are inserted into the subtree rooted at - \\ ptr p' is a pointer to a new brother of
ptr, if created, in which k' is the smallest key
value - Case 1. ptr is a leaf
- IF there is room in ptr, THEN insert (k,p) into
ptr return(k'0, p'Null) - ELSE re-arrange the content in ptr and (k,p)
into (p0, k1, p1, ..., k(n1), p(n1)) - leave (p0, k1, ..., k(r-1), p(r-1)) in
ptr - create a new leaf q put (pr, k(r1),
p(r1), ..., k(n1), p(n1)) in q - return( k' k_r, p' q )
- Case 2. ptr is not a leaf
- find a key ki in ptr such that ki lt k lt
k(i1) - Insert(pi, (k,p), (k",p"))
- IF (p" Null) THEN return(k'0, p'Null)
- ELSE IF there is room in ptr, THEN insert
(k",p") into ptr return(k'0, p'Null) - ELSE re-arrange the content in ptr and
(k",p") into (p0, k1, p1, ..., k(n1), p(n1)) - leave (p0, k1, ..., k(r-1), p(r-1)) in
ptr - create a new leaf q put (pr, k(r1),
p(r1), ..., k(n1), p(n1)) in q
16n3
100
30
3 5 11
30 31
17n3
100
30
3 5 11
30 31
32
18n3
100
30
3 5 11
30 31
19n3
100
30
7
3 5
3 5 11
30 31
7
20n3
100
120 150 180
180 200
150 156 179
21n3
100
120 150 180
180 200
150 156 179
160 179
22n3
10 20 30
1 2 3
10 12
20 25
30 32 40
23n3
10 20 30
1 2 3
10 12
20 25
30 32 40
24Deletion from Btree
- (a) Simple case (no example)
- (b) Coalesce with neighbor (sibling)
- (c) Re-distribute keys
- (d) Cases (b) or (c) at non-leaf
25- (b) Coalesce with sibling
- Delete 50
n4
10 40 100
10 20 30
40 50
26- (b) Coalesce with sibling
- Delete 50
n4
10 40 100
10 20 30
40 50
40
27- (c) Redistribute keys
- Delete 50
n4
10 40 100
10 20 30 35
40 50
28- (c) Redistribute keys
- Delete 50
n4
10 40 100
35
10 20 30 35
40 50
35
29- (d) Non-leaf coalesce
- Delete 37
n4
25
10 20
30 40
25 26
30 37
40 45
20 22
10 14
1 3
30Btree deletions in practice
- Often, coalescing is not implemented
- Too hard and not worth it!
31Interesting problem
- For a Btree, how large should n be?
n is number of keys per node
32Sample assumptions
- Time to read node from disk is (STn) ms
(S, T constants, e.g. S70, T0.5)
(2) Once block in memory, use binary search (a
b log2n) ms (a, b constants, a S)
(3) Assume Btree is full, i.e., nodes to examine
is logn N, where N records
(4) Total search time f(n)
(lognN1)(STn) (a b log2n)
33Can get f(n) time to find a record
34? FIND nopt by f(n) 0
- Answer is nopt few hundred
35Variation on Btree B-tree (no )
- Idea
- Avoid duplicate keys
- Have record pointers in non-leaf nodes
36- to record to record to record
- with K1 with K2 with K3
- to keys to keys to keys to
keys - lt K1 K1ltxltK2 K2ltxltk3 gtk3
37B-tree example n2
65 125
145 165
85 105
25 45
10 20
30 40
110 120
90 100
70 80
170 180
50 60
130 140
150 160
38B-tree example n2
- sequence pointers
- not useful now!
65 125
145 165
85 105
25 45
10 20
30 40
110 120
90 100
70 80
170 180
50 60
130 140
150 160
39Tradeoffs
- ? B-trees have faster lookup than Btrees
- ? in B-tree, non-leaf leaf different sizes
- ? in B-tree, insertion and deletion more
complicated
? Btrees preferred!