Title: Tirgul 6
1Tirgul 6
- B-Trees Another kind of balanced trees
2Motivation
- Primary memory (RAM) very fast, but
costlySecondary storage (disk) very cheap, but
slow - Problem a large D.B. must reside partially on
disk. But disk operations are very slow. - Solution take advantage of important disk
property -Basic read/write unit is a page (2-4
Kb) - cant read/write less. - Thus when analyzing D.B. performance, we consider
two different measures CPU time and number of
times we need to access the disk. - Besides, B-trees are an interesting type of
balanced trees...
3B-Trees
- B-Tree a balanced search tree whose nodes can
have many children - A node x contains nx keys, and has nx1
children (c1x, c2x, , cnx1x). - The keys in each node are ordered, and relate to
their left and right sub-trees like regular
search trees if ki is any key stored in the
sub-tree rooted at cix, then - All leaves have the same depth h (the trees
height) - There is a fixed integer t (the minimum degree)
- Every node (besides the root) has at least t-1
keys (i.e. t children) - Every node can contain at most 2t-1 keys (2t
children).
4Example
50
t3
5B-Trees and disk access (last time...)
- Each node contains as many keys as possible
without being larger than a single page on disk. - Whenever we need to access a node load it from
the disk (one read operation), after changing a
node rewrite it to the disk. - For example, say each node contains 1000 keys
then the root has 1001 children, each of which
also has 1001 children. Thus with just 2 disk
accesses we are able to access 10003 records. - Operations are designed to work in one pass from
the root to the leaves we do not need to
backtrack our steps. This further reduces the
number of disk accesses we make.
6The height of a B-Tree
- Theorem If n ? 1, then for any B-tree of height
h with n keys and minimum degree t ? 2 - h ? log t ( (n1) / 2 )
- Proof Each child of the root has at least t
children, each of them also has at least t
children, and so on. Thus in every sub-tree of
the root there are at least
nodes. Each of them contains at least t-1 keys.
The root contains at least one key and has at
least two children, so we have -
7B-Tree Search
- Search is done in the regular way In each node,
we find the sub-tree in which our value might be,
and recursively find it there. - Performance O(th) O(tlogtn) - total run-time,
out of which - O(h) O(logtn) - disk access operations
8B-Tree Split
- Used for insertion. This operation verifies that
a node will have less than 2t-1 keys. - What we do is split the node into two nodes, each
witht-1 keys. The extra key goes into the nodes
parent (We assume the parent is not full) - To split a node x (look at the next slide for
illustration), take keytx (notice it is the
median key). All smaller keys (exactly t-1 of
them) form one new (legal) node, the same with
all larger keys. keytx goes into xs parent. - If the node we split is the root, then a new root
is created. This new root contains only one key.
9B-tree split
x
y
x
y
m
(parent)
m
(full node)
t-1 keys...
t-1 keys...
t-1 keys...
t-1 keys...
. . .
. . .
. . .
. . .
Notice that the parent has many other sub-trees
that dont change.
10Example
50
89
83
65
95
96
83
65
95
96
A full node (t3)
11B-Tree Insert
- We insert a key only to a leaf. We start from the
root and go down to the appropriate leaf. - On the way, before going down to the next node,
we check if it is full. If so, we split it (its
father is non-full because we checked this before
going down to the father). - When we reach the right leaf, we know that the
leaf is not full, so we can simply insert the new
value to the leaf. - Notice that we may need to split the root, if it
is full. In this case, the trees height
increases (but the tree remains completely
balanced!). Thats why we say that a B-tree grows
from the root, in contrast to most of the trees,
who grow from the leaves...
12Example
We start with an empty tree (t3)
(II) Inserting 25 splits the root
10
(I) Inserting 3,7,34,10,39
(III) Inserting 40 and 20
(IV) Inserting 17 splits the right leaf
10
25
20
17
13B-Tree Insert (cont.)
- Performance
- Split
- three disk accesses (to write the 2 new nodes,
and the parent) - O(t) - total run time
- Insert
- O(h) - disk accesses
- O(tlogtn) - total run time
- Requires O(1) pages in main memory.
14B-Tree Delete
- Delete is a bit more complicated...
- The basic idea when we go down the tree, we
make sure the next node has at least t keys, so
we will be able to delete a value from a leaf. - For this we use merge - the inverse of split. If
we have two children with t-1 keys and a parent
with at least t keys, we take one key from the
parent and merge it with the two children to
become a single node. - We also use a rotation - move a value from one
child who has at least t keys to another (see
next slides).
15Merge
parent with at least t keys
x
y
x
y
m
m
t-1 keys...
t-1 keys...
t-1 keys...
t-1 keys...
. . .
. . .
. . .
. . .
Notice that the parent has many other sub-trees
that dont change.
16(left) rotation
m
X
t-1 keys
t-1 keys m
keys gt t-1
X keys gt t-1
T
T
- Rotation is done when we have two consecutive
siblings, one with exactly t-1 keys and one with
at least t keys. - Similarly we can do right rotation.
17B-Tree Delete (1)
- For each node x on the way to k (x is internal,
and doesnt contain k), determine the sub-tree
that should contain k (whose root is cix), and, - If cix has at least t keys, simply delete k
from it by recursion. - If cix has only t-1 keys but has an adjacent
sibling with t keys, add cix an extra key by
left/right rotation. Now recurse to cix - Otherwise, merge cix and one of its adjacent
siblings (note that x itself has at least t
keys). Continue deletion from the merged node. - Similarly to insertion, the tree height decreases
(only) when we perform a merge operation, the
parent is the root, and it contains only one
value.
18B-Tree Delete (2)
- If we want to delete a key k and we found the
node x that contains k - If x is a leaf simply remove k from x (we
assume x has at least t keys, which we verify in
the descent, see also below). - If x is an internal node
- Let y be the left child of k. If y has at least t
keys, then recursively delete the largest key
from y (call it k) and replace k with k in x. - Otherwise, do this for the right child, z, of k.
- If both y and z have only t-1 keys, merge y, z,
and k. Recursively delete k from the merged node
(notice that the new node has 2t-1 keys).
19 Performance of delete operation
- O(h) disk accesses
- O(th)O(tlogtn) total run time
- The advantage - only one pass down the tree!
20Example
Start
(I) Deleting 39 causes a right rotation
25
20
17
(II) Deleting 20 causes a merge
20
17
10
(III) Deleting 10
(IV) Deleting 40
25
40
34
17
17
17
(V) Deleting 34 causes a merge and the tree
shrinks
34
40
25
34
25
7
25
17
3