Title: The BTree Index
1The B-Tree Index
Lecture 22 CS 157A
- Prof. Sin-Min Lee
- Department of Computer Science
- San Jose State University
2Outline
- Introduction
- The B-tree shape
- Dynamic changes in the B-tree
- Properties of the B-tree
- Create index statement syntax
- More about the B-tree
- Summary
3Introduction
- A B-tree is a keyed index structure, comparable
to a number of memory resident keyed lookup
structures such as balanced binary tree, the AVL
tree, and the 2-3 tree. - The difference is that a B-tree is meant to
reside on disk, being made partially
memory-resident only when entries int the
structure are accessed. - The B-tree structure is the most common used
index type in databases today. - It is provided by ORACLE, DB2, and INGRES.
4(No Transcript)
5The B-Tree Shape
- A B-tree is built upside down with the root at
the top and the leaves at the bottom. - All nodes above the leaf level, including the
root, are called directory nodes or index nodes. - Directory nodes below the root are called
internal nodes. - The root node is known as level 1 of the B-tree
and successively lower levels are given
successively larger level numbers with the leaf
nodes at the lowest level. - The total number of levels is called the depth
of the B-tree.
6- Balanced and Unbalanced Trees
- Trees can be balanced or unbalanced.
- In a balanced tree, every path from the route to
a leaf node is the same length. - A tree that is balanced has at most logorder n
levels. This is desirable for an index.
7The Problem of Unbalanced Trees
- a. A Troublesome Search Tree
- b. A More Troublesome Search Tree
1
1
2
2
3
3
4
4
5
6
5
7
8
9
8Disadvantage of unbalanced tree
- Searching an unbalanced tree may
- require traversing an arbitrary and
unpredictable number of nodes and pointers.
9Unbalanced Tree (cont.)
- Problems
- 1. The levels of the tree are only sparsely
filled - 2. Resulting in long
- 3. Deep paths and defeating the purpose of
binary trees in the first place.
10The general name for B-trees is multiway trees.
However, the best known of them have their very
own names 2-3 trees, B trees, and B trees.
Here we will look only at 2-3 trees. They are the
simplest. And or purposes of learning the
underlying principles this is good. Multiway
trees can get pretty complicated pretty fast.
Keep in mind that 2-3 trees as we study them are
rarely built. Their larger cousins, B and B
trees are often used for medium and large
database applications. The general idea of a
multiway tree of order n is that each node can
hold up to n - 1 key values and each node can
have up to n childern. So, a 2-3 tree is actually
a multiway tree of order 3. That means that a 2-3
tree has nodes that can hold 1 or 2 values and
that a node can have 0, 1, 2, or 3 children. Thus
the name 2-3 tree
11The B-Tree Shape (c.)
- level1 root node, level2 directory nodes,
level3 leaf nodes
12(No Transcript)
13B-Trees
- For a binary search tree, the search time is
O(h), where h is the height of the tree. - The height cannot be less than roughly log2n for
a tree with n nodes. - If the tree is too large for internal memory,
each access of a node is an I/O. - For a tree with 10,000 nodes
- log210000 100 disk accesses
- We need a structure that would require only
3 or 4 disk accesses. -
14B-Trees - Definition
- A B-tree of order M is an M-ary tree with
the following properties - (1) The data items are stored at leaves.
- (2) The nonleaf nodes store up to M - 1 keys
to guide the searching key i represents the
smallest key in subtree i 1.
15- (3) The root is either a leaf or has between 2
and M children. - (4) All nonleaf nodes (except the root) have
between ceiling(M/2) and M children. - (5) All leaves are at the same depth and have
between ceiling(L/2) and L data items, for some
L.
16(No Transcript)
17B-tree of order n
- Every B-tree is of some "order n", meaning nodes
contain from n to 2n keys (so nodes are always at
least half full of keys), and n1 to 2n1
pointers, and n can be any number. - Keys are kept in sorted order within each node. A
corresponding list of pointers are effectively
interspersed between keys to indicate where to
search for a key if it isn't in the current node.
18- A B-tree of order n is a multi-way search tree
with two properties - 1.All leaves are at the same level
- 2.The number of keys in any node lies between
n and 2n, with the possible exception of the root
which may have fewer keys.
19Other definition
- A B-tree of order m is a m-way tree that
satisfies the following conditions. - Every node has lt m children.
- Every internal node (except the root) has ltm/2
children. - The root has gt2 children.
- An internal node with k children contains (k-1)
ordered keys. The leftmost child contains keys
less than or equal to the first key in the node.
The second child contains keys greater than the
first keys but less than or equal to the second
key, and so on.
20A B-tree of order 2
21Dynamic changes in the B-Tree
- A B-tree is an efficient self-modifying structure
when new entries are inserted pointing to new
rows inserted in the indexed table. - The nodes at every level are generally assumed
not to be full. - Space is left so that inserts are often possible
to a node at any level without new disk space
being required. - An insert of a new entry always occurs at the
leaf level, but occasionally the leaf node is too
full to simply accept the new entry. In this
case, for additional space the leaf level node is
split into two leaf pages.
22Properties of the B-Tree
- Assumptions
- Entry key values can have variable length because
of variable-length column values appearing in the
index key. - When a node split occurs, equal lengths of entry
information are placed in the left and right
split node. - Rebalancing actions in the B-tree occur when
entries are deleted.
23Properties of the B-Tree (c.)
- Properties
- Every node is disk-page sized and resides in a
well-defined location. - Nodes above the leaf level contain directory
entries, with n-1 separator keys and n disk
pointers to lower-level B-tree nodes. - Nodes at the leaf level contain entries with
(keyval, rowid) pairs pointing to individual rows
indexed. - All nodes below the root are at least half full
with entry information. - The root node contains at least two entries.
24Insertion in B-Tree
f
a b f g
a b
g k
25Insertion (cont.)
- 3. 4.
- d, h, m j
- 5. 6.
- e, s, i, r
x -
f j
f
a b d
g h
k m
a b d
g h k m
f j r
f j
g h i
s x
k m
a b d e
a b d e
k m r s
g h i
26Insertion (cont.)
c f j r
s t u x
k l m n
g h i
a b
d e
j
m r
c f
a b
d e
k l
n p
g h i
s t u x
27Inserting into a B-Tree
- To insert key value x into a B-Tree
- Use the B-Tree search to determine on which node
to make the insertion. - Insert x into appropriate position on that leaf
node. - If resulting number of keys on that node lt L,
then simply output that node to disk and return. - Otherwise, split the node.
28Inserting into a B-Tree Splitting a Node
- Allocate a new leaf node. Put about half (i.e.,
- about L/2) of the keys on the new node and
- leave about half of the keys on the existing
node. - Make appropriate changes to keys and pointers in
the parent node. - If the parent node was already full, then split
the parent node. - The splitting of parents may continue all the way
back up to the root node.
29Insert 19,12, 22,15.
30Insertion
- Insert the keys in the folowing order into a
B-tree of order 5. - A, G, F, B, K, D, H, M, J, E, S, I, R, X, C,
L, N, T, U, P.
31(No Transcript)
32Searching
- Searching for an Item in a B-Tree
- 1. Make a local variable, i, equal to the
first index such that datai gt target. If
there is no such index, then set i equal to
data_count, indicating that none of the entries
is grater than or equal to the target. - 2. if (we found the target at datai)
- return true
- else if (the root has no children)
- return false
- else
- return subseti-gtcontains
(target)
33Searching (cont.)
6 17
19 22
12
4
20
25
2 3
16
10
18
5
34Deletion form a B-Tree
- 1. detete h, r
-
s promote s and -
delete form leaf
j
c f
m r
s t u x
d e
a b
k l
n p
g h i
g i
t u x
35Deletion (cont.)
- 2. delete p
-
-
t pull s
down -
pull t up
j
c f
m s
d e
a b
g i
n p
k l
t u x
n s
36Deletion (cont.)
j
m t
c f
u x
k l
n s
g i
d e
a b
37Deletion (cont.)
j
m t
f
k l
n s
u x
g i
a b c e
f j m t
u x
n s
k l
g i
a b c e
38Deleting from a B-Tree
- To delete a key value x from a B-tree, first
search to determine the leaf node that contains
x. - If removing x leaves that leaf node with
fewer than the minimum number of keys, try to
adopt a key from a neighboring node. If thats
possible, then youre finished.
39Deleting from a B-Tree (continued)
- If the neighboring node is already at its
minimum, combine the leaf node with its
neighboring node, resulting in one full leaf
node. - This will require restructuring the parent node
since it has lost a child - If the parent now has fewer than the minimum
keys, adopt a key from one of its neighbors. If
thats not possible, combine the parent with its
neighbor.
40Deleting from a B-Tree (continued)
- This process may percolate all the way to the
root. - If the root is left with only one child, then
remove the root node and make its child the new
root. - Both insertion and deletion are O(h), where h is
the height of the tree.
41Delete 18
42Delete 5
43Delete 19
44Delete 12
45Properties of B-Trees
- (1) In a B-tree of order m with n keys the number
of nodes, p, satisfies -
- and on average p 1.44 n/m.
- (2) On average less than
- nodes are split per insertion
46Advantages of B-tree
- Searching a balanced tree means that all leaves
are at the same depth. There is no runaway
pointer overhead. Indeed, even very large B-trees
can guarantee only a small number of nodes must
be retrieved to find a given key. For example, a
B-tree of 10,000,000 keys with 50 keys per node
never needs to retrieve more than 4 nodes to find
any key.
47ORACLE Create Index Statement
- Syntax
- create unique index indexname on tablename
- (columnname asc desc , columnname
asc desc) - tablespace tblspacename
- storage ( initial n next n
minextents n - maxextents n pctincrease n )
- pctfree n
- other disk storage and transaction
clauses not covered - or deferred
- nosort
48ORACLE Create Index Statement (c.)
- Explanation
- The value of n in pctfree can range from 0 to 99
and this number determines the percentage of each
B-tree node page. - The list of columnnames in parentheses on the
second line specifies a concatenation of column
values that make up an index key on the table
specified. - The nosort indicates that the rows already lie on
disk in sorted order by the key values for this
index.
49DB2 Create Index Statement
- Syntax
- create unique index indexname on tablename
- (columname asc desc , columnname asc
desc) - using . . .
- freepage n
- pctfree n
- additional clauses not covered or
deferred
50DB2 Create Index Statement (c.)
- Explanation
- The using specifies how the index is to be
constructed from disk files. - The integer n of the freepage specifies how
frequently an empty free page should be left in
the sequence of pages assigned to the index when
it is loaded with entries by a DB2 utility. - One free page is left for every n index pages
where n varies from 0 to 255. The defaut value
for n is 0 meaning that no free pages are left. A
value of n1 means that alternate disk page are
left empty.
51INGRES Create Index Statement
- Syntax
- create unique index indexname on table
- (columname , columnname)
- with / comas separate
clauses following / - location . . .
- structure btree isam hash . . .
- key (columnname , columnname)
- fillfactor n
- nonleaffill n
- additional clauses not covered or
deferred
52INGRES Create Index Statement (c.)
- Explanation
- The with keyword must be present if any of the
later coma-separated clauses appear. - The location specifies how the index is to be
constructed from disk files. - The structure is unique to INGRES and names the
access structure that the index will be assigned
when it is created. - The key indicates that the index key value will
be constructed from the columnnames listed. - The fillfactor and nonleaffill gives the
percentage of node space that should be filled.
53Index Node Layout and Free Space
- Below is the schematic layout of a normal
leaf-level index node with unique key value. - Header info
Free space
keyval rid
keyval rid
54More about the B-Tree
- The purpose of the B-tree index is to minimize
the number of disk I/Os needed to locate a row
wit a given index key value. - The depth of the B-tree bears a close
relationship to the number of disk I/Os used to
reach the leaf-level entry where the rowid is
kept. - The nodes of the B-tree are loaded in a
left-to-right fashion so that successive inserts
normally occur to the same leaf node held
consistently in memory buffer. - When the leaf node splits, the successive leaf
node is allocated from the next disk page of the
allocated extent.
55More about the B-Tree (c.)
- Node splits at every level occur in a controlled
way and allow us to leave just the right amount
of free space on each page. - It is common to estimate the fanout at each level
to have a value of n where n is expected number
of entries that appear in each node. Assuming
that there are n directory entries at the root
node and every node below that, the number of
entries at the second level is n2, at third
level is n3, and so on. For a tree of depth K,
the number of leaf-level entries is nK just
before a root split occurs in the tree to make it
a tree of depth K1.
56Summary
- The B-tree is a tree-like structure that helps us
to organize data in an efficient way. - The B-tree index is a technique used to minimize
the disk I/Os needed for the purpose of locating
a row with a given index key value. - Because of its advantages, the B-tree and the
B-tree index structure are widely used in
databases nowadays.