Title: FILE
1FILE SYSTEIM STRUCTURE
2Physical Storage Media
- A system consists of several forms of storage
- Cache fastest and most costly form of storage
volatile managed by the computer system
hardware. - Main memory
- fast access (10ns to 100ns 1 nanosecond 109
seconds) - generally too small (or too expensive) to store
the entire database - capacities of up to a few Gigabytes widely used
currently - Capacities have gone up and per-byte costs have
decreased steadily and rapidly (roughly factor
of 2 every 2 to 3 years) - Volatile contents of main memory are usually
lost if a power failure or system crash occurs.
3Disks and Files
- DBMS stores information on (hard) disks.
- This has major implications for DBMS design!
- READ transfer data from disk to main memory
(RAM). - WRITE transfer data from RAM to disk.
- Both are high-cost operations, relative to
in-memory operations, so must be planned
carefully!
4Why Not Store Everything in Main Memory?
- Costs too much. 1000 will buy you either 0.5GB
of RAM or 50GB of disk today. - Main memory is volatile. We want data to be
saved between runs. (Obviously!) - Typical storage hierarchy
- Main memory (RAM) for currently used data.
- Disk for the main database (secondary storage).
- Tapes for archiving older versions of the data
(tertiary storage).
5Disks
- Secondary storage device of choice.
- Main advantage over tapes random access vs.
sequential. - Data is stored and retrieved in units called disk
blocks or pages. - Unlike RAM, time to retrieve a disk page varies
depending upon location on disk. - Therefore, relative placement of pages on disk
has major impact on DBMS performance!
6Components of a Disk
Spindle
Disk head
- The platters spin (say, 90rps).
- The arm assembly is moved in or out to position
a head on a desired track. Tracks under heads
make a cylinder (imaginary!).
Sector
Platters
- Only one head reads/writes at any one time.
- Block size is a multiple of sector size (which
is fixed).
7Accessing a Disk Page
- Time to access (read/write) a disk block
- seek time (moving arms to position disk head on
track) - rotational delay (waiting for block to rotate
under head) - transfer time (actually moving data to/from disk
surface) - Seek time and rotational delay dominate.
- Seek time varies from about 1 to 20msec
- Rotational delay varies from 0 to 10msec
- Transfer rate is about 1msec per 4KB page
- Key to lower I/O cost reduce seek/rotation
delays! Hardware vs. software solutions?
8Arranging Pages on Disk
- Next block concept
- blocks on same track, followed by
- blocks on same cylinder, followed by
- blocks on adjacent cylinder
- Blocks in a file should be arranged sequentially
on disk (by next), to minimize seek and
rotational delay. - For a sequential scan, pre-fetching several pages
at a time is a big win!
9Files of Records
- Page or block is OK when doing I/O, but higher
levels of DBMS operate on records, and files of
records. - FILE A collection of pages, each containing a
collection of records. Must support - insert/delete/modify record
- read a particular record (specified using record
id) - scan all records (possibly with some conditions
on the records to be retrieved)
10Record Formats Fixed Length
F1
F2
F3
F4
L1
L2
L3
L4
Base address (B)
Address BL1L2
- Information about field types same for all
records in a file stored in system catalogs. - Finding ith field requires scan of record.
11Record Formats Variable Length
- Two alternative formats ( fields is fixed)
F1 F2 F3
F4
Fields Delimited by Special Symbols
Field Count
F1 F2 F3 F4
Array of Field Offsets
- Second offers direct access to ith field,
efficient storage - of nulls (special dont know value) small
directory overhead.
12Page Formats Fixed Length Records
Slot 1
Slot 1
Slot 2
Slot 2
Free Space
. . .
. . .
Slot N
Slot N
Slot M
N
M
1
. . .
1
1
0
M ... 3 2 1
number of records
number of slots
PACKED
UNPACKED, BITMAP
- Record id . In first
alternative, moving records for free space
management changes rid may not be acceptable.
13Page Formats Variable Length Records
Rid (i,N)
Page i
Rid (i,2)
Rid (i,1)
N
Pointer to start of free space
20
16
24
N . . . 2 1
slots
SLOT DIRECTORY
- Can move records on page without changing rid
so, attractive for fixed-length records too.
14Alternative File Organizations
- Many alternatives exist, each ideal for some
situation, and not so good in others - Heap files Suitable when typical access is a
file scan retrieving all records. - Sorted Files Best if records must be retrieved
in some order, or only a range of records is
needed. - Hashed Files Good for equality selections.
- File is a collection of buckets. Bucket primary
page plus zero or more overflow pages. - Hashing function h h(r) bucket in which
record r belongs. h looks at only some of the
fields of r, called the search fields.
15TERMINOLOGY
- Computers represent data as a sequence of zero
and ones, termed bits - A byte is eight contiguous bits
0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
16TERMINOLOGY
- Computers represent data as a sequence of zero
and ones, termed bits - A byte is eight contiguous bits
0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
17TERMINOLOGY
- Computers represent data as a sequence of zero
and ones, termed bits - A byte is eight contiguous bits
0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
18TERMINOLOGY
- Computers represent data as a sequence of zero
and ones, termed bits - A byte is eight contiguous bits
- 1024 bytes is one Kilobyte (KB), e.g., your
textbook.
0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
19TERMINOLOGY
- Computers represent data as a sequence of zero
and ones, termed bits - A byte is eight contiguous bits
- 1024 bytes is one Kilobyte (KB).
- 1024 KB is one Megabyte (MB), a high resolution
photograph.
0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
20TERMINOLOGY
- Computers represent data as a sequence of zero
and ones, termed bits - A byte is eight contiguous bits
- 1024 bytes is one Kilobyte.
- 1024 KB is one Megabyte (MB).
- 1024 MB is one Gigabyte (GB), e.g., a DVD quality
movie.
0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
21TERMINOLOGY
- Computers represent data as a sequence of zero
and ones, termed bits - A byte is eight contiguous bits
- 1024 bytes is one Kilobyte.
- 1024 KB is one Megabyte.
- 1024 MB is one Gigabyte.
- 1024 GB is one Terabyte (TB), all text in the
library of congress.
0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
22TERMINOLOGY
- Computers represent data as a sequence of zero
and ones, termed bits - A byte is eight contiguous bits
- 1024 bytes is one Kilobyte.
- 1024 KB is one Megabyte.
- 1024 MB is one Gigabyte.
- 1024 GB is one Terabyte (TB).
- 1024 TB is one Petabyte (PB), entire multimedia
collection at LoC.
0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
23TERMINOLOGY
- Computers represent data as a sequence of zero
and ones, termed bits - A byte is eight contiguous bits
- 1024 bytes is one Kilobyte.
- 1024 KB is one Megabyte.
- 1024 MB is one Gigabyte.
- 1024 GB is one Terabyte.
- 1024 TB is one Petabyte (PB).
- 1024 PB is one Exabyte (XB), record all phone
conversations in a year.
0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
24TERMINOLOGY
- Computers represent data as a sequence of zero
and ones, termed bits - A byte is eight contiguous bits
- 1024 bytes is one Kilobyte.
- 1024 KB is one Megabyte.
- 1024 MB is one Gigabyte.
- 1024 GB is one Terabyte.
- 1024 TB is one Petabyte.
- 1024 PB is one Exabyte.
- 1024 XB is one Zetabyte (ZB), all uncompressed
medical data.
0101111110011010101010110000000000..00000
0101111110011010101010110000000000..00000
25HOW MUCH DATA IS THERE?
- Approximately 5000 films are made each year
(worldwide) - Two hour display time at 240 mbps 900 TB
- Approximately 52 billion photographs are taken
each year - _at_ 10 KB per photograph, 520 PB
- Library of congress
- 20 million books _at_ 1MB 20 TB
- 15 million photographs _at_ 1 MB 13 TB
- 4 million maps _at_ 100 MB 400 TB
- 500,000 movies _at_ 10 GB 5 PB
- 3.5 million sound recordings at library of
congress _at_ 1 audio per CD 2 PB
26FILE SYSTEM STRUCTURE (Cont)
- A database system is organized as several layers
of software - Query parser translates a higher level query
language to an internal representation - Query optimizer transforms the internal
representation to an efficient execution paradigm
- Concurrency control and crash recovery ensures
consistency of data in the presence of multiple
concurrent update operations and
crash-recoveries. - Index methods efficient retrieval of records
for fast retrieval and update operations - Abstraction of multiple records on a disk page
implements the concept of multiple records on a
disk page.
27BIG PICTURE
SELECT SS FROM emp WHERE sal 50K
DBMS
28Overall Organization
SELECT SS FROM emp WHERE sal 50K
Relational Algebra operators ?, ?, ?, ?, ?, ?,
?, ?, ?
29Overall Organization
SELECT SS FROM emp WHERE sal 50K
Query Parser
?SS(?sal 50K (emp))
Relational Algebra operators ?, ?, ?, ?, ?, ?,
?, ?, ?
30QUERY TREE
- ?SS(?sal 50K (emp)) becomes a query tree
Computer Screen
?
TMP File1
? sal 50K
emp
31Overall Organization
Query Parser
Query Optimizer
Query Interpretor
Relational Algebra operators ?, ?, ?, ?, ?, ?,
?, ?, ?
Index structures
Abstraction of records
Buffer Pool Manager
File System
32FILE SYSTEM STRUCTURE (Cont)
- Buffer manager maintains a portion of memory that
is conceptualized as disk page frames. It
maintains which disk pages are memory resident.
It also implements a replacement policy in order
to swap a page out in favor of another disk page
that is being referenced. This happens because
the number of memory page frames is significantly
smaller than the number of disk pages. - File manager provides the following services
create a file, delete a file, read a disk page
into a specific memory address given the physical
address of disk page on the secondary storage
device, write a disk page from a memory address
on to the appropriate physical disk address,
insert a page into a file, modify a page, and
delete a page from a file.
33FILE SYSTEM STRUCTURE (Cont)
- When a program requests a disk page (by
specifying its address), the buffer manager takes
the following steps - Check if the page is in the buffer.
- If it is then pass its address to the calling
program. - Otherwise, read the page from the disk into the
buffer, possibly replacing some other page, and
then pass its address to the calling program. - Pinned blocks Occasionally, the DBMS needs to
specifically indicate that some blocks have to be
kept in the buffer until released by unpinning
them. These blocks are termed pinned. - Forced writing of blocks to disks To preserve
the consistency of the database during
crash-recovery, the DBMS might force the buffer
manager to flush some blocks to disks.
34INDEX
35TOPICS
- Basic concepts
- Hashing
- B-tree
36INTRODUCTION
E-R data model
Conceptual
Logical
Relational data model SQL
relation a file Org. of records on a disk
page Organization of attributes within a
record Index Files
Physical
37Software Architecture of a DBMS
Query Parser
Query Optimizer
Query Interpretor
Relational Algebra operators ?, ?, ?, ?, ?, ?,
?, ?, ?
Index structures
Abstraction of records
Buffer Pool Manager
File System
38Implementation of ?
- Emp table
- ?Salary30,000(Employee)
- Process the select operator using a file scan
(linear scan) - F1 Open the file corresponding to Employee
- P read first page of F1
- While P is not null
- For each record in P, if the record satisfies the
selection predicate then produce as output - P read next page of F1 / P becomes null when
EoF is reached /
39Implementation of ?
- Emp table
- ?Salary30,000(Employee)
- Process the select operator using a file scan
(linear scan) - F1 Open the file corresponding to Employee
- P read first page of F1
- While P is not null
- For each record in P, if the record satisfies the
selection predicate then produce as output - P read next page of F1
Fetch the page from disk if not in the buffer
pool
40Implementation of ?
- Emp table
- ?Salary30,000(Employee)
- Process the select operator using a file scan
(linear scan) - F1 Open the file corresponding to Employee
- P read first page of F1
- While P is not null
- For each record in P, if the record satisfies the
selection predicate then produce as output - P read next page of F1
41TERMINOLOGY
- An exact match selection predicate
?Salary30,000(Employee) , ?FirstNameShideh(Emp
loyee) - A range selection predicate ?Salary30,000(Employ
ee) , ?Salary30,000
and Salary
42INTRODUCTION (Cont)
- Example
- Assume, size of disk page 2 data records 5
index records. - Indexing or not indexing?
- SELECT age SELECT age
- FROM personnel FROM personnel
- WHERE name Alice WHERE name Don
43INTRODUCTION (Cont)
- Example
- Assume, size of disk page 2 data records 5
index records. - Primary vs. Secondary
- SELECT name SELECT age
- FROM personnel FROM personnel
- WHERE state Ohio WHERE name David
44INTRODUCTION (Cont)
- Example (page 2 data 5 index)
- Exact match vs. Range
- SELECT name SELECT name
- FROM personnel FROM personnel
- WHERE state California WHERE state
Alaska and - state Florida
- Speedup by employing binary search (is it
possible?)
45Dense Index Files
- Dense index Index record appears for every
search-key value in the file.
46Example of Sparse Index Files
47Multilevel Index
48Clustered vs. Unclustered Index
- Suppose that Alternative (2) is used for data
entries, and that the data records are stored in
a Heap file. - To build clustered index, first sort the Heap
file (with some free space on each page for
future inserts). - Overflow pages may be needed for inserts. (Thus,
order of data recs is close to, but not
identical to, the sort order.)
Index entries
UNCLUSTERED
CLUSTERED
direct search for
data entries
Data entries
Data entries
(Index File)
(Data file)
Data Records
Data Records
49HASHING
- Hash function
- K the set of all search key values
- V the set of all bucket address
- h(K) K V
- K is large (perhaps infinite) but set of
search-key values actually stored in the database
is much smaller than K. - Fast lookup To find Ki, search the bucket with
h(Ki) address.
50HASHING (Cont)
- Example
- K salary (set of all 6 digit integers)
- V 1000 buckets addressed from 0 to 999
- h(k) k mod 1000.
- SELECT name
- FROM personnel
- WHERE salary 120,100
- To find a 120,100 salary, we should search bucket
number 100. - Hash is only appropriate for Exact match queries.
- A bad hash function maps the value to a subset of
(or a few) buckets (e.g., h(k) k mod 10.
51HASHING (Cont)
- Clustered Hash Index
- The index structure and its buckets are
represented as a file (say file.hash) - The relation is stored in file.hash (I.e., each
entry in file.hash corresponds to a record in
relation) - Assuming no duplicates the record can be
accessed in 1 IO. - Non-clustered Hash Index
- The index structure and its buckets are
represented as a file (say file.hash) - The relation remains intact
- Each entry in file.hash has the following format
(search-key value, RID) - Assuming no duplicates the record can be
accessed in 2 IO.
52HEAP FILE ORGANIZATION
- Assume a student table Student(name, age, gpa,
major) - t(Student) 16
- P(Student) 4
Bob, 21, 3.7, CS
Kane, 19, 3.8, ME
Louis, 32, 4, LS
Chris, 22, 3.9, CS
Mary, 24, 3, ECE
Lam, 22, 2.8, ME
Martha, 29, 3.8, CS
Chad, 28, 2.3, LS
Tom, 20, 3.2, EE
Chang, 18, 2.5, CS
James, 24, 3.1, ME
Leila, 20, 3.5, LS
Kathy, 18, 3.8, LS
Vera, 17, 3.9, EE
Pat, 19, 2.8, EE
Shideh, 16, 4, CS
53Non-Clustered Hash Index
- A non-clustered hash index on the age attribute
with 4 buckets, - h(age) age B
(24, (1, 2))
(20, (4,3))
(32, (3,1))
(16, (4,4))
(21, (1, 1))
(20, (1,3))
(24, (3,3))
(17, (2,4))
(28, (4,2))
(29, (3,2))
0
1
2
(18, (1, 4))
3
(22, (2,2))
(19, (2, 1))
(22, (4,1))
(19, (3, 4))
(18, (2,3))
Bob, 21, 3.7, CS
Kane, 19, 3.8, ME
Louis, 32, 4, LS
Chris, 22, 3.9, CS
Mary, 24, 3, ECE
Lam, 22, 2.8, ME
Martha, 29, 3.8, CS
Chad, 28, 2.3, LS
Tom, 20, 3.2, EE
Chang, 18, 2.5, CS
James, 24, 3.1, ME
Leila, 20, 3.5, LS
Kathy, 18, 3.8, LS
Vera, 17, 3.9, EE
Pat, 19, 2.8, EE
Shideh, 16, 4, CS
54Clustered Hash Index
- A clustered hash index on the age attribute with
4 buckets, - h(age) age B
Mary, 24, 3, ECE
Shideh, 16, 4, CS
Louis, 32, 4, LS
Leila, 20, 3.5, LS
Bob, 21, 3.7, CS
Tom, 20, 3.2, EE
James, 24, 3.1, ME
Vera, 17, 3.9, EE
Chad, 28, 2.3, LS
Martha, 29, 3.8, CS
0
1
2
Kathy, 18, 3.8, LS
3
Lam, 22, 2.8, ME
Kane, 19, 3.8, ME
Chris, 22, 3.9, CS
Pat, 19, 2.8, EE
Chang, 18, 2.5, CS
55Non-Clustered Hash Index
- A non-clustered hash index on the age attribute
with 4 buckets, - h(age) age B
- Pointers are page-ids
500
(24, (1, 2))
(20, (4,3))
1001
(32, (3,1))
(16, (4,4))
(21, (1, 1))
(20, (1,3))
(24, (3,3))
(17, (2,4))
(28, (4,2))
(29, (3,2))
0
500
706
1
1001
2
(18, (1, 4))
706
101
3
101
(22, (2,2))
(19, (2, 1))
(22, (4,1))
(19, (3, 4))
(18, (2,3))
Bob, 21, 3.7, CS
Kane, 19, 3.8, ME
Louis, 32, 4, LS
Chris, 22, 3.9, CS
Mary, 24, 3, ECE
Lam, 22, 2.8, ME
Martha, 29, 3.8, CS
Chad, 28, 2.3, LS
Tom, 20, 3.2, EE
Chang, 18, 2.5, CS
James, 24, 3.1, ME
Leila, 20, 3.5, LS
Kathy, 18, 3.8, LS
Vera, 17, 3.9, EE
Pat, 19, 2.8, EE
Shideh, 16, 4, CS
56HASHING (Cont)
- Hashing vs. Indexing
- Hashing is appropriate for exact match queries
(cannot support range queries) - SELECT A1, A2,
- FROM r
- WHERE (Ai c)
- Indexing is appropriate for both range and exact
match queries - SELECT A1, A2,
- FROM r
- WHERE (Ai c2)
57B-TREE
- B-tree is a multi-level tree structured
directory - Clustered Leaf nodes contain the records,
themselves.
58B-TREE (Cont)
- Non-clustered Leaf nodes contain the pairs (P,
K), where P is a pointer to the record in the
file and K is a search-key.
59B-TREE (Cont)
- Leaf nodes
- Maintain between to n-1 values per
leaf. - If i
- Every search-key value in the file appears in
some leaf node. - Suppose Li and Lj are two leaves and i every search value in Li is less than every
search value in Lj.
P1
K1
P2
. . .
Pn-1
Kn-1
Pn
5
7
10
15
17
18
60B-TREE (Cont)
- Internal nodes
- Maintain between to n pointers per
internal node - root is an exception It must have more than one
pointer. - Suppose a node with m pointers and 2
- Pi points to subtree containing search-key values
Ki-1. - Pm points to subtree containing search-key values
Km-1. - P1 points to subtree containing search-key values
61B-TREE (Cont)
- Lookup
- Find 7 4 Ios
- Find 4-20 4 IOs (assuming primary index), 8 IOs
(assuming secondary index) - More than 10 selection it is more efficient to
do sequential scan (do not use the secondary
index). - Example 10,000 records, select 1000 of them,
1000 records per disk page (Sequential search
10 IOs, Secondary index potentially 1000 IOs)
62B-TREE (Cont)
- Analysis
- B in B-tree stands for Balanced. i.e., the
length of every path from the root to a leaf node
is the same. - Hence, good performance for lookup, insertion,
and deletion - K number of search key values in a file, then
the path is - K 1,000,000, and 10 to 9 nodes be accessed.
- Insertion and Deletion should not destroy the
balance of the tree. - References
- J. Jannink, Inplementing Deletion in B-Trees,
SIGMOD RECORD, Volume 24, Number 1 (March 1995),
pages 33-38. - D. Comer, The Ubiquitous B-tree, ACM Computing
Surveys, Volume 11, Number 2 (June 1979), pages
121-137
63Inserting a Data Entry into a B Tree
- Find correct leaf L.
- Put data entry onto L.
- If L has enough space, done!
- Else, must split L (into L and a new node L2)
- Redistribute entries evenly, copy up middle key.
- Insert index entry pointing to L2 into parent of
L. - This can happen recursively
- To split index node, redistribute entries evenly,
but push up middle key. (Contrast with leaf
splits.) - Splits grow tree root split increases height.
- Tree growth gets wider or one level taller at
top.
64B-TREE (Cont)
n 4 Internal nodes 2 to 4 pointers Leaf
nodes 2 to 3 values
8
25
Insert 41
10
20
4
7
30
40
41
Insert 47
30
40
41
47
8
25
41
10
20
4
7
30
40
41
47
65B-TREE (Cont)
Insert 50
Insert 52
41
47
50
52
8
25
41
50
41
25
8
50
10
20
4
7
30
40
41
47
50
52
66B-TREE (Cont)
Delete 20
30
underflow
8
41
50
4
7
30
40
41
47
50
52
10
30
41
50
10
4
7
41
47
50
30
40
52
67B-TREE (Cont)
Delete 20
30
underflow
8
41
50
4
7
30
40
41
47
50
52
10
30
41
50
10
4
7
41
47
50
30
40
52
68Multi-Level Indexes
- Such a multi-level index is a form of search tree
however, insertion and deletion of new index
entries is a severe problem because every level
of the index is an ordered file.
69FIGURE 14.8A node in a search tree with pointers
to subtrees below it.
70FIGURE 14.9A search tree of order p 3.
71Dynamic Multilevel Indexes Using B-Trees and
B-Trees
- Because of the insertion and deletion problem,
most multi-level indexes use B-tree or B-tree
data structures, which leave space in each tree
node (disk block) to allow for new index entries - These data structures are variations of search
trees that allow efficient insertion and deletion
of new search values. - In B-Tree and B-Tree data structures, each node
corresponds to a disk block - Each node is kept between half-full and
completely full
72Dynamic Multilevel Indexes Using B-Trees and
B-Trees (contd.)
- An insertion into a node that is not full is
quite efficient if a node is full the insertion
causes a split into two nodes - Splitting may propagate to other tree levels
- A deletion is quite efficient if a node does not
become less than half full - If a deletion causes a node to become less than
half full, it must be merged with neighboring
nodes
73Difference between B-tree and B-tree
- In a B-tree, pointers to data records exist at
all levels of the tree - In a B-tree, all pointers to data records
exists at the leaf-level nodes - A B-tree can have less levels (or higher
capacity of search values) than the corresponding
B-tree
74FIGURE 14.10B-tree structures. (a) A node in a
B-tree with q 1 search values. (b) A B-tree of
order p 3. The values were inserted in the
order 8, 5, 1, 7, 3, 12, 9, 6.
75FIGURE 14.11The nodes of a B-tree. (a) Internal
node of a B-tree with q 1 search values. (b)
Leaf node of a B-tree with q 1 search values
and q 1 data pointers.
76B Tree The Most Widely Used Index
- Insert/delete at log F N cost keep tree
height-balanced. (F fanout, N leaf pages) - Minimum 50 occupancy (except for root).
- Supports equality and range-searches efficiently.
77Example B Tree
- Search begins at root, and key comparisons direct
it to a leaf. - Search for 5, 15, all data entries 24 ...
Root
17
24
30
13
39
3
5
19
20
22
24
27
38
2
7
14
16
29
33
34
- Based on the search for 15, we know it is not
in the tree!
78Inserting 8 into Example B Tree
Entry to be inserted in parent node.
- Observe how minimum occupancy is guaranteed in
both leaf and index pg splits. - Note difference between copy-up and push-up be
sure you understand the reasons for this.
(Note that 5 is
s copied up and
5
continues to appear in the leaf.)
3
5
2
7
8
appears once in the index. Contrast
79Example B Tree After Inserting 8
Root
17
24
30
13
5
2
3
39
19
20
22
24
27
38
7
5
8
14
16
29
33
34
- Notice that root was split, leading to increase
in height.
- In this example, we can avoid split by
re-distributing entries however,
this is usually not done in practice.
80Deleting a Data Entry from a B Tree
- Start at root, find leaf L where entry belongs.
- Remove the entry.
- If L is at least half-full, done!
- If L has only d-1 entries,
- Try to re-distribute, borrowing from sibling
(adjacent node with same parent as L). - If re-distribution fails, merge L and sibling.
- If merge occurred, must delete entry (pointing to
L or sibling) from parent of L. - Merge could propagate to root, decreasing height.
81Example Tree After (Inserting 8, Then) Deleting
19 and 20 ...
Root
17
27
30
13
5
2
3
39
38
7
5
8
22
24
27
29
14
16
33
34
- Deleting 19 is easy.
- Deleting 20 is done with re-distribution. Notice
how middle key is copied up.
82 ... And Then Deleting 24
- Must merge.
- Observe toss of index entry (on right), and
pull down of index entry (below).
30
39
22
27
38
29
33
34
Root
13
5
30
17
3
39
2
7
22
38
5
8
27
33
34
14
16
29