Title: Indexing
1Indexing
- Indexing is a combination of methods for speeding
up the access to the data in a database - The speed is determined by two factors
- where the data is stored on disk
- available access paths
- The primary access method to any table is a table
scan, reading a table by finding all tuples one
by one. - Indexing creates multiple access paths to the
data, each of which is called a secondary access
method. - indexing speeds up access to a table for a
specific set of attributes
2Indexing
- Example ITEMS(itemid, description, price, city)
- to find all items in Dallas, read all tuples
from secondary storage and check if cityDallas
for each one (scan) - may read many extra tuples that are not part of
the answer - suppose instead we create index itemcity for the
city attribute of the items relation - itemcity, Dallast1,t5,t10,
Bostont2,t3,t15, - to find all items for Dallas, we can now find
the index entry for Dallas and get the ids of
just the tuples we want - many fewer tuples are read from secondary storage
3Disk terminology
rotation
Platters (2 platters 4 read/write surfaces
Track Can read four tracks from four surfaces at
the same time (all tracks at the same radius are
called a cylinder)
Read/write heads one for each surface
Disk arm controller
4Disk terminology
- Reading data from a disk involves
- seek time -- move heads to the correct
track/cylinder - rotational latency -- wait until the required
data spins under the read/write heads (average is
about half the rotation time) - transfer time -- transfer the required data
(read/write data to/from a buffer) - Tracks contain the same amount of information
even though they have different circumferences
A block or a page is the smallest unit of data
transfer for a disk. Read a block / write a block
track
Block/page
5Disk Space - final frontier (1999)
- Disk is much cheaper than memory (1GB 15 - 20)
- A fast disk today
- 40 - 10 GB per disk
- 4.9 ms average read seek time
- 2.99 ms average latency
- 10,000 rpm
- 280 - 452 Mbits/sec transfer rate
- 7.04 bits per square inch density
- 12 - 3 heads, 6 - 2 disk platters
- Disk is non-volatile storage that
- survives power failures
6Disk space - the next generation (2002)
- Ultrastar 146Z10 by IBM (up to 146GB)
- 825 Mbs transfer rate
- 15 recoding zones, 12-2 data heads, 6-1 data
disks - Max areal density 26,263 Mbits/sq inch
- 4.7 ms seek time
- Average latency 3pm
- 10,000 rpm
7Reading from disk
- Reading data from disk is extremely slow
(compared to reading from memory) - To read a page from disk
- find the page on disk (seeklatency times)
- transfer the data to memory/buffer (total bytes
transfer rate) - Assume the average page size is 4KB. To retrieve
a single row/tuple, we need to load the page that
contains it - assume 300 Mbits/sec transfer rate, to read a
page 4KB32764bits0.03Mbits hence we take
1/10000 of a second - 4.9 ms (seek) 2.99 ms (latency) 0.1 ms
(transfer time) 7.99 ms (seek and latency times
dominate!)
8Reading from disk
- Assume the database saves a number of memory
slots (each holding exactly one page), which are
called buffers - To read / modify / write tuple t
- DISK (read it from disk, write it to buffer)
- DB (read it from buffer, modify)
- DISK (write it to disk, free the buffer space)
Buffer slots This buffer can hold 4 pages at any
time
9Tablespaces
- Age old wisdom if you store a set of pages in
contiguous pages / blocks on disk, then the
transfer time will improve greatly (reduce seek
and latency times) - A tablespace is an allocation of space in
secondary storage - when creating a tablespace, a DBMS requests
contiguous blocks of disk space from the OS - the tablespace appears as a single file to the OS
- the management of physical addresses in a
tablespace is performed by the DBMS - a DBMS can have many tablespaces
- when a table is created, it is placed in a
tablespace, or partitioned between multiple
tablespaces
10Tablespaces
- CREATE TABLESPACE tspace1
- DATAFILE diskafile1.dat SIZE 20M,
- DATAFILE diskafile2.dat SIZE 40M
- CREATE TABLE temp1 (
-
- TABLESPACE file1
- STORAGE (initial 6144, next 6144,
- minextents 1, maxextents 5)
- )
- CREATE TABLE temp2 (
-
- TABLESPACE file2
- STORAGE (initial 12144, next 6144,
- minextents 1, maxextents 5)
- )
tspace1
file1
file2
temp1
temp2
Actual data
11Tablespaces
- Create table -- assign the tuples in the table
to a file in a tablespace - when a table is created, a chunk of space is
assigned to this table, the size of this chunk is
given by the INITIALEXTENT - when the initial extent becomes full, a new chunk
is allocated, the size of all next chunks is
given by the NEXTEXTENT - can also specify
- maxextents, minextents
- pctincrease (increase the size of extents at each
step) - pctfree (how much of the extent must be left free
12Data Storage on pages
- Layout of a single disk page (assume fixed size
rows) - To find a specific row in a page, must know
- page number (or block number) BBBBBBBB
- offset (slot number of record within the page)
SSSS - file name (which datafile/tablespace) FFFF
- ROWID is then a unique number BBBBBBBB.SSSS.FFFF
for a row - B,S,F are hexadecimal numbers
...
Header info
row directory
Free space
Data rows
Row N
Row N-1
Row 1
1
2
N
...
13Pseudocolumns
- Since each tuple has a unique rowid, we can refer
to the tuples with their rowid field
- However, rowid may change if the tuple is stored
at a different location (the valueof its primary
key is a better identifier)
14Storage of tuples
- The order of tuples in disk may be
- Arbitrary, the order of insertion of tuples. This
is called a HEAP file. - SORTED with respect to a column.
- A sorted file can be searched using binary search
technique - To preserve the sorted order, a number of empty
slots can be preserved for each disk page (based
on some fill factor), and overflow pages can be
used for cases when a specific page is too full
15Indexing Concepts
- Indexing speeds up access to data residing on
disk - disk access is much slower than main memory
access, by orders of magnitude - goal minimize the number of disk accesses
- Primary access methods rely on the physical
location of data as stored in a relation - finding all tuples with value x requires
reading the entire relation (heap) or binary
search (sorted) - Secondary access methods use a directory to
enable tuples to be found more quickly based on
the value of one or more attributes (keys) in a
tuple
16Secondary Index
- To create a simple index on column A of table T,
make a list of pairs of the form - (attribute A value, tuple rowid)
- for each tuple in T
- example secondary index for the SSN attribute
- SSN ROWIDs (RID)
- 111-11-1111 AAAAqYAABAAAEPvAAH
- 222-22-2222 AAAAqYAABAAAEPvAAD
- 333-33-3333 AAAAqYAABAAAEPvAAG
- . . . .
- This index is large and stored on the disk
17Secondary Index
- Suppose a disk page can contain 200 index entries
from a secondary index - To store a secondary index for a relation with 1
million tuples assuming no duplicate values
requires - 1,000,000 / 200 5,000 disk pages
- To find a particular Person tuple in the SSN
index given his or her SSN, you must on average
scan half of the index (5,000 / 2 2500 disk
accesses) - If 20 tuples of the Person relation fit on a
page, then sequential scan of the relation itself
needs to read on average half the relation
(50,000 / 2 25,000 disk accesses) - In this case, the secondary index helps a lot
18Efficiency
- Need to organize the index information in a way
that makes it efficient to access and search - scanning the index from the beginning is not good
enough - Sorting the secondary index helps, but is not
sufficient - Solution 1 build a tree index
- Solution 2 hash the index
19Tree Indices
- Want to minimize number of disk accesses.
- each tree node requires a disk access
- therefore, trees that are broad and shallow are
preferred over trees that are narrow and deep - Balanced binary search tree, AVL tree, etc. that
are useful in main memory are too narrow and deep
for secondary storage. - Need an m-way tree where m is large.
- also need a tree that is balanced
20B-Tree
- A B -Tree of order d is a tree in which
- each node has between d and 2d key values
- the keys values within a node are ordered
- each key in a node has a pointer immediately
before and after it - leaf nodes pointer following a key is pointer to
record with that key - interior nodes pointers point to other nodes in
the tree - the length of the path from root to leaf is the
same for every leaf (balanced) - the root may have fewer keys and pointers
21Example B -Tree
53
11 30
66 78
66 69 71 76
2 7
11 15 22
30 41
53 54 63
78 84 93
B-Tree of order 2 each node can hold up to four
keys
22Searching in B-Trees
- Search(T, K) / searching for tuple with key
value K in tree T / -
- if T is a non-leaf node
- search for leftmost key K in node T such
that K lt K - if such a K exists
- ptr pointer in T immediately before K
- return the result of Search(ptr, K)
- if no such K exists
- ptr rightmost pointer in node T
- return the result of Search(ptr, K)
- else if T is a leaf node
- search for K in T
- if found, return the pointer following K
- else return NULL / K not in tree T /
23Insert Algorithm
- To insert a new tuple with key K and address
rowid - use a modified Search algorithm to look for the
leaf node into which key K should be inserted - insert key K followed by address rowid into the
proper place in this leaf node to maintain order
and rebalance the tree if necessary - Rebalancing the tree
- if the leaf node has room for K and rowid, then
no rebalancing is needed - if the leaf node has no room for K and rowid,
then it is necessary to create a new node and
rebalance the tree
24Rebalancing Algorithm
- Assume that K and rowid are to be inserted into
leaf node L, but L has no more room. - create a new empty node
- put K and rowid in their proper place among the
entries in L to maintain the key sequence order
-- there are 2d1 keys in this sequence - leave the first d keys with their rowids in node
L and move the final d1 keys with their rowids
to the new node - copy the middle key K from the original sequence
into the parent node followed by a pointer to the
new node - put them immediately after the pointer to node L
in the parent node - apply this algorithm recursively up the tree as
needed
25Insert Example
Insert record with key 57 BTree of order 2
26Another Insert Example
Insert record with key 65 BTree of order 2
27Insertion Algorithm (1)
Insert (T, K, rowid, child) / insert new tuple
with key K and address rowid into tree T / /
child is NULL initially / / handle an
interior node of the B-Tree / if T is a
non-leaf node find j such that Kj ? K ? Kj1
for keys K1, , Kn in T ptr pointer between
Kj and Kj1 Insert (ptr, K, rowid, child) if
child is NULL then return / must insert key
and child pointer into T / if T has space for
another key and pointer put child.key and
child.ptr into T at proper place child
NULL return
28Insert Algorithm (2)
else / must split node T / construct
sequence of keys and pointers from T
with child.key and child.ptr inserted at
proper place first d keys and d1 pointers
from sequence stay in T last d keys and d1
pointers from sequence move to a new node
N child.key middle key from
sequence child.ptr pointer to N if T is
root create new node containing pointer to T,
child.key, and child.ptr make this node
the new root node of the B-Tree return If T
is a leaf node / handle leaf node of the B-Tree
/ if T has space for another key and
rowid put K and rowid into T at proper
place return
29Insert Algorithm (3)
else / must split leaf node T / construct
sequence of keys and pointers from T with K
and rowid inserted at proper place first d
keys and d1 pointers from sequence stay in
T last d1 keys and d2 pointers from sequence
move to a new node N child.key first
key in new node N child.ptr pointer to
N if T was root create new node containing
pointer to T, child.key, and
child.ptr make this node the new root node of
the B-Tree return
30Deletion
- Assume that a tuple with key K and address rowid
is to be deleted from leaf node L. There is a
problem if after removing K and rowid from L it
has fewer than d keys remaining. To fix this - if a neighbor node has at least d1 keys, then
evenly redistribute the keys and rowids with the
neighbor node and adjust the separator key in the
parent node - otherwise, combine node L with a neighbor node
and discard the empty node - the parent node now needs one less key and node
pointer, so recursively apply this algorithm up
the tree until all nodes have enough keys and
pointers
31Deletion Example
Redistribute between the second and third leaf
nodes.
53
11 30
66 78
30
22
66 69 71 76
2 7
53 54 63
78 84 93
30 41
11 15 22
41
Delete key 30
B-Tree of order 2
32Another Deletion Example
Cannot redistribute, so combine the left two leaf
nodes
53
11 30
66 78
66 69 71 76
2 7
11 15
30 41
53 54 63
78 84 93
2 11 15
B-Tree of order 2 Delete 7 from the B-Tree
33Another Deletion Example
53
30
66 78
66 69 71 76
11 15
30 41
53 54 63
78 84 93
2 11 15
B-Tree of order 2 Delete 7 from the B-Tree
34Deletion Algorithm (1)
Delete (Parent, T, K, oldchild) / delete key K
from Tree T / / Parent is parent node for T,
initially NULL / / oldchild is discarded child
node, initially NULL / / handle an interior
node of the B-Tree / if T is a non-leaf
node find j such that Kj ? K ? Kj1 for keys
K1, , Kn in T ptr pointer between Kj and
Kj1 Delete (T, ptr, K, oldchild) if oldchild
is NULL then return / must handle discarded
child node of T / remove oldchild and adjacent
key from T if T still has enough keys and
pointers oldchild NULL return
35Deletion Algorithm (2)
/ must fix node T / get a sibling node S of T
using Parent if S has entry keys / redistribute
S and T / redistribute keys adjacent
pointers evenly between S T K middle
unused key from the redistribution replace the
key in Parent between the pointers to S and T
with K oldchild NULL return else / merge
S and T / R S or T, whichever is to the
right of the other oldchild R copy key from
Parent node that is immediately before R to
the end of the node on the left move all keys
adjacent pointers from R to node to the
left discard node R return
36Deletion Algorithm (3)
/ handle leaf node of the B-Tree / if T is a
leaf node if T has extra keys remove key K
from T oldchild NULL return / must fix
node T / get a sibling node S of T using
Parent if S has entry keys / redistribute S and
T / redistribute keys and adjacent pointers
evenly between S T K first key from node S
or T, whichever is to the right of the
other replace the key in Parent between the
pointers to S and T with K oldchild
NULL return
37Deletion Algorithm (4)
else / merge S and T / R S or T,
whichever is to the right of the
other oldchild R move all keys and
adjacent rowids from R to the node on the
left discard node R return
38Analysis of B-Trees
- Every access to a node is an access to disk and
hence is expensive. - Analysis of Find
- if there are n tuples in the tree, the height of
the tree, h, is bounded by h ceil(logd (n)) - example d 50, tree contains 1 million records,
then h 4 - Analysis of Insert and Delete
- finding the relevant node required h accesses
- rebalancing required O(h) accesses
- therefore, the total is O(logd n) accesses
39B-tree
- The create index command creates a B-tree index
- CREATE INDEX age_idx ON people(age)
- TABLESPACE file1
- PCTFREE 70
- PCTFREE defines how full each node should be
- Optimal operation is usually with nodes about 70
full - To reduce disk accesses for sequential
processing, pointers are added to the leaf nodes
that point to the previous and next leaf nodes
40A B-Tree Example
- Givens
- disk page has capacity of 4K bytes
- each rowid takes 6 bytes and each key value takes
2 bytes - each node is 70 full
- need to store 1 million tuples
- Leaf node capacity
- each (key value, rowid) pair takes 8 bytes
- disk page capacity is 4K, so (41024)/8 512
(key value, rowid) pairs per leaf page - in reality there are extra headers and pointers
that we will ignore - Hence, the degree for the tree is about 256
41Example Continued
- If all pages are 70 full, each page has about
- 5120.7 359 entries
- To store 1 million tuples, requires
- 1,000,000 / 359 2786 pages at the leaf level
- 2789 / 359 8 pages at next level up
- 1 root page pointing to those 8 pages
- Hence, we have a B-tree with 3 levels, and a
total of 278681 2795 disk pages
42Duplicate Key Values
- Duplicate key values in a B-tree can be handled.
- (key, rowid) pairs for same key value can span
multiple index nodes - Search algorithm needs to be changed
- find leftmost entry at the leaf level for the
searched item, then scan the index from left to
right following leaf level pointers - The insertion and deletion algorithms also
require small changes - they are more costly and hence not always
implemented in practice
43Bitmap Index
- For some attribute x with possible values A,B and
C - create a list of all tuples in the relation and
store their rowids at some known location - build an index for each value, for example for
value A - the bitmap contains a 1 at location k if tuple
k has value A for this attribute - otherwise it contains a 0
- indices with a lot of 0s are called sparse and
can be compressed
44Bitmap Example
1 0 0 1 0 . . .
Bitmap for A10
Bitmap for A15
0 1 1 0 1 . . .
. . .
Tuple List
. . .
Tuples
45Querying with Bitmap Index
- Suppose have bitmap indices on attributes x and y
- Find if xA or xB, take the bitmaps for
both values and do a logical or - Find if xA and yltgtB, compute the logical
inverse of bitmap for yB and then do a logical
and with bitmap for xA - Bitmaps depend on the actual row ids of tuples
- If a tuple is deleted, its location can be
removed or swapped by another tuple (costly if
the index is compressed) - Too many updates or attributes with too many
values lead to bitmaps that are not cost effective
46B-tree index on attributes A1,,Ak
Primary access methods Heap tuples are placed
in the order they are inserted Cluster tuples
with the same values for attributes A1,,Ak are
placed close to each other on disk Hash tuples
with the same hash value are placed close to each
other on disk
Row directory
Tuple 11, Tuple 12, , Tuple 20
Secondary access methods The primary
access method can be anything. Additional indexes
are created with entries that point to actual
tuples
Row directory
Tuple 1, Tuple 2, , Tuple 10
47Clusters
- A cluster is a primary access method, it changes
the placement of tuples on disk - CREATE CLUSTER personnel
- (department_number integer)
- SIZE 512
- STORAGE (INITIAL 100K NEXT 50K)
- In ORACLE, a cluster can be generated for many
tables containing the same set of attributes - All tuples in different tables from the same
cluster will be placed closed to each other on
disk (i.e. on the same page and on consecutive
pages)
48Adding tables to a cluster
- CREATE CLUSTER movies.actorcluster
- (actorid INTEGER)
- CREATE INDEX actorcluster_idx
- ON CLUSTER actorcluster
- CREATE TABLE movies.actors (
- actorid INTEGER NOT NULL,
- stagename VARCHAR2(50) NOT NULL,
- datesofwork VARCHAR2(10),
- birthname VARCHAR2(20),
- firstname VARCHAR2(20),
- CONSTRAINT actor_pidx PRIMARY KEY (actorid)
- ) CLUSTER actorcluster(actorid)
- CREATE TABLE movies.casts (
- filmid INTEGER NOT NULL,
- actorid INTEGER NOT NULL,
- roletype VARCHAR2(30) NOT NULL,
- FOREIGN KEY (filmid)
- REFERENCES movies.films(filmid)
- ON DELETE CASCADE,
- FOREIGN KEY (actorid)
- REFERENCES movies.actors(actorid)
- ON DELETE CASCADE,
- CONSTRAINT cast_pidx
- PRIMARY KEY (filmid, actorid, role)
- ) CLUSTER actorcluster(actorid)
49Clusters
- Each table may belong to at most one cluster.
- Suppose we retrieve an employee tuple with
deptno10. We find a page with this employee and
read it into memory. - If there are 20 employees in the department 10,
then chances are that all these employees are on
the same page. - To find all employees in department 10 through
20, we can simply read the necessary pages. - A cluster is not an index, but we can also create
a B-tree index on a cluster - CREATE INDEX idx_personnel ON CLUSTER personnel
50Hashing
- Hashing is another index method that changes the
way tuples are placed on disk - A hash index on attribute A allocates an initial
set of pages to store the relation
1
Hash function h ranges between 1 and n
New tuple T with key A
2
3
. .
If multiple tuples map to the same location/page,
this is called a collision. These tuples are
placed in an overflow page.
h(T. A)
n
51Hashing
- The number of key values is given by HASHKEYS
- Hashing is useful for finding a tuple with a
given key value - Hashing is not as useful for ranges or key values
or for sequential processing of tuples in key
order - In the best case, a tuple is found with one disk
access - In the average case, expect 1.2 disk accesses or
more (because of overflow pages)
52Extensible Hashing
- Assume that we originally allocate 2n pages for
the hash - Distribute tuples according to hash function mod
2 - hash the key to produce a bit string and then use
the least significant bit - If a disk page becomes full, double the directory
size instead of creating overflow buckets
Page 0 Page 1 Hash directory
0 1
tuples
53Extensible Hashing
- Insert into a full Page 1 double the directory
size
The full page is split into two. Its contents
are rehashed between the original page and the
new page, using one additional bit from the
string produced by the hash function
Page 0 Page 1 Page 2 Page 3 Hash directory
00 01
10 11
New Page
Just created a new directory entry for Page 3.
Since Page 1 is not full, this directory entry
points to Page 1. The contents of Page 1 will
be rehashed with Page 3 when Page 1 becomes
full rather than creating a new page.
54Extensible Hashing
- As the hash directory size grows it must be
stored on the disk - At most two disk accesses are needed to retrieve
any tuple - this is a better upper bound than for B-Trees
- However, extensible hashing is not as good as
B-Trees for range queries and sequential
processing where you want to process all the
tuples of a relation - Consequently, B-Trees are used more frequently
than Extensible Hashing