Title: Principles of Database Management Systems 4: Index Structures
1Principles of Database Management Systems4
Index Structures
- Pekka Kilpeläinen
- (after Stanford CS245 slide originals by Hector
Garcia-Molina, Jeff Ullman and Jennifer Widom)
2An Index?
- Data structure for locating records with given
search key efficiently value - Also facilitates a full scan of a relation (or
the extent of an object class) if records not
stored on physically consecutive blocks
record
?
value
3Topics
- Conventional indexes (This lecture)
- based on sequential file order
- B-trees
- Hashing schemes
4Sequential File
Assume only 2 records / block
5Sequential File
Dense Index
Index key for each key value in file
6Sequential File
Sparse Index
Index key for each block of file
7Sparse 2nd level index
Sequential File
8Sparse vs. Dense Tradeoff
- Sparse Less index space per record -gt can keep
more of index in memory - Dense Can tell if any record exists
without accessing file - (Later
- sparse better for insertions
- dense needed for secondary indexes)
9Terms
- Search key ( ? primary key)
- Primary index
- index locations determine locations of data
blocks - Secondary index (See later)
- Dense index (contains all search key values)
- Sparse index
- Multi-level index
10Next
- Duplicate keys
- Deletion/Insertion
- Secondary indexes
11Duplicate keys
12Dense index, one way to implement?
Duplicate keys
10
10
10
10
10
10
20
20
20
20
30
30
30
30
30
30
13Duplicate keys
Dense index, better way?
10
20
30
40
14Duplicate keys
Sparse index, one way?
10
10
20
30
15Duplicate keys
Sparse index, another way?
10
20
30
30
16 Duplicate values, primary index
Summary
- Index may point to first instance of each value
only - File
- Index
a
a
a
. .
b
17Deletion from sparse index
Index update operations
10
30
50
70
90
110
130
150
18Deletion from sparse index
10
30
50
70
90
110
130
150
19Deletion from sparse index
10
30
50
70
90
110
130
150
20Deletion from sparse index
10
30
50
70
90
110
130
150
21Deletion from dense index
10
20
30
40
50
60
70
80
22Deletion from dense index
10
20
30
30
40
40
50
60
70
80
23Insertion, sparse index case
10
30
40
60
24Insertion, sparse index case
10
30
40
60
25Insertion, sparse index case
10
30
40
60
- Illustrated Immediate reorganization
- Variation
- insert new block (chained file)
- update index
26Insertion, sparse index case
10
30
40
60
27Insertion, dense index case
- Similar
- Often more expensive . . .
28Secondary indexes
Sequence field
29Secondary indexes
Sequence field (other than secondary key)
30Secondary indexes
Sequence field
31With secondary indexes
- Lowest level is dense
- Other levels are sparse
32Duplicate values secondary indexes
33Duplicate values secondary indexes
one option...
- Problem
- excess overhead!
- disk space
- search time
34Duplicate values secondary indexes
buckets
(aka postings file)
35Why bucket idea is useful
- Example
- Indexes Records
- Name primary EMP(name,dept,floor,...)
- Dept secondary
- Floor secondary
36Query Get employees in (Toy Dept) AND (2nd
floor)
? Intersect toy bucket and 2nd Floor
bucket to get set of matching EMPs
37This idea used in text information retrieval
...the cat is fat ...
...was raining cats and dogs...
...Fido the dog ...
38IR QUERIES
- Find articles with cat and dog
- Find articles with cat or dog
- Find articles with cat and not dog
- Find articles with cat in title
- Find articles with cat and dog within 5
words
39Common technique more info in inverted
list
position
location
type
d1
Title
5
Author
10
Abstract
57
d2
d3
dog
Title
100
Title
12
40Summary so far
- Conventional index
- Basic Ideas sparse, dense, multi-level
- Duplicate Keys
- Deletion/Insertion
- Secondary indexes
- Buckets of Postings List/File
41Conventional indexes
- Advantage
- - Simple
- - Index is sequential file
- good for scans
Disadvantage - Inserts expensive, and/or -
lose sequentiality balance
42- Example Index (sequential)
- continuous
- free space
10
20
30
40
50
60
70
80
90
43Outline
- Conventional indexes
- B-Trees ? NEXT
- a commonly used index structure that adapts well
to insertions and deletions - Hashing schemes