Principles of Database Management Systems 4: Index Structures

About This Presentation

Title:

Principles of Database Management Systems 4: Index Structures

Description:

Title: THJ: Principles of Database Systems Author: Pekka Kilpel inen Description: Based on Stanford University CS 245 notes originally authored by Garcia-Molina ... – PowerPoint PPT presentation

Number of Views:284

Avg rating:3.0/5.0

Slides: 44

Provided by: PekkaK9

Category:

more less

Transcript and Presenter's Notes

Title: Principles of Database Management Systems 4: Index Structures

1
Principles of Database Management Systems4
Index Structures

Pekka Kilpeläinen
(after Stanford CS245 slide originals by Hector
Garcia-Molina, Jeff Ullman and Jennifer Widom)

2
An Index?

Data structure for locating records with given
search key efficiently value
Also facilitates a full scan of a relation (or
the extent of an object class) if records not
stored on physically consecutive blocks

record
?
value
3
Topics

Conventional indexes (This lecture)
based on sequential file order
B-trees
Hashing schemes

4
Sequential File
Assume only 2 records / block
5
Sequential File
Dense Index
Index key for each key value in file
6
Sequential File
Sparse Index
Index key for each block of file
7
Sparse 2nd level index
Sequential File
8
Sparse vs. Dense Tradeoff

Sparse Less index space per record -gt can keep
more of index in memory
Dense Can tell if any record exists
without accessing file
(Later
sparse better for insertions
dense needed for secondary indexes)

9
Terms

Search key ( ? primary key)
Primary index
index locations determine locations of data
blocks
Secondary index (See later)
Dense index (contains all search key values)
Sparse index
Multi-level index

10
Next

Duplicate keys
Deletion/Insertion
Secondary indexes

11
Duplicate keys
12
Dense index, one way to implement?
Duplicate keys
10
10
10
10
10
10
20
20
20
20
30
30
30
30
30
30
13
Duplicate keys
Dense index, better way?
10
20
30
40
14
Duplicate keys
Sparse index, one way?
10
10
20
30
15
Duplicate keys
Sparse index, another way?

first new key from block

10
20
30
30
16
Duplicate values, primary index
Summary

Index may point to first instance of each value
only
File
Index

a
a
a
. .
b
17
Deletion from sparse index
Index update operations
10
30
50
70
90

110
130
150
18
Deletion from sparse index

delete record 40

10
30
50
70
90

110
130
150
19
Deletion from sparse index

delete record 30

10
30
50
70
90

110
130
150
20
Deletion from sparse index

delete records 30 40

10
30
50
70
90

110
130
150
21
Deletion from dense index
10
20
30
40
50

60
70
80
22
Deletion from dense index

delete record 30

10
20
30
30
40
40
50

60
70
80
23
Insertion, sparse index case

10
30
40
60
24
Insertion, sparse index case

insert record 34

10
30
40
60
25
Insertion, sparse index case

insert record 15

10
30
40
60

Illustrated Immediate reorganization
Variation
insert new block (chained file)
update index

26
Insertion, sparse index case

insert record 25

10
30
40
60
27
Insertion, dense index case

Similar
Often more expensive . . .

28
Secondary indexes
Sequence field
29
Secondary indexes
Sequence field (other than secondary key)

Sparse index

30
Secondary indexes
Sequence field

Dense index

31
With secondary indexes

Lowest level is dense
Other levels are sparse

32
Duplicate values secondary indexes
33
Duplicate values secondary indexes
one option...

Problem
excess overhead!
disk space
search time

34
Duplicate values secondary indexes
buckets
(aka postings file)
35
Why bucket idea is useful

Example
Indexes Records
Name primary EMP(name,dept,floor,...)
Dept secondary
Floor secondary

36
Query Get employees in (Toy Dept) AND (2nd
floor)
? Intersect toy bucket and 2nd Floor
bucket to get set of matching EMPs
37
This idea used in text information retrieval

Documents

...the cat is fat ...
...was raining cats and dogs...
...Fido the dog ...
38
IR QUERIES

Find articles with cat and dog
Find articles with cat or dog
Find articles with cat and not dog

Find articles with cat in title
Find articles with cat and dog within 5
words

39
Common technique more info in inverted
list
position
location
type
d1

Title
5
Author
10
Abstract
57
d2
d3
dog
Title
100
Title
12
40
Summary so far

Conventional index
Basic Ideas sparse, dense, multi-level
Duplicate Keys
Deletion/Insertion
Secondary indexes
Buckets of Postings List/File

41
Conventional indexes

Advantage
- Simple
- Index is sequential file
good for scans

Disadvantage - Inserts expensive, and/or -
lose sequentiality balance
42

Example Index (sequential)
continuous
free space

10
20
30
40
50
60
70
80
90
43
Outline

Conventional indexes
B-Trees ? NEXT
a commonly used index structure that adapts well
to insertions and deletions
Hashing schemes

Write a Comment

User Comments (0)

About PowerShow.com

Principles of Database Management Systems 4: Index Structures - PowerPoint PPT Presentation

Principles of Database Management Systems 4: Index Structures

Title: THJ: Principles of Database Systems Author: Pekka Kilpel inen Description: Based on Stanford University CS 245 notes originally authored by Garcia-Molina ... – PowerPoint PPT presentation