Principles of Database Management Systems 4: Index Structures - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Principles of Database Management Systems 4: Index Structures

Description:

Title: THJ: Principles of Database Systems Author: Pekka Kilpel inen Description: Based on Stanford University CS 245 notes originally authored by Garcia-Molina ... – PowerPoint PPT presentation

Number of Views:278
Avg rating:3.0/5.0
Slides: 44
Provided by: PekkaK9
Category:

less

Transcript and Presenter's Notes

Title: Principles of Database Management Systems 4: Index Structures


1
Principles of Database Management Systems4
Index Structures
  • Pekka Kilpeläinen
  • (after Stanford CS245 slide originals by Hector
    Garcia-Molina, Jeff Ullman and Jennifer Widom)

2
An Index?
  • Data structure for locating records with given
    search key efficiently value
  • Also facilitates a full scan of a relation (or
    the extent of an object class) if records not
    stored on physically consecutive blocks

record
?
value
3
Topics
  • Conventional indexes (This lecture)
  • based on sequential file order
  • B-trees
  • Hashing schemes

4
Sequential File
Assume only 2 records / block
5
Sequential File
Dense Index
Index key for each key value in file
6
Sequential File
Sparse Index
Index key for each block of file
7
Sparse 2nd level index
Sequential File
8
Sparse vs. Dense Tradeoff
  • Sparse Less index space per record -gt can keep
    more of index in memory
  • Dense Can tell if any record exists
    without accessing file
  • (Later
  • sparse better for insertions
  • dense needed for secondary indexes)

9
Terms
  • Search key ( ? primary key)
  • Primary index
  • index locations determine locations of data
    blocks
  • Secondary index (See later)
  • Dense index (contains all search key values)
  • Sparse index
  • Multi-level index

10
Next
  • Duplicate keys
  • Deletion/Insertion
  • Secondary indexes

11
Duplicate keys
12
Dense index, one way to implement?
Duplicate keys
10
10
10
10
10
10
20
20
20
20
30
30
30
30
30
30
13
Duplicate keys
Dense index, better way?
10
20
30
40
14
Duplicate keys
Sparse index, one way?
10
10
20
30
15
Duplicate keys
Sparse index, another way?
  • first new key from block

10
20
30
30
16
Duplicate values, primary index
Summary
  • Index may point to first instance of each value
    only
  • File
  • Index

a
a
a
. .
b
17
Deletion from sparse index
Index update operations
10
30
50
70
90

110
130
150
18
Deletion from sparse index
  • delete record 40

10
30
50
70
90

110
130
150
19
Deletion from sparse index
  • delete record 30

10
30
50
70
90

110
130
150
20
Deletion from sparse index
  • delete records 30 40

10
30
50
70
90

110
130
150
21
Deletion from dense index
10
20
30
40
50

60
70
80
22
Deletion from dense index
  • delete record 30

10
20
30
30
40
40
50

60
70
80
23
Insertion, sparse index case

10
30
40
60
24
Insertion, sparse index case
  • insert record 34


10
30
40
60
25
Insertion, sparse index case
  • insert record 15


10
30
40
60
  • Illustrated Immediate reorganization
  • Variation
  • insert new block (chained file)
  • update index

26
Insertion, sparse index case
  • insert record 25


10
30
40
60
27
Insertion, dense index case
  • Similar
  • Often more expensive . . .

28
Secondary indexes
Sequence field
29
Secondary indexes
Sequence field (other than secondary key)
  • Sparse index

30
Secondary indexes
Sequence field
  • Dense index

31
With secondary indexes
  • Lowest level is dense
  • Other levels are sparse

32
Duplicate values secondary indexes
33
Duplicate values secondary indexes
one option...
  • Problem
  • excess overhead!
  • disk space
  • search time

34
Duplicate values secondary indexes
buckets
(aka postings file)
35
Why bucket idea is useful
  • Example
  • Indexes Records
  • Name primary EMP(name,dept,floor,...)
  • Dept secondary
  • Floor secondary

36
Query Get employees in (Toy Dept) AND (2nd
floor)
? Intersect toy bucket and 2nd Floor
bucket to get set of matching EMPs
37
This idea used in text information retrieval
  • Documents

...the cat is fat ...
...was raining cats and dogs...
...Fido the dog ...
38
IR QUERIES
  • Find articles with cat and dog
  • Find articles with cat or dog
  • Find articles with cat and not dog
  • Find articles with cat in title
  • Find articles with cat and dog within 5
    words

39
Common technique more info in inverted
list
position
location
type
d1
  • cat

Title
5
Author
10
Abstract
57
d2
d3
dog
Title
100
Title
12
40
Summary so far
  • Conventional index
  • Basic Ideas sparse, dense, multi-level
  • Duplicate Keys
  • Deletion/Insertion
  • Secondary indexes
  • Buckets of Postings List/File

41
Conventional indexes
  • Advantage
  • - Simple
  • - Index is sequential file
  • good for scans

Disadvantage - Inserts expensive, and/or -
lose sequentiality balance
42
  • Example Index (sequential)
  • continuous
  • free space

10
20
30
40
50
60
70
80
90
43
Outline
  • Conventional indexes
  • B-Trees ? NEXT
  • a commonly used index structure that adapts well
    to insertions and deletions
  • Hashing schemes
Write a Comment
User Comments (0)
About PowerShow.com