LEARNING OBJECTIVES - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

LEARNING OBJECTIVES

Description:

LEARNING OBJECTIVES Index files. Operations Required to Maintain an Index File. Primary keys. Secondary keys. – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 34

Provided by: dhu65

Category:

more less

Transcript and Presenter's Notes

Title: LEARNING OBJECTIVES

1
LEARNING OBJECTIVES

Index files.
Operations Required to Maintain an Index File.
Primary keys.
Secondary keys.

2
Index

Index is a tool for finding records in a file. It
consists of a key field on which the index is
searched and a reference (address or RRN) field
that tells where to find the data file record
associated with a particular key.

3
Examples of an Index

The index to a book (usually at the end of the
book) provides a way to find a topic quickly.
Imagine a book without an index?
The index in a library (an on-line catalog)
allows you to locate items by an author, by a
title, or by a call number.

4
Index in Databases -example

Musical recording store uses an index file to
keep track of its inventory.
The data file consists of the following fields in
each record
Id number
Title
Composer or composers
Artist or artists
Label (publisher)

5
recording.h

class Recording // a recording with a composite
key
public
Recording ()
Recording (char label, char idNum, char
title, char composer, char artist)
char IdNum7
char Title 30
char Composer30
char Artist30
char Label7 char Key () const
Unpack (IOBuffer ) int Pack (IOBuffer )
const
void Print (ostream , char label 0) const

6
Primary key -example

The primary key in our example consists of the
initials for the company label combined with the
product ID. The canonical form of this key will
consist of the uppercase form of the Label field
followed by the ASCII representation of the ID
number.
E.G. DG241

7
Index file

Index file is used to provide rapid keyed access
to individual records in the data file.
Index file consists of the following fields
key (e.g. ANG3795)
reference (address) address of the corresponding
record in the data file

8
Operations Required to Maintain an Indexed File

Create the original empty index file and data
file
Load index file into memory before using it (if
possible, load the whole file)
Rewrite the index file from memory to the
permanent storage after modifying it
Add data records to the data file
Delete data records from the data file
Update records in the data file
Update the index to reflect changes in the data
file

9
Creating Files

Create two empty files
index file and
data record file

10
Loading Index into Main Memory

This can be supported with a buffer I/O or with
an array.

11
Rewriting the Index File from Memory

This can be supported as a part of the close
operation for the index file (I.e write the
buffer or the array to the disk).

12
Dangers of losing the index file

If the index file is
outdated
corrupted or
lost
then there must be some means of
reconstructing the index file from the data file!

13
Record addition

Adding a new data record to the data file
requires that we add a new record to the index
file too.
Since the index file is usually kept sorted than
adding a new record would require rearranging the
records in this file. (This should be easy done
if the index is kept in main memory).

14
Record deletion

Deleting a data record requires deletion of the
corresponding index record.
Note that in an index file organization all data
records are pinned. (WHY?)
What are the consequences of this fact?

15
Record Updating

There are two categories of updates
the update modifies the value of the key
the update does not modify the value of the key
If the update modifies (changes) the primary key,
then re-ordering of the index file might be
required.
If the update does not change the primary key it
might still require reordering of records in the
data file. (WHY?)

16
Indexes that are too large to hold in Memory

If the index file is too large to be kept in main
memory then it has to be kept on the secondary
storage. There are a number of disadvantages of
keeping an index file on the disk
searching the index file can be very time
consuming
index rearrangement can be time consuming too.

17
Possible alternatives to storing index files

If the index file is too large to be kept in main
memory than the following alternative
organizations should be considered
a hashed organization (if access speed is very
important)
a tree structured organization, or a multilevel
index such as a B-tree

18
Pros of a simple index file

Even if a simple index file has to be stored on
the disk, in some cases it might prove a useful
method of data storing.
Advantages of the simple index file
allows for use of binary search to obtain a
key-access to the record
if index entries are much smaller than data
records then sorting and maintaining an index is
much easier than the data file
if the data records are pinned than the index
file allows for rearranging the keys without
moving the data records

19
Indexing with Multiple Key Access

Since the primary key is unique then it is often
used as a search keyword.
Example of the primary key of the class recording
is Label Id (e.g. ANG3795). But most of the time
when one searches for a music CD one would rather
provide a title, a composer, or an artist.

20
Secondary key

Secondary key is a key for which multiple records
may exist in the data file.
Example
The composers name in the Recording class
example (there can be a number of CDs with
Beethovens work in a store).
The artist name in the Recording class.

21
Secondary Index File

A secondary index file might be created for each
of the possible secondary indexes.
Each entry in the secondary index file should
consists of the following two fields
secondary index field (e.g. Beethoven)
the corresponding primary index key (e.g. ANG3795)

22
Record Addition

Adding a record to the data file implies adding a
record to the secondary index file.
Costs of that are similar to the cost of adding a
record in the primary index file. (e.g. records
might have to be shifted)

23
Record Deletion

Deleting a record implies removing all references
to that record in the file system.
After the search on the secondary key, we perform
a search on the primary key of the record to be
deleted and and remove it from the secondary
index file.

24
Record Updating

There are three possible situations
The update changes secondary key (if the
secondary key is changed, we may have to
rearrange the secondary key index so it stays in
sorted order)
The update changes the primary key (it has a big
impact on the primary key index but in the
secondary key index we only need to update the
affected primary key field)

25
Record Updating

Update is confined to other fields all updates
that do not affect either the primary or
secondary key fields do not affect the secondary
key index, even if the update is substantial.

26
Retrieving Data with Multiple Secondary Keys

Example If we want to find all CDs in a music
store that have Beethovens Symphony No. 9 then
we should search data files by using the
following secondary keys
composer AND title.
Both of those searches should produce a list of
CDs by providing their primary keys.

27
Boolean AND in searches

EG. The search by composer could produce the
following list of CDs (ANG3795, DG139201,
DG18807, RCA2626) and the search by title could
produce the following list of CDs (ANG3795,
COL31809, DG18807)
The CDs that we are interested in will have to
belong to both of the above lists. (In other
words we are taking an intersection of two sets)
WHY?

28
Boolean OR searches

If we want to find all CDs by Beethoven and
Chopin then we will use OR operation in our
secondary key searches.
To obtain the list of CDs that we are interested
we would have to combine the outcomes of both
searches (or use a union of two sets) WHY?

29
Cons of the Current Secondary Index Structure

Index file has to be rearranged every time a new
record is added to the file.
If there are duplicate secondary keys, the
secondary key field is repeated for each entry.

30
Improvements to the secondary index key structure

Solution 1
Allow for multiple primary keys to be associated
with a single secondary key by allocating an
array of primary keys for each secondary key
entry.
Solves the problem of sorting each time when an
new entry is added.
Suffers from internal fragmentation (WHY?), and
the number of allocated entries in the array may
prove too small.

31
Improvements to the secondary index key structure

Solution 2
Create an inverted list of indexes. Have each
secondary key point to a list of primary key
references associated with it.
This method eliminates most of the problems
associated with maintaining a secondary index
file. WHY?

32
Selective Index

A selective index contains keys for only a
portion of the records in the data file. Such an
index provides the user with a view of a specific
subset of the files records. (E.G. all CDs of
Beethovens work produced in 1998)

33
Binding

Binding takes place when a key is associated with
a particular physical record in the data file.
This can take place either during the preparation
of the data file and indexes or later on during
program execution.

Write a Comment

User Comments (0)