Title: LEARNING OBJECTIVES
1LEARNING OBJECTIVES
- Index files.
- Operations Required to Maintain an Index File.
- Primary keys.
- Secondary keys.
2Index
- Index is a tool for finding records in a file. It
consists of a key field on which the index is
searched and a reference (address or RRN) field
that tells where to find the data file record
associated with a particular key.
3Examples of an Index
- The index to a book (usually at the end of the
book) provides a way to find a topic quickly.
Imagine a book without an index? - The index in a library (an on-line catalog)
allows you to locate items by an author, by a
title, or by a call number.
4Index in Databases -example
- Musical recording store uses an index file to
keep track of its inventory. - The data file consists of the following fields in
each record - Id number
- Title
- Composer or composers
- Artist or artists
- Label (publisher)
5recording.h
- class Recording // a recording with a composite
key - public
- Recording ()
- Recording (char label, char idNum, char
title, char composer, char artist) - char IdNum7
- char Title 30
- char Composer30
- char Artist30
- char Label7 char Key () const
- Unpack (IOBuffer ) int Pack (IOBuffer )
const - void Print (ostream , char label 0) const
6Primary key -example
- The primary key in our example consists of the
initials for the company label combined with the
product ID. The canonical form of this key will
consist of the uppercase form of the Label field
followed by the ASCII representation of the ID
number. - E.G. DG241
7Index file
- Index file is used to provide rapid keyed access
to individual records in the data file. - Index file consists of the following fields
- key (e.g. ANG3795)
- reference (address) address of the corresponding
record in the data file
8Operations Required to Maintain an Indexed File
- Create the original empty index file and data
file - Load index file into memory before using it (if
possible, load the whole file) - Rewrite the index file from memory to the
permanent storage after modifying it - Add data records to the data file
- Delete data records from the data file
- Update records in the data file
- Update the index to reflect changes in the data
file
9Creating Files
- Create two empty files
- index file and
- data record file
10Loading Index into Main Memory
- This can be supported with a buffer I/O or with
an array.
11Rewriting the Index File from Memory
- This can be supported as a part of the close
operation for the index file (I.e write the
buffer or the array to the disk).
12Dangers of losing the index file
- If the index file is
- outdated
- corrupted or
- lost
- then there must be some means of
reconstructing the index file from the data file!
13Record addition
- Adding a new data record to the data file
requires that we add a new record to the index
file too. - Since the index file is usually kept sorted than
adding a new record would require rearranging the
records in this file. (This should be easy done
if the index is kept in main memory).
14Record deletion
- Deleting a data record requires deletion of the
corresponding index record. - Note that in an index file organization all data
records are pinned. (WHY?) - What are the consequences of this fact?
15Record Updating
- There are two categories of updates
- the update modifies the value of the key
- the update does not modify the value of the key
- If the update modifies (changes) the primary key,
then re-ordering of the index file might be
required. - If the update does not change the primary key it
might still require reordering of records in the
data file. (WHY?)
16Indexes that are too large to hold in Memory
- If the index file is too large to be kept in main
memory then it has to be kept on the secondary
storage. There are a number of disadvantages of
keeping an index file on the disk - searching the index file can be very time
consuming - index rearrangement can be time consuming too.
17Possible alternatives to storing index files
- If the index file is too large to be kept in main
memory than the following alternative
organizations should be considered - a hashed organization (if access speed is very
important) - a tree structured organization, or a multilevel
index such as a B-tree
18Pros of a simple index file
- Even if a simple index file has to be stored on
the disk, in some cases it might prove a useful
method of data storing. - Advantages of the simple index file
- allows for use of binary search to obtain a
key-access to the record - if index entries are much smaller than data
records then sorting and maintaining an index is
much easier than the data file - if the data records are pinned than the index
file allows for rearranging the keys without
moving the data records
19Indexing with Multiple Key Access
- Since the primary key is unique then it is often
used as a search keyword. - Example of the primary key of the class recording
is Label Id (e.g. ANG3795). But most of the time
when one searches for a music CD one would rather
provide a title, a composer, or an artist.
20Secondary key
- Secondary key is a key for which multiple records
may exist in the data file. - Example
- The composers name in the Recording class
example (there can be a number of CDs with
Beethovens work in a store). - The artist name in the Recording class.
21Secondary Index File
- A secondary index file might be created for each
of the possible secondary indexes. - Each entry in the secondary index file should
consists of the following two fields - secondary index field (e.g. Beethoven)
- the corresponding primary index key (e.g. ANG3795)
22Record Addition
- Adding a record to the data file implies adding a
record to the secondary index file. - Costs of that are similar to the cost of adding a
record in the primary index file. (e.g. records
might have to be shifted)
23Record Deletion
- Deleting a record implies removing all references
to that record in the file system. - After the search on the secondary key, we perform
a search on the primary key of the record to be
deleted and and remove it from the secondary
index file.
24Record Updating
- There are three possible situations
- The update changes secondary key (if the
secondary key is changed, we may have to
rearrange the secondary key index so it stays in
sorted order) - The update changes the primary key (it has a big
impact on the primary key index but in the
secondary key index we only need to update the
affected primary key field)
25Record Updating
- Update is confined to other fields all updates
that do not affect either the primary or
secondary key fields do not affect the secondary
key index, even if the update is substantial.
26Retrieving Data with Multiple Secondary Keys
- Example If we want to find all CDs in a music
store that have Beethovens Symphony No. 9 then
we should search data files by using the
following secondary keys - composer AND title.
- Both of those searches should produce a list of
CDs by providing their primary keys.
27Boolean AND in searches
- EG. The search by composer could produce the
following list of CDs (ANG3795, DG139201,
DG18807, RCA2626) and the search by title could
produce the following list of CDs (ANG3795,
COL31809, DG18807) - The CDs that we are interested in will have to
belong to both of the above lists. (In other
words we are taking an intersection of two sets)
WHY?
28Boolean OR searches
- If we want to find all CDs by Beethoven and
Chopin then we will use OR operation in our
secondary key searches. - To obtain the list of CDs that we are interested
we would have to combine the outcomes of both
searches (or use a union of two sets) WHY?
29Cons of the Current Secondary Index Structure
- Index file has to be rearranged every time a new
record is added to the file. - If there are duplicate secondary keys, the
secondary key field is repeated for each entry.
30Improvements to the secondary index key structure
- Solution 1
- Allow for multiple primary keys to be associated
with a single secondary key by allocating an
array of primary keys for each secondary key
entry. - Solves the problem of sorting each time when an
new entry is added. - Suffers from internal fragmentation (WHY?), and
the number of allocated entries in the array may
prove too small.
31Improvements to the secondary index key structure
- Solution 2
- Create an inverted list of indexes. Have each
secondary key point to a list of primary key
references associated with it. - This method eliminates most of the problems
associated with maintaining a secondary index
file. WHY?
32Selective Index
- A selective index contains keys for only a
portion of the records in the data file. Such an
index provides the user with a view of a specific
subset of the files records. (E.G. all CDs of
Beethovens work produced in 1998)
33Binding
- Binding takes place when a key is associated with
a particular physical record in the data file.
This can take place either during the preparation
of the data file and indexes or later on during
program execution.