Lecture 5: Record Storage and Primary File Organizations

1 / 24

About This Presentation

Title:

Lecture 5: Record Storage and Primary File Organizations

Description:

... of a file are allocated to disk blocks. usually 512 to 4K bytes (K ... collision is less severe with buckets because as many records as will fit in a bucket ... –

Number of Views:1050

Avg rating:3.0/5.0

Slides: 25

Provided by: ils2

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 5: Record Storage and Primary File Organizations

1
Lecture 5 Record Storage and Primary File
Organizations

Storage Devices
Storage of Databases
Operations on Files
Primary vs. Secondary File Organizations
Heap Files
Sorted Files
Hashing

2
Storage Devices

Computer Storage Medium (Hierarchy)
Factors cost, capacity, speed
Primary Storage data processed directly by the
CPU main memory, cache memory
Secondary (on-line) Storage - data must first be
copied into primary storage for processing
magnetic disks
Secondary (off-line) Storage - optical disks
(direct access), magnetic tapes (sequential)

3
Storage of Databases

Main Memory Databases
entire databases are kept in main memory
main memory is a volatile storage requires a
backup copy (on magnetic disk)
Most Databases
are stored permanently on magnetic disk
are too large to fit entirely in main memory
magnetic disk is less expensive

4
File Records on Disk

Records
file as a sequence of records (fig5.7)
record type field names data types
Fixed-Length Records
records with the same size in a file
Variable-Length Records (with separators)
records of different sizes
caused by multi-valued fields, optional fields,
or variable-length fields

5
File Blocks on Disk

Disk Block (fig5.8)
unit of data transfer between disk memory
records of a file are allocated to disk blocks
usually 512 to 4K bytes (K1024)
Blocking Factor (bfr)
number of (fixed-length) records in a block
bfr B/R (floor function)
B block size, R record size (in bytes)

6
File Blocks on Disk

Spanned vs. Unspanned File Org. (fig5.8)
Unspanned leaves the remaining space in each
block unused
Spanned utilizes the unused space
Contiguous vs. Linked Allocation
Contiguous file blocks are allocated to
consecutive disk blocks
Linked each file block contains the pointer to
the next block

7
Operations on Files

Types of Operations
Retrieval do not change data in the file
(open/close a file, find/read records)
Update change the files by insertion, deletion
or modification of records
Record-at-a-time operations are applied to a
single record
Set-at-a-time operations are applied to a set of
records or to the whole file

8
Operations on Files

File Open/Close Operations
Open readies the file for access, allocates
buffers to hold file blocks, sets the file
pointer to the beginning of the file
Close terminates access to the file
Set-at-a-time Operations
Find searches for the first file record that
satisfies a certain condition (selection
condition), and makes it the current file record

9
Operations on Files

FindNext searches for the next file record (from
the current record) and makes it the current file
record
Read reads the current file record
Insert inserts a new record into the file and
makes it the current file record
Delete removes the current file record from the
file by marking the record to indicate that it is
no longer valid

10
Operations on Files

Modify changes the values of some fields of the
current file record
Record-at-a-time Operations
FindAll locates all the records satisfying a
search condition
FindOrdered retrieves all the records in a
specific order
Reorganize reorganizes the records after update
operations

11
Operations on Files

Operation Factors
Access Type attribute value() or range(gt)
Access Time to find a particular record(s)
Insertion Time to insert a new record (find the
place to insert index structure update)
Deletion Time to delete a record (find the
record(s) to delete index structure update)
Space Overhead additional space occupied by an
index structure

12
Primary vs. Secondary File Organizations

Primary File Organizations
Heap Files
Sorted Files
Hashing
Secondary File Organizations (Index)
Single-level or Multi-level Indexes
B-trees
B-trees

13
Heap Files

Files of Unordered Records
simplest and basic file organization
new records are inserted at the end of the file
Access linear search requires searching through
the file block by block (N/2 file blocks on
average if the record exists, N file blocks if
not), very inefficient (it takes O(N) time)
Insertion very efficient (random order)
Deletion must first find its block, inefficient

14
Heap Files

Direct File
allows direct access by the position of a record
in a file
applies only to fixed-length records, contiguous
allocation, and unspanned blocks
file records 0, 1, , r-1 (i.e., 120)
records in each block (bfr) 0, 1, , bfr-1 (15)
ith record of a file (43) block position
(i/bfr), record position in the block (i mod
bfr)

15
Sorted Files

Files of Ordered Records
file records are kept sorted by the values of an
ordering field (sequential file) fig5.9
Access binary search (on its ordering field)
requires reading and searching log2 of the file
blocks on the average (O(logN) time), improvement
over linear search
Insertion records must be inserted in the
correct order, very inefficient

16
Sorted Files

Files of Ordered Records (cont)
Deletion inefficient, less expensive with
deletion marker and periodic reorganization
FindOrdered reading the records in order of the
ordering key values is extremely efficient
Overflow temporary unordered file for new
records to improve insertion efficiency,
periodically merged with the main ordered file

17
Hashing

Hash Functions
records in the file are unordered
determine the address (B) of a record based on
the value of the hash field (K) in the record
h(K) -gt B
ex) h(K) K mod M (1, 2, , M-1)
allow direct access to the target disk block
record search in the block main memory

18
Internal Hashing

Internal Hashing
hashing for an internal file
hash table as an array of records (fig5.10)
noninteger hash field value such as names can be
transformed into an integer (ASCII)
Collision (of hash addresses)
occurs when two hash field values are mapped into
the same hash address

19
Collision Resolution

Open Addressing
checks the subsequent positions in order until an
empty position is found
Chaining
extend the array with a number of overflow
positions
use a linked list of overflow records for each
hash address
overflow pointer refers to the position of the
next record (fig5.10(b))

20
Collision Resolution

Multiple Hashing
applies a second hash function if the first hash
function results in a collision
uses open addressing or applies a third hash
function if another collision results
Good Hashing Function
uniform and random distribution of records
hash table 70-90 full to minimize collisions
with less unused locations

21
External Hashing

Hashing Function
target address space is made of buckets (one disk
block or a cluster of contiguous blocks)
maps a hash field value into a bucket number
bucket number is then converted to the
corresponding disk block address (fig5.11)
collision is less severe with buckets because as
many records as will fit in a bucket

22
External Hashing

Bucket Overflow
when a bucket is filled to capacity
can be solved by chaining method fig5.12
a pointer is maintained in each bucket to a
linked list of overflow records for the bucket
record pointers include both a block address and
a relative record position within the block

23
External Hashing

Static Hashing
very fast access to records by the hash field
a fixed number of buckets M is allocated
not suitable for dynamic files (grows and shrinks
dynamically)
difficult to determine the number of buckets in
advance
requires a dynamic hashing technique

24
Dynamic Hashing

Extendible Hashing (fig5.13)
maintains a directory of 2d bucket addresses
uses first d bits of a hash value to determine a
directory entry and then a bucket address
d global depth, d local depth of a bucket
directory expands and shrinks dynamically
bucket doubling (split) vs. halving (merge)
update directory and local depth appropriately

Write a Comment

User Comments (0)