Lecture 5: Record Storage and Primary File Organizations

1 / 24
About This Presentation
Title:

Lecture 5: Record Storage and Primary File Organizations

Description:

... of a file are allocated to disk blocks. usually 512 to 4K bytes (K ... collision is less severe with buckets because as many records as will fit in a bucket ... –

Number of Views:1050
Avg rating:3.0/5.0
Slides: 25
Provided by: ils2
Category:

less

Transcript and Presenter's Notes

Title: Lecture 5: Record Storage and Primary File Organizations


1
Lecture 5 Record Storage and Primary File
Organizations
  • Storage Devices
  • Storage of Databases
  • Operations on Files
  • Primary vs. Secondary File Organizations
  • Heap Files
  • Sorted Files
  • Hashing

2
Storage Devices
  • Computer Storage Medium (Hierarchy)
  • Factors cost, capacity, speed
  • Primary Storage data processed directly by the
    CPU main memory, cache memory
  • Secondary (on-line) Storage - data must first be
    copied into primary storage for processing
    magnetic disks
  • Secondary (off-line) Storage - optical disks
    (direct access), magnetic tapes (sequential)

3
Storage of Databases
  • Main Memory Databases
  • entire databases are kept in main memory
  • main memory is a volatile storage requires a
    backup copy (on magnetic disk)
  • Most Databases
  • are stored permanently on magnetic disk
  • are too large to fit entirely in main memory
  • magnetic disk is less expensive

4
File Records on Disk
  • Records
  • file as a sequence of records (fig5.7)
  • record type field names data types
  • Fixed-Length Records
  • records with the same size in a file
  • Variable-Length Records (with separators)
  • records of different sizes
  • caused by multi-valued fields, optional fields,
    or variable-length fields

5
File Blocks on Disk
  • Disk Block (fig5.8)
  • unit of data transfer between disk memory
  • records of a file are allocated to disk blocks
  • usually 512 to 4K bytes (K1024)
  • Blocking Factor (bfr)
  • number of (fixed-length) records in a block
  • bfr B/R (floor function)
  • B block size, R record size (in bytes)

6
File Blocks on Disk
  • Spanned vs. Unspanned File Org. (fig5.8)
  • Unspanned leaves the remaining space in each
    block unused
  • Spanned utilizes the unused space
  • Contiguous vs. Linked Allocation
  • Contiguous file blocks are allocated to
    consecutive disk blocks
  • Linked each file block contains the pointer to
    the next block

7
Operations on Files
  • Types of Operations
  • Retrieval do not change data in the file
    (open/close a file, find/read records)
  • Update change the files by insertion, deletion
    or modification of records
  • Record-at-a-time operations are applied to a
    single record
  • Set-at-a-time operations are applied to a set of
    records or to the whole file

8
Operations on Files
  • File Open/Close Operations
  • Open readies the file for access, allocates
    buffers to hold file blocks, sets the file
    pointer to the beginning of the file
  • Close terminates access to the file
  • Set-at-a-time Operations
  • Find searches for the first file record that
    satisfies a certain condition (selection
    condition), and makes it the current file record

9
Operations on Files
  • FindNext searches for the next file record (from
    the current record) and makes it the current file
    record
  • Read reads the current file record
  • Insert inserts a new record into the file and
    makes it the current file record
  • Delete removes the current file record from the
    file by marking the record to indicate that it is
    no longer valid

10
Operations on Files
  • Modify changes the values of some fields of the
    current file record
  • Record-at-a-time Operations
  • FindAll locates all the records satisfying a
    search condition
  • FindOrdered retrieves all the records in a
    specific order
  • Reorganize reorganizes the records after update
    operations

11
Operations on Files
  • Operation Factors
  • Access Type attribute value() or range(gt)
  • Access Time to find a particular record(s)
  • Insertion Time to insert a new record (find the
    place to insert index structure update)
  • Deletion Time to delete a record (find the
    record(s) to delete index structure update)
  • Space Overhead additional space occupied by an
    index structure

12
Primary vs. Secondary File Organizations
  • Primary File Organizations
  • Heap Files
  • Sorted Files
  • Hashing
  • Secondary File Organizations (Index)
  • Single-level or Multi-level Indexes
  • B-trees
  • B-trees

13
Heap Files
  • Files of Unordered Records
  • simplest and basic file organization
  • new records are inserted at the end of the file
  • Access linear search requires searching through
    the file block by block (N/2 file blocks on
    average if the record exists, N file blocks if
    not), very inefficient (it takes O(N) time)
  • Insertion very efficient (random order)
  • Deletion must first find its block, inefficient

14
Heap Files
  • Direct File
  • allows direct access by the position of a record
    in a file
  • applies only to fixed-length records, contiguous
    allocation, and unspanned blocks
  • file records 0, 1, , r-1 (i.e., 120)
  • records in each block (bfr) 0, 1, , bfr-1 (15)
  • ith record of a file (43) block position
    (i/bfr), record position in the block (i mod
    bfr)

15
Sorted Files
  • Files of Ordered Records
  • file records are kept sorted by the values of an
    ordering field (sequential file) fig5.9
  • Access binary search (on its ordering field)
    requires reading and searching log2 of the file
    blocks on the average (O(logN) time), improvement
    over linear search
  • Insertion records must be inserted in the
    correct order, very inefficient

16
Sorted Files
  • Files of Ordered Records (cont)
  • Deletion inefficient, less expensive with
    deletion marker and periodic reorganization
  • FindOrdered reading the records in order of the
    ordering key values is extremely efficient
  • Overflow temporary unordered file for new
    records to improve insertion efficiency,
    periodically merged with the main ordered file

17
Hashing
  • Hash Functions
  • records in the file are unordered
  • determine the address (B) of a record based on
    the value of the hash field (K) in the record
  • h(K) -gt B
  • ex) h(K) K mod M (1, 2, , M-1)
  • allow direct access to the target disk block
  • record search in the block main memory

18
Internal Hashing
  • Internal Hashing
  • hashing for an internal file
  • hash table as an array of records (fig5.10)
  • noninteger hash field value such as names can be
    transformed into an integer (ASCII)
  • Collision (of hash addresses)
  • occurs when two hash field values are mapped into
    the same hash address

19
Collision Resolution
  • Open Addressing
  • checks the subsequent positions in order until an
    empty position is found
  • Chaining
  • extend the array with a number of overflow
    positions
  • use a linked list of overflow records for each
    hash address
  • overflow pointer refers to the position of the
    next record (fig5.10(b))

20
Collision Resolution
  • Multiple Hashing
  • applies a second hash function if the first hash
    function results in a collision
  • uses open addressing or applies a third hash
    function if another collision results
  • Good Hashing Function
  • uniform and random distribution of records
  • hash table 70-90 full to minimize collisions
    with less unused locations

21
External Hashing
  • Hashing Function
  • target address space is made of buckets (one disk
    block or a cluster of contiguous blocks)
  • maps a hash field value into a bucket number
  • bucket number is then converted to the
    corresponding disk block address (fig5.11)
  • collision is less severe with buckets because as
    many records as will fit in a bucket

22
External Hashing
  • Bucket Overflow
  • when a bucket is filled to capacity
  • can be solved by chaining method fig5.12
  • a pointer is maintained in each bucket to a
    linked list of overflow records for the bucket
  • record pointers include both a block address and
    a relative record position within the block

23
External Hashing
  • Static Hashing
  • very fast access to records by the hash field
  • a fixed number of buckets M is allocated
  • not suitable for dynamic files (grows and shrinks
    dynamically)
  • difficult to determine the number of buckets in
    advance
  • requires a dynamic hashing technique

24
Dynamic Hashing
  • Extendible Hashing (fig5.13)
  • maintains a directory of 2d bucket addresses
  • uses first d bits of a hash value to determine a
    directory entry and then a bucket address
  • d global depth, d local depth of a bucket
  • directory expands and shrinks dynamically
  • bucket doubling (split) vs. halving (merge)
  • update directory and local depth appropriately
Write a Comment
User Comments (0)
About PowerShow.com