Chapter 11: Storage and File Structure - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Chapter 11: Storage and File Structure

Description:

Chapter 11: Storage and File Structure. Overview of Physical Storage Media ... What is the value of data (e.g., Maxines Salon)? Classification of Storage Media ... – PowerPoint PPT presentation

Number of Views:226
Avg rating:3.0/5.0
Slides: 70
Provided by: marily198
Category:

less

Transcript and Presenter's Notes

Title: Chapter 11: Storage and File Structure


1
Chapter 11 Storage and File Structure
  • Overview of Physical Storage Media
  • Redundant Arrays of Independent Disks (RAID)
  • Buffer Management
  • File Organization

2
Properties of Physical Storage Media
  • Types of storage media differ in terms of
  • Speed of data access
  • Cost per unit of data
  • Reliability
  • Data loss on power failure or system (software)
    crash
  • Physical failure of the storage device
  • gt What is the value of data (e.g., Maxines
    Salon)?

3
Classification of Storage Media
  • Storage be classified as
  • Volatile
  • Content is lost when power is switched off
  • Includes primary storage (cache, main-memory)
  • Non-volatile
  • Contents persist even when power is switched off
  • Includes secondary and tertiary storage, as well
    as battery-backed up RAM

4
Storage Hierarchy, Cont.
  • Storage can also be classified as
  • Primary Storage
  • Fastest media
  • Volatile
  • Includes cache and main memory
  • Secondary Storage
  • Moderately fast access time
  • Non-volatile
  • Includes flash memory and magnetic disks
  • Also called on-line storage
  • Tertiary Storage
  • Slow access time
  • Non-volatile
  • Includes magnetic tape and optical storage
  • Also called off-line storage

5
Primary Storage
  • Cache
  • Volatile
  • Fastest and most costly form of storage
  • Managed by the computer system hardware
  • Main memory (also known as RAM)
  • Volatile
  • Fast access (10s to 100s of nanosecs 1 nanosec
    109 seconds)
  • Generally too small (or too expensive) to store
    the entire database
  • Capacities of up to a few Gigabytes
  • Capacities have gone up, while costs have gone
    down
  • gt The above comments notwithstanding, main
    memory databases do exist

6
Physical Storage Media, Cont.
  • Flash memory
  • Non-volatile
  • Widely used in embedded devices such as digital
    cameras
  • Data can be written at a location only once, but
    location can be erased and written to again
  • Can support only a limited number of write/erase
    cycles.
  • Erasing of memory has to be done to an entire
    bank of memory
  • Reads are roughly as fast as main memory
  • Writes are slow (few microseconds), erase is
    slower
  • Cost per unit of storage roughly similar to main
    memory
  • Also known as Electrically Erasable Programmable
    Read-Only Memory (EEPROM)

7
Physical Storage Media, Cont.
  • Magnetic-disk
  • Non-volatile
  • Survives power failures and system crashes
  • Failure can destroy data, but is very rare
    (??????)
  • Primary medium for database storage
  • Typically stores the entire database
  • Capacities range up to roughly 100 GB currently
  • Much larger capacity than main or flash memory
  • Growing constantly and rapidly with technology
    improvements
  • Direct-access, i.e., data on disk can be accessed
    in any order, unlike magnetic tape also called
    random-access.

8
Physical Storage Media, Cont.
  • Optical storage
  • Non-volatile, data is read optically from a
    spinning disk using a laser
  • CD-ROM (640 MB) and DVD (4.7 to 17 GB) most
    popular forms
  • Write-once, read-many (WORM) optical disks used
    for archival storage (CD-R and DVD-R)
  • Multiple write versions available (CD-RW, DVD-RW,
    and DVD-RAM)
  • Reads and writes are slower than with magnetic
    disk

9
Physical Storage Media, Cont.
  • Tape storage
  • Non-volatile
  • Sequential-access much slower than a disk
  • Very high capacity (40 to 300 GB tapes available)
  • Storage costs are much cheaper than for a disk,
    but high quality drives can be very expensive
  • Used primarily for backup (to recover from disk
    failure), archiving and transfer of large
    quantities of data

10
Physical Storage Media, Cont.
  • Juke-boxes
  • For storing massive amounts of data
  • Hundreds of terabytes (1 terabyte 109 bytes),
    petabyte (1 petabyte 1012 bytes) or exabytes (1
    exabyte 1012 bytes)
  • Large number of removable disks or tapes
  • A mechanism for automatic loading/unloading of
    disks or tapes

11
Storage Hierarchy
12
Magnetic Hard Disk Mechanism
NOTE Diagram is schematic, and simplifies the
structure of actual disk drives
13
Magnetic Disks
  • Disk assembly consists of
  • A single spindle that spins continually (at 7500
    or 10000 RPMs typically)
  • Multiple disk platters (typically 2 to 4)
  • Surface of platter divided into circular tracks
  • Over 16,000 tracks per platter on typical hard
    disks
  • Each track is divided into sectors
  • A sector is the smallest unit of data that can be
    read or written
  • Sector size typically 512 bytes
  • Typical sectors per track 200 (on inner tracks)
    to 400 (on outer tracks)
  • Head-disk assemblies
  • One head per platter, mounted on a common arm,
    each very close to its platter.
  • Reads or writes magnetically encoded information
  • Cylinder i consists of ith track of all the
    platters

14
Magnetic Disks, Cont.
  • To read/write a sector
  • disk arm swings to position head on right track
  • sector rotates under read/write head
  • data is read/written as sector passes under head
  • gt Disks are the primary performance bottleneck
    in a database system, in part because of the need
    for physical movement

15
Performance Measures of Disks
  • Access time - The time it takes from when a read
    or write request is issued to when data transfer
    begins.
  • Access time consists of two parts
  • Seek time - the time it takes to reposition the
    arm over the correct track
  • 4 to 10 milliseconds on typical disks
  • Rotational latency - the time it takes for the
    sector to be accessed to appear under the head
  • 4 to 11 milliseconds on typical disks (5400 to
    15000 r.p.m.)

16
Performance Measures of Disks, Cont.
  • Data Transfer Rate - The rate at which data can
    be retrieved from or stored to the disk.
  • 25 to 100 MB per second is typical
  • Multiple disks may share an interface controller,
    so the rate that the interface controller can
    handle data is also important
  • ATA-5 66 MB/second, SCSI-3 40 MB/s, Fiber
    Channel 256 MB/s

17
Performance Measures (Cont.)
  • Mean time to failure (MTTF) - The average time
    the disk is expected to run continuously without
    any failure.
  • Typically 3 to 5 years
  • Probability of failure of new disks is quite low
    - 30,000 to 1,200,000 hours for a new disk
  • An MTTF of 1,200,000 hours for a new disk means
    that given 1000 relatively new disks, on an
    average one will fail every 1200 hours
  • MTTF decreases as disk ages

18
Magnetic Disks (Cont.)
  • The term controller is used (primarily) in two
    ways, both in the book and other literature the
    book does not distinguish between the two very
    well.
  • Disk controller (use 1)
  • Packaged within the disk
  • Accepts high-level commands to read or write a
    sector
  • Initiates actions such as moving the disk arm to
    the right track and actually reading or writing
    the data
  • Computes and attaches checksums to each sector
    for reliability
  • Performs re-mapping of bad sectors

19
Disk Subsystem
interface
  • Interface controller (use 2)
  • Multiple disks are typically connected to the
    system bus through an interface controller
    (a.k.a., a host adapter)
  • Many functions (checksum, bad sector re-mapping)
    are often carried out by individual disk
    controllers reduces load on the interface
    controller

20
Disk Subsystem
  • The distribution of work between the disk
    controller and interface controller depends on
    the interface standard.
  • Disk interface standard families
  • ATA (AT adaptor)/IDE
  • SCSI (Small Computer System Interconnect)
  • Fibre Channel, etc.
  • Several variants of each standard (different
    speeds and capabilities)
  • One computer can have many interface controllers,
    of the same or different type
  • gt Like disks, interface controllers are also a
    performance bottleneck.

21
Techniques for Optimizationof Disk-Block Access
  • Several techniques are employed to minimize the
    negative effects of disk and controller
    bottlenecks
  • File organization
  • Organize data on disk based on how it will be
    accessed
  • Store related information (e.g., data in the same
    file) on the same or nearby cylinders.
  • A disk may get fragmented over time
  • As data is inserted and deleted from disk, free
    slots on disk tend to become scattered
  • Data items (e.g., files) are then broken up and
    scattered over the disk
  • Sequential access to a fragmented data results in
    increased disk arm movement
  • Most DBMSs have utilities to de-fragment the file
    system

22
Techniques for Optimizationof Disk-Block Access
  • Block size adjustment
  • A block is a contiguous sequence of sectors from
    a single track
  • A DBMS transfers data between disk and main
    memory in blocks
  • Sizes range from 512 bytes to several kilobytes
  • Smaller blocks more transfers from disk
  • Larger blocks may waste space due to partially
    filled blocks
  • Typical block sizes range from 4 to 16 kilobytes
  • Block size can be adjusted to accommodate
    workload
  • Disk-arm-scheduling algorithms
  • Order pending accesses so that disk arm movement
    is minimized
  • One example is the Elevator algorithm

23
Techniques for Optimizationof Disk Block Access,
Cont.
  • Log disk
  • A disk devoted to the transaction log
  • Writes to a log disk are very fast since no seeks
    are required
  • Often times provided with battery back-up
  • Nonvolatile write buffers/RAM
  • Battery backed up RAM or flash memory
  • Typically associated with a disk or storage
    device
  • Blocks to be written are first written to the
    non-volatile RAM buffer
  • Disk controller then writes data to disk whenever
    the disk has no other requests or when the
    request has been pending for some time
  • Allows processes to continue without waiting on
    write operations
  • Data is safe even in the event of power failure
  • Allows writes to be reordered to minimize disk
    arm movement

24
RAID
  • Redundant Arrays of Independent Disks (RAID)
  • Disk organization techniques that exploit a
    collection of disks
  • Speed, reliability or the combination are
    increased
  • Collection appears as a single disk
  • Originally inexpensive disks
  • Using multiple disks increases the risk of
    failure
  • A system with 100 disks, each with MTTF of
    100,000 hours (approx. 11 years), will have a
    system MTTF of 1000 hours (approximately 41 days)
  • Techniques that use redundancy to avoid data loss
    are therefore critical

25
Improvement of Reliabilityvia Redundancy
  • RAID improves on reliability by using two
    techniques
  • Parity Information
  • Mirroring
  • Parity Information (basic)
  • Makes use of the exclusive-or operator
  • 1 1 0 1 1
  • 0 1 1 0 0
  • Enables the detection of single-bit errors

26
Improvement of Reliabilityvia Redundancy
  • Mirroring
  • Duplicate every disk
  • Every write is carried out on both disks
  • If one disk in a pair fails, data still available
  • Reads can take place from either disk and, in
    particular, from both disks at the same time
  • Data loss would occur only if a disk fails, and
    its mirror disk also fails before the system is
    repaired
  • Probability of both events is very small.
  • Mean time to data loss depends on time to failure
    and time to repair.
  • If MTTF is100,000 hours, mean time to repair is10
    hours, then mean time to data loss is 500106
    hours (or 57,000 years) for a mirrored pair of
    disks.

27
Improvement in Performancevia Parallelism
  • RAID improves performance main through the use of
    parallelism.
  • Two main goals of parallelism in a disk system
  • Load balance multiple small accesses to increase
    throughput (I/O rate).
  • Parallelize large accesses to reduce response
    time (transfer rate).
  • Parallelism is achieved through striping.
  • Striping can be done at varying levels
  • bit-level
  • block-level (variable size)

28
Improvement in Performance via Parallelism
  • Bit-level striping
  • Split the bits of each byte across multiple
    disks
  • In an array of eight disks, write bit i of each
    byte to disk i.
  • In theory, each access can read data at eight
    times the rate of a single disk.
  • But seek/access time worse than for a single
    disk.
  • Not commonly used, in particular, as described.
  • Block-level striping
  • With n disks, block i of a file goes to disk (i
    mod n) 1
  • Requests for different blocks can run in parallel
    if the blocks reside on different disks (improves
    throughput).
  • A request for a long sequence of blocks can use
    all disks in parallel (improves response time).

29
RAID Levels
  • Different RAID organizations, or RAID levels,
    have differing cost, performance and reliability
    characteristics.
  • Levels vary by source and vendor
  • Standard levels 0, 1, 2, 3, 4, 5, 6
  • Nested levels 01, 10, 50, 51
  • Several other, non-standard, vendor specific
    levels exist as well.
  • Our book 0, 1, 2, 3, 4, 5
  • Performance and reliability are not linear in the
    level .
  • It is helpful to compare each level to every
    other level, and to the single disk option.

30
RAID Levels
  • RAID Level 0
  • Block striping.
  • Used in high-performance applications where data
    lost is not critical.
  • RAID Level 1
  • Mirrored disks with block striping.
  • Offers the best write performance.
  • Popular for applications such as storing log
    files.
  • Sometimes called 01, 10, 01 or 10.

31
RAID Levels, Cont.
  • RAID Level 2
  • Memory-Style Error-Correcting-Codes (ECC) with
    bit-level striping.
  • Parity information is used to detect, locate and
    correct errors.
  • Outdated Disks contain embedded functionality to
    detect and locate sector errors, so only a single
    parity disk is needed (for correction).

32
RAID Levels, Cont.
  • RAID Level 3
  • Bit-level striping, with parity.
  • To recover data in a damaged disk, compute XOR of
    bits from other disks (including parity bit
    disk).
  • Faster data transfer than with a single disk, but
    fewer I/Os per second since every disk has to
    participate in every I/O.
  • When writing data, corresponding parity bits must
    also be computed and written to a parity bit
    disk.
  • Subsumes Level 2 (provides all its benefits, at
    lower cost).

33
RAID Levels (Cont.)
  • RAID Level 4
  • Block-level striping, and keeps a parity block on
    a separate disk for corresponding blocks from N
    other disks.
  • To recover data in a damaged disk, compute XOR of
    bits from other disks (including parity bit
    disk).
  • When writing data block, the corresponding block
    of parity bits must also be recomputed and
    written to the parity disk
  • Can be done by using old parity block, old value
    of current block and new value of current block
    (2 block reads 2 block writes)
  • Or by re-computing the parity value using the new
    values of blocks corresponding to the parity
    block
  • More efficient for writing large amounts of data
    sequentially

34
RAID Levels (Cont.)
  • RAID Level 4 (Cont.)
  • Parity block becomes a bottleneck for independent
    block writes.
  • Provides higher I/O rates for independent block
    reads than Level 3
  • Provides higher transfer rates for large,
    multi-block reads compared to a single disk.

35
RAID Levels (Cont.)
  • RAID Level 5
  • Block-level striping with distributed parity
    partitions data and parity among all N 1 disks,
    rather than storing data in N disks and parity in
    1 disk.
  • For example, with 5 disks, parity block for ith
    set of N blocks is stored on disk (i mod 5) 1,
    with the data blocks stored on the other 4 disks.

36
RAID Levels (Cont.)
  • RAID Level 5 (Cont.)
  • Provides same benefits of level 4, but minimizes
    the parity disk bottleneck.
  • Higher I/O rates than Level 4.
  • RAID Level 6
  • PQ Redundancy scheme similar to Level 5, but
    uses additional information to guard against
    multiple disk failures.
  • Better reliability than Level 5 at a higher cost
    not used as widely.

37
Choice of RAID Level
  • Factors in choosing RAID level
  • Monetary cost of disks, controllers or RAID
    storage devices
  • Reliability requirements
  • Performance requirements
  • Throughput vs. response time
  • Short vs. long I/O
  • Reads vs. writes
  • During failure and rebuilding of a failed disk

38
Choice of RAID Level, Cont.
  • RAID 0 is used only when (data) reliability is
    not important.
  • Level 2 and 4 never used since they are subsumed
    by 3 and 5.
  • Level 3 is not used (typically) since bit-level
    striping forces single block reads to access all
    disks, wasting disk arm movement.
  • Sometimes vendors advertize a version of level 3
    using byte-level striping.
  • Level 6 is rarely used since levels 1 and 5 offer
    adequate safety for almost all applications.
  • So competition is between 1 and 5 only.

39
Choice of RAID Level (Cont.)
  • Level 1 provides much better write performance
    than level 5.
  • Level 5 requires at least 2 block reads and 2
    block writes to write a single block, whereas
    Level 1 only requires 2 block writes.
  • Level 1 preferred for high update environments.
  • Level 1 has higher storage cost than level 5,
    however
  • I/O requirements have increased greatly, e.g. for
    Web servers.
  • When enough disks have been bought to satisfy
    required rate I/O rate, they often have spare
    storage capacity
  • In that case there is often no extra monetary
    cost for Level 1!
  • Level 5 is preferred for applications with low
    update rate,and large amounts of data.
  • Level 1 is preferred for all other applications.

40
Other RAID Issues
  • Hardware vs. software RAID.
  • Local storage buffers.
  • Hot-swappable disks.
  • Spare components disks, fans, power supplies,
    controllers.
  • Battery backup.

41
Storage Access
  • A DBMS will typically have several files
    allocated to it for storage, which are (usually)
    formatted and managed by the DBMS.
  • In the simplest case each file maps to either a
    disk or disk partition.
  • Each file is partitioned into blocks (a.k.a,
    pages).
  • Consists of zero or more contiguous sectors.
  • A block is the smallest unit of DBMS storage
    allocation.
  • A block is the smallest unit of DBMS data
    transfer.
  • Each block is partitioned into records.
  • Each record is partitioned into fields.

42
Buffer Management
  • The DBMS will transfer blocks of data between RAM
    and Disk, in a manner similar to an operating
    system.
  • The DBMS seeks to minimize the number of block
    transfers between the disk and memory.
  • The portion of main memory available to store
    copies of disk blocks is called the buffer.
  • The DBMS subsystem responsible for allocating
    buffer space in main memory is called the buffer
    manager.

43
Buffer Manager Algorithm
  • Programs call the buffer manager when they need a
    (disk) block.
  • If a requested block is already in the buffer
    then the requesting program is given the address
    of the block in main memory
  • If the block is not in the buffer then the buffer
    manager
  • Allocates space in the buffer for the block,
    throwing out some other block, if required.
  • Writes out to disk the thrown out block if it has
    been modified since the last time it was
    retrieved or written.
  • Reads the block from disk to the buffer once
    space is available, and passes the address of the
    block in main memory to requester.

44
Buffer-Replacement Policies
  • How is a block selected for replacement?
  • Most operating systems use Least Recently Used
    (LRU).
  • Past pattern of block references is used to
    predict future references.
  • In a DBMS, LRU can be a bad strategy for certain
    access patterns.
  • Queries have well-defined access patterns (such
    as sequential scans), and so a DBMS can predict
    future references (an OS cannot).
  • Mixed strategies are common.
  • Statistical information can be exploited.
  • Examples of statistical information
  • The data dictionary is frequently accessed, so
    keep those blocks in the buffer.
  • Index blocks are used frequently, so keep them in
    the buffer.
  • Store and use histograms on the distribution of
    column values.

45
Buffer-Replacement Policies, Cont.
  • Some terminology...
  • A memory block that has been loaded into the
    buffer from disk, and is not allowed to be
    replaced is said to be pinned.
  • At any given time a buffer block is in one of the
    following states
  • Unused (free) does not contain the copy of a
    block from disk.
  • Used, but not pinned contains the copy of a
    block from disk, which is available for
    replacement.
  • Used and pinned contains the copy of a block
    from disk, but which is not available for
    replacement.

46
Buffer-Replacement Policies, Cont.
  • Page replacement strategies
  • Toss immediate Frees the space in the buffer
    occupied by a block, typically done when the
    final tuple of a block has been used writes the
    block out if necessary.
  • Most recently used (MRU) The moment a disk
    block in the buffer becomes unpinned, it becomes
    the most recently used block.
  • Least recently used (LRU) Of all unpinned disk
    blocks in the buffer, the one referenced furthest
    back in time.

47
Buffer-Replacement Policies, Cont.
  • Consider the following query and query plan,
    chosen by the query optimizer, for implementing a
    join query between the borrower and customer
    relations.
  • select customer-name, loan-number, street, city
  • from borrower, customer
  • where bcustomer-name ccustomer-name
  • for each tuple b of borrower do // scan
    borrower sequentially
  • for each tuple c of customer do // scan
    customer sequentially
  • if bcustomer-name ccustomer-name then
    begin
  • xcustomer-name bcustomer-name
  • xloan-number bload-number
  • xstreet cstreet
  • xcity ccity
  • include x in the result of the query
  • end if
  • end for
  • end for

48
Buffer-Replacement Policies, Cont.
  • Suppose that borrower consists of blocks b1, b2,
    and b3, that customer consists of blocks c1, c2,
    c3, c4, and c5, and that the buffer can fit five
    (5) blocks total.
  • LRU Example
  • Read in a block of borrower, and keep it in the
    buffer until the last tuple in that block has
    been processed (pinned).
  • Once all of the tuples from that block have been
    processed, free up the buffer space used by that
    block immediately (toss-immediately).
  • If a customer block is to be brought into the
    buffer, and another block must be moved out in
    order to make room, then move out the least
    recently used block (borrower, or customer).

49
Buffer-Replacement Policies, Cont.
  • Applying the LRU replacement algorithm results in
    the following for the first two tuples of b1
  • b1 b1 b1 b1 b1 b1 b1 b1 b1 b1 b1
  • c1 c1 c1 c1 c5 c5 c5 c5 c4 c4
  • c2 c2 c2 c2 c1 c1 c1 c1 c5 .......
  • c3 c3 c3 c3 c2 c2 c2 c2
  • c4 c4 c4 c4 c3 c3 c3
  • Notice that each time a customer block is read
    into the buffer, the block replaced is the next
    block required!
  • This results in a total of six (6) page
    replacements.

50
Buffer-Replacement Policies, Cont.
  • MRU Example
  • Read in a block of borrower, and keep it in the
    buffer until the last tuple in that block has
    been processed (pinned).
  • Once all of the tuples from that block have been
    processed, free up the buffer space used by that
    block immediately (toss-immediately).
  • If a customer block is to be brought into the
    buffer, and another block must be moved out in
    order to make room, then move out the most
    recently used block (borrower, or customer).

51
Buffer-Replacement Policies, Cont.
  • Applying the MRU replacement algorithm results in
    the following for the first two tuples of b1
  • b1 b1 b1 b1 b1 b1 b1
  • c1 c1 c1 c1 c1 c1
  • c2 c2 c2 c2 c2 .......
  • c3 c3 c3 c4
  • c4 c5 c5
  • This results in a total of two (2) page
    replacements.

52
File Organization
  • Several standard low-level storage DBMS issues
  • Avoiding large numbers of block reads
  • Algorithmically, the bar has been raised
  • Records crossing block boundaries
  • Dangling pointers
  • Disk fragmentation
  • Allocating free space
  • Other issues exist.
  • Partial solutions to the above will be presented
    in the following.

53
File Organization
  • Recall
  • A database is stored as a collection of files.
  • A file consists of a sequence of blocks.
  • A block consists of a sequence of records.
  • A record consists of a sequence of fields.
  • Initial assumptions
  • Record size is fixed.
  • Each file has records of one particular type
    only.
  • Different files are used for different relations.
  • Tuples are stored in an arbitrary order, i.e.,
    unsorted.
  • This case is easiest to deal with variable
    length records will be considered later.

54
Fixed-Length Records
  • Simple approach
  • Store record i starting from byte n ? (i 1),
    where n is the size of each record.
  • Record access is simple.
  • Technically, records could cross block
    boundaries, but this can be fixed.
  • Insertion is easy
  • Add each new record to the end
  • Deletion of record i options
  • Move records i 1, . . ., n to i, . . . , n 1
    (shift)
  • Move record n to i
  • What problems arise with each?
  • gt Dont move records, but link all free
    records on a free list.

55
Free Lists
  • Store the address, i.e., a pointer, of the first
    deleted record in a file header.
  • Store a pointer in the first record to the second
    record, etc.
  • Improvement - use space for a normal attribute to
    store the pointers.
  • Records in the file could be connected with
    points for efficient scanning.

56
Variable-Length Records
  • Variable-length records arise in databases in
    several ways
  • Storage of multiple record types in a file
  • Record types that allow variable lengths for one
    or more fields
  • Record types that allow repeating fields
  • Options for storing variable-length records
  • Byte-string
  • Reserved space
  • Pointers
  • Slotted page structure

57
Variable-Length Records, Cont.
  • Byte string representation
  • Attach an end-of-record (?) character to the end
    of each record.
  • Could also combine this with pointers for free
    and un-free records.
  • Deletion insertion result in fragmentation.
  • Difficulty with individual record growth.
  • Reserved space can use fixed-length records of
    a known maximum length unused space in shorter
    records filled with a null or end-of-record
    symbol.
  • Once again, could use with pointers for free and
    un-free records.

58
Variable Length Records, Cont.
  • Pointer method
  • A variable-length record is represented by a list
    of fixed-length records, chained together via
    pointers.
  • Can be used even if the maximum record length is
    not known.
  • Once again, could use with pointers for free and
    un-free records.

59
Variable Length Records, Cont.
  • Disadvantages
  • Records are more likely to cross block boundaries
  • Space is wasted in all records except the first
    in a a chain
  • Partial solution is to allow two kinds of block
    in file
  • Anchor block contains the first records of
    chain
  • Overflow block contains records other than
    those that are first in a chain.
  • Records are still more likely, to cross block
    boundaries.

60
Variable-Length Recordsthe Slotted Page
Structure
  • The slotted page (e.g., block) header contains
  • Number of record entries
  • Pointer to the end of free space in the block
  • Location and size of each record
  • Records can be moved within a page to eliminate
    empty space
  • Entry in the header must be updated.
  • Pointers from outside the block do not point
    directly to a record, but rather to the
    corresponding entry in the header.

61
Organization of Records in Files
  • How are records assigned to blocks within a
    specific file?
  • Heap a record is placed anywhere in the file
    where there is space.
  • Sequential records are stored in sequential
    order, based on the value of the search key of
    each record.
  • Hashing a hash value is computed from some
    attribute(s) of each record, which specifies the
    block where the record is placed.
  • Clustering records of different tables are
    stored in the same file.
  • Motivation is to store related records on the
    same block to minimize I/O
  • The preceding slides covered heap organization.

62
Sequential File Organization
  • Suitable for applications that require sequential
    processing of the entire file.
  • The records in the file are ordered by a
    search-key

63
Sequential File Organization (Cont.)
  • Deletion must maintain pointer chains.
  • Insertion locate the position where the record
    is to be inserted
  • if there is free space insert there
  • if no free space, insert the record in an
    overflow block
  • In either case, pointer chain must be updated
  • Need to reorganize the filefrom time to time to
    restoresequential order.

64
Clustering File Organization
  • Simple file structure stores each relation in a
    separate file
  • A clustering file organization stores several
    relations in one file
  • Clustered organization of customer and depositor
  • Good for queries involving depositor
    customer, and for queries involving one single
    customer and his accounts
  • Bad for queries involving only customer
  • Results in variable size records

65
Data Dictionary Storage
  • The data dictionary (a.k.a. system catalog)
    stores metadata
  • Information about relations
  • names of relations
  • names and types of attributes
  • names and definitions of views
  • integrity constraints
  • Physical file organization information
  • How relation is stored (sequential, hash, etc.)
  • Physical location of relation
  • operating system file name, or
  • disk addresses of blocks containing the records
  • Statistical and descriptive data
  • number of tuples in each relation
  • User and accounting information, including
    passwords
  • Information about indices (Chapter 12)

66
Data Dictionary Storage (Cont.)
  • Catalog structure can use either
  • specialized data structures designed for
    efficient access
  • a set of relations, with existing system features
    used to ensure efficient access
  • The latter alternative is usually preferred
  • A possible catalog representation

Relation-metadata (relation-name,
number-of-attributes, storage-organization,
location)Attribute-metadata (attribute-name,
relation-name, domain-type, position,
length) User-metadata (user-name,
encrypted-password, group) Index-metadata
(index-name, relation-name, index-type,
index-attributes) View-metadata (view-name,
definition)
67
Large Objects
  • Large objects
  • text documents
  • images
  • computer aided designs
  • audio and video data
  • Large objects may need to be stored in a
    contiguous sequence of bytes when brought into
    memory.
  • If an object is bigger than a block, contiguous
    blocks in the buffer must be allocated to store
    it.
  • Might be preferable to disallow direct access to
    data, and only allow access through a
    file-system-like API, to remove need for
    contiguous storage.
  • Most vendors provide supporting types
  • binary large objects (blobs)
  • character large objects (clobs)
  • text, image, etc.

68
Modifying Large Objects
  • If the application requires insertion or deletion
    of bytes from specified regions of an object
  • B-tree file organization (described later in
    Chapter 12) can be modified to represent large
    objects
  • Each leaf block of the tree stores between half
    and 1 block worth of data from the object
  • Special-purpose application programs outside the
    database are frequently used to manipulate large
    objects
  • Text data treated as a byte string manipulated by
    editors and formatters.
  • Graphical data and audio/video data is typically
    created and displayed by separate application

69
End of Chapter
Write a Comment
User Comments (0)
About PowerShow.com