SEG3550 Fundamentals of Information System - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

SEG3550 Fundamentals of Information System

Description:

Why do we need to know about storage/file structure. Many database technologies are developed to utilize the storage architecture/hierarchy ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 27
Provided by: huan9
Category:

less

Transcript and Presenter's Notes

Title: SEG3550 Fundamentals of Information System


1
SEG3550 Fundamentals of Information System
  • Tutorial 8
  • Storage and File Structure

2
Overview
  • Storage Hierarchy
  • Physical Storage Media
  • RAID
  • RAID Levels
  • File Organization

3
Why do we need to know about storage/file
structure
  • Many database technologies are developed to
    utilize the storage architecture/hierarchy
  • Data in the database needs to be organized and
    stored/retrieved efficiently

4
Physical Storage Media
  • Physical storage media classified according to
  • Data access speed
  • Cost per unit of data
  • Reliability
  • Can differentiate storage into
  • Volatile storage Loses contents when power
    is switched off
  • Non-volatile storage Contents persist even
    when power is switched off

5
Storage Hierarchy
Cache
Volatile primary storage
unit price
Memory
Flash Memory
Secondary storage
Magnetic Disk
Non-volatile
speed
Optical Disk
Tertiary storage
Magnetic Tape
6
Primary Storage
  • Cache
  • Volatile - managed by hardware
  • Speed 7 to 20 ns (1 nanosecond 109 seconds)
  • Capacity
  • A typical PC level 2 cache 64KB-2 MB.
  • Within processors, level 1 cache usually ranges
    in size from 8 KB to 64 KB.
  • Main memory
  • Volatile
  • Speed 10s to 100s of nanoseconds
  • Capacity Up to a few Gigabytes
  • widely used currently
  • per-byte costs have decreased roughly
  • factor of 2 every 2 to 3 years)

7
Secondary Storage
  • Flash memory
  • Non-volatile
  • Speed read speed similar to main memory. But
    writes are slow (few microseconds), erase is
    slower.
  • Capacity 32M to a few Gigabytes currently
  • Forms SmartMedia, memory stick, secure digital,
    BIOS
  • Cost roughly same as main memory
  • Magnetic-disk
  • Non-volatile
  • Capacities up to roughly 1 TB(1000 GB) currently
  • Data must be moved from disk to main memory for
    access, and written back for storage.
  • Growing constantly and rapidly with technology
  • improvements (factor of 2 to 3 every 2
    years)

8
Tertiary Storage (Non-volatile)
  • Optical storage
  • CD-ROM (640 MB) and DVD (4.7 to 17 GB) most
    popular forms
  • Reads and writes are slower than with magnetic
    disk
  • Tape storage
  • used primarily for backup (to recover from disk
    failure), and for archival data
  • sequential-access much slower than disk
  • very high capacity (40 to 300 GB tapes available)

9
Between Memory and Disk
  • The permanent residency of database is mostly on
    disk
  • In database, cost is usually measured by the
    number of disk I/O
  • But disks are too slow and we need memory to be
    the buffers but memory is volatile
  • this introduced a number of issues

10
RAID
  • RAID Redundant Arrays of Independent Disks
  • Disk organization techniques that manage a large
    numbers of disks.
  • high capacity and high speed by using multiple
    disks in parallel, and
  • high reliability by storing data redundantly

11
Mean time to failure (MTTF)
  • Average time the disk is expected to run
    continuously without any failure.
  • Typically 3 to 5 years (1 year 8,760 hours)
  • MTTF 30,000 to 1,200,000 hours for a new disk
  • an MTTF of 1,200,000 hours for a new disk means
    that given 1000 relatively new disks, on an
    average one will fail every 1200 hours
  • (assuming by Exponential Distribution)
  • When number of disks increase, the chance of some
    disk failure increase proportionally

12
Parallelism
  • Two main goals of parallelism in a disk system
  • 1. Load balance multiple small accesses to
    increase throughput
  • 2. Parallelize large accesses to reduce response
    time.
  • Basic strategy Stripping
  • Compare and contrast bit stripping and byte
    stripping

13
Redundancy
  • store extra information that can be used to
    rebuild information lost in a disk failure
  • Basic strategy mirroring, parity
  • Mean time to data loss depends on mean time to
    failure, and mean time to repair

Data Parity
10010010 1
14
RAID Levels
Twice the Read transaction rate of single disks,
same Write transaction rate as single disks.
Due to its cost and complexity, level 2 never
really "caught on". Therefore, much of the
information below is based upon theoretical
analysis, not empirical evidence.
15
RAID Levels (cont)
Very high read data transfer rate and very high
write data transfer rate, but Controller design
is fairly complex.
Very high read data transaction rate,but quite
complex controller design. Difficult and
inefficient data rebuild in the event of disk
failure.
16
RAID Levels (cont)
Highest read data transaction rate and medium
write data transaction rate. Most complex
controller design. Difficult to rebuild in the
event of a disk failure (compared to RAID level 1)
RAID 6 is essentially an extension of RAID level
5 which allows for additional fault tolerance by
using a second independent distributed parity
scheme (dual parity)
17
Choice of RAID Levels
  • Level 1 provides much better write performance
    than level 5
  • Level 5 requires at least 2 block reads and 2
    block writes to write a single block, whereas
    Level 1 only requires 2 block writes
  • Level 1 preferred for high update environments
    such as log disks
  • Level 1 had higher storage cost than level 5
  • disk drive capacities increasing rapidly
    (50/year) whereas disk access times have
    decreased much less (x 3 in 10 years)
  • I/O requirements have increased greatly, e.g. for
    Web servers
  • When enough disks have been bought to satisfy
    required rate of I/O, they often have spare
    storage capacity
  • so there is often no extra monetary cost for
    Level 1!
  • Level 5 is preferred for applications with low
    update rate,and large amounts of data
  • Level 1 is preferred for all other applications

18
Buffer Management
  • Database can not fit entirely in memory, needs
    memory as a buffer for speed reasons
  • LRU is used in many OS
  • Spatial and temporal locality due to loops
  • Database has a more predictable behavior
  • Example join

19
File Organization
  • The database is stored as a collection of files.
    Each file is a sequence of records. A record is
    sequence of fields
  • Approaches
  • assume record size is fixed
  • each file has records of one particular
    type only
  • different files are used for different
    relations

20
Fixed-Length Records
  • Simple approach
  • Store record i starting from byte n (i -
    1), where n is the size of each record
  • Record access is simple but records may
    cross blocks
  • Deletion of record i Several alternatives
  • move records i 1, . . . , n to i, . . .
    , n - 1
  • move record n to i
  • link all free records on a free list

21
Free Lists
  • Store the address of the first record whose
    contents are deleted in the file header
  • Use this first record to store the address of the
    second available record, and so on
  • Can think of these stored addresses as pointers
    since they point to the location of a record
  • More space efficient representation reuse space
    for normal attributes of free records to store
    pointers. (No pointers stored in in-use records)

22
Variable-Length Records
  • Variable-length records arise in database systems
    in several ways
  • Storage of multiple record types in a file
  • Record types that allow variable lengths
    for one or more fields
  • Record types that allow repeating fields
    (used in some older data models)
  • Byte string representation
  • Attach an end-of-record ( ) control
    character to the end of each record

23
Organization of Records in Files
  • Sequential File Organization
  • Suitable for applications that require sequential
    processing of the entire file. The records in the
    file are ordered by a search-key. Need to
    reorganize the file from time to time to restore
    sequential order.
  • Deletion use pointer chains
  • Insertion must locate the position in the file
    where the record is to be inserted
  • If there is free space insert there
  • If no free space, insert the record in an
    overflow block
  • In either case, pointer chain must be updated
  • Clustering File Organization
  • Simple file structure stores each relation in a
    separate file
  • Can instead store several relations in one file
    using a clustering file organization

24
Exercises (1)
  • Show the structure of the file below after each
    of the following steps
  • a. Insert(Brighton, A-323,1600) b. Delete
    record 2
  • c. Insert( Brighton, A-626,2000)

25
Exercises (2)
  • Show the structure of the file below after each
    of the following steps
  • a. Insert(Mianus, A-101, 2800) b. Insert(
    Brighton, A-323, 1600) c. Delete( Perryridge,
    A-102, 400)

26
  • Thanks!
Write a Comment
User Comments (0)
About PowerShow.com