Title: SEG3550 Fundamentals of Information System
1SEG3550 Fundamentals of Information System
- Tutorial 8
- Storage and File Structure
2Overview
- Storage Hierarchy
- Physical Storage Media
- RAID
- RAID Levels
- File Organization
3Why do we need to know about storage/file
structure
- Many database technologies are developed to
utilize the storage architecture/hierarchy - Data in the database needs to be organized and
stored/retrieved efficiently
4Physical Storage Media
- Physical storage media classified according to
- Data access speed
- Cost per unit of data
- Reliability
- Can differentiate storage into
- Volatile storage Loses contents when power
is switched off - Non-volatile storage Contents persist even
when power is switched off
5Storage Hierarchy
Cache
Volatile primary storage
unit price
Memory
Flash Memory
Secondary storage
Magnetic Disk
Non-volatile
speed
Optical Disk
Tertiary storage
Magnetic Tape
6Primary Storage
- Cache
- Volatile - managed by hardware
- Speed 7 to 20 ns (1 nanosecond 109 seconds)
- Capacity
- A typical PC level 2 cache 64KB-2 MB.
- Within processors, level 1 cache usually ranges
in size from 8 KB to 64 KB. - Main memory
- Volatile
- Speed 10s to 100s of nanoseconds
- Capacity Up to a few Gigabytes
- widely used currently
- per-byte costs have decreased roughly
- factor of 2 every 2 to 3 years)
7Secondary Storage
- Flash memory
- Non-volatile
- Speed read speed similar to main memory. But
writes are slow (few microseconds), erase is
slower. - Capacity 32M to a few Gigabytes currently
- Forms SmartMedia, memory stick, secure digital,
BIOS - Cost roughly same as main memory
- Magnetic-disk
- Non-volatile
- Capacities up to roughly 1 TB(1000 GB) currently
- Data must be moved from disk to main memory for
access, and written back for storage. - Growing constantly and rapidly with technology
- improvements (factor of 2 to 3 every 2
years)
8Tertiary Storage (Non-volatile)
- Optical storage
- CD-ROM (640 MB) and DVD (4.7 to 17 GB) most
popular forms - Reads and writes are slower than with magnetic
disk - Tape storage
- used primarily for backup (to recover from disk
failure), and for archival data - sequential-access much slower than disk
- very high capacity (40 to 300 GB tapes available)
9Between Memory and Disk
- The permanent residency of database is mostly on
disk - In database, cost is usually measured by the
number of disk I/O - But disks are too slow and we need memory to be
the buffers but memory is volatile - this introduced a number of issues
10RAID
- RAID Redundant Arrays of Independent Disks
- Disk organization techniques that manage a large
numbers of disks. - high capacity and high speed by using multiple
disks in parallel, and - high reliability by storing data redundantly
11Mean time to failure (MTTF)
- Average time the disk is expected to run
continuously without any failure. - Typically 3 to 5 years (1 year 8,760 hours)
- MTTF 30,000 to 1,200,000 hours for a new disk
- an MTTF of 1,200,000 hours for a new disk means
that given 1000 relatively new disks, on an
average one will fail every 1200 hours - (assuming by Exponential Distribution)
- When number of disks increase, the chance of some
disk failure increase proportionally
12Parallelism
- Two main goals of parallelism in a disk system
- 1. Load balance multiple small accesses to
increase throughput - 2. Parallelize large accesses to reduce response
time. - Basic strategy Stripping
- Compare and contrast bit stripping and byte
stripping
13Redundancy
- store extra information that can be used to
rebuild information lost in a disk failure - Basic strategy mirroring, parity
- Mean time to data loss depends on mean time to
failure, and mean time to repair
Data Parity
10010010 1
14RAID Levels
Twice the Read transaction rate of single disks,
same Write transaction rate as single disks.
Due to its cost and complexity, level 2 never
really "caught on". Therefore, much of the
information below is based upon theoretical
analysis, not empirical evidence.
15RAID Levels (cont)
Very high read data transfer rate and very high
write data transfer rate, but Controller design
is fairly complex.
Very high read data transaction rate,but quite
complex controller design. Difficult and
inefficient data rebuild in the event of disk
failure.
16RAID Levels (cont)
Highest read data transaction rate and medium
write data transaction rate. Most complex
controller design. Difficult to rebuild in the
event of a disk failure (compared to RAID level 1)
RAID 6 is essentially an extension of RAID level
5 which allows for additional fault tolerance by
using a second independent distributed parity
scheme (dual parity)
17Choice of RAID Levels
- Level 1 provides much better write performance
than level 5 - Level 5 requires at least 2 block reads and 2
block writes to write a single block, whereas
Level 1 only requires 2 block writes - Level 1 preferred for high update environments
such as log disks - Level 1 had higher storage cost than level 5
- disk drive capacities increasing rapidly
(50/year) whereas disk access times have
decreased much less (x 3 in 10 years) - I/O requirements have increased greatly, e.g. for
Web servers - When enough disks have been bought to satisfy
required rate of I/O, they often have spare
storage capacity - so there is often no extra monetary cost for
Level 1! - Level 5 is preferred for applications with low
update rate,and large amounts of data - Level 1 is preferred for all other applications
18Buffer Management
- Database can not fit entirely in memory, needs
memory as a buffer for speed reasons - LRU is used in many OS
- Spatial and temporal locality due to loops
- Database has a more predictable behavior
- Example join
19File Organization
- The database is stored as a collection of files.
Each file is a sequence of records. A record is
sequence of fields - Approaches
- assume record size is fixed
- each file has records of one particular
type only - different files are used for different
relations
20Fixed-Length Records
- Simple approach
- Store record i starting from byte n (i -
1), where n is the size of each record - Record access is simple but records may
cross blocks - Deletion of record i Several alternatives
- move records i 1, . . . , n to i, . . .
, n - 1 - move record n to i
- link all free records on a free list
21Free Lists
- Store the address of the first record whose
contents are deleted in the file header - Use this first record to store the address of the
second available record, and so on - Can think of these stored addresses as pointers
since they point to the location of a record - More space efficient representation reuse space
for normal attributes of free records to store
pointers. (No pointers stored in in-use records)
22Variable-Length Records
- Variable-length records arise in database systems
in several ways - Storage of multiple record types in a file
- Record types that allow variable lengths
for one or more fields - Record types that allow repeating fields
(used in some older data models) - Byte string representation
- Attach an end-of-record ( ) control
character to the end of each record
23Organization of Records in Files
- Sequential File Organization
- Suitable for applications that require sequential
processing of the entire file. The records in the
file are ordered by a search-key. Need to
reorganize the file from time to time to restore
sequential order. - Deletion use pointer chains
- Insertion must locate the position in the file
where the record is to be inserted - If there is free space insert there
- If no free space, insert the record in an
overflow block - In either case, pointer chain must be updated
- Clustering File Organization
- Simple file structure stores each relation in a
separate file - Can instead store several relations in one file
using a clustering file organization
24Exercises (1)
- Show the structure of the file below after each
of the following steps - a. Insert(Brighton, A-323,1600) b. Delete
record 2 - c. Insert( Brighton, A-626,2000)
25Exercises (2)
- Show the structure of the file below after each
of the following steps - a. Insert(Mianus, A-101, 2800) b. Insert(
Brighton, A-323, 1600) c. Delete( Perryridge,
A-102, 400)
26