SEG3550 Fundamentals of Information System - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

SEG3550 Fundamentals of Information System

Description:

Why do we need to know about storage/file structure. Many database technologies are developed to utilize the storage architecture/hierarchy ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 27

Provided by: huan9

Category:

more less

Transcript and Presenter's Notes

Title: SEG3550 Fundamentals of Information System

1
SEG3550 Fundamentals of Information System

Tutorial 8
Storage and File Structure

2
Overview

Storage Hierarchy
Physical Storage Media
RAID
RAID Levels
File Organization

3
Why do we need to know about storage/file
structure

Many database technologies are developed to
utilize the storage architecture/hierarchy
Data in the database needs to be organized and
stored/retrieved efficiently

4
Physical Storage Media

Physical storage media classified according to
Data access speed
Cost per unit of data
Reliability
Can differentiate storage into
Volatile storage Loses contents when power
is switched off
Non-volatile storage Contents persist even
when power is switched off

5
Storage Hierarchy
Cache
Volatile primary storage
unit price
Memory
Flash Memory
Secondary storage
Magnetic Disk
Non-volatile
speed
Optical Disk
Tertiary storage
Magnetic Tape
6
Primary Storage

Cache
Volatile - managed by hardware
Speed 7 to 20 ns (1 nanosecond 109 seconds)
Capacity
A typical PC level 2 cache 64KB-2 MB.
Within processors, level 1 cache usually ranges
in size from 8 KB to 64 KB.
Main memory
Volatile
Speed 10s to 100s of nanoseconds
Capacity Up to a few Gigabytes
widely used currently
per-byte costs have decreased roughly
factor of 2 every 2 to 3 years)

7
Secondary Storage

Flash memory
Non-volatile
Speed read speed similar to main memory. But
writes are slow (few microseconds), erase is
slower.
Capacity 32M to a few Gigabytes currently
Forms SmartMedia, memory stick, secure digital,
BIOS
Cost roughly same as main memory
Magnetic-disk
Non-volatile
Capacities up to roughly 1 TB(1000 GB) currently
Data must be moved from disk to main memory for
access, and written back for storage.
Growing constantly and rapidly with technology
improvements (factor of 2 to 3 every 2
years)

8
Tertiary Storage (Non-volatile)

Optical storage
CD-ROM (640 MB) and DVD (4.7 to 17 GB) most
popular forms
Reads and writes are slower than with magnetic
disk
Tape storage
used primarily for backup (to recover from disk
failure), and for archival data
sequential-access much slower than disk
very high capacity (40 to 300 GB tapes available)

9
Between Memory and Disk

The permanent residency of database is mostly on
disk
In database, cost is usually measured by the
number of disk I/O
But disks are too slow and we need memory to be
the buffers but memory is volatile
this introduced a number of issues

10
RAID

RAID Redundant Arrays of Independent Disks
Disk organization techniques that manage a large
numbers of disks.
high capacity and high speed by using multiple
disks in parallel, and
high reliability by storing data redundantly

11
Mean time to failure (MTTF)

Average time the disk is expected to run
continuously without any failure.
Typically 3 to 5 years (1 year 8,760 hours)
MTTF 30,000 to 1,200,000 hours for a new disk
an MTTF of 1,200,000 hours for a new disk means
that given 1000 relatively new disks, on an
average one will fail every 1200 hours
(assuming by Exponential Distribution)
When number of disks increase, the chance of some
disk failure increase proportionally

12
Parallelism

Two main goals of parallelism in a disk system
1. Load balance multiple small accesses to
increase throughput
2. Parallelize large accesses to reduce response
time.
Basic strategy Stripping
Compare and contrast bit stripping and byte
stripping

13
Redundancy

store extra information that can be used to
rebuild information lost in a disk failure
Basic strategy mirroring, parity
Mean time to data loss depends on mean time to
failure, and mean time to repair

Data Parity
10010010 1
14
RAID Levels
Twice the Read transaction rate of single disks,
same Write transaction rate as single disks.
Due to its cost and complexity, level 2 never
really "caught on". Therefore, much of the
information below is based upon theoretical
analysis, not empirical evidence.
15
RAID Levels (cont)
Very high read data transfer rate and very high
write data transfer rate, but Controller design
is fairly complex.
Very high read data transaction rate,but quite
complex controller design. Difficult and
inefficient data rebuild in the event of disk
failure.
16
RAID Levels (cont)
Highest read data transaction rate and medium
write data transaction rate. Most complex
controller design. Difficult to rebuild in the
event of a disk failure (compared to RAID level 1)
RAID 6 is essentially an extension of RAID level
5 which allows for additional fault tolerance by
using a second independent distributed parity
scheme (dual parity)
17
Choice of RAID Levels

Level 1 provides much better write performance
than level 5
Level 5 requires at least 2 block reads and 2
block writes to write a single block, whereas
Level 1 only requires 2 block writes
Level 1 preferred for high update environments
such as log disks
Level 1 had higher storage cost than level 5
disk drive capacities increasing rapidly
(50/year) whereas disk access times have
decreased much less (x 3 in 10 years)
I/O requirements have increased greatly, e.g. for
Web servers
When enough disks have been bought to satisfy
required rate of I/O, they often have spare
storage capacity
so there is often no extra monetary cost for
Level 1!
Level 5 is preferred for applications with low
update rate,and large amounts of data
Level 1 is preferred for all other applications

18
Buffer Management

Database can not fit entirely in memory, needs
memory as a buffer for speed reasons
LRU is used in many OS
Spatial and temporal locality due to loops
Database has a more predictable behavior
Example join

19
File Organization

The database is stored as a collection of files.
Each file is a sequence of records. A record is
sequence of fields
Approaches
assume record size is fixed
each file has records of one particular
type only
different files are used for different
relations

20
Fixed-Length Records

Simple approach
Store record i starting from byte n (i -
1), where n is the size of each record
Record access is simple but records may
cross blocks
Deletion of record i Several alternatives
move records i 1, . . . , n to i, . . .
, n - 1
move record n to i
link all free records on a free list

21
Free Lists

Store the address of the first record whose
contents are deleted in the file header
Use this first record to store the address of the
second available record, and so on
Can think of these stored addresses as pointers
since they point to the location of a record
More space efficient representation reuse space
for normal attributes of free records to store
pointers. (No pointers stored in in-use records)

22
Variable-Length Records

Variable-length records arise in database systems
in several ways
Storage of multiple record types in a file
Record types that allow variable lengths
for one or more fields
Record types that allow repeating fields
(used in some older data models)
Byte string representation
Attach an end-of-record ( ) control
character to the end of each record

23
Organization of Records in Files

Sequential File Organization
Suitable for applications that require sequential
processing of the entire file. The records in the
file are ordered by a search-key. Need to
reorganize the file from time to time to restore
sequential order.
Deletion use pointer chains
Insertion must locate the position in the file
where the record is to be inserted
If there is free space insert there
If no free space, insert the record in an
overflow block
In either case, pointer chain must be updated
Clustering File Organization
Simple file structure stores each relation in a
separate file
Can instead store several relations in one file
using a clustering file organization

24
Exercises (1)

Show the structure of the file below after each
of the following steps
a. Insert(Brighton, A-323,1600) b. Delete
record 2
c. Insert( Brighton, A-626,2000)

25
Exercises (2)

Show the structure of the file below after each
of the following steps
a. Insert(Mianus, A-101, 2800) b. Insert(
Brighton, A-323, 1600) c. Delete( Perryridge,
A-102, 400)