Recap of Feb 25: Physical Storage Media - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

Recap of Feb 25: Physical Storage Media

Description:

Primary storage (volatile): Cache, Main Memory. Secondary or On-line storage (non ... These concerns are linked in to our next topic: file organization. ... – PowerPoint PPT presentation

Number of Views:161

Avg rating:3.0/5.0

Slides: 14

Provided by: david227

Category:

more less

Transcript and Presenter's Notes

Title: Recap of Feb 25: Physical Storage Media

1
Recap of Feb 25 Physical Storage Media

Issues are speed, cost, reliability
Media types
Primary storage (volatile) Cache, Main Memory
Secondary or On-line storage (non-volatile)
Flash Memory, Mag Disk
Tertiary or Off-line storage (non-volatile)
Optical Storage, Tape Storage
Mag disk issues
definitions sector, track, cylinder
disk controllers, multiple disks
disk performance measures (seek time, rotational
latency, data transfer rate, MTTF)
Now we start with Optimization of Disk-Block
Access

2
Optimization of Disk-Block Access Motivation

Requests for disk I/O are generated both by the
file system and by the virtual memory manager
Each request specifies the address on the disk to
be referenced in the form of a block number
a block is a contiguous sequence of sectors from
a single track on one platter
block sizes range from 512 bytes to several K (4
-- 16K is typical)
smaller blocks mean more transfers from disk
larger blocks makes for more wasted space due to
partially filled blocks
block is the standard unit of data transfer
between disk to main memory
Since disk access speed is much slower than main
memory access, methods for optimizing disk-block
access are important

3
Optimization of Disk-Block Access Methods

Disk-arm Scheduling requests for several blocks
may be speeded up by requesting them in the order
they will pass under the head.
If the blocks are on different cylinders, it is
advantageous to ask for them in an order that
minimizes disk-arm movement
Elevator algorithm -- move the disk arm in one
direction until all requests from that direction
are satisfied, then reverse and repeat
Sequential access is 1-2 orders of magnitude
faster random access is about 2 orders of
magnitude slower

4
Optimization of Disk-Block Access Methods

Non-volatile write buffers
store written data in a RAM buffer rather than on
disk
write the buffer whenever it becomes full or when
no other disk requests are pending
buffer must be non-volatile to protect from power
failure
called non-volatile random-access memory
(NV-RAM)
typically implemented with battery-backed-up RAM
dramatic speedup on writes with a
reasonable-sized buffer write latency essentially
disappears
why cant we do the same for reads? (hints ESP,
clustering)

5
Optimization of Disk-Block Access Methods

File organization (Clustering) reduce access
time by organizing blocks on disk in a way that
corresponds closely to the way we expect them to
be accessed
sequential files should be kept organized
sequentially
hierarchical files should be organized with
mothers next to daughters
for joining tables (relations) put the joining
tuples next to each other
over time fragmentation can become an issue
restoration of disk structure (copy and rewrite,
reordered) controls fragmentation

6
Optimization of Disk-Block Access Methods

Log-based file system
does not update in-place, rather writes updates
to a log disk
essentially, a disk functioning as a non-volatile
RAM write buffer
all access in the log disk is sequential,
eliminating seek time
eventually updates must be propogated to the
original blocks
as with NV-RAM write buffers, this can occur at a
time when no disk requests are pending
the updates can be ordered to minimize arm
movement
this can generate a high degree of fragmentation
on files that require constant updates
fragmentation increases seek time for sequential
reading of files

7
Storage Access (11.5)

Basic concepts (some already familiar)
block-based. A block is a contiguous sequence of
sectors from a single track blocks are units of
both storage allocation and data transfer
a file is a sequence of records stored in
fixed-size blocks (pages) on the disk
each block (page) has a unique address called
BID
optimization is done by reducing I/O, seek time,
etc.
database systems seek to minimize the number of
block transfers between the disk and memory. We
can reduce the number of disk accesses by keeping
as many blocks as possible in main memory.
Buffer - portion of main memory used to store
copies of disk blocks
buffer manager - subsystem responsible for
allocating buffer space in main memory and
handling block transfer between buffer and disk

8
Buffer Management

The buffer pool is the part of the main memory
alocated for temporarily storing disk blocks read
from disk and made available to the CPU
The buffer manager is the subsystem responsible
for the allocation and the management of the
buffer space (transparent to users)
On a process (user) request for a block (page)
the buffer manager
checks to see if the page is already in the
buffer pool
if so, passes the address to the process
if not, it loads the page from disk and then
passes the address to the process
loading a page might require clearing (writing
out) a page to make space
Very similar to the way virtual memory managers
work, although it can do a lot better (why?)

9
Buffer Replacement Strategies

Most operating systems use a LRU replacement
scheme. In database environments, MRU is better
for some common operations (e.g., join)
LRU strategy replace the least recently used
block
MRU strategy replace the most recently used
block
Sometimes it is useful to fasten or pin blocks to
keep them available during an operation and not
let the replacement strategy touch them
pinned block is thus a block that is not allowed
to be written back to disk
There are situations where it is necessary to
write back a block to disk even though the buffer
space it occupies is not yet needed. This write
is called the forced output of a block useful in
recovery situations
Toss-immediate strategy free the space occupied
by a block as soon as the final tuple of that
block has been processed

10
Buffer Replacement Strategies

Most recently used (MRU) strategy system must
pin the block currently being processed. After
the final tuple of that block has been processed
the block is unpinned and becomes the most
recently used block. This is essentially
toss-immediate with pinning, and works very
well with joins.
The buffer manager can often use other
information (design or statistical) to predict
the probability that a request will reference a
particular page
e.g., the data dictionary is frequently accessed
-- keep the data dictionary blocks in main memory
buffer
if several pages are available for overwrite
choose the one that has the lowest number of
recent access requests to replace

11
Buffer Management (cont)

Existing OS affect DBMS operations by
read ahead, write behind
wrong replacement strategies
Unix is not good for DBMS to run on top
Most commercial systems implement their own I/O
on a raw disk partition
Variations of buffer allocation
common buffer pool for all relations
separate buffer pool for each relation
as above but with relations borrowing space from
each other
prioritized buffers for very frequently accessed
blocks, e.g. data dictionary

12
Buffer Management (cont)

For each buffer the manager keeps the following
which disk and which block it is in
whether the block is dirty (has been modified) or
not (why?)
information for the replacement strategy
last time block was accessed
whether it is pinned
possible statistical information (access
frequency etc.)

13
Buffer Management and Disk-block Access
Optimization (end)

Disk-block access methods must take care of some
information within each block, as well as
information about each block
allocate records (tuples) within blocks
support record addressing by address and by
value
support auxiliary (secondary indexing) file
structures for more efficient processing
These concerns are linked in to our next topic
file organization.