Using Secondary Storage Effectively - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Using Secondary Storage Effectively

Description:

When implementing a DBMS, one must assume that the data does not fit into main memory. ... Output blocks are written one at a time, interspersed with block reads ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 23
Provided by: scie232
Category:

less

Transcript and Presenter's Notes

Title: Using Secondary Storage Effectively


1
Using Secondary Storage Effectively
  • In most studies of algorithms, one assumes the
    "RAM model
  • The data is in main memory,
  • Access to any item of data takes as much time as
    any other.
  • When implementing a DBMS, one must assume that
    the data does not fit into main memory.
  • In designing efficient algorithms, one must take
    into account the use of secondary, and perhaps
    even tertiary storage.
  • Often, the best algorithms for processing very
    large amounts of data differ from the best
    main-memory algorithms for the same problem.
  • There is a great advantage in choosing an
    algorithm that uses few disk accesses, even if
    the algorithm is not very efficient when viewed
    as a main-memory algorithm.

2
Assumptions (for now)
  • One processor
  • One disk controller, and one disk.
  • The database itself is much too large to fit in
    main memory.
  • Many users, and each user issues disk-I/O
    requests frequently,
  • Disk controller serving on a first-come-first-serv
    ed basis.
  • We will change this assumption later.
  • Thus, each request for a given user will appear
    random
  • Even if the relation that a user is reading is
    stored on a single cylinder of the disk.

3
I/O model of computation
  • Disk I/O read or write of a block is very
    expensive compared with what is likely to be done
    with the block once it arrives in main memory.
  • Perhaps 1,000,000 machine instructions in the
    time to do one random disk I/O.
  • Random block accesses is the norm if there are
    several processes accessing disks, and the disk
    controller does not schedule accesses carefully.
  • Reasonable model of computation that requires
    secondary storage count only the disk I/O's.

In examples, we shall assume that the disk is a
Megatron 747, with 16Kbyte blocks and the timing
characteristics determined before. In
particular, the average time to read or write a
block is about 11ms
4
Good DBMS algorithms
  • Try to make sure that if we read a block, we use
    much of the data on the block.
  • Try to put blocks that are accessed together on
    the same cylinder.
  • Try to buffer commonly used blocks in main memory.

5
Sorting Example
  • Setup
  • 10,000,000 records of 160 bytes 1.6Gb file.
  • Stored on Megatron 747 disk, with 16Kb blocks,
    each holding 100 records
  • Entire file takes 100,000 blocks
  • 100M available main memory
  • The number of blocks that can fit in 100M bytes
    of memory (which, recall, is really 100 x 220
    bytes), is 100 x 220/214, or 6400 blocks ?1/16th
    of file.
  • Sort by primary key field.

6
Merge Sort
  • Common mainmemory sorting algorithms don't look
    so good when you take disk I/O's into account.
    Variants of Merge Sort do better.
  • Merge take two sorted lists and repeatedly
    chose the smaller of the heads of the lists
    (head first of the unchosen).
  • Example merge 1,3,4,8 with 2,5,7,9
    1,2,3,4,5,7,8,9.
  • Merge Sort based on recursive algorithm divide
    records into two parts recursively mergesort the
    parts, and merge the resulting lists.

7
TwoPhase, Multiway Merge Sort
  • Merge Sort still not very good in disk I/O model.
  • log2n passes, so each record is read/written from
    disk log2n times.
  • The secondary memory algorithms operate in a
    small number of passes
  • in one pass every record is read into main memory
    once and written out to disk once.
  • 2PMMS 2 reads 2 writes per block.
  • Phase 1
  • 1. Fill main memory with records.
  • 2. Sort using favorite mainmemory sort.
  • 3. Write sorted sublist to disk.
  • 4. Repeat until all records have been put into
    one of the sorted lists.

8
Phase 2
  • Use one buffer for each of the sorted sublists
    and one buffer for an output block.
  • Initially load input buffers with the first
    blocks of their respective sorted lists.
  • Repeatedly run a competition among the first
    unchosen records of each of the buffered blocks.
  • Move the record with the least key to the output
    block it is now chosen.
  • Manage the buffers as needed
  • If an input block is exhausted, get the next
    block from the same file.
  • If the output block is full, write it to disk.

9
Analysis Phase 1
  • 6400 of the 100,000 blocks will fill main memory.
  • We thus fill memory ?100,000/6,400?16 times,
    sort the records in main memory, and write the
    sorted sublists out to disk.
  • How long does this phase take?
  • We read each of the 100,000 blocks once, and we
    write 100,000 new blocks. Thus, there are 200,000
    disk I/O's for 200,00011ms 2200 seconds, or
    37 minutes.

Avg. time for reading a block.
10
Analysis Phase 2
  • In the second phase, unlike the first phase, the
    blocks are read in an unpredictable order, since
    we cannot tell when an input block will become
    exhausted.
  • However, notice that every block holding records
    from one of the sorted lists is read from disk
    exactly once.
  • Thus, the total number of block reads is 100,000
    in the second phase, just as for the first.
  • Likewise, each record is placed once in an output
    block, and each of these blocks is written to
    disk. Thus, the number of block writes in the
    second phase is also 100,000.
  • We conclude that the second phase takes another
    37 minutes.
  • Total Phase 1 Phase 2 74 minutes.

11
How Big Should Blocks Be?
  • We have assumed a 16K byte block in our analysis
    of algorithms using the Megatron 747 disk.
  • However, there are arguments that a larger block
    size would be advantageous.
  • Recall that it takes about a quarter of a
    millisecond (0.25ms) for transfer time of a 16K
    block and 10.63 milliseconds for average seek
    time and rotational latency.
  • If we doubled the size of blocks, we would halve
    the number of disk I/O's.
  • On the other hand, the only change in the time to
    access a block would be that the transfer time
    increases to 0.2520.50 millisecond.
  • We would thus approximately halve the time the
    sort takes.
  • For a block size of 512K (i.e., an entire track
    of the Megatron 747) the transfer time is
    0.25328 milliseconds.
  • At that point, the average block access time
    would be 20 milliseconds, but we would need only
    12,500 block accesses, for a speedup in sorting
    by a factor of 14.

12
Reasons to limit the block size
  • First, we cannot use blocks that cover several
    tracks effectively.
  • Second, small relations would occupy only a
    fraction of a block, so large blocks would waste
    space on the disk.
  • There are also certain data structures for
    secondary storage organization that prefer to
    divide data among many blocks and thereforework
    less well when the block size is too large.
  • In fact, the larger the blocks are, the fewer
    records we can sort by 2PMMS.
  • Nevertheless, as machines get faster and disks
    more capacious, there is a tendency for block
    sizes to grow.

13
How many records can we sort?
  • The block size is B bytes.
  • The main memory available for buffering blocks is
    M bytes.
  • Records take R bytes.
  • Number of main memory buffers M/B blocks
  • We need one output buffer, so we can actually use
    (M/B)-1 input buffers.
  • How many sorted sublists that makes sense to
    produce?
  • (M/B)-1.
  • Whats the total number of records we can sort?
  • Each time we fill in the memory we sort M/R
    records.
  • Hence, we are able to sort (M/R)(M/B)-1 or
    approximately M2/RB.
  • If we use the parameters in the example about
    TPMMS we have
  • M100MB 100,000,000 Bytes 108 Bytes
  • B 16,384 Bytes
  • R 160 Bytes
  • So, M2/RB (108)2 / (160 16,384) 4.2 billion
    records, or 2/3 of a TeraByte.

14
Sorting larger relations
  • If our relation is bigger, then, we can use 2PMMS
    to create M2/RB sorted sublists.
  • Then, in a third pass we can merge (M/B)-1 of
    these sorted sublists.
  • The third phase lets us sort
  • (M/B)-1M2/RB ? M3/RB2 records
  • For our example, the third phase lets us sort 75
    trillion records occupying 7500 Petabytes!!

15
Improving the Running Time of 2PMMS
  • Here are some techniques that sometimes make
    secondarymemory algorithms more efficient
  • Group blocks by cylinder.
  • One big disk ? several smaller disks.
  • Mirror disks multiple copies of the same data.
  • Prefetching'' or double buffering.
  • Disk scheduling the elevator algorithm.

16
Cylindrification
  • If we are going to read or write blocks in a
    known order, place them by cylinder, so once we
    reach that cylinder we can read block after
    block, with no seek time or rotational latency.
  • Application to Phase 1 of 2PMMS
  • 1. Initially, records on 196 consecutive
    cylinders.
  • 2. Load main memory from 13 consecutive
    cylinders.
  • Order of blocks read is unimportant, so only time
    besides transfer time is one random seek and 12
    1cylinder seeks (neglect).
  • Time to transfer 6,400 blocks at 0.25 ms/block
    1.60 sec.
  • 3. Write each sorted list onto 13 consecutive
    cylinders, so write time is also about 1.60 sec.
  • Total for Phase1 about (1.60)?2?1652 sec
  • But in Phase 2 ?

17
Cylindrification Phase 2
  • Storage by cylinders does not help in the second
    phase.
  • Blocks are read from the fronts of the sorted
    lists in an order that is determined by which
    list next exhausts its current block.
  • Output blocks are written one at a time,
    interspersed with block reads
  • Thus, the second phase will still take 37 min.
  • We have cut the sorting time almost half, but
    cannot do better by cylindrification alone.

18
Multiple Disks
  • Use several disks with independent heads
  • Example Instead of a large disk of 8GB (Megatron
    747), lets use 4 smaller disks of 2GB each
    (Megatron 737)
  • We divide the given records among the four disks
    the data will occupy 196 adjacent cylinders on
    each disk.
  • Phase 1 of 2PMMS Load in main memory from all 4
    disks in parallel.
  • Time for the fill of ¼ of the memory 1/4 1.60
    Sec 0.4 Sec
  • Since, we read in parallel during that time we
    fill from the other disks, the other ¾ of the
    memory.
  • Hence, in total, one full fill of the memory
    takes 0.4 Sec.
  • We do this 16 times, which means 0.4 16
    6.4sec
  • Plus 6.4sec for writing for a total of 12.8sec in
    phase 1.

19
Multiple Disks Phase 2
  • Once a fill is sorted in main memory write the
    blocks onto the 4 disks
  • Phase 1 16 (0.4sec 0.4sec) 12.8sec
  • It was 52sec when having 1 diskcylindrification.
  • Phase 2 use 4 output buffers one per disk and
    and cut writing time in about 1/4.
  • However, what about the reading part?
  • If we are careful about timing we could manage to
    read from four different sorted lists whose
    previous blocks were exhausted.
  • This cuts the reading time in about 1/2 to 1/3.
  • Total time for phase 2 is about 37/218min.
  • Total time for both phases ? 18min

20
Mirror Disks
  • Mirror disk identical copy of disk
  • Improves reliability (when one crashes) at the
    expense of disk copies.
  • With n copies we can handle n reads in time equal
    to 1 read
  • A read can be done from the disk with the
    shortest seek time (i.e., closer head)
  • Writing no speedup, no slowdown compared to
    single disk (why?)

21
Mirroring disks (Contd)
  • Writing no speedup, no slowdown
  • That is because whenever we need to write a
    block, we write it on all disks that have a copy.
  • Since the writing can take place in parallel, the
    elapsed time is about the same as for writing to
    a single disk.
  • Obvious minus extra disk cost.
  • In the multiple disks example, if we were
    careful about timing we could manage to read from
    four different sorted lists whose previous blocks
    were exhausted. With mirroring disks way we are
    guaranteed for that.

22
Prefetching and large scale buffering
  • If we have extra space for mainmemory buffers,
    consider loading buffers in advance of need.
    Similarly, keep output blocks buffered in main
    memory until it is convenient to write them to
    disk.
  • Example Phase 2 of 2PMMS
  • With 128Mb of main memory, we can afford to
    buffer 2 cylinders for each of 16 sublists and
    for the output (one cylinder 4Mb).
  • Consume one cylinder for each sublist while the
    other is being loaded from main memory.
  • Similarly, write one output cylinder while the
    other is being constructed.
  • Thus, seek and rotational latency are almost
    eliminated, lowering total read and write times
    to about 27sec each.
Write a Comment
User Comments (0)
About PowerShow.com