Parity Declustering for Continous Operation in Redundant Disk Arrays

1 / 21
About This Presentation
Title:

Parity Declustering for Continous Operation in Redundant Disk Arrays

Description:

... to balance cost against data reliability and performance during failure recovery. ... Higher user performance during recovery compared to RAID 5. ... –

Number of Views:69
Avg rating:3.0/5.0
Slides: 22
Provided by: brajend
Category:

less

Transcript and Presenter's Notes

Title: Parity Declustering for Continous Operation in Redundant Disk Arrays


1
Parity Declustering for Continous Operation in
Redundant Disk Arrays
  • Mark Holland, Garth A. Gibson

2
Purpose of Parity Declustering
  • Parity Declustering is designed to balance
    cost against data reliability and performance
    during failure recovery.
  • It improves on standard parity organizations
    by reducing the additional load on surviving
    disks during the reconstruction of a failed
    disks contents. And it yields higher user
    throughput during recovery, and/or shorter
    recovery time.

3
Declustered Parity Layout
  • RAID is a special case of declustered Parity
    Layout, in RAID GC

4
Definition of some terms
  • Data unit is the minimum amount of contiguous
    user data allocated to one disk before any data
    is allocated to any other disk.
  • Parity unit is a block of parity information that
    is the size of a data stripe unit.
  • Parity stripe is the set of data units over
    which a parity unit is computed, plus the parity
    unit itself.
  • e.g. An S in the layout is either a data unit or
    parity unit.
  • Four Ss together is called parity stripe.

5
Example declustered layout
  • Di,j represents one of the four data units in
    parity stripe number i,and Pi represents the
    parity unit for parity stripe i.
  • Declustering ratio is defined as a(G-1)/(C-1).
    It indicates the fraction of each surviving disk
    that must be read during the reconstruction of a
    failed disk.
  • For example, D1.0, D1.1, D1.2, P1 together is a
    parity stripe.
  • So G4 , C 5, a 75
  • In RAID5, a 100

6
Data layout strategy
  • How to layout data in parity declustered disk
    arrays?
  • Our goals
  • Single failure correcting No two stripe units in
    the same parity stripe may reside on the same
    physical disk.
  • 2. Distributed reconstructionWhen any disk
    fails, its user workload should be evenly
    distributed across all other disks in the array.
  • 3. Distributed parity Parity information should
    be evenly distributed across the array.
  • Efficient mappingThe function mapping a file
    systems logical block address to physical disk
    addresses is efficient.
  • Large write optimization Dont need 4 access
    operation.
  • Maximal parallelism Read of contiguous data can
    have max parallelism.

7
Layout strategy
  • The distributed reconstrucition criterion
    requires that the same number of unites be read
    from each surviving disk during the
    reconstruction of a failed disk. This will be
    achieved if the number of times that a pair of
    disks contain stripe units from the same parity
    stripe is constant across all pairs of disks.Such
    layout can be implemented by balanced incomplete
    block design.
  • A block design is an arrangment of v distinct
    objects into b tuples, each containing k
    elements, such that each object appears in
    exactly r tuples, and each pair of objects
    appears in exactly ?p tuples.

8
Complete block design
  • Its simpler than incomplete block design.
  • A block design is called a complete block design
    which includes all combinations of exactly k
    distinct elements selected from the set of v
    objects. The number of these combinations is
    .

9
Example complete block design
  • In this example, we arrange 5 distinct
    objects(numbers) into 5 tuples, such that each
    object appears in exactly 4 tuples, and each pair
    of objects appears in exactly 3 tuples.
  • e.g. number 0 appears in 4 tuples,
  • (0,1) appears in tuple 0, 1, 2.
  • (1,4) appears in tuple 1,2,4.
  • Its complete because it includes all
    combinations of exactly 4 distinct elements
    selected from the set of 5 elements.

10
Layout with complete block design
  • Tuple 00,1,2,3
  • Tuple 10,1,2,4
  • Tuple 20,1,3,4
  • Tuple 30,2,3,4
  • Tuple 41,2,3,4
  • If we associates disks with objects(numbers) and
    parity stripes with tuples. We get
  • Although its complete, it violates the design
    goals 3. It doesnt distributed parity evenly.
    Parity on disk 4 is the bottleneck for write
    operation.

11
  • We duplicate previous layout G times, assigning
    parity to a different element of each tuple in
    each duplication, then we get above full block
    design table.

12
Problem with full block design
  • The size of the block design table may be very
    large. So its not guaranteed that the layout
    will have an efficient mapping. But its required
    by our fourth criterion.
  • Our fifth and sixth criteria depend on the data
    mapping function used by higher levels of
    software.
  • Large-write opimization is guaranteed.
  • But parallel read cannot achieve maximal
    parallelism.
  • Thats, not all sets of five adjacent data
    units from the mapping, D0.0, D0.1, D0.2, D1.0,
    D1.1, D1.2, D2.0 etc., are allocated on five
    different disks. Reading five adjacent data units
    starting at data unit 0 causes disk 0 and 1 to be
    used twice, and disk3 and 4 not at all.

13
Problem with full block design
  • In addition, in the case the number of disks in
    an array( C ) is large relative to the number of
    stripe units in a parity stripe( G), the full
    block design cannot be implemented.
  • e.g. a 41 disk array with 20 parity overhead(G
    5) allocated by a complete block design will
    have about 3,750,000 tuples. It cannot be
    implemented, because even large disks rarely have
    more than a few million sectors.

14
Balanced Incomplete block design
  • Our goal is to find a small block design on C
    objects with a tuple size of G. Hall presents a
    list containing a large number of known block
    designs, and states that , within the bounds of
    this list, a solution is given in every case
    where one is known to exit
  • Sometimes a balanced incomplete block design with
    the required parameters may not be known, we
    resort to choosing the closest feasible design
    point thats the point which yield a value of a
    closest to what is desired.

15
Balanced Incomplete block design
  • We can choose the closest feasible design point
    from the subset of Halls list of design.

16
Average reponse time
  • These two figure show that, except for writes
    with a 0.1, fault-free performance is
    essentially independent of parity declustering.
  • It may lead to slightly better average response
    time in the degraded rather than fault-free
    mode.(A user write may induces only one write
    access)

17
Reconstruction Performance
  • Higher user performance during recovery compared
    to RAID 5.
  • Simplest Reconstruction involves a single sweep
    through the contents of a failed disk. For each
    stripe unit on a replacement disk, the
    reconstruction process reads all other stripe
    units in the corresponding parity stripe and
    computes an exclusive-or on these units. The
    resulting unit is then written to the replacement
    disk.
  • The time needed to entirely repair a failed disk
    is equal to the time needed to replace it in the
    array plus the time needed to reconstruct its
    entire contents and store them on the
    replacement.
  • Continuous-operation system require data
    availability during reconstruction.

18
Four reconstruction algorithm
  • Minial-update algorithm No extra work is sent
    whenever possible user writes are folded into the
    parity unit, and neither reconstruction
    optimization is enabled
  • User-writes algorithmAll user writes explicitly
    targeted at the replacement disk are sent
    directly to the replacement.
  • Redirection of readsuser accesses to data that
    has already been reconstructed are serviced by
    (redirected to ) the replacement disk, rather
    than invoking on-the-fly reconstruction as they
    would if the data were not yet available.
  • Piggybacking of writesUser reads that cause
    on-the-fly reconstruction also cause the
    reconstructed data to be written to the
    replacement disk. This is targeted at speeding
    reconstruction.
  • (Redirection of reads and Piggybacking of
    writes are proposed by Muntz and Lui.)

19
Comparison of four algorithm
  • The testing result showed that Muntz and Luis
    redirection of reads and redirectpiggyback dont
    consistently decrease reconstruction time
    relative to the simpler algorithm.
  • The reason is that loading the replacement disk
    with random work penalizes the reconstruction
    writes to this disk more than off-loading
    benefits the surviving disks unless the surviving
    disks are highly utilized.
  • Even a small amount of random load imposed on the
    replacement disk may greatly increase its average
    access times because reconstruction writes are
    sequential and dont require long seeks.

20
Conclusion
  • We demonstrated
  • Parity declustering, a strategy for allocating
    parity in a single-failure-correcting redundant
    disk array that trades increased parity overhead
    for reduced user-performance degradation during
    on-line failure recovery, can be effectively
    implemented in array-controlling software.
  • Using block design to map parity stripes onto a
    disk array insures that both the parity update
    load and the on-line reconstruction load is
    balanced over all disks in the array.

21
Questions
  • 1.Whats parity declustering?
  • 2. Whats the data layout goals?
  • 3. Whats the disadvantage of complete block
    design?
Write a Comment
User Comments (0)
About PowerShow.com