Title: Parity Declustering for Continous Operation in Redundant Disk Arrays
1Parity Declustering for Continous Operation in
Redundant Disk Arrays
- Mark Holland, Garth A. Gibson
2Purpose of Parity Declustering
- Parity Declustering is designed to balance
cost against data reliability and performance
during failure recovery. - It improves on standard parity organizations
by reducing the additional load on surviving
disks during the reconstruction of a failed
disks contents. And it yields higher user
throughput during recovery, and/or shorter
recovery time.
3Declustered Parity Layout
- RAID is a special case of declustered Parity
Layout, in RAID GC
4Definition of some terms
- Data unit is the minimum amount of contiguous
user data allocated to one disk before any data
is allocated to any other disk. - Parity unit is a block of parity information that
is the size of a data stripe unit. - Parity stripe is the set of data units over
which a parity unit is computed, plus the parity
unit itself. - e.g. An S in the layout is either a data unit or
parity unit. - Four Ss together is called parity stripe.
5Example declustered layout
- Di,j represents one of the four data units in
parity stripe number i,and Pi represents the
parity unit for parity stripe i. - Declustering ratio is defined as a(G-1)/(C-1).
It indicates the fraction of each surviving disk
that must be read during the reconstruction of a
failed disk. - For example, D1.0, D1.1, D1.2, P1 together is a
parity stripe. - So G4 , C 5, a 75
- In RAID5, a 100
6Data layout strategy
- How to layout data in parity declustered disk
arrays? - Our goals
- Single failure correcting No two stripe units in
the same parity stripe may reside on the same
physical disk. - 2. Distributed reconstructionWhen any disk
fails, its user workload should be evenly
distributed across all other disks in the array. - 3. Distributed parity Parity information should
be evenly distributed across the array. - Efficient mappingThe function mapping a file
systems logical block address to physical disk
addresses is efficient. - Large write optimization Dont need 4 access
operation. - Maximal parallelism Read of contiguous data can
have max parallelism.
7Layout strategy
- The distributed reconstrucition criterion
requires that the same number of unites be read
from each surviving disk during the
reconstruction of a failed disk. This will be
achieved if the number of times that a pair of
disks contain stripe units from the same parity
stripe is constant across all pairs of disks.Such
layout can be implemented by balanced incomplete
block design. - A block design is an arrangment of v distinct
objects into b tuples, each containing k
elements, such that each object appears in
exactly r tuples, and each pair of objects
appears in exactly ?p tuples.
8Complete block design
- Its simpler than incomplete block design.
- A block design is called a complete block design
which includes all combinations of exactly k
distinct elements selected from the set of v
objects. The number of these combinations is
.
9Example complete block design
- In this example, we arrange 5 distinct
objects(numbers) into 5 tuples, such that each
object appears in exactly 4 tuples, and each pair
of objects appears in exactly 3 tuples. - e.g. number 0 appears in 4 tuples,
- (0,1) appears in tuple 0, 1, 2.
- (1,4) appears in tuple 1,2,4.
- Its complete because it includes all
combinations of exactly 4 distinct elements
selected from the set of 5 elements.
10Layout with complete block design
- Tuple 00,1,2,3
- Tuple 10,1,2,4
- Tuple 20,1,3,4
- Tuple 30,2,3,4
- Tuple 41,2,3,4
- If we associates disks with objects(numbers) and
parity stripes with tuples. We get - Although its complete, it violates the design
goals 3. It doesnt distributed parity evenly.
Parity on disk 4 is the bottleneck for write
operation.
11- We duplicate previous layout G times, assigning
parity to a different element of each tuple in
each duplication, then we get above full block
design table.
12Problem with full block design
- The size of the block design table may be very
large. So its not guaranteed that the layout
will have an efficient mapping. But its required
by our fourth criterion. - Our fifth and sixth criteria depend on the data
mapping function used by higher levels of
software. - Large-write opimization is guaranteed.
- But parallel read cannot achieve maximal
parallelism. - Thats, not all sets of five adjacent data
units from the mapping, D0.0, D0.1, D0.2, D1.0,
D1.1, D1.2, D2.0 etc., are allocated on five
different disks. Reading five adjacent data units
starting at data unit 0 causes disk 0 and 1 to be
used twice, and disk3 and 4 not at all.
13Problem with full block design
- In addition, in the case the number of disks in
an array( C ) is large relative to the number of
stripe units in a parity stripe( G), the full
block design cannot be implemented. - e.g. a 41 disk array with 20 parity overhead(G
5) allocated by a complete block design will
have about 3,750,000 tuples. It cannot be
implemented, because even large disks rarely have
more than a few million sectors.
14Balanced Incomplete block design
- Our goal is to find a small block design on C
objects with a tuple size of G. Hall presents a
list containing a large number of known block
designs, and states that , within the bounds of
this list, a solution is given in every case
where one is known to exit - Sometimes a balanced incomplete block design with
the required parameters may not be known, we
resort to choosing the closest feasible design
point thats the point which yield a value of a
closest to what is desired.
15Balanced Incomplete block design
- We can choose the closest feasible design point
from the subset of Halls list of design.
16Average reponse time
- These two figure show that, except for writes
with a 0.1, fault-free performance is
essentially independent of parity declustering.
- It may lead to slightly better average response
time in the degraded rather than fault-free
mode.(A user write may induces only one write
access)
17Reconstruction Performance
- Higher user performance during recovery compared
to RAID 5. - Simplest Reconstruction involves a single sweep
through the contents of a failed disk. For each
stripe unit on a replacement disk, the
reconstruction process reads all other stripe
units in the corresponding parity stripe and
computes an exclusive-or on these units. The
resulting unit is then written to the replacement
disk. - The time needed to entirely repair a failed disk
is equal to the time needed to replace it in the
array plus the time needed to reconstruct its
entire contents and store them on the
replacement. - Continuous-operation system require data
availability during reconstruction.
18Four reconstruction algorithm
- Minial-update algorithm No extra work is sent
whenever possible user writes are folded into the
parity unit, and neither reconstruction
optimization is enabled - User-writes algorithmAll user writes explicitly
targeted at the replacement disk are sent
directly to the replacement. - Redirection of readsuser accesses to data that
has already been reconstructed are serviced by
(redirected to ) the replacement disk, rather
than invoking on-the-fly reconstruction as they
would if the data were not yet available. - Piggybacking of writesUser reads that cause
on-the-fly reconstruction also cause the
reconstructed data to be written to the
replacement disk. This is targeted at speeding
reconstruction. - (Redirection of reads and Piggybacking of
writes are proposed by Muntz and Lui.)
19Comparison of four algorithm
- The testing result showed that Muntz and Luis
redirection of reads and redirectpiggyback dont
consistently decrease reconstruction time
relative to the simpler algorithm. - The reason is that loading the replacement disk
with random work penalizes the reconstruction
writes to this disk more than off-loading
benefits the surviving disks unless the surviving
disks are highly utilized. - Even a small amount of random load imposed on the
replacement disk may greatly increase its average
access times because reconstruction writes are
sequential and dont require long seeks.
20 Conclusion
- We demonstrated
- Parity declustering, a strategy for allocating
parity in a single-failure-correcting redundant
disk array that trades increased parity overhead
for reduced user-performance degradation during
on-line failure recovery, can be effectively
implemented in array-controlling software. - Using block design to map parity stripes onto a
disk array insures that both the parity update
load and the on-line reconstruction load is
balanced over all disks in the array.
21Questions
- 1.Whats parity declustering?
- 2. Whats the data layout goals?
- 3. Whats the disadvantage of complete block
design?