Title: Lecture 4: A Case for RAID (Part 2)
1Lecture 4 A Case for RAID (Part 2)
- Prof. Shahram Ghandeharizadeh
- Computer Science Department
- University of Southern California
2Smaller Inexpensive Disks
- 25 annual reduction in size 40 annual drop in
price
1 GB, Year 2008 IBM Microdrive _at_ 125
1 GB, Year 1980 IBM 3380 _at_ 40,000
1 inch in height, weighs 1 ounce (16 grams)
Size of a refrigerator, 550 pounds (250 Kg)
3Inexpensive Disks
- Less than 9 Cents / Gigabyte of storage
4Challenge Managing Data is Expensive
- Cost of Managing Data is 100K/TB/Year
- High availability Down time is estimated at
thousands of dollars per minute. - Data loss results in lost productivity
- 20 Megabytes of accounting data requires 21 days
and costs 19K to reproduce. - 50 of companies that lose their data due to a
disaster never re-open 90 go out of business in
2 years!
5Challenge Managing Data is Expensive
- Cost of Managing Data is 100K/TB/Year
- High availability Down time is estimated at
thousands of dollars per minute. - Data loss results in lost productivity
- 20 Megabytes of accounting data requires 21 days
and costs 19K to reproduce. - 50 of companies that lose their data due to a
disaster never re-open 90 go out of business in
2 years!
RAID
6MTTF, MTBF, MTTR, AFR
- MTBF Mean Time Between Failures
- Designed for repairable devices
- Number of hours since the system was started
until its failure. - MTTF Mean Time To Failures
- Designed for non-repairable devices such as
magnetic disk drives - Disks of 2008 are more than 40 times more
reliable than disks of 1988. - MTTR Mean Time To Repair
- Number of hours required to replace a disk drive,
AND - Reconstruct the data stored on the failed disk
drive. - AFR Annualized Failure Rate
- Computed by assuming a temperature for the case
(40 degrees centigrade), power-on-hours per year
(say 8,760, 24x7), and 250 average motor
start/stop cycles per year.
7Focus on MTTF MTTR
- MTTF Mean Time To Failures
- Designed for non-repairable devices such as
magnetic disk drives - Disks of 2008 are more than 40 times more
reliable than disks of 1988. - MTTR Mean Time To Repair
- Number of hours required to replace a disk drive,
AND - Reconstruct the data stored on the failed disk
drive.
8Assumptions
- MTTF of a disk is independent of other disks in a
RAID. - Assume
- The MTTF of a disk is once every 100 years, and
- An array of 1000 such disks.
- The MTTF of any single disk in the array is once
every 37 days.
9RAID
- RAID organizes D disks into nG groups where each
group consists of G disks and C parity disks.
Example - D 8
- G 4
- C 1
- nG 8/4 2
Disk 1
Disk 2
Disk 3
Disk 4
Parity 1
Disk 5
Disk 6
Parity 2
Disk 7
Disk 8
Parity Group 1
Parity Group 2
10RAID
- RAID organizes D disks into nG groups where each
group consists of G disks and C parity disks.
Example - D 8
- G 4
- C 1
- nG 8/4 2
Disk 1
Disk 2
Disk 3
Disk 4
Parity 1
Disk 5
Disk 6
Parity 2
Disk 7
Disk 8
Parity Group 1
Parity Group 2
11RAID With 1 Group
- With G disks in a group and C check disks, a
failure is encountered when - A disk in the group fails, AND
- A second disk fails before the failed disk of
step 1 is repaired. - MTTF of a group of disks with RAID is
12RAID With 1 Group (Cont)
- Probability of another failure
- MTTR includes the time required to
- Replace the failed disk drive,
- Reconstruct the content of the failed disk.
- Performing step 2 in a lazy manner increases
duration of MTTR. - And the probability of another failure.
- What happens if we increase the number of data
disks in a group?
13RAID with nG Groups
- With nG groups, the Mean Time To Failure of the
RAID is computed in a similar manner
14Review
- RAID 1 and 3 were presented in the previous
lecture. - Here is a quick review.
15RAID 1 Disk Mirroring
- Contents of disks 1 and 2 are identical.
- Redundant paths keep data available in the
presence of either a controller or disk failure. - A write operation by a CPU is directed to both
disks. - A read operation is directed to one of the disks.
- Each disk might be reading different sectors
simultaneously.
CPU 1
Controller 1
Controller 2
Disk 1
Disk 2
16RAID 3 Small Blocks Reads
- Bit-interleaved.
- Bad news Small reads of less than the group
size, requires reading the whole group. - E.g., read of one sector, requires read of 4
sectors. - One parity group has the read rate identical to
one disk.
01011110101010000001101001111
Disk 1
Disk 2
Disk 3
Disk 4
Parity
0 1
0 1
1 1
0 1
1 0
17RAID 3 Small Block Reads
- Given a large number of disks, say D12, enhance
performance by constructing several parity
groups, say 3. - With G (4) disks per group and D (say 8), the
number of read requests supported by RAID 3 when
compared with one disks is the number of groups
(2). Number of groups is D/G.
Disk 1
Disk 2
Disk 3
Disk 4
Parity 1
Disk 5
Disk 6
Parity 2
Disk 7
Disk 8
Parity Group 1
Parity Group 2
18Any Questions?
19A Few Questions?
- Assume one instance of RAID-1 organization. What
are the values for - D
- G
- C
- nG
20A Few Questions?
- Assume one instance of RAID-1 organization. What
are the values for - D1
- G1
- C1
- nG1
21A Few Questions?
- Assume one instance of RAID-1 organization. What
are the values for - D1
- G1
- C1
- nG1
- Is the availability characteristics of the
following Level 3 RAID better than RAID 1?
Disk 1
Disk 2
Disk 3
Disk 4
Parity 1
Parity Group
22RAID 4
- Enhances performance of small reads/writes/read-mo
dify-write. How? - Interleave data across disks at the granularity
of a transfer unit. Minimum size is a sector. - Parity block ECC1 is an exclusive or of the bits
in blocks a, b, c, and d.
Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
23RAID 4
- Small read retrieves its block from one disk.
- Now, 4 requests referencing blocks on different
data disks may proceed in parallel. - When compared with 1 disk, throughput of a D disk
system is D times higher.
Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
24RAID 4 Failures (Cont)
- If Disk 2 fails, a small read for Block b
retrieves blocks a, c, d, and ECC 1 from disks 1,
3, 4, and Parity disks to compute the missing
block. What is throughput relative to one disk
now? - Once Disk 2 is replaced with a new one, its
content is constructed either eagerly or in a
lazy manner. System cannot be too lazy because
we want to minimize MTTR.
Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
25RAID 4 Failures (Cont)
- If the Parity disk fails, read of data blocks may
proceed as in normal mode of operation. - Once the Parity disk is replaced, content of new
Parity disk is constructed either eagerly or
lazily.
Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
26RAID 4 Small Writes
- Performance of small writes is improved.
- To write Block b
- Read the old Block b and old parity block ECC1,
- Compute the new parity using the old Block b, new
Block b, and the old parity - New parity (old block xor new block) xor old
parity ECC1 - A write requires 4 accesses 2 reads and 2
writes.
Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
27RAID 4 Bottlenecks
- For writes, parity disk is a bottleneck.
- Two different writes to Block b and g must read
ECC1 and ECC2 from the Parity disk. A queue will
form on the Parity disk. - Performance of small writes is same as RAID 3,
D/2G.
Disk 1
Disk 2
Disk 3
Disk 4
Parity
Block a
Block b
Block c
Block d
ECC 1
Block e
Block f
Block g
Block h
ECC 2
28RAID 4 Summary
29RAID 5 Resolve the Bottleneck
- Distribute data and check blocks across all disks.
Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Block a
Block b
Block c
Block d
ECC 1
Block h
Block e
Block f
Block g
ECC 2
Block i
Block j
ECC 3
Block k
Block l
Block p
Block m
ECC 4
Block n
Block o
Block t
ECC 5
Block q
Block r
Block s
30RAID 5 Resolve the Bottleneck
- Write of Blocks a and j may proceed in parallel
now.
Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Block a
Block b
Block c
Block d
ECC 1
Block h
Block e
Block f
Block g
ECC 2
Block i
Block j
ECC 3
Block k
Block l
Block p
Block m
ECC 4
Block n
Block o
Block t
ECC 5
Block q
Block r
Block s
31RAID 5 Read Performance
- Check disks service read requests.
- With D disks broken into nG groups, number of
parity disks is nGC. nG D/G. - When compared with one disk, the throughput of a
D disk system is D CD/G times higher.
Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Block a
Block b
Block c
Block d
ECC 1
Block h
Block e
Block f
Block g
ECC 2
Block i
Block j
ECC 3
Block k
Block l
32RAID 5 Write Performance
- For writes, read the referenced block and its
parity block. Compute the new parity block.
Write the new data block and its parity block. - Continue to use the parity disk.
- With D disks broken into nG groups, number of
parity disks is nGC. nG D/G. - When compared with one disk, the throughput of a
D disk system is D/4 (CD/G)/4 times higher.
Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Block a
Block b
Block c
Block d
ECC 1
Block h
Block e
Block f
Block g
ECC 2
Block i
Block j
ECC 3
Block k
Block l
33RAID 5 R-M-W Performance
- For R-M-W, read and write of the data block comes
for free. - the referenced block is already retrieved. Must
perform one extra disk I/O to read they parity
block. Compute the new parity block. Write the
new data block and its parity block. - Continue to use the parity disk.
- With D disks broken into nG groups, number of
parity disks is nGC. nG D/G. - When compared with one disk, the throughput of a
D disk system is D/2 (CD/G)/2 times higher.
Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Block a
Block b
Block c
Block d
ECC 1
Block h
Block e
Block f
Block g
ECC 2
Block i
Block j
ECC 3
Block k
Block l
34RAID 5 Summary
35RAID 5 Summary
- Significant improvement in the performance of
small writes/R-M-W
36RAID Summary
- If your workload consists of small R-M-W
operations, which RAID would you choose?