CS152 - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

CS152

Description:

CS152 Computer Architecture and Engineering Lecture 18 ECC, RAID, Bandwidth vs. Latency 2004-10-28 John Lazzaro (www.cs.berkeley.edu/~lazzaro) – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 38

Provided by: instEecsB

Category:

more less

Transcript and Presenter's Notes

Title: CS152

1
CS152 Computer Architecture andEngineeringLect
ure 18 ECC, RAID, Bandwidth vs. Latency
2004-10-28 John Lazzaro(www.cs.berkeley.edu/lazz
aro) Dave Patterson (www.cs.berkeley.edu/patters
on) www-inst.eecs.berkeley.edu/cs152/
2
Review

Buses are an important technique for building
large-scale systems
Their speed is critically dependent on factors
such as length, number of devices, etc.
Critically limited by capacitance
Direct Memory Access (dma) allows fast, burst
transfer into processors memory
Processors memory acts like a slave
Probably requires some form of cache-coherence so
that DMAed memory can be invalidated from cache.
Networks and switches popular for LAN, WAN
Networks and switches starting to replace buses
on desktop, even inside chips

3
Review ATA cables

Serial ATA, Rounded parallel ATA,Ribbon parallel
ATA cables
40 inches max vs. 18 inch

Serial ATA cables are thin

4
Outline

ECC
RAID Old School Update
Latency vs. Bandwidth (if time permits)

5
Error-Detecting Codes

Computer memories can make errors occasionally
To guard against errors, some memories use
error-detecting codes or error-correcting codes
(ECC)
gt extra bits are added to each memory word
When a word is read out of memory, the extra
bits are checked to see if an error has occurred
and, if using ECC, correct them
Data extra bits called code words

6
Error-Detecting Codes

Given 2 code words, can determine how many
corresponding bits differ.
To determine how many bits differ, just compute
the bitwise Boolean EXCLUSIVE OR of the two
codewords, and count the number of 1 bits in the
result
The number of bit positions in which two
codewords differ is called the Hamming distance
if two code words are a Hamming distance d apart,
it will require d single-bit errors to convert
one into the other

7
Error-Detecting Codes
For example, the code words 11110001 and 00110000
are a Hamming distance 3 apart because it takes 3
single-bit errors to convert one into the other.
11110001 Xor 00110000 --------
11000001 3 1s Hamming distance 3
8
Error-Detecting Codes

As a simple example of an error-detecting code,
consider a code in which a single parity bit is
appended to the data.
The parity bit is chosen so that the number of 1
bits in the codeword is even (or odd).
E.g., if even parity, parity bit for 11110001 is
1.
Such a parity code has Hamming distance 2, since
any single-bit error produces a codeword with the
wrong parity
It takes 2 single-bit errors to go from a valid
codeword to another valid codeword gt detect
single bit errors.
Whenever a word containing the wrong parity is
read from memory, an error condition is signaled.
The program cannot continue, but at least no
incorrect results are computed.

9
Error-Correcting Codes

a Hamming distance of 2k 1 is required to be
able to correct k errors in any data word
As a simple example of an error-correcting code,
consider a code with only four valid code words
0000000000, 0000011111, 1111100000, and
1111111111
This code has a distance 5, which means that it
can correct double errors.
If the codeword 0000000111 arrives, the receiver
knows that the original must have been 0000011111
(if there was no more than a double error). If,
however, a triple error changes 0000000000 into
0000000111, the error cannot be corrected.

10
Hamming Codes

How many parity-bits are needed?
m parity-bits can code 2m-1-m info-bits

Info-bits Parity-bits
lt5 3
lt12 4
lt27 5
lt58 6
lt121 7

How correct single error (SEC) and detect 2
errors (DED)?
How many SEC/DED bits for 64 bits data?

11
Administrivia - HW 3, Lab 4
12
ECC Hamming Code.

Hamming Coding is a coding method for detecting
and correcting errors.
ECC Hamming distance between 2 coded words must
be 3
Number bits from right, starting with 1
All bits whose bit number is a power of 2 are
parity bits
We use EVEN PARITY in this example
This example shows a 4 data bits

Bit 1 will check (parity) in all the bit
positions that use a 1 in their number
Bit 2 will check all the bit positions that use a
2 in their number
Bit 4 will check all the bit positions that use a
4 in their number
Etc.

7 6 5 4 3 2 1
D D D P D P P 7-BIT CODEWORD
D - D - D - P (EVEN PARITY)
D D - - D P - (EVEN PARITY)
D D D P - - - (EVEN PARITY)
13
Example Hamming Code.

Example The message 1101 would be sent as
1100110, since

7 6 5 4 3 2 1
1 1 0 0 1 1 0 7-BIT CODEWORD
1 - 0 - 1 - 0 (EVEN PARITY)
1 1 - - 1 1 - (EVEN PARITY)
1 1 0 0 - - - (EVEN PARITY)
EVEN PARITY If number of 1s is even then Parity
0 Else Parity 1
Let us consider the case where an error caused by
the channel transmitted message received
message 1 1 0 0 1 1 0 ------------gt 1 1 1 0 1
1 0 BIT 7 6 5 4 3 2 1 BIT 7 6 5 4 3 2
1
14
Example Hamming Code.
transmitted message received message 1 1 0 0
1 1 0 ------------gt 1 1 1 0 1 1 0 BIT 7 6 5
4 3 2 1 BIT 7 6 5 4 3 2 1 The above
error (in bit 5) can be corrected by examining
which of the three parity bits was affected by
the bad bit

7 6 5 4 3 2 1
1 1 1 0 1 1 0 7-BIT CODEWORD
1 - 1 - 1 - 0 (EVEN PARITY) NOT! 1
1 1 - - 1 1 - (EVEN PARITY) OK! 0
1 1 1 0 - - - (EVEN PARITY) NOT! 1
bad parity bits labeled 101 point directly to the
bad bit since 101 binary equals 5
15
Will Hamming Code detect and correct errors on
parity bits? Yes!
transmitted message received message 1 1 0 0
1 1 0 ------------gt 1 1 0 0 1 1 1 BIT 7 6 5
4 3 2 1 BIT 7 6 5 4 3 2 1 The above
error in parity bit (bit 1) can be corrected by
examining as below

7 6 5 4 3 2 1
1 1 0 0 1 1 1 7-BIT CODEWORD
1 - 0 - 1 - 0 (EVEN PARITY) NOT! 1
1 1 - - 1 1 - (EVEN PARITY) OK! 0
1 1 0 0 - - - (EVEN PARITY) OK! 0
the bad parity bits labeled 001 point directly to
the bad bit since 001 binary equals 1. In this
example error in parity bit 1 is detected and can
be corrected by flipping it to a 0
16
RAID Beginnings

We had worked on 3 generations of Reduced
Instruction Set Computer (RISC) processors 1980
1987
Our expectation I/O will become a performance
bottleneck if doesnt get faster
Randy Katz gets Macintosh with disk along side
Use PC disks to build fast I/O to keep pace with
RISC?

17
Redundant Array of Inexpensive Disks (1987-93)

Hard to explain ideas, given past disk array
efforts
Paper to educate, differentiate?
RAID paper spread like virus
Products from Compaq, EMC, IBM,
RAID I
Sun 4/280, 128 MB of DRAM,
4 dual-string SCSI controllers,
28 5.25 340 MB disks SW
RAID II
Gbit/s net 144 3.5 320 MB disks
1st Network Attached Storage
Ousterhout Log Structured File Sys. widely used
(NetAp)
Today RAID 25B industry 80 of server disks
in RAID
1998 IEEE Storage Award

Students Peter Chen, Ann Chevernak, Garth
Gibson, Ed Lee, Ethan Miller, Mary Baker, John
Hartman, Kim Keeton, Mendel Rosenblum, Ken
Sherriff,
18
Latency Lags Bandwidth

Over last 20 to 25 years, for network disk,
DRAM, MPU, Latency Lags Bandwidth
Bandwidth Improved 120X to 2200X
But Latency Improved only 4X to 20X
Look at examples, reasons for it

19
Disks Archaic(Nostalgic) v. Modern(Newfangled)

Seagate 373453, 2003
15000 RPM (4X)
73.4 GBytes (2500X)
Tracks/Inch 64000 (80X)
Bits/Inch 533,000 (60X)
Four 2.5 platters (in 3.5 form factor)
Bandwidth 86 MBytes/sec (140X)
Latency 5.7 ms (8X)
Cache 8 MBytes

CDC Wren I, 1983
3600 RPM
0.03 GBytes capacity
Tracks/Inch 800
Bits/Inch 9550
Three 5.25 platters
Bandwidth 0.6 MBytes/sec
Latency 48.3 ms
Cache none

20
Latency Lags Bandwidth (for last 20 years)

Performance Milestones
Disk 3600, 5400, 7200, 10000, 15000 RPM (8x,
143x)

(latency simple operation w/o contention BW
best-case)
21
MemoryArchaic(Nostalgic)v. Modern(Newfangled)

1980 DRAM (asynchronous)
0.06 Mbits/chip
64,000 xtors, 35 mm2
16-bit data bus per module, 16 pins/chip
13 Mbytes/sec
Latency 225 ns
(no block transfer)

2000 Double Data Rate Synchr. (clocked) DRAM
256.00 Mbits/chip (4000X)
256,000,000 xtors, 204 mm2
64-bit data bus per DIMM, 66 pins/chip (4X)
1600 Mbytes/sec (120X)
Latency 52 ns (4X)
Block transfers (page mode)

22
Latency Lags Bandwidth (last 20 years)

Performance Milestones
Memory Module 16bit plain DRAM, Page Mode DRAM,
32b, 64b, SDRAM, DDR SDRAM (4x,120x)
Disk 3600, 5400, 7200, 10000, 15000 RPM (8x,
143x)

(latency simple operation w/o contention BW
best-case)
23
LANs Archaic(Nostalgic)v. Modern(Newfangled)

Ethernet 802.3
Year of Standard 1978
10 Mbits/s link speed
Latency 3000 msec
Shared media
Coaxial cable

Ethernet 802.3ae
Year of Standard 2003
10,000 Mbits/s (1000X)link speed
Latency 190 msec (15X)
Switched media
Category 5 copper wire

Coaxial Cable
Plastic Covering
Braided outer conductor
Insulator
Copper core
24
Latency Lags Bandwidth (last 20 years)

Performance Milestones
Ethernet 10Mb, 100Mb, 1000Mb, 10000 Mb/s
(16x,1000x)
Memory Module 16bit plain DRAM, Page Mode DRAM,
32b, 64b, SDRAM, DDR SDRAM (4x,120x)
Disk 3600, 5400, 7200, 10000, 15000 RPM (8x,
143x)

(latency simple operation w/o contention BW
best-case)
25
CPUs Archaic(Nostalgic) v. Modern(Newfangled)

1982 Intel 80286
12.5 MHz
2 MIPS (peak)
Latency 320 ns
134,000 xtors, 47 mm2
16-bit data bus, 68 pins
Microcode interpreter, separate FPU chip
(no caches)

2001 Intel Pentium 4
1500 MHz (120X)
4500 MIPS (peak) (2250X)
Latency 15 ns (20X)
42,000,000 xtors, 217 mm2
64-bit data bus, 423 pins
3-way superscalar,Dynamic translate to RISC,
Superpipelined (22 stage),Out-of-Order execution
On-chip 8KB Data caches, 96KB Instr. Trace
cache, 256KB L2 cache

26
Latency Lags Bandwidth (last 20 years)

Performance Milestones
Processor 286, 386, 486, Pentium, Pentium
Pro, Pentium 4 (21x,2250x)
Ethernet 10Mb, 100Mb, 1000Mb, 10000 Mb/s
(16x,1000x)
Memory Module 16bit plain DRAM, Page Mode DRAM,
32b, 64b, SDRAM, DDR SDRAM (4x,120x)
Disk 3600, 5400, 7200, 10000, 15000 RPM (8x,
143x)

Note Processor Biggest, Memory Smallest
(latency simple operation w/o contention BW
best-case)
27
Annual Improvement per Technology
CPU DRAM LAN Disk
Annual Bandwidth Improvement (all milestones) 1.50 1.27 1.39 1.28
Annual Latency Improvement (all milestones) 1.17 1.07 1.12 1.11

But what about recent BW, Latency change?

Annual Bandwidth Improvement (last 3 milestones) 1.55 1.30 1.78 1.29
Annual Latency Improvement (last 3 milestones) 1.22 1.06 1.13 1.09

How summarize BW vs. Latency change?

28
Towards a Rule of Thumb

How long for Bandwidth to Double?

Time for Bandwidth to Double (Years, all milestones) 1.7 2.9 2.1 2.8

How much does Latency Improve in that time?

Latency Improvement in Time for Bandwidth to Double (all milestones) 1.3 1.2 1.3 1.3

But what about recently?

Time for Bandwidth to Double (Years, last 3 milestones) 1.6 2.7 1.2 2.7
Latency Improvement in Time for Bandwidth to Double (last 3 milestones) 1.4 1.2 1.2 1.3

Despite faster LAN, all 1.2X to 1.4X

29
Rule of Thumb for Latency Lagging BW

In the time that bandwidth doubles, latency
improves by no more than a factor of 1.2 to 1.4
Stated alternatively Bandwidth improves by more
than the square of the improvement in Latency
(and capacity improves faster than bandwidth)

30
6 Reasons Latency Lags Bandwidth

1. Moores Law helps BW more than latency
Faster transistors, more transistors, more pins
help Bandwidth
MPU Transistors 0.130 vs. 42 M xtors (300X)
DRAM Transistors 0.064 vs. 256 M xtors (4000X)
MPU Pins 68 vs. 423 pins (6X)
DRAM Pins 16 vs. 66 pins (4X)
Smaller, faster transistors but communicate over
(relatively) longer lines limits latency
Feature size 1.5 to 3 vs. 0.18 micron (8X,17X)
MPU Die Size 35 vs. 204 mm2 (ratio sqrt ? 2X)
DRAM Die Size 47 vs. 217 mm2 (ratio sqrt ?
2X)

31
6 Reasons Latency Lags Bandwidth (contd)

2. Distance limits latency
Size of DRAM block ? long bit and word lines ?
most of DRAM access time
Speed of light and computers on network
1. 2. explains linear latency vs. square BW?
3. Bandwidth easier to sell (biggerbetter)
E.g., 10 Gbits/s Ethernet (10 Gig) vs. 10
msec latency Ethernet
4400 MB/s DIMM (PC4400) vs. 50 ns latency
Even if just marketing, customers now trained
Since bandwidth sells, more resources thrown at
bandwidth, which further tips the balance

32
6 Reasons Latency Lags Bandwidth (contd)

4. Latency helps BW, but not vice versa
Spinning disk faster improves both bandwidth and
rotational latency
3600 RPM ? 15000 RPM 4.2X
Average rotational latency 8.3 ms ? 2.0 ms
Things being equal, also helps BW by 4.2X
Lower DRAM latency ? More access/second (higher
bandwidth)
Higher linear density helps disk BW (and
capacity), but not disk Latency
9,550 BPI ? 533,000 BPI ? 60X in BW

33
6 Reasons Latency Lags Bandwidth (contd)

5. Bandwidth hurts latency
Queues help Bandwidth, hurt Latency (Queuing
Theory)
Adding chips to widen a memory module increases
Bandwidth but higher fan-out on address lines may
increase Latency
6. Operating System overhead hurts Latency more
than Bandwidth
Long messages amortize overhead overhead bigger
part of short messages

34
3 Ways to Cope with Latency Lags Bandwidth
If a problem has no solution, it may not be a
problem, but a fact--not to be solved, but to be
coped with over time Shimon Peres (Peress
Law)

Caching (Leveraging Capacity)
Processor caches, file cache, disk cache
Replication (Leveraging Capacity)
Read from nearest head in RAID, from nearest
site in content distribution
Prediction (Leveraging Bandwidth)
Branches Prefetching disk, caches

35
HW BW Example Micro Massively Parallel
Processor (mMMP)

Intel 4004 (1971) 4-bit processor,2312
transistors, 0.4 MHz, 10 micron PMOS, 11 mm2
chip

RISC II (1983) 32-bit, 5 stage pipeline, 40,760
transistors, 3 MHz, 3 micron NMOS, 60 mm2 chip
4004 shrinks to 1 mm2 at 3 micron

250 mm2 chip, 0.090 micron CMOS 2312 RISC IIs
Icache Dcache
RISC II shrinks to 0.05 mm2 at 0.09 mi.
Caches via DRAM or 1 transistor SRAM
(www.t-ram.com)
Proximity Communication via capacitive coupling
at gt 1 TB/s (Ivan Sutherland_at_Sun)

Processor new transistor?
Cost of Ownership, Dependability, Security v.
Cost/Perf. gt mMPP

36
Too Optimistic so Far (its even worse)?

Optimistic Cache, Replication, Prefetch get more
popular to cope with imbalance
Pessimistic These 3 already fully deployed, so
must find next set of tricks to cope hard!
Its even worse bandwidth gains multiplied by
replicated components ? parallelism
simultaneous communication in switched LAN
multiple disks in a disk array
multiple memory modules in a large memory
multiple processors in a cluster or SMP

37
Conclusion Latency Lags Bandwidth

For disk, LAN, memory, and MPU, in the time that
bandwidth doubles, latency improves by no more
than 1.2X to 1.4X
BW improves by square of latency improvement
Innovations may yield one-time latency reduction,
but unrelenting BW improvement
If everything improves at the same rate, then
nothing really changes
When rates vary, require real innovation
HW and SW developers should innovate assuming
Latency Lags Bandwidth