Title: Review last class
1Review last class
- Fundamental concepts in Fault Tolerant Computing
- FTC Reliability and availability
- Fault, Error, Failure, Dependability
- Reliability Models (structural, markovian)
- Redundancy as key mechanism in FTC
- HW
- SW
- Information
- Time redundancy
Well concentrate on these topics some study
cases in HW/SW
2Todays Topics
- Information Redundancy
- Error Detecting and Correcting Codes (ECC)
- ECC is a huge topic, techniques require
knowledge on abstract algebra - Fields, rings, groups, vector spaces etc.
- We will introduce the basic techniques in ECC
3Information Redundancy
- Key Idea Add redundant information to data to
allow - Fault detection
- Fault masking
- Fault tolerance
- Mechanisms
- Error detecting codes and error correcting codes
(ECC)
4Information Redundancy
- Important to distinguish
- Data words ? the actual information contents
- Code words ? the transmitted information
(redundant) - Dataword with d bits is encoded into a codeword
with c bits where c gt d - Not all 2c combinations are valid codewords
- If c bits are not a valid codeword an error is
detected - Extra bits may be used to correct errors
- Overhead time to encode and decode
5Information Redundancy
Less bandwidth available for real information
More code bits
More error tolerance
6Error Detection/Correction
- We use information redundancy in
- Error Detection (EDC)
- Parity bits
- Checksums
- Hamming codes
- Error Detection/Correction (ECC)
- Hamming codes
- Cyclic codes
- Reed-Solomon
- Turbo Codes
7ECC vs. EDC
- Highly reliable channels (e.g. fiber optics)
- It is cheaper to use an error detecting code and
just retransmit the occasional block found to be
faulty. - Probability that any given bit is in error (Bit
error rateBER) in fiber optics 10-12 - Unreliable channels (e.g. wireless links)
- It is cheaper to use error correcting code and
reconstruct the original message - Many retransmissions
- Retransmission can be an error
- Bit error rate of electrical channels 10-9
8Data Communication/Storage
- Error correcting codes provide reliable digital
data transmission when the communication medium
used has an unacceptable bit error rate (BER) and
a low signal-to-noise ratio (SNR)
Noise
ECC Encoder/Decoder
ECC Encoder/Decoder
Transmision channel
Errors
Data Medium
Write
Read
9Shannons Theorem
- Shannon theorem1 states the maximum amount of
error-free data (i.e, information) that can be
transmitted over a communication link with a
specific bandwidth in the presence of noise
C is the channel capacity in bits per second
(including bits for error correction) W is the
bandwidth of the channel S/N is the
signal-to-noise ratio of the channel 1C. E.
Shannon, A Mathematical Theory of
Communication, Bell System Technical
Journal,Volume 27, pp. 379 - 423 and pp. 623 -
656, 1948.
10Coding and Redundancy
- If we transmit at a data rate that is less than
channel capacity, there exists an error control
code that provides arbitrarily low BER. - Shannon establishes a limit for error free data
but doesnt say how can we get it. - Many redundancy techniques can be considered as
coding schemes - E.g. the code 000,111 can be used to encode
0,1 - The best codes provide the most robustness (less
errors) with the least additional overhead of
bits. - Simplest error detecting code
- Parity bit
- 2 dimensional parity bit
11CheckSum
- Check code used to detect errors in blocks of
data transmitted on communication networks - Also used in memory systems
- Basic idea - add up the block of data being
transmitted and transmit this sum as well
Noise
Data block
Data block
Data block checksum
Error retransmit
Calculate checksum
Calculate checksum
Checksum
Transmitter
Receiver
12Versions of Checksum
- Data words are d bits long
- Versions
- Single-precision - checksum is a modulo G(2d)
addition - Double-precision - modulo 22d addition
- Double-precision, catches more errors
- Residue checksum takes into account the carry out
of the d-th bit as an end-around carry somewhat
more reliable - The Honeywell checksum concatenates words into
pairs for the checksum calculation (done modulo
2d ) - guards against errors in the same position - TCP uses a checksum field which is the 16 bit
one's complement of the one's complement sum of
all 16 bit words in the header and text.
13Comparing Versions of Checksum
Checksum schemes allow error detection but not
error location - entire block of data must
be retransmitted if an error is detected
0110 1
Single precision checksum does not detect error
but Honeywell method does
Calculated 00000111
Calculated 0111
14Reliable Data Communication
- Single CheckSums provide error detection
-
- Data1 1 1 1
- Message1 1 1 1 0
- If error retransmit (reduces channel capacity by
2)
- Repeating data in same message
- Data 1 1 1 1
- Message
- 1 1 1 11 1 1 11 1 1 1
- Majority vote
Reduces channel capacity by 3
15Hamming Distance
- Hamming Distance for a pair of code words
- The number of bits that are different between the
two code words HW(v1, v2) HW(v1?v2) - E.g. 0000, 0001 ? HD1
- E.g. 0100, 0011 ? HD3
- Minimum Hamming Distance for a code
- MinHD(code) Minx,yHD(x,y)
16Hamming Distance
- Hamming Distance of 2 means that a single bit
error will not change one of the codewords into
other
001,010,100,111 codeword has distance 2 The
code can detect a single bit error
errors
17Hamming Distance
- Hamming Distance of 3 means that two bit error
will not change one of the codewords into other
000,111 codeword has distance 3 The code can
detect a single or double bit error
errors
18Error Detection/Correction
- In general
- To detect up to D bit errors, the code distance
should be at least D1 - To correct up to C bit errors, the code distance
should be at least 2C1
e
a
a
b
b
C
C1
2C1
Single bit error correction
C-bit error correction
19Break
20Hammings Error Correction Solution
- Encoding
- Use Multiple Checksums (called r,s,t)
- Messagea b c d
- r (abd) mod 2
- s (abc) mod 2
- t (bcd) mod 2
- Coder s a t b c d
Message1 0 1 0 r(100) mod 2 1
s(101) mod 2 0 t(010) mod 2 1 Code
1 0 1 1 0 1 0
21Hamming Codes
- Examples
- r s a t b c d
- 1 0 1 0 1 0 1
- 0 0 1 0 0 1 1
- 0 0 0 1 1 1 1
- 1 2 3 4 5
6 7 bit position - This encodes a 4-bit information word a to 7-bit
codeword (called a (7,4) code)
22Hamming(7,4) Code
- The Hamming (7,4) code may be defined with the
use of a Venn diagram. - Place the four digits of the un-encoded binary
word and place them in inner sections of the
diagram. - Choose digits r, s, and t so that the parity of
each circle is even.
d
r
t
b
a
c
s
r,s,t bits are in charge of checking the bits
within their scope
23Hamming(7,4) Code
- Example
- Code 1 1 0 1
- Codeword
- 1010101
1 d
r1
t0
1 b
1 a
0 c
s0
24Hamming Codes
- Previous method of construction can be
generalized to construct an (n,k) Hamming code - Simple bound
- k number of information bits
- r number of check bits
- n k r total number of bits
- n 1 number of single errors or no error
- Each error (including no error) must have a
distinct syndrome (which indicates its location) - With r check bits max possible syndrome 2r
- Hence 2r ? n 1
25Hamming Codes Single Error Correcting (SEC)
- Properties of SEC
- If there is no error, all parity equations will
be satisfied - c1 r ? r , c2 s ? s, c4 t ? t
(r,s,tcalculated check bits) - If there is exactly one error, the c1, c2, c4
point to the location of the error - The vector c1, c2, c4 is called syndrome
- The (7,4) Hamming code is Single Error Correcting
code
26Hamming Codes Single Error Correcting (SEC)
- Example error in code bit
- Code1101 rst100
- Codeword transmitted 1 0 1 0 1 0 1
- Codeword received 1 0 0 0 1 0 1 (code0101
rst100) - Recalculating
- rst010
- c1 r?r, c2 s?s, c4 t?t
- c1 1, c2 1, c4 0
- position 3 has error
1
r0
t0
1
0
0
s1
27Hamming Codes Single Error Correcting (SEC)
- Example error in check bit
- Code1101 rst100
- Codeword transmitted 1 0 1 0 1 0 1
- Codeword received 1 1 1 0 1 0 1 (code1101
rst110) - Recalculating
- rst100
- c1 0, c2 1, c4 0
- position 2 has error
1
r1
t0
1
1
0
s0
SEC works for both errors in code and check bits
28A Cube of Bits
Vertices are fixed at 1 unit, 2 units and 3 units
away from the origin
110
011
29Cyclic Codes
- A code C is cyclic if every cyclic shift of c
also belongs to C. That is if C is cyclic then -
- Example A 5-bit cyclic code
- Cyclic codes are easy to generate (with a shift
register) - Hamming
30Cyclic Codes
- Encoding
- Data word constant Code word
- Decoding
- Code word / constant Data word
if the remainder is non-zero, an error has
occurred
D(x), C(x) and G(x) are polynoms, arithmetic is
performed in GF(2).
31GF(2)
- Calculations performed in Galois Field GF(2)
- multiplication modulo 2 AND operation
- addition modulo 2 XOR operation
- in GF(2), subtraction addition
32Cyclic Code Theory
n bits
k bits
n-k1 bits
Code word
Data word
Constant
- (n,k) Cyclic Code with generator polynomial of
degree n-k and total number of encoded bits n - An (n,k) cyclic code can detect all single errors
and all runs of adjacent bit errors shorter than
n-k - Useful in applications like wireless
communication - channels are frequently noisy and
have bursts of interference resulting in runs of
adjacent bit errors
33Cyclic Redundancy Code (CRC)
- Basic idea
- Message is multiplied it by maximum exponent of
fixed generator polynomial divide result by
generator polynomial. Remainder of division is
appended to message as the error checking
information - Receiver performs the same division compares the
calculated remainder with the transmitted
remainder. - CRC calculations are based on
- polynomial division
- arithmetic over GF(2)
- Ethernet uses CRC
34Reed-Solomon (RS) Codes
- RS codes (1960) are block-based error correcting
codes with a wide range of applications in
digital communications and storage. - Storage devices (tape, CD, DVD, barcodes, etc)
- Wireless or mobile communications
- Satellite communications
- Digital television / DVB
- High-speed modems such as ADSL, xDSL, etc.
35Reed-Solomon (RS) Codes
Reed-Solomon codes are particularly good dealing
with "bursts" of errors. Current implementations
of Reed-Solomon codes in CD technology are able
to cope with error bursts as long as 4000
consecutive bits (2.5 millimeters in a scratched
CD surface) Other codes are better for random
errors. e.g. Gallager codes, Turbo codes
36Reed-Solomon (RS) Codes
- Symbols8 bits, 25 bit burst noise
A whole symbol is replaced even if only a single
bit in it is incorrect
37Reed-Solomon (RS) Codes
- Typical Reed-Solomon codeword RS(n,k)
Example A popular Reed-Solomon code is
RS(255,223) with 8-bit symbols (GF(28)). Each
codeword contains 255 code word bytes, of which
223 bytes are data and 32 bytes are parity. For
this code n 255, k 223, s 8 2t 32, t
16 The decoder can correct any 16 symbol errors
in the code word i.e. errors in up to 16 bytes
anywhere in the codeword can be automatically
corrected.
38RS-Encoding
- b0,b1, . . . ,bd-1.is the message. Since we use
bytes for symbols computations will occur in
GF(p2r). - Our encoding will be a longer sequence of numbers
e0,e1, . . . ,en-1, where we require that p gt n.
The e sequence is derived from original b
sequence defining a polynomial P, which we
evaluate at n points. - Let P(x) c0 c1xc2x2 . . .cd-1xd-1. Such that
P(0)b0,P(1)b1..P(d-1)bd-1.Our encoding would
consist of the values P(0),P(1), . . . ,P(n-1). - Message is part of ecoding. P(x) can be found
using Langrange interpolation
Polynomial of degree d-1 That passes through
points (a0,b0), . . . , (ad-1,bd-1)
1 when xaj 0 when xak
39RS-Decoding
- Receiver and sender agree on the polymonial.
- The receiver must determine the polynomial from
the received values once the polynomial is
determined, the receiver can determine the
original message values. - If there are no errors
- Given the polynomial the message is determined by
computing P(0), P(1), etc. - If there are errors use Berlekamp and Welch
decoding algorithm