Title: Chapter 3 Design Techniques to Achieve Fault Tolerance
1Chapter 3Design Techniques to Achieve Fault
Tolerance
2Primary Design Issue
- The development of a fault-tolerant system
requires the consideration of many design issues.
Among these are fault detection, fault
containment, fault location, fault recovery, and
fault masking. - A system that employs fault masking achieves
fault tolerance by hiding faults that occur.
Systems that do not use fault masking requires
fault detection, fault location, and fault
recovery to achieve fault tolerance.
3The Concept of Fault Tolerance
- REDUNDANCY Addition of information or resources
or time beyond what is needed for normal
operation - Hardware Redundancy
- Triple modular redundancy
- Software Redundancy
- N-version programming
- Information Redundancy
- Parity Codes in memories
- Time Redundancy
- Recomputation on the same processor
4Hardware Redundancy
- There are three basic forms of hardware
redundancy - Passive Hardware Redundancy
- Use the concept of fault masking to hide the
occurrence of faults and prevent the faults from
resulting in errors - Active Hardware Redundancy (Dynamic method)
- It achieve fault tolerance by detecting the
existence of faults and performing some action to
remove the faulty hardware from the system. - Hybrid Hardware Redundancy
5Fault Tolerance Approaches
- Passive or masking redundancy
- Add redundancy to mask out effects of faults
immediately, errors corrected. - It relies on voting mechanisms to mask the
occurrence of faults (majority voting), The
passive design inherently tolerate the faults
(without the need for fault detection or system
reconfiguration). - Active or Standby Redundancy
- Detect fault
- Locate fault
- Reconfigure system around fault
- Recover and restart
6Passive Hardware Redundancy
- Triple Modular Redundancy (TMR)
7 N-Modular Redundancy (NMR)
Passive Hardware Redundancy
8Passive Hardware Redundancy
- The voting mechanism can be implemented in
software or hardware - A 1-bit majority voter
- The time required to perform the vote using
hardware is simply the propagation delay through
the digital logic circuit.
9Active Hardware Redundancy
- Duplication with Comparison (Fig. 3.12)
- Standby Sparing (Fig. 3. 14)
- Watchdog Timer
10Active Hardware Redundancy
- Standby Redundancy or Dynamic redundancy
- Only one active copy of the system
- Standby modules are used to replace active
modules when faulty - Explicit steps for fault detection, location, and
repair or reconfiguration required. - Example Memory System
Spare column
X
Active column
11Hybrid Hardware Redundancy
- NMR plus spares Use disagreement detector
between module and voter outputs. Replace faulty
module - Self-Purging Redundancy (Fig. 3. 17, Fig. 3. 18),
Binary threshold gate (Table 3. 1)
12Hybrid Hardware Redundancy
- Sift-out Modular Redundancy (Fig. 3.21, 3. 22, 3.
23) - Triple-Duplex Architecture (Fig. 3. 25, 3.26)
13Component Level Masking
- Use of Redundant Components
- Quadded logic 4 copies of each component
Non-redundant diode (zero resistance forward,
infinite resistance reverse)
Redundant circuit Any single faulty diode (open
or short) tolerated
14Information Redundancy
- Information redundancy is the addition of
redundant information to data to allow fault
detection, fault masking, or possibly fault
tolerance. - Error-detecting code and error-correcting code.
- A code is a means of representing information, or
data, using a self-defined set of rules. - A codeword is a collection of symbols, often
called digits if the symbols are numbers, used to
represent a particular piece of data based upon a
specified code. - A binary code is one in which the symbols forming
each codeword consist of only the digit 0 and 1.
15Information Redundancy
- Encoding Operation
- The process of determining the corresponding
codeword for a particular data item. - Decoding operation
- Single-error correcting code, double-error
correcting code - The hamming distance between any two binary words
is the number of bit positions in which the two
words differ. - The distance of a code is the minimum hamming
distance between any two valid codewords.
16Error-Detecting Codes
- A fault is a physical malfunction
- An error is an incorrect output caused by fault
- Output of circuit may be encoded so that output
takes a subset of possible values during normal
(fault-free) operation - Formally, a code is a subset S of an universes U
of vectors chosen. - A noncode word is a vector in set U-S
- If X is a code word and X is a different vector
produced by a fault, then X is a detectable
error if X ? U-S and undetectable error if X? S
17Error-Detecting Codes (Cont.)
- Example
- Assume a code word has 8 bits, so U 28
vectors
U
S
failure
failure
Code words X1, X2, X3 Detectable error X2
? X4 Noncode word X4 Undetectable
error X1 ? X3
18Fault Detection through Encoding
- At logic level, codes provide means of masking or
detection of errors - Formally, code is a subset S of universe U of
possible vectors - A noncode word is a vector in set U-S
X1 is a code word lt10010011gt Due to multiple bit
errors, Becomes X3 lt10011100gt not detectable X2
is a code word becomes X4 noncode word detectable
S even parity
19Basic Code Operations
- Consider n-bit vectors, space of 2n vectors
- A subset of 2n vectors are codewords
- Subset called (n, k) code, where fraction k/n is
called - rate of code
- Addition operation on vectors is bit-wise XOR
- Multiplication operation is bitwise AND
X Y ltx1 ? y1, x2 ? y2, xn ? yngt
cX ltcx1, cx2, , cxngt
20Information Redundancy
- Separability
- A separable code is one in which the original
information - is appended with new information to form the
codeword. - Check bit
- A nonseparable code does not possess the property
of separability. - Parity codes
- Odd parity Even parity
- The single-bit parity code (either odd or even)
has a distance of 2, therefore allowing
single-bit error to be detected but not
corrected. - The basic parity code is a separable code.
21Parity Codes - Example
Parity Bit
Parity Generator
Parity Checking
Error Signal
Parity
Data
Data
Memory
Data Out
Data In
Fig. 3. 27
Table 3.3
22XOR Tree for Parity Generation
Data Bits
Generated Parity Bit
Error Signal
23Information Redundancy
- The basic parity scheme can be modified to
provide additional error detection capability. - Bit-per-word, bit-per-byte, bit-per-chip,
bit-per-multiple-chips, and interlaced parity. - Overlapping parity
- Parity groups are formed with each bit appearing
in more than one parity group. - The primary advantage of the overlapping parity
is that errors can be located in addition to be
detected. - Once the erroneous bit is located, it can be
corrected by a simple complementation. - It is the basic concept of some of the hamming
error-correcting codes.
24Codes for RAMs
Odd or Even
P
Bit-per-word parity
P1
P2
15
14
13
12
11
10
9
8
Bit-per-byte parity
Odd
Chip5
Chip4
Chip3
Chip2
Chip1
Bit-per-multiple -chips parity
Chip5
Chip4
Chip3
Chip2
Chip1
Bit-per-chip parity
Interlaced parity
Fig. 3. 29
25Parity Codes for RAMs - Comparison
Code
Advantages
Disadvantages
Even parity
Detects single-bit errors
Certain errors go
Bit-per-word
undetected, e.g., if a
word, including the parity
bit, becomes all is
Bit-per-byte parity
Detects the all is and the all
Ineffective in detection of
0s conditions
multiple errors
Bit-per-multiple-
Detects failure of entire chip
Failure of a complete chip
Chips parity
is detected, but it is not
located
Bit-per-chip parity
Detects single error and
Susceptible to the whole-
identifies the chip that
chip failure
contains the erroneous bit
Interlaced parity
Detects errors in adjacent
Parity groups not based
bits does not take into
on the physical memory
account the physical memory
organization
organization
26Information Redundancy
Four Information Bits Three Parity Check Bits
Bit in Error Parity Group Affected 3
p2 p1 p0 2
p2 p1 1
p2 p0 0
p1 p0 p2
p2 p1
p1 p0
p0
27Error Correction with Overlapped Parity
Error Correction with Overlapped Parity
3
2
1
0
P1
P0
P2
1 3 0 Parity Generator
1 2 3 Parity Generator
2 0 3 Parity Generator
C 0
Bit 0
Corrected Bits
P
P
P
Bit 0
3
r
1
r
2
r
3-8 Decode
C 1
C3 Correct Bit 3 C2 Correct Bit 2 C1
Correct Bit 1 C0 Correct Bit 0 CP2 Correct Bit
P2 CP1 Correct Bit P1 CP0 Correct Bit P0 E
No Error
Bit 1
P
Bit 1
0
r
P
C 2
0
Bit 2
P
Bit 2
1
r
P
...
1
P
CP 2
2
r
Bit P2
P
Bit P2
2
28Information Redundancy
- Let m be the number of information bits to be
protected using an overlapping parity approach,
and let k be the number or parity bits required
to protect those m information bits - 2k ?? m k 1 (Table 3.4)
- m-out-of-n code (Table 3.5)
- The codewords of m-out-of-n code are n bits in
length and contain exactly m 1s. - Any single-bit error can be detected.
- The major disadvantage is that the encoding,
decoding, and the detection processes are often
difficult to perform. - It provides detection of all single errors and
all multiple, unidirectional errors.
29Information Redundancy
- Duplication codes
- Duplication codes are based on the concept of
completely duplicating the original information
to form the codeword. - Duplication codes are found in many applications,
including memory systems and some communication
system. - A variation of the basic duplication code is to
complement the duplicated portion of the
codeword. - Complemented Duplication (Fig. 3.32, Fig. 3. 33)
- Swap and Compare (Fig. 3.34)
- Checksum (Fig. 3. 35)
- The checksum is another form of separable code
that is most applicable when blocks of data are
to be transferred from one point to another. - The checksum is a quantity of information that is
added to the block of data to achieve error
detection capability
30Information Redundancy
- Single-precision checksum (Fig. 3.36, ignore any
overflow) - The single-precision checksum is unable to detect
certain types of errors, Fig. 3. 37) - Double-precision checksum (Fig. 3.38)
- Honeywell checksum
- Residue checksum
- Cyclic codes
- The fundamental feature of cyclic codes is that
any end-around shift of a codeword will produce
another codeword. - They are frequently applied to sequential-access
devices such as tapes, bubble memories, and disks - The encoding operation can be implemented using
simple shift registers with feedback connections.
31Checksum Codes-Basic Concepts
Checksum
- Checksum appended to block data when such blocks
are transferred
Transfer
Checksum on Original Data
Checksum on Original Data
Compare
Received Version of Checksum
32Single Precision Checksums
0 1 1 1
0 0 0 1 0 1 1 0
(Addition) 1 0 0 0 1 ( 0
1 1 0)
Original Data
Checksum
Carry is Ignored
A single-precision is formed by adding the data
words and ignored any overflow
Original Data
Received Data
Transmit
Receive
Checksum
Checksum of Received Data
Faulty Line Always 1
1 1 1 0
1 1 1 0
Received Checksum
The single-precision checksum is unable to
detect Certain types of errors. The received
checksum and the checksum of the Received data
are equal, so no error is detected.
1 1 1 0
33Double Precision Checksums
- Compute 2n-bit checksum for a block of n-bit
words - using modulo-22n arithmetic
- Overflow is still a concern, but it is now
overflow from a 2n-bits
Original Data
Received Data
Transmit
Receive
Faulty Line Always 1
Checksum
Checksum of Received Data
Received Checksum
The received checksum and the checksum of the
received data are not equal, so the error is
detected
34Honeywell Checksums
- Concatenate consecutive words to form double
words to create k/2 words of 2n bits checksum
formed over newly structured data
Original Data
Received Data
Transmit
Receive
Faulty Line Always 1
Checksum of Received Data
Checksum of Original Data
Received Checksum
Checksum
35Residue Checksums
- The same concept as the single-precision checksum
except that the carry bit is not ignored and is
added to checksum in an end-around fashion
Original Data
Received Data
Transmit
Receive
Carry from Addition
End-Around Carry Addition
Sum of Data
C
Checksum of Original Data
Faulty Line Always 1
1 1 1 0
Three Carries
Generated During
C
1 1 1 0
End-Addition
Carry Addition
Checksum
Checksum of Received Data
0 0 0 1
Received Checksum
1 1 1 0
36Information Redundancy
- A cyclic code is characterized by its generator
polynomial, G(X), which is a polynomial of degree
(n-k) or greater, where n is the number of bits
contained in the complete codeword produced by
G(X), and k is the number of bits in the original
information to be encoded. - For binary cyclic codes, the coefficients of the
generator polynomial are either 0 or 1. - A cyclic code with a generator polynomial of
degree (n-k) is called an (n, k) cyclic code. - They are able to detect all single errors and all
multiple, adjacent errors affecting less than
(n-k) bits
37Cyclic Code-Example
- Consider generator polynomial g(x) x3 x 1
for (7,4) code - Can verify g(x) divides x7 1
- Given data word (1111), generate codeword
- d(x) x3 x2 x 1
- Then, c(x) g(x)d(x) (x3 x2 x 1)(x3 x
1 ) x6 x5 x3 1 (code polynomial) - Hence code word is (1101001)
-
38Properties of Cyclic Codes(n, k) Codes
- If a polynomial g(X) of degree r n - k divides
xn -1, then g(X) generates a cyclic code
39Encoding a Cyclic Code
- Define a data polynomial
-
- One way to encode is to perform multiplication
- This is non-systematic encoding with a shift
register, since data does not appear explicitly
as a subsequence in the output code digits
40Example Cycle Code
Consider (7, 3) code generated by g(x) x4 x3
x2 1, N 7, k 3, r 4 so, g(x) has degree
4
41Circuit for Generating Cyclic Codes
- Consider blocks labeled X as multipliers, and
addition elements as modulo 2
- Another representation is to replace multipliers
by - storage elements, adders by EX-OR gates
42Generation of Code Words
Barry PP. 106
n words
informatio
bit
-
4
for
codes
Cyclic
code
n
informatio
)
,
,
,
,
,
,
(
)
d
,
d
,
d
,
(d
v
v
v
v
v
v
v
6
5
4
3
2
1
0
3
2
1
0
0000 0000000 0001
0001101 0010 0011010 0011
0010111 0100
0110100 0101 0111001 0110
0101110 0111
0100011 1000 1101000 1001
1100101 1010
1110010 1011 1111111 1100
1011100 1101
1010001 1110 1000110 1111
1001011
The encoding process
Register values Clock period 1 2 3 D(x)
V(x) 0 0 0 0
1
1 1 1 0 1
1
0 2 1 1 1
0 1
3 0 1 1
1 0
4 1 0 0
0 0 5
0 1 0
0 0 6
0 0 1
0 1 7
0 0 0
Data polynomial d0 d1x d2 x2 d3 x3
Generator polynomial 1 x x3 Code
polynomial v0 v1x v2 x2 v3 x3 v4 x4
v5 x5 v6 x6
43Generation of Code Words
1
1
0
1
1
0
1
1
1
0
44Generation of Code Words
1
0
1
1
1
0
1
0
0
0
45Decoding of Cyclic codes
- Determine if code word
is valid - Code polynomial
- If r(x) is a valid code polynomial, it should be
a multiple of the generator polynomial g(x) - r(x)d(x)g(x)s(x), where the syndrome polynomial
s(x) should be zero. - Hence divide r(x) by g(x) and check whether the
remainder is equal to 0.
46Circuits for Decoding
Anther representation is to replace multipliers
by storage elements, adders by EX-OR gates
Noteonce the division is completed, the
registers contain the value of the syndrome
(remainder)
47Example Decoding
The decoding process with correct information
The decoding process with erroneous information
Register values Clock period 1 2 3 V(x)
B(x) D(x) 0 0 0 0
1 0
1 1 0 0 1
0
1 1 2 0 1 1
1 1
0 3 1 1 0
0 1
1 4 1 0 1
0 0 0
5 0 1 0
0 0 0 6
1 0 0
1 1 0 7
0 0 0
Register values Clock period 1 2 3 V(x)
B(x) D(x) 0 0 0 0
1 0
1 1 0 0 1
0
1 1 2 0 1 1
1 1
0 3 1 1 0
1 1
0 4 1 0 0
0 1 1
5 0 0 1
0 1 1 6
0 1 1
1 1 0 7
1 1 0
48Systematic Encoding of Cyclic Code
- Let d(X) dk-1Xk-1 dk-2Xk-2 d0 be a data
polynomial - Consider
- Xn-kd(X) dk-1Xn-1 dk-2Xn-2
d0Xn-k - Express as polynomial division by Xn-kd(X)
q(X)g(X) r(X) - Add r(X) to both sides, where
- r(X) Pn-k-1Xn-k-1 P0
- dk-1Xn-1 dk-2Xn-2
d0Xn-k r(X) q(X)g(X) - Since the left hand side is a multiple of g(X),
it is a code polynomial. - Therefore, dk-1dk-2 d0Pn-k-1,,P0 is a
systematic code
49Systematic Cyclic Codes
- Previous cyclic code were not systematic, i.e.
data is not part of the code word - To generate (n, k) systematic cyclic code, do the
following - Multiply d(X) by xn-k, accomplished by shifting
d(x) n-k bits - Code polynomial is c(x) xn-kd(x) r(x)
- xn-kd(x) r(x) g(x)q(x), which is code word
c(x) since c(x) is multiple of g(x)
50Example of Systematic Cyclic Code
- Generator polynomial g(x) x4 x3 x2 1 of
(7, 3) code, data contains 3 bits, n-k 4 bits
2
3
4
1
G(x)
by
Generated
Code
Cyclic
(7,3)
Systematic
x
x
x
Word
Code
Bits
Message
4
r(x)
-
d(x)
x
4
4
G(x)
d(x)
Remx
r(x)
d(x)
x
v
v
v
v
v
v
v
d
d
d
0
1
2
3
4
5
6
0
1
2
0000000
0
0
000
2
3
4
0011101
1
001
x
x
x
2
5
0100111
1
010
x
x
x
3
4
5
0111010
011
x
x
x
x
2
3
6
1001110
100
x
x
x
x
4
6
1010011
1
101
x
x
x
3
5
6
1101001
1
110
x
x
x
2
4
5
6
1110100
111
x
x
x
x
-
k
n
)
(
x
x
d
51Definitions
- Groups A group G is a set of elements and a
defined operation for - which certain axioms hold.
- For any a, b, in G, a ? b is in G (closure
property). - For any a, b, c in G, (a ?b) ?c a ?(b ?c)
(associative property). - There is an identity e in G such that e ?a a ?
e a - For each a in G, there is an inverse, a-1, such
that a ? a-1, a-1 ?a e. - An Abelian (or commutative) group is defined as a
group for which the commutative law is satisfied
a? b b? a - A SUBSET H OF ELEMENTS IN A GROUP G IS CALLED A
SUBGROUP OF G IF H ITSELF IS A GROUP. - Example The group G1 of integers modulo 5 with
the operation addition. The members of G1 are 0,
1, 2, 3, 4, . The identity e is 0. The integers 2
and 3 are inverse of each other, as are 1 and 4.
52Definitions
- A ring R is a set of elements with two operations
defined. - The set R is an Abelian group under addition.
- For any a and b in R, ab is in R (closure).
- For any a, b, and c in R, a(bc) abac and
(bc)a ba ca (distributive law) - A ring is called commutative if for any a, b in
R, ab ba. - The additive identity is denoted as 0.
- A field F is a commutative ring with a
multiplicative identity (denoted as 1) in which
every nonzero element has a multiplicative
inverse.
53Definitions
- A field of q elements is denoted as GF(q), where
GF stands for Galois field. - A vector space V is a set of elements called
vectors over a field F which satisfies the
following axioms - For any v in V and any field element in F, a
product cv, which is a vector, is defined. - If u and v are in V and c is in F, c(uv) cu
cv. - For v in V and c and d in F, (cd)v cv dv.
- If v is in V and c and d are in F, (cd)v c(dv)
and 1v v. - A subset of a vector space which is also a vector
space is called a subspace.
54Parity Check Codes
- We are concerned with symbols from GF(2) (Galois
field of two element), i.e., binary codes, and
from GF(q) for which, b 1, q 2b, - (b-adjacent binary codes)
- There are qn different vectors of the form
- X ltx1, x2, ,xngt, where
xj ? GF(q) - A subset S of qk (kltn) vectors are code words
55Matrix Description of Parity Check Codes
- Consider a k-tuple data d1, d2, , dk
- Consider a one-to-one mapping into n-tuple x1,
x2, , xn called a code word - The arithmetic is modulo 2.
- In order to have a one-to-one mapping between the
set of 2k data sequences and the corresponding 2k
code words. It is necessary that the k rows of G
be linear independent.
56G Matrix Description
- G matrix generates one-to-one mapping from
k-tuple data space to n-tuple code space - Is a linear mapping, hence called linear codes.
- More general non-binary linear codes are
sometimes considered, where gij, di, and xj are
field elements from GF(q). In the general case
there are qk code words.
57Parity Check Codes (G Matrix)
- Consider a set of 3-bit data words (000,001, 010,
,110,111). We can convert them into coded words
through a transformation by multiplying a
generator matrix G(k ? n). - Multiplication and addition are modulo 2
58Matrix Description of Parity Check Codes
- Hence parity codes are also called Linear codes
- Here gij 0 or 1 and arithmetic is modulo 2 (1
1 0, 1 x 1 1, 0 x 1 1 x 0 0 x 0 0) - G is called Generator Matrix which is a k ? n
matrix
59Systematic Parity Check Codes
- The first k column of G form a k? k identity
matrix - Hence the first k code bits are identical to the
k data bits. - The remaining n-k code bits are the parity check
bits - Convenient because data can be directly extracted
from code word (Separable code)
60Systematic Parity Check Codes
- The first k columns of G matrix from a k ? k
identity - matrix.
- Then the first k bits of an n-bit code word are
identical - to the data bits.
- The remaining n-k code bits are parity check
bits. - Convenient because data is extracted directly
from - code word.
- Example Data lt0110gt Code lt0110...gt
61Properties of Binary Parity Codes
- Interchanging rows of generator matrix or adding
rows to the other rows does not change the set of
code words, but change the mapping. - Hence, we can convert any nonsystematic code into
a - systematic code
- For the systematic code
-
- Parity bits can be related to the data bits
62Properties of Binary Parity Codes
- This is why Pjs are called parity check bits
- Rewrite this equation as
- In matrix form
63Properties of Parity Codes
- Interchanging rows of generator matrix or adding
rows to other rows does not change the set of
code words. - Change the mapping of data words to code words
- Hence, can convert any nonsystematic code into
systematic code - Allowed to exchange column positions, changes bit
positions in code word
Data Codeword 000 000000 001
011011 010 101110 011 110101 100
110010 101 101001 110 011100 111
000111
64Another Equivalent Representation
- Definition Let H be an r ? n (r n-k) matrix
of symbols from - GF(2). Then the set of binary n-component vectors
satisfying X ? Ht - 0 is called null space of H.
- Property Let H be an r? n matrix with rank r
n-k, Then - the null space of H has 2k vectors.
- Example
Any vector X ltx1, x2, x3, x4, x5gt in the null
space of H Must satisfy the two equations
65Another Equivalent Representation
- Two vectors and are orthogonal if
. Since is orthogonal to
every row of H and to every vector in n-k
dimensional space spanned by rows of H. is
in null space of row space of H. - Example
66Parity Check Codes (H Matrix)
- An equivalent (and more common) representation is
the H matrix (r ? n matrix) where r n-k, 2n
size of codespace, 2k number of codewords - The set of codewords (n-bit vectors) X must
satisfy - Codewords (0000000, 0001111, 0010110,
0011001,). - We can verify (0010110)? HT 000
67Generating Codewords From H
68Error Detection In Parity Codes
- Represent (n, k) codewords, where n 6, k 3.
- Consider the codewords (000000, 011011,
- 101110, ,000111)
- If there is a single bit error in any of the
codewords, we - get words (000001, 011010, 101111,
,000110,) - that are not member of code space
69Basic Concepts Hamming Distance
- Hamming distance properties The Hamming
- Weight of a vector x (e.g., codeword), w(x),
is number - of nonzero elements of x.
- Hamming Distance between two vectors x and
y,d(x,y) - is number of bits in which they differ.
- Distance of a code is a minimum of Hamming
distances - between all pairs of code words.
Example x (1011), y (0110)
w(x) 3, w(y) 2, d(x,y) 3
70Distance Properties of Parity Codes
- Definition The minimum distance of a code S is
the - minimum of Hamming distances between all pairs of
- code words e.g., Fragment of distance 5 (double
error - correcting code)
71Distance Properties
- To detect all error patterns of Hamming distance
? d - code distance must be ? d 1
- e.g., code with distance 2 can detect patterns
with distance 1 (i.e., single bit errors) - To correct all patterns of Hamming distance ?
c, code - distance must be ? 2c 1.
- To detect all patterns of Hamming distance d,
and - correct all patterns of Hamming distance c,
code - distance must be ? 2c d 1.
- e.g., code with distance 3 can detect or correct
all single-bit errors.
72Distance Properties of Parity Check Codes
- To detect all error patterns of weight d, and
correct all error patterns of weight c, we must
have dmin ? c d 1 - Distance of group code rank(H) 1
- The minimum distance dmin between two different
codewords is the weight of the lowest nonzero
weight codeword. - To correct all error patterns of weights e or
less, we must have dmin gt 2e.
73Distance Properties
- Distance of a group code is minimum weight of
nonzero code words - Distance of code is also minimum number of
columns of H matrix that are linearly dependent,
i.e. add to zero vector - Sum of columns 2, 4, 5 zero vector, Hence
distance of code 3. Can correct single bit
errors (SEC code) - I f the code has m n-k check bits, The longest
single-error-correcting code is of length 2m 1. - The resulting (2m 1, 2m -1 m) code is called
a Hamming single-error-correcting code.
74Simple Parity Check Code
- Consider simple parity code, e.g., 8 bits of
data, plus parity bit forms a (9, 8) code which
has 28 codewords in space of 29. Corresponding H
matrix (1 ? 9) - H 1111 1111 1
- One example (even parity) codeword (010010011)
-
- X ? HT 0
- Consider word (010010010) multiplied by HT? 0,
hence not a codeword, error detected.
75Single-bit Even Parity Code
- H1 1 1
- H is a 1 ? n matrix
- Code has n-1 information bits, 1 check bit
- No column of H is zero, but any two columns are
linearly dependent, hence this is a distance 2
code (rank 1) - Used to detect single errors in computer
peripherals and memories
76Hamming single Error Correcting Codes
- Single bit error-correcting code, i.e., distance
3 ? 2c 1 - We want each column of the H matrix to be
different and nonzero - Code has m n-k check bits
- Hence code has 2m-1 nonzero syndromes the longest
single-error-correcting code having m check bits
is of length 2m-1 -
- Resulting code (2m-1, 2m-1-m)
77Hamming Code
- In 1950, R. W. Hamming described a general method
for constructing codes with a minimum distance of
3, now called Hamming codes. - For any value of i, this method yields a 2i-1
-bit code with i parity bits and 2i-1-i
information bits. - The bit positions in a hamming code word are
numbered from 1 to 2i-1. Any position whose
number is a power of 2 is a parity bit, and the
remaining positions are information bits. - Each parity bit is grouped with a subset of the
information bits, as specified by a parity-check
matrix.
78Hamming Codes
- Each parity bit is grouped with the information
positions whose numbers have a 1 in the same bit
when expressed in binary. - For a given combination of information-bit
values, each parity bit is chosen so that the
total number of 1s in its group is even. - We simply add one more parity bit, chosen that
the parity of all the bits, including the new
one, is even.
79Hamming Code
80Hamming Codes
Information Bits Parity Bits (3) Parity
Bits (4) 0000 000
0000 0001
011 0111 0010
101
1011 0011 110
1100 0100
110 1101 0101
101
1010 0110 011
0110 0111
000 0001 1000
111
1110 1001 100
1001 1010
010 0101 1011
001
0010 1100 001
0011 1101
010 0100 1110
100
1000 1111 111
1111
81Error Checking
82Error Checking (Cont.)
83Error Correction With Hamming Code
84Algorithm for Correcting Errors
- Test whether S is 0. If S is 0, the word is
assumed to be error free. - If S ? 0, try to find perfect match between S and
column of H matrix match implemented by n r-way
AND gates (r n-k) - If S is same as ith column of H, the ith bit of
word is in error corrected by flipping bit. - If is not equal to any column of H, error is
uncorrectable (UE)
85Circuitry for Correcting Errors
Error detected
r check bits read
r syndrome bits
k data bits
OR
Syndrome decoder (n r-way AND gates)
AND
OE
XOR tree
NOR
Error corrector (in two way XOR
Bit-wise XOR
Corrected word
r syndrome bits
r check bits (written)
86SEC/DED Codes
- Single Error Correction, Double Error Detection
- Distance c d 11 2 1 4
- SEC/DED codes used in Memories
87Summary
- Applied information redundancy
- Described parity check codes
- G and H matrices
- Distance properties of codes
- SEC/DED codes