Title: The Math Behind the Compact Disc
1The Math Behind the Compact Disc
- Linear Algebra and Error-Correcting Codes
william j. martin. mathematical sciences.
wpi wednesday december 3. 2008 fairfield
university
2How the device works
The compact disc is a complex system
incorporating interesting ideas from engineering,
physics, CS and math. We will focus only on the
mathematics of the error- correction
strategy. For more info on the CD, see Kelin
Kuhns book Laser Engineering
3 Borrowed from K J Kuhns book Laser Engineering
4The Pits
- Each pit is 0.5 microns wide
- and 0.83 to 3.56 microns long.
- Tracks are separated by 1.6 microns of land
- Wavelength of green light is about 0.5 micron
- 40 tracks under one strand of human hair
5Modelling a CommunicationsChannel
Linear algebra model r me (vector add.)
6Channel with Error Correction
7Turn it into an algebra problem!
- A number system that the computer can understand
- F 0, 1
- Ordinary multiplication
- Addition 110
- Now music is turned into binary vectors!
8A bit (or a nibble?) of graph theory
- The n-cube is a type of Hamming graph
- Vertices are all binary n-tuples
- n-tuples are adjacent if they differ in only one
coordinate - Nice eigenvalues!
9Binary Vector Spaces
- The vectors are all possible binary n-tuples
0 0 1 0 1 1 1 0 1 0 1 1 0 0 0
0 0 1 1 1 1 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 1 0 1 0 1 1 0 0 1
10Hamming Distance
- The distance between two binary n-tuples x and y
is the number of coordinates in which they differ
dist( 001100, 001011 ) 3
- This is a metric
- dist( x, y ) ? 0 with dist( x, y ) 0 iff
xy - dist( x, y ) dist( y, x )
- Triangle inequality
- dist( x, z ) ? dist( x, y ) dist( y, z )
11Theorem
n
- Let C (the code) be a subset of F with
minimum distance between any two codewords equal
to d. - Then there exists an algorithm which corrects up
to t errors per transmitted codeword if and only
if d ? 2t 1.
12Proof
- If x and y are distinct codewords, then the
balls of radius t around them are disjoint. So if
the received vector is within distance t of x, it
must be at distance gt t from any other codeword.
So decoding is unique.
13A Useful Extension of the Theorem
- The above (computationally infeasible)
decoding algorithm also correctly recovers from
any t symbol errors and any s symbol erasures
provided d gt 2ts.
transmit 0 1 1 2 2 3 0 receive 0 1 3 3 ? ?
? (here, t2 errors and s3 erasures)
14Small Example
- Let C denote the rowspace of the matrix
- Then C 000000, 110100, 011010, 101110,
- 001101, 111001, 010111, 100011
- and C has minimum distance 3 so C allows
correction of any single-bit error in any
transmitted codeword.
15The binary Hamming code
- Codewords 0 0 0 0 0 0 0 1 1 1 1 1 1 1
- 1 1 0 1 0 0 0 0 0
1 0 1 1 1 - 0 1 1 0 1 0 0 1 0
0 1 0 1 1 - 0 0 1 1 0 1 0 1 1
0 0 1 0 1 - 0 0 0 1 1 0 1 1 1
1 0 0 1 0 - 1 0 0 0 1 1 0 0 1
1 1 0 0 1 - 0 1 0 0 0 1 1 1 0
1 1 1 0 0 - 1 0 1 0 0 0 1 0 1
0 1 1 1 0
- Quadratic Residues!
- In we have
- 1 6 1
- 4 5 4
- 3 2 4 2
Z
Z
7
2
2
2
2
2
2
16The Fano projective plane
3
Vector Space F Poynts 1-dim.
subspaces Lynes 2-dim. subspaces
2
17C nullsp(H) where
All codewords 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 1 1 1 1
1 1 1 0 0 0 0 0 1 1 0 0 1 1
1 0 0 1 1 0 0 0 1 1 1
1 0 0 1 0 0 0 0 1
1 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 1 0 1 0
0 1 0 0 1 0 1 1 1 0 0 1 1 0
0 0 1 1 0 0 1 1 1 0 1 0 0 1
0 0 1 0 1 1 0
18Codes from polynomials
- Lets replace F0,1 with F0,1,,6 (with
modular arithmetic). Now consider the vector
space Fz of all polynomials in z with
coefficients in F. For any subset N of F, we
have a linear transformation - L Fz ? F
- via f(z) ? f(0), f(1), f(2), f(3), f(4), f(5)
(Here, we use, N0,1,2,3,4,5.) - This is a Reed-Solomon code.
N
19Polynomials to Codewords
- Example
- Let the message be 1, 2, 2 (working mod 7)
- Polynomial is f(z) z 2 z 2
- Codeword is
- f(0), f(1), f(2), f(3), f(4), f(5) 2,
5, 3, 3, 5, 2
2
20Reed-Solomon Codes
- FACT Two polynomials of degree less than k
having k points of intersection must be equal. - SO Reed-Solomon code of length nltq and dim k has
min. dist. n-k1
21Compact Disc Parameters
- SONY/Philips design (1980)
- Music is sampled 44,100 times per second
- Each sample consists of 32 bits, representing
- left and right channel signal magnitude
065535 (Pulse Code Modulation PCM) - So chip must process 1,411,200 raw data bits per
second - But it gets much worse!
22Cross-Interleaved RS Codes
- Inner code is a 28-dimensional subspace of a
- 32-dimensional vector space over a finite field
of size 256. - Outer code is a 24-dimensional subspace of a
28-dimensional vector space. - Six 32-bit samples make up a 192-bit frame which
is encoded as a 224-bit codeword. (Eventually,
codewords have length 588 bits!)
23Encoding The numbers
- The codewords from the first code are interleaved
into a virtually infinite array of 28 rows of
symbols over GF(256). - We pull out 8 binary columns (one symbol) to
obtain a 28x8224-bit frame which is then encoded
using another Reed-Solomon code to obtain a
codeword of length 256 bits.
24Interleaving to disperse errors
- Codewords of first code are stacked like bricks
- 28 rows of vectors over GF(256)
- Extract columns and re-encode using second
Reed-Solomon code
25Splitting Odd and Even Bits
26Back to the Pits
- Each pit is 0.5 microns wide
- and 0.83 to 3.56 microns long.
- Tracks are separated by 1.6 microns of land
- Not all 01-sequences can be recorded
27EFM Eight-to-Fourteen Modulation
- This encoding scheme can only store sequences
where each consecutive pair of ones is separated
by at least 2 and at most 10 zeros - This is achieved by a mapping F ? F
- which is given by a lookup table.
14
8
2
2
28Further Processing
- Three more merge bits are added to each of
these 14 - So 256826433x8 bits, carrying six samples, or
192 information bits, gets encoded as 588 channel
bits on the disk - This represents 0.000136 seconds of music
29What actually goes on the disc?
- We must do this 7,350 times per second
- So CD player reads 4,321,800 bits per second of
music produced - To get 74 minutes of music, we must store
- 74x60x4321800 19,188,792,000
- bits of data on the compact disc!
30When in doubt, erase
- Inner code has minimum distance 5 (over GF(256))
- Rather than correct two-symbol errors, the CD
just erases the entire received vector.
31Sohow good is it?
- The two Reed-Solomon codes team up to correct
burst errors of up to 4000 consecutive data
bits (2.5 mm scratch on disc) - If signal at time t cannot be recovered,
interpolate - With smart data distribution, this allows for
recovery from burst errors of up to 12,000 data
bits (7.5 mm track length on disc) - If all else fails, mute, giving 0.00028 sec of
silence.
32Other Applications
- Space communications (Mariner,Voyager,etc.)
- DVD, CD-R, CD-ROM
- Cell phones, internet packets
- Memory chips, hard drives, USB sticks
- RAID disk arrays
- Quantum computing
33The Last Slide
Thank You All!