L4139 - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

L4139

Description:

Consider that you were in receipt of the following telegram: ... It is possible due to the inherent redundancy of natural language to perform a ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 37
Provided by: ianmc8
Category:
Tags: l4139 | telegram | the

less

Transcript and Presenter's Notes

Title: L4139


1
COM347J1Networks and Data Communications
Lecture 4 Data Compression, Error Detection and
Error Correction
  • Ian McCrum Room 5D03B
  • Tel 90 366364 voice mail on 6th ring
  • Email IJ.McCrum_at_Ulster.ac.uk
  • Web site http//www.eej.ulst.ac.uk

2
The Encoding and compression of data
  • Introduction
  • Information Content of a message stream
  • simple coding methods
  • Huffman coding
  • compression techniques

3
REDUNDANCY
  • Consider that you were in receipt of the
    following telegram
  • RONMIE (ROCKTT) OSULLIVON 146 CREAK
  • It is possible due to the inherent redundancy of
    natural language to perform a reconstruction
    leading to the message on the next slide.

4
REDUNDANCY
  • Consider that you were in receipt of the
    following telegram
  • RONMIE (ROCKTT) OSULLIVON 146 CREAK
  • It is possible due to the inherent redundancy of
    natural language to perform a reconstruction
    leading to the message below.
  • RONNIE (ROCKET) OSULLIVAN 146 BREAK
  • but what about the numbers in the message?

5
Redundancy
  • Redundancy arises due to the correlation of
    letters occurring in natural language, consider
    the word
  • YACH ( if T is sent it will carry no information)
  • Is it possible for a coding schema to produce an
    Ideal code?

6
Reduction of Redundancy
  • observe the
  • Statistical occurrence of symbols
  • Repetition of symbols
  • employ
  • Fano coding, Huffman coding (the most common
    symbols are given shorter codes)
  • data compression (e,g code repetition as a
    special case)

7
Packed decimal / half byte compression
  • When frames just contain numeric characters
  • use binary coded decimal instead of 7 bit ASCII
    or 8 bit EBCDIC as only the four least
    significant bits change with number.
  • In ASCII and in same column are used as
    decimal pt and space respectively

8
Packed Decimal
STX Cntrl XX 26 3 2 45
ETX BCC
Closing flag Block CC
1st number 26.32
Number of digits following
Control character half byte compression
Opening Flag
9
Relative encoding
  • Whenever only small differences occur between
    successive values
  • send only that difference
  • very effective in data logging
  • consider level of a river

Relative encoding sign, number and delimiter
STX 1 4
ETX BCC
Relative encoding using signed 8 bit integers
STX 3 -95 11 124 -100
ETX BCC
10
Character suppression
  • in a stream of digits there are often sequences
    of the same characters, most frequently spaces.
  • if a continuous string of three or more chars in
    a sequence it is replaced by Cntrl,char,number
  • thus CntrlF25 means 25Fs in a sequence.
  • type of run-length encoding

11
Character suppression
STX Cntrl sp 45 A B
ETX BCC
Single letters
Closing flag Block CC
number of chars
Char being suppressed
Control character
Opening Flag
12
Run length encoding
  • Run-length compression where the codeword
    actually contains the number of repetitions.
  • A three byte minimum repetition is chosen such
    that all occurrences of repetitions greater or
    equal to 3 will be encoded thus.
  • ltchargtltchargtltchargtltngt
  • this four byte codeword can represent repetitions
    up to 259
  • ltchargt ltchargt
  • ltchargtltchargt ltchargtltchargt
  • ltchargtltchargtltchargt ltchargtltchargtltchargtlt0gt
  • ltchargtltchargtltchargtltchargt ltchargtltchargtltchargtlt1gt
  • ltchargtltchargtltchargtltchargtltchargt ltchargtltchargtltchargt
    lt2gt

13
Huffman coding
  • Instead of representing symbols with a fixed no
    of bits, fewer bits are used for frequently
    occurring symbols and vice versa
  • Method Determine the relative frequency of
    symbols. Create an unbalanced tree with unequal
    branches.

14
Example of Huffman
  • Consider that a group of characters A to H is to
    be transmitted. This comprises
  • 9As, 9Bs, 5Cs, 5Ds, 2Es, 2Fs, 2Gs, 2Hs
  • Sequence of operations.
  • a) Order the symbols in terms of probability
  • b) Combine the two least frequently occurring
    symbols
  • c) assigning 1(upper) and 0(lower) to each.
  • d) This is now considered to be one entity.

15
Huffman continued
  • Perform the same steps until only two symbols are
    left.
  • Determine the codeword by reading from left to
    right. The first bit being read is the least
    significant one.

16

17
Comparison
  • If there were N symbols then N codewords would be
    sent. In the case of fixed length binary codes
    this would be represented by 3N bits.
  • How does this compare with those required by this
    example of Huffman encoding?

18

19
Therefore there has been a saving of 0.28N bits
in comparison with fixed length binary each of 3
bits. Redundancy it can shown that the ideal
code for this sequence of symbols would take
2.53N bits ie. this is the actual information
content of the stream of codewords. Thus for
fixed length binary codes the
Information content Redundancy 1 -
------------------------- Number of
bits sent or 1 -
2.53N/3.0N 16 for Huffman
1 - 2.53N/2.72N
7
20
MNP Class 5 Compression
  • is a combination of Huffman and run-length
    encoding.
  • The symbol stream is run-length encoded with a
    minimum repetition of 3 bytes and then Huffman
    encoded using a statistically generated table.
  • During transmission the statistics for the
    occurrence of each symbol are updated and the
    allocation of codewords are dynamically changed.
  • MNP Class 5 compression achieves 21 compression
    on a regular basis. Its major drawback is that
    cannot turn itself off when it offers no gain, so
    that an incompressible file actually expands by
    gt10.

21
Error detection and protection
  • Introduction
  • Error Detection
  • recognise that one has happened
  • Error Correction
  • repair damaged data
  • parity and CRC.
  • BCC and Hamming,

22
Data errors
  • Errors can arise due to attenuation of signal
    strength and due to other reasons.
  • well shaped signals can become distorted and
    thus misinterpreted.
  • Random errors (each occurs with certain
    probability)
  • noise in electronics
  • distance traveled
  • Burst errors (groups of bits in error occur)
  • source interference
  • faults in equipment

23
Error detection
  • A sequence of bits (I0 In) is subjected to some
    processing (P) giving rise to a check sequence
    (C0Ck)
  • Both are transmitted toward a receiver and incur
    a possibility of corruption.
  • Upon reception the bit stream is separated into
    received data (I0r Inr) and received check
    sequence (C0rCkr).
  • The received data (I0r Inr) is assumed to be
    correct and the same processing (P) is performed
    on it giving the reconstructed sequence
    (C0rr...Ckrr).
  • If received check sequence (C0rCkr) and the
    reconstructed sequence (C0rr...Ckrr) are equal
    then no detectable error has occurred.

24
Parity for ASCII codes
  • Consider a seven-bit ASCII code to comprise the
    following bits which can be labeled I6, I5, I4,
    I3, I2, I1, I0
  • A Parity bit P0 is placed beside the most
    significant bit I6 so that the codeword P0, I6,
    I5, I4, I3, I2, I1, I0 is formed.
  • The Parity bit is determined as before so that
    for Odd parity there are an odd number of 1s in
    the codeword.
  • and for Even parity there are an even number of
    1s in the codeword.

25
Block Sum Check Character
P0 I6 I5 I4 I3 I2 I1 I0 1 0 1 1 1 0 1 0 0 1 1 1
1 0 0 1 1 1 0 1 1 1 0 0 0 0 1 0 0 1 1 0 1 1 1 0 1
0 0 1 1 1 1 1 0 1 1 1 0 1 0 1 1 0 0 0
Codeword 1 Codeword 2 Codeword 3 Codeword
4 Codeword 5 Codeword 6 Block Check Char.
Hey! See me!!
26
Block Sum Check Character
  • Consider what this method can do
  • in terms of detecting errors.
  • in terms of correcting errors.
  • Can you see where it might be used in practice?
  • Where will it cease to work adequately?

27
Cyclic Redundancy Check (CRC)
  • The CRC is so called because the codes fall into
    a class of cyclic codes each forming new legal
    code which shifted, when added to a sequence of
    bits they increase the redundancy of the
    codeword.
  • The data sequence is divided by a standard
    polynomial and the remainder is the check bits or
    CRC.
  • Polynomial is of the form
  • 1.X4 0.X3 1.X2 0.X1 1
  • more usually written X4 X2 1
  • and in binary take the form 10101

28
The arithmetic is different! But easier
  • In decimal 0..9 and 0..9 means 100 different
    additions and 21 different answers (0..20)
  • In binary using a half adder or exclusive OR
    there are (0 1) and (0 1) meaning 4 different
    additions and only 2 answers.
  • Thus 0 ? 0 0, 0 ? 1 1, 1 ? 0 1 and
    1 ? 1 0
  • ? being the symbol for exclusive OR.
  • think of a half adder being an adder without a
    carry.

29
To perform CRC determination
  • Get data to be protected, ok 11011
  • Choose polynomial ok X4 X2 1
  • append to data the number of bits indicated by
    the maximum order of the polynomial (4) giving
    110110000
  • divide this number by the polynomial thus
  • 110110000 / 10101
  • Take the remainder and send after the original
    data.
  • Upon reception check received CRC with
    reconstructed CRC to determine error conditions.

30
Use the polynomial x4 x2 1 to generate CRC
11101 10101 110110000
10101 11100 10101
10010 10101 11100
10101 1001
Thus the remainder is 1001and codeword 110111001
31
Does 111010010 contain an error, generated by
using the same polynomial as before.
11000 10101 111010000
10101 10000 10101
10100 10101 010

Thus the remainder is 0010 and codeword 111010010
32
Or divide rx data and crc by generating
polynomial and remainder should be zero
11010 10101 111010010
10101 10000 10101
10101 10101 000

Thus the remainder is 000 and codeword 11101 was
rx ok!
33
Hamming Codes
11 10 9 8 7 6 5 4 3 2 1
position in codeword I6 I5 I4 C3 I3 I2 I1 C2 I0
C1 C0 information and checks
Given an ASCII code 1001010 what is the Hamming
Code?
11 10 9 8 7 6 5 4 3 2 1 I6
I5 I4 C3 I3 I2 I1 C2 I0 C1 C0 1 0 0 x 1 0 1 x 0
x x
34
How to determine the values of C3C2C1C0
C3 C2 C1 C0 11 1 0 1 1 7 0 1 1 1 5 0 1
0 1 1 0 0 1
I6 I5 I4 C3 I3 I2 I1 C2 I0 C1 C0 1 0 0 1 1 0 1 0
0 0 1
35
How does this detect an error?
I6 I5 I4 C3 I3 I2 I1 C2 I0 C1 C0 1 0 0 1 1 1 1 0
0 0 1
Bit in error
C3 C2 C1 C0 11 1 0 1 1 8 1 0 0 0 7 0 1 1
1 6 0 1 1 0 5 0 1 0 1 1 0 0 0 1 0 1 1 0
Therefore 6th bit was received in error
36
Summary
  • Hamming codes have their redundant bits in the
    positions which are powers of 2 ie 1,2,4,8 etc
  • They can detect and correct single errors.
  • They can indicate multiple error conditions but
    cannot correct.
  • Used for random errors.
  • Can you think of how they might be applied to a
    circumstance a burst error could occur? Assume
    that the burst is shorter that 8 bits and there
    are 256 bytes to be transmitted.
Write a Comment
User Comments (0)
About PowerShow.com