Title: Multimedia: Representation, Compression and Transmission
1Chapter 2
- Multimedia Representation, Compression and
Transmission
2Contents
- 1. Text
- 1.1 Text Representation
- 1.2 Principle of Text Compression
- 1.3 Theoretical Limit on Compression Efficiency
- 1.4 Compression Methods
- 1.4.1 Run-Length Encoding
- 1.4.2 Huffman Coding
- 1.4.3 Remark
3Contents
- 2. Audio
- 2.1 Human Perception
- 2.2 Audio Bandwidth
- 2.3 Digitization
- 2.4 Audio Compression
- 2.4.1 Differential PCM
- 2.4.2 Adaptive Differential PCM
- 2.4.3 MP3
4Contents
- 3. Image
- 3.1 Image Representation
- 3.1.1 Resolution
- 3.1.2 Color
- 3.2 Image Compression
- 3.2.1 General Concept
- 3.2.2 Concept of Discrete Cosine Transform
(DCT) - 3.2.3 JPEG
- 3.2.4 JPEG2000
5Contents
- 4. Video
- 4.1 Video Representation
- 4.2 Video Compression
- 4.2.1 General Concept
- 4.2.2 MPEG-1
- 4.2.3 Other MPEG Standards
61. Text 1.1 text representation
- Unformatted Text
- Unformatted text comprises strings of characters
from a character set. - ASCII Character Set static encoding
- Each character is represented by a 7-bit
codeword. - There are 128 characters (some are printable
characters, some are control characters). - ASCII is an example of a fixed length code. There
are 100 printable characters in the ASCII
character set, and a few non printable
characters, giving 128 total characters. Since
log2128 7, ASCII requires 7 bits to represent
each character. The ASCII character set treats
each character in the alphabet equally, and makes
no assumptions about the frequency with which
each character occurs. - Extended ASCII Character Set
- Each character is represented by a 8-bit
codeword. - There are 128 extra characters for representing
non-English characters and graphics/mathematical
symbols.
7Text text representation
- Formatted Text
- In formatted text, characters can have different
styles/size/shape, and they can be structured
into chapters, sections, paragraphs, etc. - We can use word processing softwares to produce
formatted text. - Hypertext
- Hypertext contains formatted text as well as
hyperlinks to other documents (e.g., web
documents).
81. Text 1.2 Principle of Text Compression
- It is desirable to compress text to reduce its
size (i.e., reduce the total number of bytes)
before transmission dynamic encoding. - Save network resources, speed up transmission,
and save storage space.
9Text Principle of Text Compression
- Principle of Text Compression
- Different characters have different frequency of
- occurrence (e.g., e occurs more frequently than
z). - Use fewer bits to represent the frequently used
characters, and use more bits to represent the
less frequently used characters. - The average number of bits per character can be
reduced. - After compression, different codewords may have
different number of bits.
101.3 Theoretical Limit on Compression efficiency
- Suppose there are N characters C1,C2 C3,,CN ,
and character occurs with probability pi. - If successive characters are statistically
independent, the amount of information gained
after observing the character Ci is defined to
be
111.3 Theoretical Limit on Compression efficiency
- The average information is called entropy. It is
given by - the weighted average of I (Ci)
Shannon Theorem The mean code length for any
coding method is at least H.
121.4 Compression Methods
Run-Length Encoding Every string of repeated
symbol (e.g., bits, numbers, character, etc) is
replaced by (i) a special marker (ii) the
symbol (iii) the number of times the symbol
occurs.
Example Consider the following string of
number 31500000000000084511111111 Suppose we
use A as the marker and two-digit number for the
repetition counter. The encoded
(compressed) string is 315A012845A108
131.4 Compression Methods
Run-Length Encoding Every string of repeated
symbol (e.g., bits, numbers, character, etc) is
replaced by (i) a special marker (ii) the
symbol (iii) the number of times the symbol
occurs.
Example Consider the following string of
number 31500000000000084511111111 Suppose we
use A as the marker and two-digit number for the
repetition counter. The encoded
(compressed) string is 315A012845A108
141.4 Compression Methods
Huffman Coding Huffman coding assigns shorter
codewords to the more frequently occurring
characters lossless compression.
- A Huffman Code is an optimal prefix code, that
guarantees unique decodability of a file
compressed using the code. The code was devised
by Huffman as part of a course assignment at MIT
in the early 1950s. - Huffman coding is a technique for assigning
binary sequences to elements of an alphabet. The
goal of an optimal code is to assign the minimum
number of bits to each symbol (letter) in the
alphabet. - ASCII is an example of a fixed length code. There
are 100 printable characters in the ASCII
character set, and a few non printable
characters, giving 128 total characters. Since
log2128 7, ASCII requires 7 bits to represent
each character. The ASCII character set treats
each character in the alphabet equally, and makes
no assumptions about the frequency with which
each character occurs.
151.4 Compression Methods
Huffman Coding
- A variable length code is based on the idea that
for a given alphabet, some letters occur more
frequently than others. This is the basis for
much of information theory, and this fact is
exploited in compression algorithms to use as few
bits as possible to encode data without losing
information. - More sophisticated compression techniques can use
compression techniques that actually discard
information lossy compression . For example,
image and video data can take a sustain a certain
amount of loss since our brain can compensate for
missing information, up to a degree. - However, for text compression, we dont want to
have characters discarded as part of the
compression, so a text compression requires a
unique decodability condition of the compression
algorithm. In the Huffman coding algorithm,
symbols that occur more frequently have a shorter
codewords than symbols that occur less
frequently. The two symbols that occur least
frequently will have the same codeword length.
161.4 Compression Methods
- Construction of Huffman Code
- The characters are listed in order of decreasing
occurrence probabilities. - 2. The two characters of the lowest probability
are assigned a "0" and "1 respectively. They are
"combined" into a new character. The probability
of occurrence for this new character is equal to
the sum of the two original characters. Replace
the two characters with the new character. - 3. Repeat the above steps until only two
characters remain. - 4. The codeword for each character is determined
by working backward and tracing the sequence of
0s and 1s assigned to that character as well as
its successors.
171.4 Compression Methods
- Example
- The probability of occurrence of four
characters a1, a2, a3, a4 are 0.500, - 0.250, 0.125, 0.125 respectively.
- The codewords for a1, a2, a3, a4 can be found
to be 0, 10, 110, 111 - respectively as follows
181.4 Compression Methods
- The mean codeword length can be found to be 1.75
- bits/character.
- The entropy can be found to be 1.75.
- In this example, the codewords are optimal
(i.e., the - mean codeword length is minimum).
Decompression The receiver maps each codeword to
its original character. It must know the
codewords adopted (e.g., it gets the codewords in
advance, or receives them from the transmitter).
191.4 Compression Methods
- Remark
- There are other interesting text compression
methods - Dynamic Huffman coding
- LZ coding
- LZW coding
- If you are interested in these methods, please
refer to - F. Halsall, Multimedia Communications, Chapters
2-3, Pearson Education, 2001.