Title: Chapter 12 Multimedia Information
1Chapter 12Multimedia Information
- Lossless Data Compression
- Compression of Analog Signals
- Image and Video Coding
2Bits, numbers, information
- Bit number with value 0 or 1
- n bits digital representation for 0, 1, , 2n
- Byte or Octet, n 8
- Computer word, n 16, 32, or 64
- n bits allows enumeration of 2n possibilities
- n-bit field in a header
- n-bit representation of a voice sample
- Message consisting of n bits
- The number of bits required to represent a
message is a measure of its information content - More bits ? More content
3Block vs. Stream Information
- Block
- Information that occurs in a single block
- Text message
- Data file
- JPEG image
- MPEG file
- Size Bits / block
- or bytes/block
- 1 kbyte 210 bytes
- 1 Mbyte 220 bytes
- 1 Gbyte 220 bytes
- Stream
- Information that is produced transmitted
continuously - Real-time voice
- Streaming video
- Bit rate bits / second
- 1 kbps 103 bps
- 1 Mbps 106 bps
- 1 Gbps 109 bps
4Transmission Delay
- L number of bits in message
- R bps speed of digital transmission system
- L/R time to transmit the information
- tprop time for signal to propagate across
medium - d distance in meters
- c speed of light (3x108 m/s in vacuum)
- Use data compression to reduce L
- Use higher speed modem to increase R
- Place server closer to reduce d
5Compression
- Information usually not represented efficiently
- Data compression algorithms
- Represent the information using fewer bits
- Noiseless original information recovered
exactly - E.g. zip, compress, GIF, fax
- Noisy recover information approximately
- JPEG
- Tradeoff bits vs. quality
- Compression Ratio
- bits (original file) / bits (compressed file)
6Color Image
Red component image
Green component image
Blue component image
Color image
Total bits 3 ? H ? W pixels ? B bits/pixel
3HWB
Example 8?10 inch picture at 400 ? 400 pixels
per in2 400 ? 400 ? 8 ? 10 12.8 million
pixels 8 bits/pixel/color 12.8 megapixels ? 3
bytes/pixel 38.4 megabytes
7Examples of Block Information
8Chapter 12Multimedia Information
- Lossless Data Compression
9Data Compression
- Information is produced by a source
- Usually contains redundancy
- Lossless Data Compression system exploits
redundancy to produce a more efficient (usually
binary) representation of the information - Compressed stream is stored or transmitted
depending on application - Data Expansion system recovers exact original
information stream
10Binary Tree Codes
- Suppose information source generates symbols from
- A a1, a2, , aK
- Binary tree code
- K leafs
- 1 leaf assigned to each symbol
- Binary codeword for symbol aj is sequence of
bits from root to corresponding leaf - Encoding use table
- Decoding trace path from root to leaf, output
corresponding symbol repeat
Encoding Table
a1 00 a2 1 a3 010 a4 011
11Performance of Tree Code
- Average number of encoded bits per source symbol
- Let l(aj) length of codeword for aj
- To minimize above expression, assign short
codeword to frequent symbols and longer codewords
to less frequent symbols
12Example
- Assume
- 5 symbol information source a,b,c,d,e
- symbol probabilities 1/4, 1/4,1/4,1/8,1/8
0
1
Symbol
Codeword
a 00 b 01 c 10 d 110 e 111
0
1
0
1
00
01
1
10
0
c
a
b
110
111
e
d
17 bits
aedbbad.... mapped into 00 111 110 01 01 00 110
... Note decoding done without commas or spaces
13Finding Good Tree Codes
- What is the best code if K2?
- Simple! There is only one tree code assign 0
or 1 to each of the symbols - What about K3?
- Assign the longest pair of codeword to the two
least frequent symbols - If you dont, then switching most frequent symbol
to shortest codeword will reduce average length - Picking the two least probable symbols is always
best thing to do
14Huffman Code
- Algorithm for finding optimum binary tree code
for a set of symbols - A1,2,,K, denote symbols by index
- Symbol probabilities p1, p2, p3, , pK
- Basic step
- Identify two least probable symbols, say i and j
- Combine them into new symbol (i,j) with
probability pi pj - Remove i and j from A and replace them with (i,j)
- New alphabet A has 1 fewer symbol
- If A has two symbols, stop
- Else repeat Basic Step
- Building the tree code
- Each time two symbols are combined join them in
the binary tree
15Building the tree code by Huffman algorithm
e
d
c
b
a
.05
.15
.10
.50
.20
.15
.30
.50
1.00
El1(.5)2(.20)3(.15)4(.1.05)1.95
16What is the best performance?
- Can we do better?
- Huffman is optimum, so we cannot do better for A
- If we take pairs of symbols, we have a different
alphabet - Aaa, ab, ac, , ba, bb, , ea, eb, ee
- (.5)(.5), (.5)(.2), .., (.05)(.05)
- By taking pairs, triplets, and so on, we can
usually improve performance - So what is the best possible performance?
- Entropy of the source
17Entropy of an Information Source
- Suppose a source
- produces symbols from alphabet A1,2,,K
- with probabilities p1, p2, p3, , pK
- Source outputs are statistically independent of
each other - Then the entropy H of the source is the best
possible performance
18Examples
- Example 1 source with .5, .2, .15, .10, .05
- Huffman code gave El1.95, so its pretty close
to H
- Example 2 source with K equiprobable symbols
- Example 3 source with K2m equiprobable
symbols
- Fixed-length code with m bits is optimum!
19Run-Length Codes
- Blank in strings of alphanumeric information
- ------5----3-------------2--------3------
- 0 (white) and 1 (black) in fax documents
- When one symbol is much more frequent than the
rest, block codes dont work well - Runlength codes work better
- Parse the symbol stream into runs of the frequent
symbol - Apply Huffman or similar code to encode the
lengths of the runs
20Binary Runlength Code 1
- Run Length Codeword Codeword (m
4) - 1 0 00..00 0000
- 01 1 00..01 0001
- 001 2 00..10 0010
- 0001 3 00..11 0011
- 00001 4 . .
- 000001 5 . .
- 0000001 6 . .
- . . . .
- . . . .
- 000...01 2m 2 11..10 1110
- 000...00 run gt2m 2 11..11 1111
m
- Use m-bit counter to count complete runs up to
length 2m-2 - If 2m-1 consecutive zeros, send m 1s to indicate
lengthgt2m-2
21Example Code 1
Code 1 performance m / ER encoded bits/source
bits
22Binary Runlength Code 2
- Run Length Codeword Codeword (m
4) - 1 0 10..00 10000
- 01 1 10..01 10001
- 001 2 10..10 10010
- 0001 3 10..11 10011
- 00001 4 . .
- 000001 5 . .
- 0000001 6 . .
- . . . .
- . . . .
- 000... 01 2m 1 11..11 11111
- 000... 00 run gt2m 1 0 0
m 1
- When all-zero runs are frequent, encode event
with 1 bit to get higher compression
23Example Code 2
Code 2 performance E l / ER encoded
bits/source bits
24Predictive Coding
25Fax Documents use Runlength Encoding
- CCITT Group 3 facsimile standard
- Default 1-D Huffman coding of runlengths
- Option 2-D (predictive) run-length coding
26Adaptive Coding
- Adaptive codes provide compression when symbol
and pattern probabilities unknown - Essentially, encoder learns/discovers frequent
patterns - Lempel-Ziv algorithm powerful popular
- Incorporated in many utilities
- Whenever a pattern is repeated in the symbol
stream, it is replaced by a pointer to where it
first occurred a value to indicate the length
of the pattern - All tall We all are tall. All small We all are
small - All_ta2,3We_6,4are4,5._1,4sm6,1531,5.
_tall
All_
all_
ll
small
all_We_all_are_
27Chapter 12Multimedia Information
- Compression of Analog Signals
28Stream Information
- A real-time voice signal must be digitized
transmitted as it is produced - Analog signal level varies continuously in time
29Digitization of Analog Signal
- Sample analog signal in time and amplitude
- Find closest approximation
Original signal
Sample value
Approximation
3 bits / sample
Rs Bit rate bits/sample x samples/second
30Sampling Theorem
Nyquist Perfect reconstruction if sampling rate
1/T gt 2Ws
(a)
(b)
Interpolation filter
31Quantization of Analog Samples
Quantizer maps input into closest of
2m representation values
3.5?
output y(nT)
2.5?
1.5?
0.5?
???
???
??
???
Quantization error noise x(nT) y(nT)
-0.5?
??
??
?
??
input x(nT)
-1.5?
-2.5?
-3.5?
32Bit Rate of Digitized Signal
- Bandwidth Ws Hertz how fast the signal changes
- Higher bandwidth ? more frequent samples
- Minimum sampling rate 2 x Ws
- Bit Rate 2 Ws samples/second x m bits/sample
- Representation accuracy range of approximation
error - Higher accuracy
- ? smaller spacing between approximation values
- ? more bits per sample
- SNR 6m 7 dB
33Example Voice Audio
- Telephone voice
- Ws 4 kHz ? 8000 samples/sec
- 8 bits/sample
- Rs8x8000 64 kbps
- Cellular phones use more powerful compression
algorithms 8-12 kbps
- CD Audio
- Ws 22 kHz ? 44000 samples/sec
- 16 bits/sample
- Rs16x44000 704 kbps per audio channel
- MP3 uses more powerful compression algorithms
50 kbps per audio channel
34Differential Coding
- Successive samples tend to be correlated
- Use prediction to get better quality for m bits
35Differential PCM
- Quantize the difference between prediction and
actual signal
The end-to-end error is only the error introduced
by the quantizer!
36Voice Codec Standards
A variety of voice codecs have been standardized
for different target bit rates and implementation
complexities. These include G.711 64 kbps
using PCM G.723.1 5-6 kbps using
CELP G.726 16-40 kbps using ADPCM G.728 16
kbps using low delay CELP G.729 8 kbps using
CELP
37Transform Coding
- Quantization noise in PCM is white (flat
spectrum) - At high frequencies, noise power can be higher
than signal power - If coding can produce noise that is shaped so
that signal power is always higher than noise
power, then masking effects in ear results in
better subjective quality - Transform coding maps original signal into a
different domain prior to encoding
38Subband Coding
- Subband coding is a form of transform coding
- Original signal is decomposed into multiple
signals occupying different frequency bands - Each band is PCM or DPCM encoded separately
- Each band allocated bits so that signal power
always higher than noise power in that band
39MP3 Audio Coding
- MP3 is coding for digital audio in MPEG
- Uses subband coding
- Sampling rate 16 to 48 kHz _at_ 16 bits/sample
- Audio signal decomposed into 32 subbands
- Fast Fourier transform used for decomposition
- Bits allocated according to signal power in
subbands - Adjustable compression ratio
- Trade off bitrate vs quality
- 32 kbps to 384 kbps per audio signal
40Chapter 12Multimedia Information
41Image Coding
- Two-dimensional signal
- Variation in Intensity in 2 dimensions
- RGB Color representation
- Raw representation requires very large number of
bits - Linear prediction transform techniques
applicable - Joint Picture Experts Group (JPEG) standard
42Transform Coding
1-D DCT
X(f)
(a)
x(t)
(time)
(frequency)
- Time signal on left side is smooth, that is, it
changes slowly with time - If we take its discrete cosine transform (DCT) we
find that the non-negligible frequency components
are clustered near zero frequency other
components are negligible.
43Image Transform Coding
- Take a block of samples from a smooth image
- If we take two-dimensional DCT, non-negligible
values will cluster near low spatial frequencies
(upper left-hand corner)
44Sample Image in 8x8 blocks
45DCT Coding
In image and video coding, the picture array is
divided into 8x8 pixel blocks which are coded
separately.
- Quantized DCT coefficients are scanned in zigzag
fashion - Resulting sequence is run-length and
variable-length (Huffman) coded
46JPEG Image Coding Standard
- JPEG defines
- Several coding modes for different applications
- Quantization matrices for DCT coefficients
- Huffman VLC coding tables
- Baseline DCT/VLC coding gives 51 to 301
compression
47Low (23.5 kb) High (64.8 kb)
Look for jaggedness along boundaries
48Video Signal
- Sequence of picture frames
- Each picture digitized compressed
- Frame repetition rate
- 10-30-60 frames/second depending on quality
- Frame resolution
- Small frames for videoconferencing
- Standard frames for conventional broadcast TV
- HDTV frames
Rate M bits/pixel x (WxH) pixels/frame x F
frames/second
49Luminance signal (black white)
Chrominance signals
50Color Representation
- RGB (Red, Green, Blue)
- Each RGB component has the same bandwidth and
Dynamic Range - YUV
- Commonly used to mean YCbCr, where Y represents
the intensity and Cr and Cb represent chrominance
information - Derived From "Color Difference" Video Signals Y,
RY, BY - Y 0.299R 0.587G 0.114B
- Sampling Ratio of YCrCb
- Y is typically sampled more finely than Cr Cb
- 444, 422, 420, 411
51(No Transcript)
52Typical Video formats
- CIF Common Interchange Format
- 352x288 pixels, 30 frames/second, sampling rate
420 - SIF Simple Input Format
- 360x242 pixels, 30 frames/second, sampling rate
420 - 360x288 pixels, 25 frames/second, sampling rate
420 - CCIR-601 (ITU-601)
- 720x525 pixels, 30 frames/second, sampling rate
444 422 - 720x625 pixels, 25 frames/second, sampling rate
444 422
53Video Compression Techniques
- Intraframe coding compression of single image,
e.g. JPEG - Interframe coding compression of difference
between current image block reference block in
another frame - Requires motion compensation
- Prediction reference frame is in past
- Interpolation reference frames are in past
future
54Motion Compensation
- Motion Vector - Error Block - Intra Block
- Find block from previous frame that best matches
current block transmit displacement vector - Encode difference between current previous block
55H.261 Encoder
- Intended for videoconferencing applications
- Bit rates p x 64 kbps, p 2, 6, 24 common
56Video Codecs H.263
- Frame-based coding
- Low Bit rate Coding
- lt 64 Kbps (typical)
- H.261 coding with improvements
- I/P/B frames
- Additional Image formats 4CIF, 16CIF
- Suitable for desktop video conferencing over
low-speed links
57MPEG Coding Standard
- Motion Picture Expert Group (MPEG)
- Video and audio compression multiplexing
- Video display controls
- Fast forward, reverse, random access
- Elements of encoding
- Intra- and inter-frame coding using DCT
- Bidirectional motion compensation
- Group of Picture structure
- Scalability options
- MPEG only standardizes the decoder
58MPEG Video Block Diagram
DCT Discrete Cosine Transform FS Frame
Store MC Motion Compensation
VB Variable Buffer VLC Variable length
coding VLD Variable length decoding
59MPEG Motion Compensation
1-D examples
x
x
Quantize individual samples
Interpolation
Linear prediction
Fn1
Fn1
Bidirectional MC
Fn
Fn-1
Fn
Fn-1
- Intra - Forward - Backward - Bidirectional
60Group of Picture Structure
- I-frames for random access
- intraframe coded lowest compression
- P-frames predictive encoded
- most recent I- or P- frame, medium compression
- B-frames interpolation
- most recent subsequent I- or P-frame, highest
compression
61MPEG2 Scalability Modes
- Scalability modes
- Data Partitioning
- Separate headers and payloads apart
- SNR (Signal-to-Noise Ratio)
- Different levels of quality
- Temporal
- Different frame rates
- Spatial
- Different resolutions
- Limited scalability capabilities
- Three layers only
62MPEG Scalability
63MPEG Versions
- MPEG-1
- For video storage in CD-ROM transmission over
T-1 lines (1.5 Mbps) - MPEG-2
- Many options 352x240 pixel 720x480 pixel
1440x1152 pixel 1920x1080 pixel - Many profiles (set of coding tools parameters)
- Main Profile
- I, P B frames 720x480 conventional TV
- Very good quality _at_ 4-6 Mbps
- MPEG-4
- lt64 kpbs to 4 Mbps
- Designed to enable viewing, access manipulation
of objects, not only pixels - For digital TV, streaming video, mobile
multimedia games
64MPEG Systems and Multiplex
- Provides packetization and multiplexing for
audio/video elementary streams - Provides timing and error control information
- MPEG1 systems
- System Streams, long variable size packets,
suitable for error-free environments - MPEG2 systems
- Transport Streams, short fixed size packets,
suitable for error-prone environments - Program Streams, long variable size packets,
suitable for relatively error-free environments
65MPEG-2 Multiplexing
(for error-free environment)
(for error-prone environment)
- Packetized Elementary Streams (PES)
- Packet length, presentation decoding
timestamps bit rate - For lip-synch, clock recovery
66Digital Video Summary