Audio, Images, and Video - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Audio, Images, and Video

Description:

Fewer colors: Use 8 bits to encode a color palette of 256 carefully chosen colors ... With careful selection of the colors and the densities of each, it is possible ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 29

Provided by: seasHa

Category:

more less

Transcript and Presenter's Notes

Title: Audio, Images, and Video

1
Audio, Images, and Video

Efficient representation of media
Larry
Denenberg

2
Epitaph
John Backus, inventor of FORTRAN
19242007
3
Epigraph

Prof. Denenberg showed early promise as a
riveting lecturer who had done a lot of thinking
about higher education and would be a great
lecturer. Apparently, however, Prof. Denenberg
failed to realize that the CS121 style of
teaching -- read off your own notes that you
simultaneously put up on the overhead projector
and hand out to the students -- is the absolute
hands-down BEST way to reduce students to
stupefied, uninterested, uncaring blobs, as well
as the world's least effective way of encouraging
students to come to class.

4
The Problem

People want to listen to music, look at pictures,
and watch videos
Straightforward high-quality representation of
audio/images/video uses LOTS of space
Cellphones, iPods, etc., have limited space
Networks have limited bandwidth
Hence We need to represent efficiently
(Small slice of an immense topic)

5
Review Lossy vs. Lossless Compression

Lossless compression Original can be retrieved
perfectly, bit for bit
Lossy compression Information is lost original
cannot be recovered precisely
When is loss acceptable?
Parametrizable loss How bad can it be?
Rest of this talk Lossy compression
Info on lossless compression LD ch 5

6
TransformsWhat is a transform?

A transform is a way of changing the
representation of data, typically losslessly
We use transforms to put data into a more
convenient form
For example, a transform might rearrange data to
group together "useful" components
Example The Fourier transform, which decomposes
a signal into sine waves of various
frequencies---some not so useful

7
Audio

Gold standard is CD quality 2-channel 16-bit
PCM encoding at 44.1 KHz sampling
Multiply it out to get 10.56 MB per minute
Recall that even this is an approximation!
Digression on CDs Where is the music?
Key to (lossy) compression Limitations of human
hearing
E.g., the ear hears nothing above 21KHz

8
The mp3 strategy

Transform to frequency domain filter music into
twelve bands chosen based on frequency response
of human ear
Use psychoacoustic modelling to determine how
much each band contributes to sound perception
Allocate bits to bands according to importance,
i.e., encode important bands more accurately
(more bits) than others

9
Filter input into frequency bands

Recall that any waveform can be written as a
combination of sine waves (pure tones) of various
frequencies and amplitudes
We can filter the input music into components,
each of which is made up of a narrow range
(band) of frequencies
Each band is encoded separately they are
combined when music is played
Digression mp3 standard specifies only
decoding, not encoding

10
Psychoacoustic modelling

When two tones are close in frequency, the louder
masks the softer that is, the softer is much
less audible than a tone of the same volume but
highly different frequency
A pure tone is not as effective at masking as is
noise, and is masked more easily
Soft sounds just before or just after loud sounds
are masked
All these factors are considered to calculate the
signal-to-mask ratio for each banda higher SMR
means a more important band

11
Bit allocation

With mp3, the bit rate is fixed in advance (e.g.
128 Kbps, 192 Kbps), where lower bit rate more
compression lower quality
So we know a priori how many bits we can use
(contrast with, e.g., Huffman encoding)
We use the results of psychoacoustic modelling to
apportion these bits among the frequency bands.
More bits more accurate coding more faithful
reproduction
VBR encoding tries to improve this

12
Images

RGB representation describe each color with 24
bits, 8 each of red, green, and blue
Write a color as 6 hex digits 0xRRGGBB
An image is represented by pixels (picture
elements) each encodes a tiny colored dot
So a 4-megapixel image (2272 x 1704) takes about
12 MB of storage
Recall that even this is an approximation!

13
(No Transcript)
14
Simple compression techniques

Reduce resolution by subsampling
Fewer bits per color Use 5 bits for R B, 6
bits for G, to save 1/3 of the space(our eyes
perceive G better than R or B)
Fewer colors Use 8 bits to encode a color
palette of 256 carefully chosen colors
Make a table of 256 colors, image-specific
Encode each color as an index into this table
GIF and PNG format images work this way

15
Dithering

A region colored with sufficiently small pixels
of two different colors is perceived by the eye
as uniformly colored
With careful selection of the colors and the
densities of each, it is possible to imitate
colors not in the palette
Black and white dithering to get shades of grey
is called halftoning

16
Dithering examples (from Wikipedia)
17
JPEG step 1 Chroma subsampling

Transform color space from RGB to YCbCr
Y brightness, a measure of black vs white
Cb blue chroma, a measure of blueness
Cr red chroma, a measure of you-guess-what
The eye is more sensitive to brightness than to
color, so we reduce the resolution of Cb and Cr
while maintaining the resolution of Y
JPEG subsampling is 420, meaning half as many
Cb/Cr samples in each dimension

18
JPEG step 2 Brightness transform

Differences in brightness are easy to see over
large regions, but rapidly varying brightness
over small areas is imperceptible
Therefore Transform the brightness information
to separate variations at "high frequency" from
those at "low frequency"
This is analogous to separating music into
frequency components and throwing away
frequencies too high for the ear to hear
Resulting data is losslessly encoded

19
JPEG examples (Wikipedia again)
83K 15K 5K 1.5K
20
Video

A video stream is a sequence of frames
HDTV is 1920 pixels horizontally by 1080 lines
vertically at 30 frames per second
At 24 bits/pixel, this is over 11 GB/minute and
that doesnt even include the audio!
Recall that even this is an approximation!
We can cut this to about 30 MB/minute including
audio and other data

21
Key to (lossy) compressionFrames are similar

Often only a small portion of the screen changes
during 1/30 of a second
So In each frame, encode only pieces that have
changed since the preceding frame

22
MPEG ImprovementHandling motion efficiently

Encode frames in 16x16 blocks of pixels
Each block is either
explicitly specified
unchanged from the preceding frame
set equal to some block somewhere in a preceding
frame (i.e., with a motion vector)
An error correction can also be specified when a
new block is similar to an old one
Quickly finding the best encoding of a block is
the hard problem of MPEG encoding

23
Example

All we need encode in the second frame is that
the square has moved and that its old location is
taken from some other blank area the error
correction term can even handle the rotation

24
How frames are put into video files

First idea Encode the first frame explicitly,
encode the second based on the first, the third
based on the second, etc. indefinitely
Problem With this scheme, all playback must
start at the beginning of the file!
So we periodically insert I-frames, frames fully
specified without reference to any other frame
(an I-frame is essentially a JPEG)
Between the I-frames are the P-frames, which are
encoded w/r/t the preceding frame

25
An Infelicity

Theres no good way to encode the new frame from
the old one. But if the square continues moving
to the right, its easy to encode the new frame
based on the upcoming frame.

26
Next improvementTo know the present, use the
future

Some situations are not handled well by this
encoding, e.g. when objects appear or are
partially obscured or change background
We cope by introducing B-frames frames each of
whose blocks can be specified using either the
preceding or upcoming frame
Preceding and upcoming frames refer only to
I- or P-frames B-frames are not specified w/r/t
other B-frames
(Newer standards are even more general)

27
Frame sequencing

Consider the following (typical) frame sequence
I1 B1 B2 P1 B3 B4 P2 B5 B6 I2 . . .
Here
I1 and I2 are fully specified
P1 depends on I1 P2 depends on P1
B1 and B2 depend on I1 and P1
B3 and B4 depend on P1 and P2
B5 and B6 depend on P2 and I2

28
Frame sequencing, cont.

I1 B1 B2 P1 B3 B4 P2 B5 B6 I2 . .
.
When it's time to display B1, we've seen only I1.
But B1 also may depend on P1!
To cope, the frames are arranged in a different
order in the actual video file I1 P1 B1
B2 P2 B3 B4 I2 B5 B6 . . .
Each frame follows all frames it depends on

Write a Comment

User Comments (0)