Audio, Images, and Video - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Audio, Images, and Video

Description:

Fewer colors: Use 8 bits to encode a color palette of 256 carefully chosen colors ... With careful selection of the colors and the densities of each, it is possible ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 29
Provided by: seasHa
Category:

less

Transcript and Presenter's Notes

Title: Audio, Images, and Video


1
Audio, Images, and Video
  • Efficient representation of media
  • Larry
    Denenberg

2
Epitaph
John Backus, inventor of FORTRAN
19242007
3
Epigraph
  • Prof. Denenberg showed early promise as a
    riveting lecturer who had done a lot of thinking
    about higher education and would be a great
    lecturer. Apparently, however, Prof. Denenberg
    failed to realize that the CS121 style of
    teaching -- read off your own notes that you
    simultaneously put up on the overhead projector
    and hand out to the students -- is the absolute
    hands-down BEST way to reduce students to
    stupefied, uninterested, uncaring blobs, as well
    as the world's least effective way of encouraging
    students to come to class.

4
The Problem
  • People want to listen to music, look at pictures,
    and watch videos
  • Straightforward high-quality representation of
    audio/images/video uses LOTS of space
  • Cellphones, iPods, etc., have limited space
  • Networks have limited bandwidth
  • Hence We need to represent efficiently
  • (Small slice of an immense topic)

5
Review Lossy vs. Lossless Compression
  • Lossless compression Original can be retrieved
    perfectly, bit for bit
  • Lossy compression Information is lost original
    cannot be recovered precisely
  • When is loss acceptable?
  • Parametrizable loss How bad can it be?
  • Rest of this talk Lossy compression
  • Info on lossless compression LD ch 5

6
TransformsWhat is a transform?
  • A transform is a way of changing the
    representation of data, typically losslessly
  • We use transforms to put data into a more
    convenient form
  • For example, a transform might rearrange data to
    group together "useful" components
  • Example The Fourier transform, which decomposes
    a signal into sine waves of various
    frequencies---some not so useful

7
Audio
  • Gold standard is CD quality 2-channel 16-bit
    PCM encoding at 44.1 KHz sampling
  • Multiply it out to get 10.56 MB per minute
  • Recall that even this is an approximation!
  • Digression on CDs Where is the music?
  • Key to (lossy) compression Limitations of human
    hearing
  • E.g., the ear hears nothing above 21KHz

8
The mp3 strategy
  • Transform to frequency domain filter music into
    twelve bands chosen based on frequency response
    of human ear
  • Use psychoacoustic modelling to determine how
    much each band contributes to sound perception
  • Allocate bits to bands according to importance,
    i.e., encode important bands more accurately
    (more bits) than others

9
Filter input into frequency bands
  • Recall that any waveform can be written as a
    combination of sine waves (pure tones) of various
    frequencies and amplitudes
  • We can filter the input music into components,
    each of which is made up of a narrow range
    (band) of frequencies
  • Each band is encoded separately they are
    combined when music is played
  • Digression mp3 standard specifies only
    decoding, not encoding

10
Psychoacoustic modelling
  • When two tones are close in frequency, the louder
    masks the softer that is, the softer is much
    less audible than a tone of the same volume but
    highly different frequency
  • A pure tone is not as effective at masking as is
    noise, and is masked more easily
  • Soft sounds just before or just after loud sounds
    are masked
  • All these factors are considered to calculate the
    signal-to-mask ratio for each banda higher SMR
    means a more important band

11
Bit allocation
  • With mp3, the bit rate is fixed in advance (e.g.
    128 Kbps, 192 Kbps), where lower bit rate more
    compression lower quality
  • So we know a priori how many bits we can use
    (contrast with, e.g., Huffman encoding)
  • We use the results of psychoacoustic modelling to
    apportion these bits among the frequency bands.
    More bits more accurate coding more faithful
    reproduction
  • VBR encoding tries to improve this

12
Images
  • RGB representation describe each color with 24
    bits, 8 each of red, green, and blue
  • Write a color as 6 hex digits 0xRRGGBB
  • An image is represented by pixels (picture
    elements) each encodes a tiny colored dot
  • So a 4-megapixel image (2272 x 1704) takes about
    12 MB of storage
  • Recall that even this is an approximation!

13
(No Transcript)
14
Simple compression techniques
  • Reduce resolution by subsampling
  • Fewer bits per color Use 5 bits for R B, 6
    bits for G, to save 1/3 of the space(our eyes
    perceive G better than R or B)
  • Fewer colors Use 8 bits to encode a color
    palette of 256 carefully chosen colors
  • Make a table of 256 colors, image-specific
  • Encode each color as an index into this table
  • GIF and PNG format images work this way

15
Dithering
  • A region colored with sufficiently small pixels
    of two different colors is perceived by the eye
    as uniformly colored
  • With careful selection of the colors and the
    densities of each, it is possible to imitate
    colors not in the palette
  • Black and white dithering to get shades of grey
    is called halftoning

16
Dithering examples (from Wikipedia)
17
JPEG step 1 Chroma subsampling
  • Transform color space from RGB to YCbCr
  • Y brightness, a measure of black vs white
  • Cb blue chroma, a measure of blueness
  • Cr red chroma, a measure of you-guess-what
  • The eye is more sensitive to brightness than to
    color, so we reduce the resolution of Cb and Cr
    while maintaining the resolution of Y
  • JPEG subsampling is 420, meaning half as many
    Cb/Cr samples in each dimension

18
JPEG step 2 Brightness transform
  • Differences in brightness are easy to see over
    large regions, but rapidly varying brightness
    over small areas is imperceptible
  • Therefore Transform the brightness information
    to separate variations at "high frequency" from
    those at "low frequency"
  • This is analogous to separating music into
    frequency components and throwing away
    frequencies too high for the ear to hear
  • Resulting data is losslessly encoded

19
JPEG examples (Wikipedia again)
83K 15K 5K 1.5K
20
Video
  • A video stream is a sequence of frames
  • HDTV is 1920 pixels horizontally by 1080 lines
    vertically at 30 frames per second
  • At 24 bits/pixel, this is over 11 GB/minute and
    that doesnt even include the audio!
  • Recall that even this is an approximation!
  • We can cut this to about 30 MB/minute including
    audio and other data

21
Key to (lossy) compressionFrames are similar
  • Often only a small portion of the screen changes
    during 1/30 of a second
  • So In each frame, encode only pieces that have
    changed since the preceding frame

22
MPEG ImprovementHandling motion efficiently
  • Encode frames in 16x16 blocks of pixels
  • Each block is either
  • explicitly specified
  • unchanged from the preceding frame
  • set equal to some block somewhere in a preceding
    frame (i.e., with a motion vector)
  • An error correction can also be specified when a
    new block is similar to an old one
  • Quickly finding the best encoding of a block is
    the hard problem of MPEG encoding

23
Example
  • All we need encode in the second frame is that
    the square has moved and that its old location is
    taken from some other blank area the error
    correction term can even handle the rotation

24
How frames are put into video files
  • First idea Encode the first frame explicitly,
    encode the second based on the first, the third
    based on the second, etc. indefinitely
  • Problem With this scheme, all playback must
    start at the beginning of the file!
  • So we periodically insert I-frames, frames fully
    specified without reference to any other frame
    (an I-frame is essentially a JPEG)
  • Between the I-frames are the P-frames, which are
    encoded w/r/t the preceding frame

25
An Infelicity
  • Theres no good way to encode the new frame from
    the old one. But if the square continues moving
    to the right, its easy to encode the new frame
    based on the upcoming frame.

26
Next improvementTo know the present, use the
future
  • Some situations are not handled well by this
    encoding, e.g. when objects appear or are
    partially obscured or change background
  • We cope by introducing B-frames frames each of
    whose blocks can be specified using either the
    preceding or upcoming frame
  • Preceding and upcoming frames refer only to
    I- or P-frames B-frames are not specified w/r/t
    other B-frames
  • (Newer standards are even more general)

27
Frame sequencing
  • Consider the following (typical) frame sequence
    I1 B1 B2 P1 B3 B4 P2 B5 B6 I2 . . .
  • Here
  • I1 and I2 are fully specified
  • P1 depends on I1 P2 depends on P1
  • B1 and B2 depend on I1 and P1
  • B3 and B4 depend on P1 and P2
  • B5 and B6 depend on P2 and I2

28
Frame sequencing, cont.
  • I1 B1 B2 P1 B3 B4 P2 B5 B6 I2 . .
    .
  • When it's time to display B1, we've seen only I1.
    But B1 also may depend on P1!
  • To cope, the frames are arranged in a different
    order in the actual video file I1 P1 B1
    B2 P2 B3 B4 I2 B5 B6 . . .
  • Each frame follows all frames it depends on
Write a Comment
User Comments (0)
About PowerShow.com