Title: Multi-Media Computing
1Multi-Media Computing
2Expected outcome after this lecture
- After this lecture, students are expected to be
able to - Explain how audio/video are stored digitally
- Perform calculation on the storage requirement
for un-com-pressed audio/video, as well as
bitrate and compression ratio - Explain the differences and respective advantages
of lossy and lossless compression - Name commonly used media formats for audio, image
video - Suggest commonly used audio/video configuration
(resolution, color depth, sampling frequencyetc)
3Multimedia components
- Typical computer systems represent information by
means of text and graphics. - Multimedia Apart from text and graphics, also
consists of - Audio
- Still image
- Video sequence
- Computer Animation (e.g. 3D)
4Audio - How can audio be store in computer?
- Audio is originally an analog waveform signal
- Analog data cannot be stored in computer
directly. - Two steps to change analog signal to digital
- Sampling
- Quantization
5Sampling
- Measure the magnitude of audio signal multiple
times per second, with a fixed interval - Audio signal now have a discrete timing and is
calledPulse-Amplitude-Modulation (PAM)
6Sampling How many times per second?
- The optimal sampling rate depends on the
characteristic of the audio signal. - Too small Bad psychoacoustic perception
- Too large Unnecessary waste of storage space
7Sampling How many times per second?
- In order to store all the hearable sound,
sampling frequency should be 44100Hz, which
exactly is the sampling rate of Audio CD - Some very high quality (studio quality) audio
uses a sampling rate of 48000Hz 96000Hz - For voice/speech signal, a smaller sampling rate
is good enough (e.g. telephone uses 8000Hz)
8Second step Quantization
- Sampling is the 1st step for analog -gt digital
conversion - The second step is quantization which maps the
magnitude of PAM signal to a nearest valid
number - (e.g. 36.45123 to 36, -124.5678 to -125)
- Finally, represent the digital signal in binary
form. which is called Pulse-Code-Modulation
(PCM)
9Audio samples
- How many bits are used to store a single sample?
- CD uses 16 bits. (65536 steps, high quality)
- Lower quality music uses 8 bits (256 steps)per
sample - Speech samples may use as low as 6 bits
- Number of channels
- Audio CD is stereo (2 channels Left, Right)
- Telephone is mono (1 channel)
10Audio samples
- Some high definition audio uses 5.1 channels
- Left, Right, Center, Left Surround, Right
Surround - And a Base enhancement channel. (LFE)
- Newer sound cards even supports 7.1 channels
- 5.1 Center Rear Top
11How many storage space is required?
- Consider a 3-minute stereo music signal with 16
bits per sample and a sampling rate of 44100Hz. - 16 bits 2 bytes per sample
- 1 second uses 44100 x 2 channels x 2 bytes
176400 bytes - 3 minutes 176400 x 60 x 3 31,752,000 bytes
31.75 MB - (Note MB is Mega Byte, Mb is Mega bit)
12Audio Compression
- Uncompressed audio data take up large amount of
space. (3-minute music 32MB!) - Compression is used to reduce the amount of space
required (especially for hand-held devices) - Compression ratio
- Ratio between the original size and compressed
size - e.g. 32MB PCM signal compressed to 8MB
- Compression Ratio 32 8 4 1
13Lossless audio compression
- Audio compression can be lossless or lossy
- Lossless The decompressed audio signal is 100
identical to the original one - Advantage
- Best audio quality
- Disadvantage
- The saving (percentage) of file size is not
large. - Typical compression ratio is about 21 to 31
14Lossy compression
- The decompressed audio signal is not identical to
the original one - But the differences are almost undetectable by
human ear. (Psychoacoustic perception are almost
the same) - Higher compression ratio (usually 101 to 361)
- Allow specifying the final compressed file size
at the cost of audio quality.
15Bitrate
- For compressed audio / video, it is usually
represented by bitrate rather than compression
ratio - Bitrate The number of bits per 1 second
audio/video - e.g. For stereo PCM data (44100Hz, 16 bit)
- Bitrate 44100 x 2 channel x 16 bit
1411200 bps (bits per second)
1411kbps or 1.4Mbps - If the audio signal is compressed to mp3 of
128kbpsCompression ratio 1411 128 11.0 1
16Basics for lossy audio compression
- Audio data is basically waveform.
- The audio data is said to be in Time Domain
- The basic idea of audio compression is to change
audio from time domain to frequency domain
through mathematical manipulation
time
0 ms
1 ms
2 ms
frequency
1000 Hz
17Mathematical transform
- The mathematical operation which changes signal
from time domain to frequency domain is called
Transform - Transform guarantees one-to-one mapping and is
usually reversible - After transform, the audio signal could be
represented in a more compact manner. - Two most common transforms
- DCT (Discrete Cosine Transform)
- DWT (Discrete Wavelet Transform)
18Commonly used audio format
- MPEG I/II (for high fidelity audio)
- Layer I .mp1 (for satellite broadcast)
- Layer II .mp2 (audio in VCD)
- Layer III .mp3 (audio transfer / sharing in
internet) - Real Audio (.ra) (for high fidelity audio
speech) - For streaming audio in internet
- Windows Media Audio (.wma)
- Ogg Vorbis (.ogg) Open Source Format
- Monkeys Audio (.ape) Lossless audio compression
19Still Images
- Image is represented as a collection of color
dots (pixel picture element) - Pixel depth Number of bits to represent a pixel
- From 1 bit (Mono) up to 24 bits (true color)
- For 24-bit images, a pixel is divided into 3
channels - 8 bit each for Red / Green /Blue
- But for image compression, it is a common
practice to convert image from RGB to YUV or YCrCb
20Lossless Image compression
- Similar to Audio compression, Image compression
can be lossy or lossless. - Lossless formats
- .BMP (Windows Bitmap)
- .PCX (ZSoft PC Paintbrush)
- .GIF (Support only up to 256 colors)
- Support transparency animation
- Extensively used in internet
- .PNG (Portable Network Graphics)
- As a replacement of GIF
- Can also be lossy
21Lossy Image compression
- Similar psychovisual perception at very high
compression ratio. - Common lossy formats
- JPEG (Joint Photographic Experts Group)
- JPEG2000 (The new, better JPEG standard)
- Similar to audio compression, also make use of
mathematical transform to change to frequency
domain - Audio Audio Sample (Time Domain) -gt Frequency
Domain - Image Pixel (Spatial Domain) -gt Frequency Domain
- The difference is that the transform is
two-dimensional
22Image compression Mathematical Transform
(2D)(for reference)
- Similar to audio compression, spatial domain
pixels are converted to frequency domain by
either - DCT (Discrete Cosine Transform) (e.g. JPEG)
- DWT (Discrete Wavelet Transform) (e.g. JPEG2000)
- The reason for transform is that information can
be coded in a more compact manner after
transform
23Video Compression
- Video is just a sequence of images (frames)
showing one after another very quickly. - Video compression is similar to repeating image
compression multiple times - One major difference
- There are similarities between frames
- For example, if the video is showing a car
running along a road in countryside - The background (Sky, Mountain, Sun) unchange
- The car in later frames is the same as the car in
previous frames, the only difference is the
position.
24Video Compression
- Better compression efficiency can be achieved if
we copy the car from one frame to another with
its position adjusted. - The process of searching the new position (e.g.
of the car) by comparing old new frames is
called Motion-Estimation - Motion Estimation is the major difference between
video and still image compression. - Instead of coding the original image, one only
need to code the prediction error (called
residue)
25Typical video specification
- The two most common spec
- PAL (Phrase Alternating Line)
- Frame resolution VCD (352x288), DVD (704x576)
pixels - 25 frames per second
- NTSC (National Television System Committee)
- Frame resolution VCD (352x240), DVD (704x480)
pixels - 30 frames per second
- For VCD/DVD, video typically use 24-bit color
depth - HDTV provides even higher resolution
- 720p (1280x720)
- 1080p (1920x1080) (so-called Full
High-Definition) - Ultra-HDTV (7680x4320) (experimental, begin in
2015)
26How many storage space required for 30 seconds of
uncompressed clip (no sound)?
- Assume PAL format (352 x 288 x 25 frames)
- 1 pixel 24 bits 24/8 3 bytes
- 1 frame 352 x 288 x 3 bytes 304,128 bytes
- 1 second 304,128 x 25 7,603,200 bytes
- 30 seconds 7,603,200 x 30 228,096,000
bytes 228.1MB
27Bitrate Compression ratio
- Similar to Audio, video is always described as
bit rate - For example, bitrate of VCD standard is 1.5Mbps
- Continue from the previous page,
- given that the audio of the 30-sec clip uses
5,760,000 bytes. What is the compression ratio if
the clip is compressed as VCD format?
28Compression ratio?
- Video 228,096,000 bytesAudio 5,760,000 bytes
- Total 233,856,000 bytes 233.856MB
1870.848Mb - VCD 1.5Mbps
- 30 sec 1.5 x 30 45Mb
- Compression ratio 1870.848 45
41.57 1.0
29Common video formats
- .MPG / .MPEG / .MPV (MPEG Video)
- .RM / .RMVB (Real Media)
- .MOV (Quicktime Movie)
- .WMV / .ASF (Windows Media)
- .AVI (Audio Video Interleave)
- .MKV (Matroska Video)
- .FLV (Flash Video)
30References
- http//www.cs.ucf.edu/moshell/CAP4020/lecture28.h
tml - http//www.libpng.org/pub/png/pngintro.html
- http//www.jpeg.org/
- http//www.jpeg.org/jpeg2000/index.html
- http//www.mpeg.org