Image and Video Compression A presentation to Avocent - PowerPoint PPT Presentation

1 / 92
About This Presentation
Title:

Image and Video Compression A presentation to Avocent

Description:

Image and Video Compression A presentation to Avocent Noel O Connor, Andrew Kinane, Daniel Larkin ... Generic Codec Structure Discrete Cosine Transform (DCT) Why DCT? – PowerPoint PPT presentation

Number of Views:416
Avg rating:3.0/5.0
Slides: 93
Provided by: elmEengD2
Category:

less

Transcript and Presenter's Notes

Title: Image and Video Compression A presentation to Avocent


1
Image and Video CompressionA presentation to
Avocent
  • Noel OConnor, Andrew Kinane, Daniel Larkin
  • 19/09/2006

2
Overview
  • Lossless Compression
  • Entropy coding a brief review
  • Huffman Coding
  • Arithmetic Coding
  • Lossless Compression Standards
  • The FAX Group Standards, JBIG, Lossless JPEG
  • Lossy Compression
  • Generic Codec Structure
  • DCT/IDCT
  • Quantization
  • Motion Estimation
  • Motion Compensation
  • Lossy Compression Standards
  • JPEG, JPEG2000, H.261 / H.263 / H.264,
    MPEG-1/-2/-4
  • Image Analysis Techniques
  • Visual Feature Extraction

3
Lossless CompressionEntropy Coding
4
Entropy Coding
  • Also referred to as source coding
  • Assign each symbol a binary codeword
  • Allocate a specific string of bits to a symbol
  • Based on information theory
  • S s1 sN is set of symbols to encode with
    probabilities p1 pN
  • Entropy H(s) is measure of the information
    content
  • Specifies lower bound on efficiency

5
Huffman Coding
  • A form of Variable Length Coding
  • Assign shorter code-words to symbols most likely
    to occur, longer to those less likely
  • Problem must choose code-words carefully!
  • Must obey prefix condition so decoder can parse
    bitstream

Sequence s1, s4, s3, s2 Bitstream 1 0 1 0
0 1 1 0 1 Decoder
s1
s4
s3
s2
s1
s2 or s4?
6
Huffman Coding
  • Ensures instantaneously parseable code-words
  • 100 efficient when p1 pN are negative
    exponents of 2 (0.5, 0.25, etc )
  • Algorithm generate Huffman coding tree
  • Form the tree
  • Sort the symbols by their probabilities
  • Merge the two smallest probabilities by adding
    them and produce a new node in the tree
  • Repeat until only a singe node is reached
  • Assign bits
  • Traverse the tree from the root to the leaf nodes
    assigning each branch encountered a one or zero.
  • Decoding based on storing codewords in specially
    constructed LUT

7
Huffman Coding
  • Generate code-words for each grey level
  • S s1 s2 s3 s4 s5 0,4,5,6,7
  • p1 p2 p3 p4 p5 0.125, 0.484, 0.25, 0.125, 0.016

8
Huffman Coding
  • Generate code-words for each grey level
  • S s1 s2 s3 s4 s5 0,4,5,6,7
  • p1 p2 p3 p4 p5 0.125, 0.484, 0.25, 0.125, 0.016

9
Huffman Coding
  • Efficiency
  • Calculate Average Coding Rate
  • Symbol probability (pi) x code-word length (li)
  • Compare to entropy

H(s)
R
10
Huffman Coding
  • Problems
  • Lower bound of 1 bit/symbol
  • Does not facilitate adaptive coding
  • Example

11
Arithmetic Coding
  • Treat groups of symbols but maintain a
    symbol-by-symbol encoding mechanism
  • Assign a single codeword to a group of symbols
  • Codeword represents a half-open interval on 0.0,
    1.0)
  • By assigning enough precision bits, one interval
    can be distinguished from another
  • Symbols with higher probabilities correspond to
    larger intervals, thereby requiring less
    precision bits

12
Arithmetic Coding
  • Sa,b p1 p2 1/3, 2/3
  • First symbol narrows interval to that symbols
    range
  • Subsequent symbols further restrict the current
    interval.
  • Decoding reverses this
  • Receives number in 0.0, 1.0)
  • Checks which symbols range contains this
    decode symbol
  • Since lower upper bounds of symbol known,
    their effects on the encoded number can be
    reversed
  • Gives, a new number
  • REPEAT

13
Arithmetic Coding
  • Incremental transmission
  • Example message BILLltspacegtGATES

2
25
257
2572
257216
2572167
14
Arithmetic Coding
  • Can be performed very efficiently using 16/32 bit
    integer mathematics
  • Bits are transmitted as they become available
  • Simplification use the value 0.999 rather than
    1.0
  • In binary arithmetic this corresponds to 0.111
  • Only use fractional part gt only need integers
  • High initially stores 0xFFFF, whilst Low stores
    0x0000
  • For each symbol encoded, examine most significant
    bit of both High and Low
  • If these bits are the same, output bit

15
Lossless CompressionStandards
16
ITU-T Facsimile
  • ITU-T Rec. T4 (Group 3)
  • Targets scanned business documents
  • Binary images white (1), black (0)
  • Two modes
  • Modified Huffman (MH)
  • Run-length encoding is used to form runs of 1s
    and 0s for each line in the image
  • Huffman coding applied to these (run,symbol)
    pairs
  • Different Huffman codes for runs of 1s and 0s
  • A special end-of-line (EOL) symbol is encoded for
    error detection purposes.
  • Modified Read (MR)
  • Pixel values from the previous line used as
    predictors for current pixels to be encoded
  • Prediction residual is then encoded using Huffman
    coding.
  • MR mode is periodically interspersed with MH
    mode.

17
JBIG
  • Joint Binary Image Experts Group (JBIG) developed
    jointly by ITU-T and ISO
  • Targets bi-level images
  • may be either business documents or grey-scale
    images of natural scenes rendered as bi-level
    images.
  • Uses adaptive arithmetic encoding
  • Modeling step estimates probability of next
    symbol based on a context consisting of local
    pixels
  • Probability is then used to drive the arithmetic
    encoder
  • JBIG can be applied to grey-scale images by
    treating each grey-level image plane as a
    bi-level image.

18
Lossless JPEG
  • Joint Photographic Experts Group (JPEG) has a
    lossless image compression mode.
  • Prediction for pixel to be encoded based on a
    context of previously encoded pixels
  • Different ways for forming the prediction
  • Method used encoded as side-information for each
    scan line.
  • To encode the prediction residual
  • (length, magnitude) pair formed
  • length indicates the number of bits used to
    encode the magnitude
  • A static Huffman code is used.
  • magnitude is the actual residual value directly
    encoded.

19
Lossless JPEG
  • p 190
  • p1 184, p2 176
  • P 180
  • R 180-190 -10
  • Encoded as the event (4,0101)
  • Negative residuals encoded as 1s complement
  • Huffman code for 4 is 001, then this give the
    final codeword 0010101
  • Decoder
  • Calculates the prediction value (180)
  • Parses the Huffman code, which allows decoding of
    the magnitude (0101)
  • Detects a leading zero gt knows the value must be
    negative, so next four bits decoded as -10.
  • Reconstruction pP-R 180-(-10) 190

20
Lossy CompressionGeneric structure of a video
codec
21
Redundancy in Video Sequences
  • Video compression targets 3 kinds of redundancy
  • Spatial the correlation that exists between
    (groups of) pixels
  • Temporal similarity between video frames
  • Perceptual Human Visual System (HVS) is less
    sensitive to high-frequency information.
  • Lossy compression throws information away as part
    of these processes
  • Remaining information is encoded losslessly using
    entropy coding

22
Redundancy in Video Sequences
  • Spatial redundancy
  • Transform data to be encoded into a new
    representation where data is less correlated
  • Leads to a more compact representation.
  • Temporal redundancy
  • Only encode difference between 2 video frames
    (lower entropy)
  • Form prediction of frame to be encoded and encode
    prediction residual
  • Perceptual redundancy
  • Suppress/remove high frequency components
    corresponding to fine image detail.

23
Coding Modes
  • INTRA
  • Encode a frame completely independently (i.e.
    with no reference to previous/future frames)
  • Forms random access point in bitstream, resets
    encoding, limits error propagation
  • Equivalent to having a JPEG-encoded still image
    at periodic intervals in bitstream.

Frame 0
24
Coding Modes
  • INTER
  • Use a previous/future frame (termed reference
    frame) as the basis for a prediction of the
    current frame
  • Could just simply subtract reference frame from
    current frame
  • Or use a more sophisticated prediction method
  • Need to use reconstructed frame as basis for
    prediction so that encoder/decoder stay
    synchronised.

Frame 0
25
Coding Unit
  • Break image/frame up into 16 x 16 macro-blocks
  • For YUV
  • 4 8x8 luminance pixel blocks
  • 2 8x8 chrominance pixel blocks.
  • Coding decisions made on macro-block basis
  • INTRA/INTER coding mode
  • prediction method if INTER
  • Loss introduced.
  • Decisions flagged in bitstream syntax.

26
Generic Codec Structure
27
Discrete Cosine Transform (DCT)
  • Why DCT?
  • What is it?
  • How does it work?
  • How is it computed (in reality)?
  • Adoption and variations
  • What about the DWT?
  • Quantisation

28
Why DCT?
  • Neighbouring pixels are likely to be similar
  • The same is true for prediction residual data
  • Want to exploit this spatial correlation
  • We want a transform that
  • Removes correlation from data
  • Packs signal energy into as few coefficients as
    possible
  • Coefficients suitable for entropy coding

29
Why DCT?
  • Optimal solution
  • Use eigenvectors of the covariance matrix of the
    input pixel data
  • Order based on size of eigenvalue
  • Based on theory of principal component analysis
    (PCA)
  • Referred to as the Karhunen-Loeve Transform (KLT)
    rao90
  • Achieves complete de-correlation
  • Packs most energy into fewest coefficients
  • Minimises MSE for a given number of coefficients
    (Quantisation)
  • Minimises the entropy
  • Disadvantages
  • Very computationally demanding
  • Transform kernel is data dependent
  • Kernel must be sent to decoder also!
  • Not practical in a real compression system
  • Compromise ? The DCT

30
What is the DCT?
  • Treat frame as a grid of 8x8 pixel blocks
  • Pixel data (intra block)
  • Prediction Residual (inter block)
  • Compute 8x8 2D DCT on each block
  • Formula
  • Basis functions derived
  • using Fourier theory

31
What is the DCT?
  • Fouriers theorem and the Nyquist sampling
    criterion mean only certain discrete frequencies
    can be present in an 8x8 block of sampled data.
  • DCT coefficients tell us how much of a
    particular frequency is present in a particular
    block
  • Very crude explanation!
  • Inverse DCT (IDCT) reverses this process
  • Essentially Fourier synthesis

32
How does the DCT work?
  • DCT does not compress anything in isolation!
  • This is achieved by quantiser and entropy coding
  • DCT output easier to compress though
  • Most natural video dominated by low frequencies

33
How does the DCT work?
  • Human eye less sensitive to high frequencies
  • Use a quantiser whose step size depends on
    frequency
  • Effectively discard perceptually unimportant data
  • After quantisation there will be many zero valued
    coeffs
  • Typically only 5 or 6 non-zero valued coeffs
    xanthopoulos99
  • Suitable for run length and entropy coding

34
How does the DCT work?
  • Zig-zag scan
  • Keep statistically related coeffs together
  • Better run-length coding

35
How is the DCT Computed?
  • Most implementations exploit the fact that the 2D
    DCT is separable
  • Compute 1D DCT on each column
  • Compute 1D DCT on each resultant row
  • 16 x 1D 8-point DCTs in total
  • Need efficient implementation of 1D 8-point DCT
  • 30 years of research in this field
  • Basic implementation (64 56)
  • Fast implementation loeffler89 (11 29)
  • Video codec optimised implementation AAN
    arai89 (5 29)
  • Arithmetic precision a vital decision
  • If constraint is 1920x1080 _at_ 30Hz
  • 97200 8x8 blocks per second
  • Need at least (17x106 45x106) per second using
    Loeffler!

36
How is the DCT Computed?
  • Sometimes dedicated hardware needed
  • Performance and/or power reasons
  • Hardware architecture taxonomy

37
Adoption and Variations
  • 8x8 DCT
  • Used in JPEG, H.261, H.263, MPEG-1, MPEG-2,
    MPEG-4 with specific quality requirements
  • Shape Adaptive DCT
  • Used in MPEG-4 Advanced Coding Efficiency (ACE)
    profile
  • Kernel basis functions determined by object shape
  • Integer DCT Approximation
  • Used in H.264
  • Block size of 4x4 and 8x8 depending on mode
  • Avoids the IDCT mismatch problem
  • Less computationally demanding (16bit integer
    arith)
  • More features (can discuss later if necessary)

38
What about the DWT
  • Discrete Wavelet Transform (DWT)
  • Used by JPEG-2000
  • MPEG-4 uses SA-DWT (for static shape textures)
  • Why? ? Better than Fourier analysis for
    non-stationary data
  • Inherently scalable
  • Involves successive LPF and HPF of data and
    subsampling
  • More efficient at very low bit rates
  • DCT and coarse Q ? Blocking artefacts
  • DWT and coarse Q ? Blurring/smearing (much less
    perceptible)
  • More computationally demanding than DCT

39
What is Quantisation?
  • A lossy process
  • Get rid of information
  • Gives compression gain
  • Try to minimise distortion
  • Try to reduce entropy
  • Two primary types
  • Scalar quantiser (one to one)
  • Vector quantiser (many to one)

40
Scalar Quantiser
  • Need to find optimal values for
  • Decision levels di
  • Reconstruction levels ri
  • Difficult in general!

41
Scalar Quantiser
  • Aim to mimimise distortion
  • Minimise MSE ? Lloyd-Max quantiser
  • A good quantiser design depends on probability
    distribution of the input data
  • Want less error for more probable inputs
  • Case 1 Uniform distribution
  • Decision bands all same width
  • Reconstruction levels equally spaced
  • Referred to as a linear quantiser
  • Used frequently for simplicity

42
Scalar Quantiser
  • Case 2 Piecewise constant distribution
  • Used when of decision levels N is large
  • Decision level solution difficult (Use numerical
    methods for Lagrange multipliers)
  • Reconstruction levels

43
Scalar Quantiser
  • Case 3 Nonuniform distribution
  • Need numerical methods for di and ri
  • Tables available for standard distributions
    (Gaussian, Laplacian, Rayleigh,) for popular N
  • This is a true Lloyd-Max quantiser (or optimum
    mean square quantiser)
  • Case 4 Uniform quantiser
  • Uniform refers to equal spacing between decision
    levels regardless of distribution
  • Similar structure to Case 1 but different
    performance because distribution not uniform
  • Commonly used (e.g in JPEG,)

44
Scalar Quantiser Performance
  • MSE correlates well with subjective degradation
  • Dont rely on MSE minimisation in isolation
    though
  • Need to consider overall rate-distortion
  • Measures MSE as a function of number of bits n
  • Constants a and b depend on distribution
  • When designing a quantiser for each DCT
    coefficient i need to know ni
  • 64 quantisers
  • How to determine ni (number of bits per
    coefficient)?
  • Depends on variance of coefficient i relative to
    others and specified average bitrate nav
  • Bit allocation algorithm paradigm

45
Bit allocation algorithms
  • Try to keep constant
  • As variance increases, distortion decreases by
    using more bits
  • Optimal allocation for N coefficients
  • Often a rate controller after entropy encoder
    with feedback path to quantiser

46
Scalar Quantiser Summary
  • Uniform quantiser most commonly used
  • In fact, rather than transmitting a quantised
    coefficient, usually transmit the quantisation
    index
  • This has much lower entropy

47
Vector Quantiser
  • Quantise blocks of samples together
  • Each block assigned a single code
  • A code book used to find code for block
  • Code book can be dynamic or pre-defined
  • Each pattern has specific encoding
  • Can give very good performance
  • Quite computationally expensive
  • Difficult to design tables
  • Used by GIF standard

48
Demo
  • Compression gain
  • ?
  • Perceptual quality

49
Motion Estimation Compensation
  • Exploiting temporal redundancy
  • Motion Estimation
  • Block matching algorithm overview
  • Matching Criteria
  • Selection of Search Strategies
  • More advanced motion estimation techniques
  • Software / Hardware Considerations
  • Motion Compensation
  • Adoption in standards discussed later

50
Exploiting Temporal Redundancy
  • Very slight change between successive frames (e.g
    A B)
  • Camera Object Motion
  • Temporal prediction model at encoder decoder
    provides compression if
  • model parameters correction terms lt raw pixel
    information
  • e.g. Frame differencing (C)
  • Entropy
  • B 7.15 bits/pixels
  • C 4.38 bits/pixels
  • More complex models can reduce entropy further
  • Computational expense, memory and prediction
    performance trade off
  • Temporal Prediction model
  • Motion estimation
  • Motion compensation

51
Taxonomy of Motion Estimation Algorithms
  • Good Motion Estimation reviews
    Mitchell96Furht97Kuhn99

52
Block Matching Algorithm
  • For each MxN block in the current frame, find the
    associated best matching block within a
    predetermined or adaptive S pel search range in
    a reference frame(s)
  • Estimates motion of a group of pixels
  • Assumes translational motion only
  • Typically operates on luminance component only
  • Good trade off between computationally complexity
    prediction accuracy
  • Motion vector (relative offsets to the best
    match) undergoes VLC
  • Prediction Residual undergoes further processing
    (DCT, VLC, etc)

53
Matching Criteria
  • At each MxN block search position a matching
    criteria evaluated
  • Wide variety of matching criteria
  • Mean Squared Error
  • Mean Absolute Differences
  • Sum of Absolute Differences
  • Reduced complexity matching criteria
  • Binary Block Match
  • Others
  • Cross correlation
  • SAD summation truncation
  • SAD estimation
  • Reduced Bit Mean Absolute Difference
  • Minimised Maximum Error function
  • Etc
  • Matching criteria is a complexity/prediction
    performance trade off

54
Search Strategies (1/4)
  • Many possible search strategies!
  • Full Search search every position
  • Best results, but very computationally expensive
  • Operations required to generate 1 MV for 1
    current block
  • (2S1)2 block matches
  • For each pixel in a M N block match subtract,
    absolute, accumulate
  • After each block match, minimum SAD comparison
  • Therefore total operations
  • (2S1)2 (M N 3 1), e.g. s8, 289 (M N
    3 1)
  • Reduce computational expense
  • Logarithmic reduces number of search positions
  • Assumes matching criteria monotonically increases
    moving away from minimum point iteratively
    converge to minimum point
  • Possibility of getting stuck in local minimum
  • Yields higher energy prediction residual
  • Pseudocode for the Three Step Search
  • 1 R 2(log2S-1)
  • 2 Search positions within the search window
    defined using R
  • 3 R R/2
  • 4 if Rlt1 finished, else repeat go to 2.

55
Search Strategies (2/4)
  • Logarithmic searches contd.
  • Three Step Search Koga81
  • S 8, initial R4
  • Search positions defined using R
  • (x-R,y-R), (x,y-R), (xR,y-R) .(x,y),(xR,yR)
  • Operations required to generate 1 MV
  • (988) (M N 3 1)
  • Variants
  • 2-D logarithmic Jain81, Parallel 1-D Chen91,
    CDS Rao83, N3SS Li94, 4SS Po96
  • Hierarchical Search Strategies
  • Search fewer positions use fewer pixels in the
    matching criteria
  • Achieved via sub-sampling current reference
    frames
  • Disadvantage increased memory
  • Best match in lower resolution seeds search for
    subsequent resolutions
  • Can help to avoid local minima due to low pass
    filtering effect
  • Local minima still possible for small regions
    which disappear during sub-sampling

56
Search Strategies (3/4)
  • 3 Level Hierarchical Search Example
  • Level 1 Original
  • Sub-sampled by factor of 2 generating level 2
  • Level 1 sub-sampled by 4 generating level 3
  • Motion Estimation starts at level 3
  • block size N/4 X M/4
  • Search window S/4
  • FS or TSS employed within this window
  • Produces motion vector (Vx3, Vy3)
  • Motion Estimation level 2
  • block size N/2 X M/2
  • Centered on (x/22Vx3, y/22Vy3)
  • Search window 1 around this point
  • Produces motion vector (Vx2, Vy2)
  • Motion Estimation level 1
  • Centered on (x2Vx2, y2Vy2)
  • Search window 1 around this point
  • Produces final motion vector (Vx1, Vy1)
  • Operations required to generate 1 MV using a FS
    at level 3
  • (2(S/4)1)2 (M/4 N/4 3 1) 9(M/2 N/4
    3 1) 9(MN 3 1)

57
Search Strategies (4/4)
  • Scene adaptive search area
  • Zone based search strategies
  • Can employ stopping threshold in each zone
  • Advantageous in a rate/distortion sense
  • chan95Jung96Zhe97
  • Spiral Search
  • Dynamic search window size
  • Many techniques used to adjust range
  • Spatial correlation of MV Chain95In97
  • Gradient based methods
  • Block based gradient decent search Liu96
  • Stops after 4 steps
  • Diamond search Cote97
  • Early stopping technique
  • Skip to next block match when the minimum SAD has
    been exceeded
  • Successive elimination algorithm Li95
  • Conservative block SAD Do98

58
Different Search Strategy Performance
  • Frame Differencing
  • 0 Motion Vector
  • Entropy 4.38 bits/pixel
  • 1 operation/pixel (subtraction)
  • Full Search
  • Block size 16x16
  • Search range 8
  • Entropy 2.61 bits/pixel
  • 868 operations/pixel
  • Hierarchical Search
  • Block size 4x4, 8x8, 16x16
  • Search window 2,4, 8,
  • Entropy 3.08 bits/pixel
  • 39 operations/pixel
  • Hierarchical Search
  • Block size 4x4, 16x16, 32x32
  • Search window 2, 4, 8
  • Entropy 2.91 bits/pixel
  • 35 operations/pixel

59
More advanced techniques (1/2)
  • Bi-directional (Forward and Reverse) Prediction
  • Termed B-frames
  • Not feasible for real-time systems
  • Multiple Reference Frames
  • Improves prediction
  • Increases computational expense memory
    requirements
  • Unrestricted Motion Vectors
  • Allow block matches outside the reference frame
  • Pixel padding used to extend beyond frame
    boundaries
  • Predictive Motion Vectors
  • Rather than start at collocated block use a MV
    predictor
  • Temporal and/or Spatial prediction
    Lee97Kos97Zheng97
  • Can improve prediction residual quality
  • Can employ thresholds to gate-off motion
    estimation
  • H/W Reduces pixel reusability between current
    block positions
  • Global Motion Compensation
  • Default motion for the frame/object

60
More advanced techniques (2/2)
  • Sub-pel Motion Estimation
  • Real motion is not constrained by integer pixel
    amounts
  • Half-pel quarter pel frequently used
  • But memory increases
  • H.264
  • 6-tap FIR filter for ½ pel
  • Bilinear for ¼ pel
  • Variable Block Size Motion
  • Smaller block size will lead to smaller residual
  • But number of motion vectors signalling info
    increases
  • 41 MV per 16x16 block in H.264
  • MPEG-4 H.263 Advanced Prediction Motion
    Estimation (4MV)
  • H.264
  • Dynamically adapts between multiple block sizes
    (16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4)
  • Rate/Distortion Optimised
  • Motion Vector Coding Prediction
  • Adding MVs to bitstream can be costly,
    particularly if block size is small
  • DPCM used to exploit spatial MV redundancies

61
ME Software/Hardware considerations
  • Software algorithmic complexity (simplified
    analysis)
  • To support 1920x1280 9600 x 30 288K 16x16
    blocks/sec
  • 8 Search Window 289 Block matches per current
    block
  • Total block matches 289 288K 83,232,000
    matches/sec
  • Operations 83,232,000 (256 pixels31) 6.4
    GOPS
  • Hardware implementations can be attractive
  • Systolic Array (1D/2D) approaches typically
    employed
  • Memory bandwidth efficient high throughput
  • Full Search commonly used
  • Architectures also available for heuristic search
    strategies
  • Architectures for H.264 Variable Block Size
    emerging
  • Ball park figures for H.264 VBSME core
  • 1-D 16 PE SA
  • Area 40-60K gates Memory Bandwidth 3 pixels
    per clock cycle
  • 1 16x16 block match every 4096 clock cycles (8
    search range)
  • 2-D 256 PE SA
  • Area 100-200K gates Memory Bandwidth 48
    pixels per clock cycle
  • 1 16x16 block match every 256 clock cycles (8
    search range)
  • To support 1920x1280 9600 x 30 288K 16x16
    blocks/sec

62
Motion Compensation
  • Straightforward relative to motion estimation
  • Reconstructed MB Residual Mot. Comp. MB
    (pointed to by MVs)
  • Copy block of pixels from displaced block in the
    reference frame into the current frame
  • Reference frame must be stored in decoder
  • For encoder and decoder to remain synchronised
  • Encoder also needs to do motion compensation
  • Considerations
  • Additional frame memory at the decoder
  • Low computational requirements

63
Lossy CompressionStandards
64
Standards Evolution
65
JPEG
  • Flexible image coding standard
  • 4 Modes of operation
  • Lossless encoding (earlier)
  • Baseline sequential encoding
  • Progressive encoding
  • Hierarchical encoding (towards JPEG-2000)
  • Motion JPEG
  • Baseline encoding of each frame
  • No motion estimation
  • Not properly standardised

66
JPEG-2000
  • JPEG not optimised for a wide range of apps
  • JPEG-2000 even more flexible
  • Interesting features
  • Uses DWT instead of DCT
  • Region of Interest (ROI) coding
  • Scalability
  • Spatial scalability
  • SNR scalability
  • More resilient to channel errors
  • Individual quality packets independently decoded
  • Also supports lossless coding
  • Added flexibility comes at computational cost

67
JPEG/JPEG-2000 Summary
  • JPEG capable of average compression of 151 for
    subjectively transparent quality
  • JPEG-2000 better compression _at_ fixed rate
  • For Foreman
  • Gain of 1.5?4 dB for range of 1.2?0.12 bpp
  • Applications
  • Internet
  • Digital photography
  • Many more

68
ITU-T H.261
  • ITU-T narrow bandwidth real-time apps
  • H.261 (p x 64)Kb/s over ISDN (1p30)
  • CIF and QCIF resolution
  • Real time video telephony/conferencing
  • Up to 3 frames interpolated by decoder
  • Supports framerates of 30Hz, 15Hz, 10Hz, 7.5Hz
  • Video compression tools
  • 8x8 DCT
  • Uniform scalar quantiser (rate control optional)
  • Entropy coder is modified run length and Huffman
  • Motion Estimation
  • Only forward direction
  • Search window limited to 15
  • Integer pixel accuracy only
  • Motion Compensation is optional
  • Loop filter (alleviate blocking)

69
ISO/IEC MPEG-1
  • Storage of AV content for delivery at 1.5Mb/s
  • Flexible
  • Resolutions typically 768x586
  • Framerate typically 30Hz
  • H.261 was starting point for the standard
  • Compression gain at expense of latency
  • Specific features
  • Standard VLCs determined by Huffman coding
  • DCT DC coeffs are differentially predicted
  • Bi-directional prediction (I,P,B frames)
  • Motion compensation with half-pixel accuracy
  • Maximum MV range of (-512,511.5) for half pixel
    and (-1024,1023) for integer pixel
  • Weighted quantisation (H.261 does not have this)
  • Random access to bitstream, FF, FR

70
ISO/IEC MPEG-1
71
ISO/IEC MPEG-2
  • High quality video _at_ 4-15Mb/s
  • VOD, Broadcast TV, DVD, HDTV, Satellite TV
  • Major differences w.r.t. MPEG-1
  • More resolutions, framerates, qualities and
    bitrates
  • SIF (352x288_at_25Hz) ? HDTV (1920x1250_at_60Hz)
  • Profiles and levels
  • Has interlaced/progressive option
  • Frame/Field based ME, MC and DCT
  • Scalability (temporal, spatial, SNR)
  • Minor differences
  • More bits for quantisation
  • Alternate scan (as well as zigzag)

72
ITU-T H.263
  • Very low bitrate apps (lt 64kb/s)
  • Video telephony over PSTN, mobile telephony
  • Recommended resolutions subQCIF, QCIF, CIF,
    4CIF, 16CIF
  • Non-interlaced _at_ 29.97Hz
  • Similar to H.261
  • Extensions (Some optional in Annex but included
    in H.264)
  • MVs differentially encoded
  • Half-pixel accurate motion estimation
  • Extensions support quarter and one eighth
  • Unrestricted motion vector mode
  • MVs can point outside image, edge pixels form
    prediction
  • Advanced prediction mode
  • MB can have 4 MVs associated with it
  • Syntax-based arithmetic encoding (SAC)
  • Optional mode to replace VLCs with arithmetic
    encoding
  • PB frames
  • Error resilience
  • Synchronisation markers
  • Reversible VLCs

73
ISO/IEC MPEG-4
  • An all encompassing standard!
  • Improved compression at 5kb/s ? 1Gb/s
  • Resolutions of sub-QCIF to studio
  • Content-based interactivity (semantic objects)
  • Universal access (scalability, error resilience)
  • Synthetic and natural hybrid coding (SNHC)

74
ISO/IEC MPEG-4
75
ISO/IEC MPEG-4
  • Video coding tools
  • Integer, half and quarter pixel ME
  • Boundary MB ME padding or polygon matching
  • Global ME
  • Shape Adaptive DCT
  • AC/DC intra prediction
  • Enhanced scalability FGS
  • Still texture coding (uses SA-DWT)
  • Shape Coding tools
  • Context-based arithmetic encoding (CAE)
  • Compute context
  • Index into LUT for probability of 0,1
  • Drive arithmetic encoder

76
ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)
  • Targets enhanced compression for wide range of
    apps
  • Improved prediction
  • Variable block-size MC with small block sizes
  • Up to quarter-pixel MC
  • Unrestricted motion vector mode
  • Multiple reference picture MC
  • Weighted prediction (generalised B-pictures)
  • Directional intra prediction (9 4x4 modes, 1
    16x16 mode)
  • In the loop adaptive deblocking filter
  • Improved coding efficiency tools
  • Small block size transform
  • Hierarchical block transform
  • Short word length transform (16 bit integer
    arith)
  • Exact match inverse transform
  • CAVLC, CABAC
  • Enhanced error robustness and network
    friendliness

77
ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)
78
ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)
  • H.264 Version 1 has 3 profiles
  • Baseline
  • Main
  • Extended
  • Fidelity Range Extension (FRExt) Amendment
  • High Profile
  • High 10 Profile
  • High 422 Profile
  • High 444 Profile
  • Up to 12 bits per sample
  • Supports lossless region coding
  • Codes RGB to avoid colour space transformation
    error

79
Comparing Standards
  • Video conferencing applications
  • Low latency real-time requirement
  • H.264/AVC MP would improve by further 10-20
  • Using low delay bi-prediction, CABAC

80
Comparing Standards
  • Video streaming applications
  • Less of delay constraint

81
Comparing Standards
  • Entertainment-quality applications
  • High resolution, delay tolerable

82
Comparing Standards
  • Professional motion picture production
  • Random access to individual frames
  • Up to HDTV, H.264/AVC MP comparable or better
    than Motion-JPEG2000

83
Comparing Standards
  • PSNR while good does not take into account
    intricacies of the human eye
  • Need subjective video tests
  • Other metrics
  • MPQM,
  • Experiments show that H.264 gives lowest bitrate
    for subjectively equivalent video over a range of
    apps
  • Improved performance comes at the cost of
    computational complexity
  • Main bottleneck is ME (very memory intensive)

84
Image AnalysisVisual Feature Extraction
85
Visual Features - Still Images
  • What features are important?
  • Colour
  • Texture
  • The feel, appearance, consistency of a surface
  • In an image
  • Distribution over the entire image?
  • Of specific parts of the image?

No texture
Highly textured
86
Visual Features - Colour
  • Colour is visually important to humans
  • Colour features and similarity metrics easy to
    compute
  • Histogram Swain and Ballard, 1992
  • Most commonly used structure to represent global
    image features.
  • Invariant to translation and rotation and can be
    made invariant to scale by normalisation
  • MPEG-7 Scalable Colour Description
  • H(16 levels) S(4 levels) V(4 Levels) histogram
    encoded with a Haar transform for efficiency
    scaling

87
Visual Features - Texture
  • Simple texture descriptors Pratt, 1991
  • Autocorrelation function
  • Co-occurrence matrices
  • Edge frequency
  • Primitive length
  • More sophisticated (based on transforms and/or
    filtering)
  • Wavelet Mallat, 1990, Haar Theodoridis, 1999,
    Gabor Bovis, 1990
  • Others
  • Mathematical morphology
  • Fractals

88
Visual Features - Texture
  • Example MPEG-7 Edge Histogram
  • Represents the global (and possibly local - Won,
    2002) spatial distribution of edges
  • Need to first generate edge map
  • Roberts, Sobel and Prewitt, Canny,
  • Build histogram based on 5 edge types

89
Change Detection
  • Compare 2 temporally adjacent images and
    determine how different they are
  • Why?
  • Surveillance-type applications
  • Assume static camera background
  • Anything changing between one object and next
    must be an object!
  • In fact, this is naïve but starting point of many
    object segmentation techniques
  • Temporal video structuring
  • Breaking video up into chunks for non-linear
    browsing shots, scenes, events, story-lines

90
Temporal Video Structuring
  • Shot boundary detection

a video document
A set of keyframes
Keyframe-based video browsers
91
(No Transcript)
92
Temporal Video Structuring
  • Shot boundary detection
  • A shot is a continuous piece of video taken with
    one camera
  • A shot cut is the abrupt or gradual transition
    between two shots
  • Uncompressed domain
  • Calculate colour histogram for each frame
  • Calculate difference between histograms using
    suitable metric L1 (city-block), L2 (Euclidean),
    Mahanoblis, etc
  • Threshold
  • Compressed domain
  • Parse features directly from bitstream
  • E.g. use DCT coefficients for each frame to
    reconstruct approximation of image
  • E.g. motion vectors for each pair of frame and
    detect changes in global statistics
Write a Comment
User Comments (0)
About PowerShow.com