Context-based Data Compression - PowerPoint PPT Presentation

About This Presentation
Title:

Context-based Data Compression

Description:

Context-based Data Compression Xiaolin Wu Polytechnic University Brooklyn, NY Part 3. Context modeling Context model estimated symbol probability Variable length ... – PowerPoint PPT presentation

Number of Views:213
Avg rating:3.0/5.0
Slides: 15
Provided by: Xiaol9
Category:

less

Transcript and Presenter's Notes

Title: Context-based Data Compression


1
Context-basedData Compression
Xiaolin Wu Polytechnic University Brooklyn, NY
  • Part 3. Context modeling

2
Context model estimated symbol probability
  • Variable length coding schemes need estimates of
    probability of each symbol - model
  • Model can be
  • Static - Fixed global model for all inputs
  • English text
  • Semi-adaptive - Computed for specific data being
    coded and transmitted as side information
  • C programs
  • Adaptive - Constructed on the fly
  • Any source!

3
Adaptive vs. Semi-adaptive
  • Advantages of semi-adaptive
  • Simple decoder
  • Disadvantages of semi-adaptive
  • Overhead of specifying model can be high
  • Two-passes of data required
  • Advantages of adaptive
  • one-pass ? universal ? As good if not better
  • Disadvantages of adaptive
  • Decoder as complex as encoder
  • Errors propagate

4
Adaptation with Arithmetic and Huffman Coding
  • Huffman Coding - Manipulate Huffman tree on the
    fly - Efficient algorithms known but nevertheless
    they remain complex.
  • Arithmetic Coding - Update cumulative probability
    distribution table. Efficient data structure /
    algorithm known. Rest essentially same.
  • Main advantage of arithmetic over Huffman is the
    ease by which the former can be used in
    conjunction with adaptive modeling techniques.

5
Context models
  • If source is not iid then there is complex
    dependence between symbols in the sequence
  • In most practical situations, pdf of symbol
    depends on neighboring symbol values - i.e.
    context.
  • Hence we condition encoding of current symbol to
    its context.
  • How to select contexts? - Rigorous answer beyond
    our scope.
  • Practical schemes use a fixed neighborhood.

6
Context dilution problem
  • The minimum code length of sequence
  • achievable by arithmetic coding, if
  • is known.
  • The difficulty of estimating
    due to insufficient sample statistics prevents
    the use of high-order Markov models.

7
Estimating probabilities different contexts
  • Two approaches
  • Maintain symbol occurrence counts within each
    context
  • number of contexts needs to be modest to avoid
    context dilution
  • Assume pdf shape within each context same (e.g.
    Laplacian), only parameters (e.g. mean and
    variance) different
  • Estimation may not be as accurate but much larger
    number of contexts can be used

8
Entropy (Shannon 1948)
9
Conditional Entropy
  • Consider two random variables and
  • Alphabet of
  • Alphabet of
  • Conditional Self-information of is
  • Conditional Entropy is the average value of
    conditional self-information

10
Entropy and Conditional Entropy
  • The conditional entropy can be
    interpreted as the amount of uncertainty
    remaining about the , given that we know
    random variable .
  • The additional knowledge of should reduce the
    uncertainty about .

11
Context Based Entropy Coders
  • Consider a sequence of symbols

12
Decorrelation techniques to exploit sample
smoothness
  • Transforms
  • DCT, FFT
  • wavelets
  • Differential Pulse Coding Modulation (DPCM)
  • predict current symbol with past observations
  • code prediction residual rather than the symbol

13
Benefits of prediction and transform
  • A priori knowledge exploited to reduce the
    self-entropy of the source symbols
  • Higher coding efficiency due to
  • Fewer parameters to be estimated adaptively
  • Faster convergence of adaptation

14
Further Reading
  • Text Compression - T.Bell, J. Cleary and I.
    Witten. Prentice Hall. Good coverage on
    statistical context modeling. Focus on text
    though.
  • Articles in IEEE Transactions on Information
    Theory by Rissanen and Langdon
  • Digital Coding of Waveforms Principles and
    Applications to Speech and Video. Jayant and
    Noll. Good coverage on Predictive coding.
Write a Comment
User Comments (0)
About PowerShow.com