Adaptive Dictionary: - PowerPoint PPT Presentation

About This Presentation
Title:

Adaptive Dictionary:

Description:

LZ and LZW Adaptive Dictionary In 1977 and 1978 two papers were published by Jacob Ziv and Abraham Lemple that would produce a compression scheme still widely used ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 39
Provided by: Joseph513
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Dictionary:


1
Adaptive Dictionary
  • LZ and LZW

2
Adaptive Dictionary
  • In 1977 and 1978 two papers were published by
    Jacob Ziv and Abraham Lemple that would produce a
    compression scheme still widely used today
  • (1977) LZ77 or LZ1
  • (1978) LZ78 or LZ2
  • These techniques, and variations, are used in
    data compression
  • File Compression in UNIX (compress)
  • Image Compression (GIF graphical Interchange
    format)
  • Compression over Modems V.42 bis

3
LZ77
  • This algorithm is base on a portion of the
    previous encoded sequence
  • The encoder examines the input sequence through a
    sliding window
  • The sliding window consists of two parts
  • Search Buffer
  • Look-ahead Buffer

4
LZ77
  • Encoding Process
  • Move pointer back into search buffer in order to
    obtain a match of with the symbol to be encoded
  • Offset distance from symbol to be encoded
  • Encoder then examines the symbol following the
    symbol to be encoded and the matching symbol in
    the search buffer to see if they match
    consecutive symbols in the look-ahead buffer
  • Length number of consecutive symbols matching
    symbol to be encoded in the search buffer
  • The encoder stores the longest match and
    continues back through the search buffer in order
    to possibly find a longer length match

5
LZ77
  • Encoding Process (contd)
  • Once the search is complete, the encoder encodes
    the information to be sent with a triple
  • lto, l, cgt
  • o offset
  • l length
  • c codeword of the symbol following the match in
    the look-ahead buffer

6
LZ77
  • Encoding Process (contd)
  • Note The reason the third element (c) is placed
    in the triple is to take care of the situation
    that no match was found in the search buffer
    (i.e. l 0)
  • This may seem inefficient, sending a triple when
    we only need to encode c, however this
    situation is not common due to the actual size of
    the search buffers. (in practice the search
    buffers are much larger than the examples in this
    presentation)
  • The reason why this is done will become clear
    with an example

7
LZ77
  • Encoding Process (contd)
  • Let S represent the size of the search buffer
  • Let W represent the size of the entire window
  • Let A represent the size of the alphabet
  • Using fixed length codes, the triple is encoded
    using
  • ? Log2 (S) ? ? Log2 (W) ? ? Log2 (A) ?
    bits
  • Note ? x ? is the ceiling function
  • ? 3.5 ? 4.0 (ceiling function)
  • _ 3.5 _ 3.0 (floor function)

8
LZ77
  • Encoding Process (contd)
  • The second term, Log2 (W), may seem a bit
    strange. It may, at first, seem as though the
    second term should be Log2 (S). However, the
    length of the match may extend into the
    look-ahead buffer. This will become clear in an
    example
  • There are 3 cases to consider in this algorithm
  • No match in the search buffer
  • There is a match within the search buffer
  • The match extends inside the look-ahead buffer
  • The following example outlines each of these cases

9
LZ77
  • Encoding Process (contd)
  • Example
  • Let W 13, S 7 (which implies the LAB 6)
  • Suppose the sequence to be encoded is
  • cabracadabrarrarrad
  • It can be seen that there is no match in the
    search buffer for d. Thus, we transmit the
    triple lt0,0,C(d)gt
  • Shift the window by 1 symbol

cabraca dabrar
10
LZ77
  • Encoding Process (contd)
  • Example (contd)
  • A match is found at o 2, l 1
  • Another match is found at o 4, l 1
  • Another match is found at o 7, l 4
  • Thus, we encode the triple as lt7, 4, C(r)gt
  • Shift the window by 5 symbols

abracad abrarr
11
LZ77
  • Encoding Process (contd)
  • Example (contd)
  • A match is found at o 1, l 1
  • Another match is found at o 3, l 3 if we do
    not look further into the look-ahead buffer
  • However, if we do look into the look-ahead
    buffer, we can extend our length to 5
  • This resolves the question regarding the second
    term Log2 (W) in our bits needed to encode the
    triple

adabrar rarrad
12
LZ77
  • Encoding Process (contd)
  • Example (contd)
  • Thus, we encode the triple as lt3, 5, C(d)gt
  • If we were continuing to encode symbols we would
    again shift the window by 6 symbols
  • Decoding Process
  • The decoding process is best understood by an
    example

adabrar rarrad
13
LZ77
  • Decoding Process (contd)
  • Example
  • Assume we have already decoded the sequence
    cabraraca and have received the triples
  • (1) lt0, 0, C(d)gt
  • (2) lt7, 4, C(r)gt
  • (3) lt3, 5, C(d)gt
  • Initially start at
  • (0)

cabraraca
14
LZ77
  • Decoding Process (contd)
  • Example (contd)
  • (0)
  • (1) lt0, 0, C(d)gt
  • (2) lt7, 4, C(r)gt
  • (3) lt3, 5, C(d)gt

cabraca
c abraca d
cabrac ad abra r
cabracadabra r rarra d
15
LZ77
  • Decoding Process (contd)
  • Example (contd)
  • (2) lt7, 4, C(r)gt

16
LZ77
  • Decoding Process (contd)
  • Example (contd)
  • (3) lt3, 5, C(d)gt

17
LZ77 - SUMMARY
  • In General
  • The algorithm is a simple adaptive scheme that
    requires no prior knowledge of the source and
    seems to require no assumptions
  • Lemple and Ziv showed that asymptotically the
    performance of this algorithm approaches the best
    that could be obtained by using a scheme that had
    full knowledge about the statistics of the source
  • This may be true asymptotically, however in
    practice there are ways to improve LZ77
  • There is a hidden assumption that patterns
    recur close together. We shall see that this
    assumption is removed in LZ78

18
LZ77 - SUMMARY
  • Variations
  • Efficient encoding of triples
  • With added complexity we could drop the
    assumption that the triples are fixed length
  • PKzip, Zip, LHarc, PNG, gzip, ARJ all use LZ77
    with variable-length encoder
  • Varying the size of the search and look-ahead
    buffers
  • Increasing the size of the search buffer will
    require more effective search strategies
  • Such strategies can be implanted more effectively
    if the contents of the search buffer are stored
    in a manner conducive to fast searches

19
LZ77 - SUMMARY
  • Variations
  • Eliminate encoding data in a triple
  • This can be done using a flag bit
  • Implementing the flag bit removes the necessity
    of the triple. Now the data can be encoded as
    either the single symbol codeword or a pair
    representing the match. For example
  • Flag 1 ? single symbol codeword
  • Flag 0 ? pair lt o, l gt representing the match
    length
  • This is referred to as LZSS

20
LZ78
  • Updates to LZ77
  • The assumptions from LZ77 that patterns will
    occur close together was dropped
  • Makes use of recent past sequence as dictionary
    for encoding
  • However, this means that any pattern that recurs
    over a period longer than that covered by the
    coder window will not be captured

21
LZ78
  • Updates to LZ77
  • It can be seen that if the search window was one
    symbol longer. Thus, each symbol will be encoded
    as a single symbol
  • LZSS additional 1-bit overhead
  • LZ77 triple encoded for a single symbol
  • Thus, the effect of this problem actually causes
    an expansion instead of a compression

22
LZ78
  • Solution to this problem
  • LZ78 drops the search buffer for a dictionary
  • Note care must be taken to identically build the
    dictionary by both the encoder and decoder
  • Now, the date is encoded in a double (or pair)
  • lt i, c gt
  • i - the index of the symbol in the dictionary
  • c codeword for the character following the
    matched portion of the input

23
LZ78
  • Example
  • Let us encode the following word
  • The character b with a slash represents a space

24
LZ78
  • Example (contd)

25
LZ78
  • Example (contd)
  • Problems
  • The dictionary grows indefinitely
  • To resolve this problem there are two options
  • Pruning
  • However, added complexity is required in order to
    keep track of the most frequently used dictionary
    elements
  • Goes to a static dictionary
  • This limits the performance of the algorithm

26
LZW
  • Variation of LZ78
  • Terry Welch proposed a method for removing the
    necessity of encoding the pair lt i, c gt and only
    encoding the index
  • The dictionary must be primed with the source
    alphabet
  • This variation is know as LZW

27
LZW
  • Encoding
  • Example

28
LZW
  • Encoding (contd)
  • Example (contd)

29
LZW
  • Encoding (contd)
  • Example (contd)

30
LZW
  • Decoding
  • Example
  • Encoder output sequence (prev. example) was 5 2
    3 3 2 1 6 8 10 12 9 11 7 16 5 4 4 11 21 23 4

31
LZW
  • LZW Problem
  • The algorithm breaks down in one particular case
  • Let A a, b
  • Let the sequence to be encoded be
  • ababababababab .. .

32
LZW
  • LZW Problem

33
LZW
  • LZW Problem
  • The encoded sequence is 1 2 3 5 .. .
  • At this point everything is fine

34
LZW
  • LZW Problem
  • The encoded sequence is 1 2 3 5 .. .
  • But wait! We do not have 5 in our dictionary

35
LZW
  • LZW Problem
  • The encoded sequence is 1 2 3 5 .. .
  • We do have the beginning of the 5th entry, ab

36
LZW
  • LZW Problem
  • We can now decode the last letter a and
    continue on without further trouble

37
LZW
  • LZW Problem
  • Thus, the decoder must have an exception handler
    for this type of case

38
References
  • K. Saywood, Introduction to Data Compression 2nd
    Ed., Morgan Kaufmann Publishers, 2000
Write a Comment
User Comments (0)
About PowerShow.com