Modeling Delta Encoding of Compressed Files - PowerPoint PPT Presentation

About This Presentation
Title:

Modeling Delta Encoding of Compressed Files

Description:

Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 24
Provided by: yt69
Category:

less

Transcript and Presenter's Notes

Title: Modeling Delta Encoding of Compressed Files


1
Modeling Delta Encoding of Compressed Files
  • S.T. Klein, T.C. Serebro, D. Shapira

2
Delta Encoding
  • Example
  • SThe Prague Stringology Club
  • TThe Prague Stringology Conference 06
  • ?(1, 24)onferenc(3,2)06

3
Compressed Differencing
Delta encoding
Semi Compressed Differencing
Full Compressed Differencing
S
T
E(S)
E(T)
S
E(S)
?(S,T)
  • Goal- Create a delta file of S and T, without
    decompressing the compressed files.

4
LZW compression
  • STR input character
  • WHILE there are input characters
  • C input character
  • IF STR ? C is in T then
  • STR STR ? C
  • ELSE
  • output the code for STR
  • add STR ? C to T
  • STR C
  • output the code for STR

5
Example
  • S abccbaaabccba

E(S) 1233219571
6
Semi Compressed Differencing Algorithm
7
Example
  • E(S) 1233219571, T ccbbabccbabccbba.

(5,2)
(9,3)
b
(3,2)
b
(5,2)
(9,3)
(5,2)
?(S,T)
8
Full Compressed Differencing Algorithm
  • 1 construct the trie of E(S)
  • 2 flag ? 0 // output character k
  • 3 counter ? 1 // position in T
  • 4 input oldcw from E(T)
  • 5 while oldcw?NULL // still processing E(T)
  • 5.1 input cw from E(T)
  • 5.2 node ? Dictionaryoldcw
  • 5.3 if (Dictionarycw ? NULL)
  • 5.3.1 k ?first character of string corresponding
    to Dictionarycw
  • 5.4 else
  • 5.4.1 k ? first character of string
    corresponding to node
  • 5.5 if ((node has a child k) and (cw?NULL))
  • 5.5.1 output (posflag,len-flag) corresponding
    to child k of node
  • 5.5.2 flag ? 1
  • 5.6 else
  • 5.6.1 output (posflag, len-flag)
    corresponding to node
  • 5.6.2 create a new child of node corresponding
    to k
  • 5.6.3 flag ? 0

9
Example
  • E(S) 1233219571 E(T) 33221247957

10
Example
  • E(S) 1233219571 E(T) 33221247957
  • S abccbaaabccba T

E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tc oldcw3
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tcc oldcw3 cw3
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tcc oldcw3 cw3 kc
3
11
Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tcc oldcw3 cw3 kc
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tcc oldcw3 cw3 kc
lt3, 2gt
?(S,T)
3
4 (1,2,c)
12
Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tcc oldcw3 cw3 flag1 kc
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccb oldcw3 cw3 flag1 kc

lt3, 2gt
?(S,T)
4 (1,2,c)
13
Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccb oldcw3 cw2 flag1 kc

E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccb oldcw3 cw2 flag1 kb

lt3, 2gt
lt5, 1gt
?(S,T)
4 (1,2,c)
5 (2,2,c)
14
Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccb oldcw3 cw2 flag1 kb

E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbb oldcw3 cw2 flag1 kb

E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbb oldcw2 cw2 flag1 kb

E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbb oldcw2 cw2 flag0 kb

lt3, 2gt
lt5, 1gt
ltb, 0gt
?(S,T)
6 (3,2,b)
4 (1,2,c)
5 (2,2,c)
15
Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbb oldcw2 cw2 flag0 kb

E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbba oldcw2 cw2 flag0 k
b
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbba oldcw2 cw1 flag0 k
a
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbba oldcw2 cw1 flag1 k
a
lt3, 2gt
lt5, 1gt
lt5, 2gt
?(S,T)
6 (3,2,b)
7 (4,2,b)
4 (1,2,c)
5 (2,2,c)
4 (1,2,c)
5 (2,2,c)
16
Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbba oldcw2 cw1 flag1 k
a
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbab oldcw2 cw1 flag1 k
a
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbab oldcw1 cw2 flag1 k
b
lt3, 2gt
lt5, 1gt
lt5, 2gt
lt2,1gt
?(S,T)
6 (3,2,b)
7 (4,2,b)
8 (5,2,a)
4 (1,2,c)
5 (2,2,c)
4 (1,2,c)
5 (2,2,c)
17
Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabcc oldcw2 cw4 flag1
kc
lt3, 2gt
lt5, 1gt
lt5, 2gt
lt2,1gt
lt3, 1gt
?(S,T)
6 (3,2,b)
7 (4,2,b)
8 (5,2,a)
4 (1,2,c)
5 (2,2,c)
4 (1,2,c)
5 (2,2,c)
9 (6,2,b)
18
Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccba oldcw4 cw7 flag
1 kb
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccba oldcw4 cw7 flag
0 kb
lt3, 2gt
lt5, 1gt
lt5, 2gt
lt2,1gt
lt3, 1gt
(2, 1)
?(S,T)
6 (3,2,b)
7 (4,2,b)
8 (5,2,a)
4 (1,2,c)
5 (2,2,c)
9 (6,2,b)
19
Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccbabc oldcw7 cw9 fl
ag0 kb
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccbabccb oldcw9 cw5
flag0 kc
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccbabccb oldcw9 cw5
flag1 kc
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccbabccbba oldcw5 cw7
flag1 kb
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccbabccbba oldcw7 cwNul
l flag0 kb
lt3, 2gt
lt5, 1gt
lt5, 2gt
lt2,1gt
lt3, 1gt
(2, 1)
(4, 2)
lt9, 3gt
(3, 1)
(4, 2)
?(S,T)
6 (3,2,b)
7 (4,2,b)
8 (5,2,a)
4 (1,2,c)
5 (2,2,c)
9 (6,2,b)
b
10 (7,3,c)
12 (11,3,b)
20
Combination of Pairs
  • If two consecutive ordered pairs are of the form
    and , we combine them into
    a single ordered pair

lt3, 2gt
lt5, 1gt
lt5, 2gt
lt2,1gt
lt3, 1gt
(2, 1)
(4, 2)
lt9, 3gt
(3, 1)
(4, 2)
lt3, 2gt
lt5, 1gt
lt3, 3gt
?(S,T)
S abccbaaabccba
S abccbaaabccba
S abccbaaabccba
21
Combination of Pairs
  • If two consecutive ordered pairs are of the form
    and , we combine them into
    a single ordered pair

lt5, 2gt
lt2,1gt
lt3, 1gt
(2, 1)
(4, 2)
lt9, 3gt
(3, 1)
(4, 2)
lt2,1gt
lt3, 1gt
lt2, 2gt
?(S,T)
lt3, 3gt
S abccbaaabccba
S abccbaaabccba
S abccbaaabccba
22
Encoding the delta file
File consists of
(pos, len) in S
(pos, len) in T
Characters
flags
23
Experiments
  • S xfig.3.2.1
  • T xfig.3.2.2
  • T 812K
  • Gzip(T) 325K
  • LZW(T) 497K
  • ?(S,T) ? 3K
Write a Comment
User Comments (0)
About PowerShow.com