Title: Flexible Media Compression
1Flexible Media Compression State of the Art
- Jin Li
- Microsoft Research
2Outline
- Introduction
- Media compression any work to be done?
- Flexible media compression
- Image compression JPEG 2000 delivery
- Compression of the 3D environment
- Audio compression (functionality demo)
- Conclusion
3Media Compression So Many Standards
MP3, WMA, Real, G.723, G.722.1, MPEG-4 audio,
etc..
Audio/Speech
JPEG, BMP, GIF, JPEG 2000, etc..
Image
MPEG-1, MPEG-2, MPEG-4 H.261, H.263, WMV, Real,
AVI, Quicktime
Video
4Media vs. File Compression
- Whats the difference between media file
compression? - File compression
- Every bit is important, has to be compressed
losslessly - Media compression
- Exact bit/value is not important, distortion is
tolerable - Amount of media is huge, high compression ratio
is required - Often needs manipulated
5Example Image
167
123
84
200
2D array of data
Lena, Image (512x512)
6Media Representation
Subsample (128x128)
Manipulation
Reposition (256,256)-(384,384)
Image (512x512)
Compress (JPEG)
7Flexible Media Compression
- So media needs to be manipulated, how this is
related to its compression? - Many current compression standard generates
bitstream that is not manipulatable - The compressed media should have the ability to
be flexibly adjustable - to match the requirement of the
- Client device
- Network channel
- Storage device
8Flexible Media Compression
- A challenging task
- Flexible
- More functionality
- Yet as efficient as possible
9Examples
- Works done
- Flexible image compression
- Flexible environment compression
- Flexible audio compression
- Flexible compression can be achieved without the
loss of compression efficiency.
10Flexible Image Compression JPEG 2000 Vmedia
11How to Efficiently Browse a Large Image
- User specify a region of interest (ROI)
- The image is compressed into scalable units
- Only bitstream required in the current view is
delivered
12Key Technologies
- Compress an image into a set of scalable units
- JPEG 2000
- Deliver and manage bitstream segments
- Vmedia
13The benefit of JPEG 2000 over JPEG
- Achieve better efficiency
- Superior low bit-rate performance compared with
JPEG - Better visual performance visual tools
- Handle more types of image
- Provide many new useful functionalities
14The benefit of JPEG 2000 over JPEG (2)
- Provide many new useful functionalities
- Lossless compression
- Progressive transmission
- By quality, visual and resolution
- Progression to lossless
- Region of interest (ROI)
- Encoder code a certain region with high quality
- Decoder access and processing
- Progressive ROI access arbitrary access a
certain area, decoding resolution and quality
level - Robustness to bit errors
15JPEG 2000 Framework
Transform
Quantization
Entropy Coding
Image
Bitstream Assembler
. . .
Compressed bitstream
16Transform
Transform Coeff. 4123, -12.4, -96.7, 4.5,
Original 128, 129, 125, 64, 65,
17Quantization
Quantized Coeff.(Q64) 64, 0, -1, 0,
Transform Coeff. 4123, -12.4, -96.7, 4.5,
18Two Tier Entropy Coding
Res1
Res2
. . .
Encode each block separately record the
bitstream of each block. Block size is 64x64.
19Bitstream Assembler
D1
D1
D2
D2
R1
R1
R2
R2
D3
D3
D4
D4
R3
R3
R4
R4
20Assemble the Bitstream
Res1
Res2
- Bitstream
- Rate-distortion optimized, for progressive by
quality - May be reordered
- Region with resolution access, progressive by
quality
. . .
Encode each block separately record a bitstream
for each block
21A Sample JPEG 2000 Bit Stream
22Delivery of Flexibly Compressed Image
- Technology for interactive browsing
- Find content related to the current view
- Delivery content efficiently
23Problem
- Need to deliver many segments of bitstream
- Bitstream segments need to be delivered in a
prioritized way - Need to cache the delivered bitstream segments
24Vmedia
Media
Server
Vmedia
Network
Network
Network
. . .
Vmedia
Vmedia
Media Program
Media Program
Client1
Clientn
Most work done at client end
25Virtual Media Concept
26Cache Management
27JPEG 2000 Interactive Image Browser
28Initial Stage
- Read filehead media structure
29Initial Stage
- File header packet head marked by companion
file - Read with synchronous mode
30Entire Image Low Res
31Zooming In
32Panning Around
33Demo
Application
Browser
34Flexible Environment Compression
35Concentric Mosaic
Camera
Beam
36CM Data
37CM Rendering Engine
Camera trajectory
Inner circle
38CM Rendering Engine
Camera Path
Inner circle
Environment
39Challenge for Concentric Mosaic Compression
- Specialized data structure
- Large amount of data, even for one scene
- Random access
- Image is displayed as whole
- Video is accessed frame by frame
- An concentric mosaic data set is best kept in the
compressed form, and decoded and rendered
just-in-time (JIT)
40Principles of Good Concentric Mosaic Coder
- High compression ratio
- Just-in-time rendering decoding
- Access decode only the content needed to render
the current view - Fast decoding operation
- Random bitstream access delivery
41Block Coding
- Vector quantization
- Color quantization, S3TC, etc
- Block transformed based coding
42Block Coding
- Description
- Encode each block at fixed length
- Good candidate for image cache
- Advantage
- Simple system
- Easy bitstream index access
- Fast encoding decoding
- Disadvantage
- Low compression ratio (around 41-501)
43Vector Quantization
- Advantage
- Fast simple decoding
- Disadvantage
- Need to record the lookup table
- Lookup table grows if subblock is large or
required quality is high - Complex in encoding
Subblock
index
Lookup Table
0
1
. . .
2
3
44S3TC (Used in DirectX)
- Advantage
- Fast simple decoding
- Easy encoding
- Disadvantage
- Limited compression ratio
Rule 00C0, 012/3C01/3C1, 101/3C02/3C1, 11C1
45Transform Based Coding
Quantization
- Advantage
- Higher compression ratio
- Disadvantage
- More complex
46Reference Block Coding - Structure
P A P P P P P P P A P P P P P P P A P P
47Reference Block Coding Rendering
Shot sequence
P
P
A
P
P
P
P
P
P
P
A
P
P
P
P
P
P
P
A
P
Rendering engine
48Reference Block Coding
- Characteristics
- Macroblock coding similar to MPEG
- Modifications
- P frame only refers to nearby A frames
- Global panning local motion improve
efficiency - Index for A P frame MBG for random data
access - Advantage
- High compression ratio (401 2001)
- Leverage existing hardware
- Be able to JIT decode the compressed bitstream
- Disadvantage
- Relatively complex (compared to block coding)
49High Dimensional Transform Coding (3D Wavelet)
- Framework
- Frame alignment smart rebinning
- Quick decoding progressive inverse wavelet
synthesis
50Why 3D Wavelet?
- Good decorrelation and energy compaction
- Easily designed quantization and coding
algorithms - Better flexibility
- Error resilience
513D Wavelet Compression System
523D Wavelet Transform
Fn
F3
F2
F1
F0
53Lifting Implementation
An example of biorthogonal 9-7 filter
L0
x0
a
c
d
b
H0
x1
b
c
d
a
L1
x2
a
c
d
b
H1
x3
c
a
b
d
L2
x4
c
b
d
a
No auxiliary memory is needed
Easy to do inverse transform
H2
x5
Exactly the same result as convolution with half
the computational complexity.
b
c
d
a
L3
x6
a
b
c
d
Convolution on average,every node requires 4.5
X, 7 .
H3
x7
c
a
b
d
Lifting on average,every node requires
2 X, 4 .
L4
x8
High Low
Original
54Inverse Transform
Transform
L0
x0
a
c
d
b
H0
x1
b
c
d
a
L1
x2
a
c
d
b
H1
x3
c
a
b
d
L2
x4
c
b
d
a
H2
x5
b
c
d
a
L3
x6
a
b
c
d
H3
x7
c
a
b
d
L4
x8
High Low
Original
55Wavelet Packet Structure
HLL
y
y
y
LHL
HHL
z
z
z
x
x
x
Two-level Mallat
Two-level x decomp. two-level (y,z) Mallat
Two-level z decomp. two-level (x,y) Mallat
56Quantizer
- Scalar quantizer with a deadzone
2?
?
Quantized Magnitude
Sign
57Block Entropy Coding
- Segment each subband into blocks
- Bitplane entropy encode each block
- Split coefficient into bits, group the bit of
same magnitude into a bitplane - Tree based encoder
- Golomb-Rice coder
- Context adaptive arithmetic coder
- Assemble the embedded coded bitstream
58Smart-Rebinning
- Problem
- 3D wavelet lt Reference coder (in performance)?
- Approaches
- Pan compensation (Taubman and Takhor)
- Register-warp 3D ASWT (Wang et al.)
- Block matching (Ohm)
- Block MC without filling the holes(Tham et al)
- Our solution
- Data rearrangement
59Horizontal Shot Alignment
60Smart-rebinned Data Volume
61Smart-rebinning Process
62Smart-rebinned Data Volume
Original data volume
Part of the rebinned data volume
63Cross-Panorama Correlation
Pan. 0 Pan. 10 Pan. 20
Pan. 30 Pan. 40
- Well aligned objects
- Gradual parallax transition
64Arbitrary Region of Support
- Cause
- Environmental depth variation
- Solutions
- Simple rebinning
- Restrict all horizontal translation to be the
same - Padding
- Wavelet coding with arbitrary region of support
65PSNR-Y Results
Test Dataset Algorithm LOBBY (0.2bpp) LOBBY (0.12bpp) KIDS (0.4bpp) KIDS (0.24bpp)
MPEG-2 32.2 30.4 30.1 28.3
3D Wavelet 31.9 30.0 29.4 27.3
RBC 32.8 29.8 31.5 28.7
Simple rebinning 35.5 33.6 32.8 30.5
Smart-rebinning padding 36.0 34.0 33.4 31.1
Smart-rebinning arbitrary shape wavelet codec 36.3 34.3 33.8 31.3
66Just-in-time Rendering
- Challenges
- Decode render wavelet compressed concentric
mosaic just-in-time (JIT)
67Selective 3D Wavelet Decompression System
Bottleneck
. . .
. . .
Bitstream parsing
68Why is Partial Synthesis Slow?
69Progressive Inverse Wavelet Synthesis
- Use caching to avoidduplication
- Provide random data access
701D PIWS
0
1
2
71Data Access with 1D PIWS
- Guarantee minimum calculation
- Great saving if adjacent access requests are near
0
1
2
72PIWS in Concentric Mosaics
High-pass and low-pass coefficients are
interleaved
.
73Multi-scale PIWS
Rendering Engine
PIWS Engine, Level 1
Selective Decoder
PIWS Engine, Level 2
Selective Decoder
Bitstream
743 Movement and Slits Access Pattern
75Overall Rendering Speed VQ, RBC vs. PIWS
PS(frames/sec) BI(frames/sec)
RT VQ 19.7 16.3
RT RBC 16.8 13.9
RT PIWS 17.6 14.3
FB VQ 19.0 15.8
FB RBC 16.4 13.9
FB PIWS 15.8 13.5
ST VQ 17.7 14.9
ST RBC 12.5 10.9
ST PIWS 7.9 7.3
76Subjective Evaluation VQ 121
77Subjective Evaluation RBC 1001
78Subjective Evaluation 3D Wavelet 2001
79Demo
80Flexible Environment Compression
- Developed a number of IBR coding approaches
- All with JIT decoding rendering capability
- Block coding
- Limited compression ratio (61-251)
- Simple
- Reference coding
- Derived from MPEG
- An order of magnitude more compression than the
block coding approach - High dimensional transform (3D wavelet) coding
- 2x to 4x more compression ratio than reference
coding - Still be able to perform JIT rendering
81Flexible Audio Compression
82Features
- Versatile
- Lossless as good as monkeys audio
- Lossy match/exceed best audio codec found
Original
MP4 TwinVQ (9.1kbps)
EAC (8kbps)
MP3 (17.8kbps)
83Conclusion
- Flexible media compression
- Not only high compression ratio
- But also manipulatable bitstream syntax
- Flexible media compression is doable
- Innovative technologies are yet to be invented
(There are works to be done)