Customizing WideSIMD Architectures for H'264 - PowerPoint PPT Presentation

About This Presentation
Title:

Customizing WideSIMD Architectures for H'264

Description:

H.264 encoder/decoder reference design. 6. 6. 6. Customizing Wide-SIMD Architectures for H.264 ... Comparison with latest H.264 encoders ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 28
Provided by: johngo
Category:

less

Transcript and Presenter's Notes

Title: Customizing WideSIMD Architectures for H'264


1
Customizing Wide-SIMD Architecturesfor H.264
  • Sangwon Seo1, Mark Woh1, Scott Mahlke1, Trevor
    Mudge1
  • Vijay Sundaram2, Chaitali Chakrabarti2
  • 1 University of Michigan
  • 2 Arizona State University

2
Outline
  • Motivation
  • H.264 Analysis
  • Proposed Architecture
  • H.264 Kernel Mappings
  • Results
  • Conclusion

3
Motivation Smart Phone
Reference Images http//www.apple.com/iphone/gal
lery/
4
Motivation Inside Smart Phone
Reference Images http//idannyb.files.wordpress.
com/2008/07/xiuvbfueck3gsdum-large.jpg
5
H.264 Design
H.264 encoder/decoder reference design
Reference Images I. Richardson, H.264 and
MPEG-4 video compression, WILEY, 2003
6
H.264 Analysis
  • H.264 Kernel Algorithms
  • Heavy SIMD workload
  • Different natural SIMD widths
  • High Medium Thread Level Parallelism
  • Need to support multiple SIMD widths to maximize
    the SIMD utilization

7
H.264 Analysis
  • Example Deblocking Filter
  • Two dimensional data are used for multimedia
    algorithms.
  • Row or column order memory access works well for
    one set of edges, but not for the other.
  • Diagonal memory bank system helps to access
    blocks along a row or a column.

Horizontal Filtering
Vertical Filtering
8
H.264 Analysis
  • Subgraphs for Innerloops of two kernel algorithms
  • Large amount of data locality
  • Large RF power consumption (Read/Write)
  • Bypass and Temporary buffer support

9
H.264 - Analysis
  • Instruction Pairs
  • Heavy usage of shuffle and arithmetic operations
  • Add-Shift round operation
  • Sub-Abs SAD operation
  • Need to fuse the frequently used instruction pairs

10
H.264 - Analysis
  • Permutation Patterns for Intraprediction
  • Fixed set of shuffle patterns
  • Need for programmable shuffle network

11
Modified SIMD architecture
12
Modified SIMD architecture
Multiple SIMD widths Thread-Level Parallelism
13
Modified SIMD architecture
Diagonal Memory Organization Memory Bank System
Shuffle Network
14
Modified SIMD architecture
Short-lived values stored in temporary buffers
15
Modified SIMD architecture
Short-lived values Fused Operation
16
Modified SIMD architecture
Shuffle Networks are placed here and there to
align data
17
Mapping of H.264 Kernels
  • Intra Prediction

18
Results
  • System Breakdown
  • H.264 CIF video at 30fps

19
Results
  • Speedup Breakdown
  • 2.13x performance increase on average

20
Results
  • Energy-Delay product comparison
  • 29 energy-delay improvement on average

21
Results
  • Comparison with latest H.264 encoders
  • 17 T. C. Chen et.al, 2.8 to 62.7 mW
    low-power and power-aware H.264 encoder for
    mobile
  • applications, 2007 IEEE Symposium on
    VLSI Circuits, pp. 222223, June 2007.
  • 18 M. Bhatnagar, TMS320DM6446/3 Power
    Consumption Summary, Texas Instruments
  • Application Reports, http//focus.ti.com/
    lit/an/spraad6a/spraad6a.pdf, Feb. 2008.

22
Conclusion
  • Key architectural enhancements
  • SIMD partitioning
  • Diagonal memory bank system
  • Bypass and temporary buffer support
  • Fused operation support
  • Programmable crossbar
  • Future work
  • Image processing algorithms on SIMD architecture

23
Backup Slides
24
H.264 Analysis
  • Diagonal Memory Organization
  • Two dimensional data are used for multimedia
    algorithms.
  • Blocks along a row or a column need to be
    accessed easily.

25
Mapping of H.264 Kernels
  • Deblocking Filter

26
Mapping of H.264 Kernels
  • Motion Compensation

27
Mapping of H.264 Kernels
  • Motion Estimation
Write a Comment
User Comments (0)
About PowerShow.com