Bottlenecks of SIMD

About This Presentation

Title:

Description:

Number of Views:143

Avg rating:3.0/5.0

Slides: 20

Provided by: icsEleTu7

Category:

Tags: simd | bottlenecks | level

Transcript and Presenter's Notes

Title: Bottlenecks of SIMD

1
Bottlenecks of SIMD

2
Paper

Deepu Talla, Member, IEEE ,Lizy Kurian John,
Senior Member, IEEE, and Doug Burger, Member, IEEE

3
Outline

4
Introduction

It is popular to use multimedia SIMD extensions
to speed up media processing, but the efficiency
is not very high.
75 to 85 percent of the dynamic instructions in
the processor instruction stream are supporting
instructions.

5
Introduction

The bottlenecks are caused by the loop structure
and the access patterns of the media program.
So instead of exploiting more data-level
parallelism, the paper focuses on improving the
efficiency of the instructions supporting the
core computation.

6
Introduction

This paper has two major contributions
Firstly, it focuses on the supporting
instructions to enhance the performance of SIMD
which is an innovation.
Secondly, it gives a method to reduce and
eliminate supporting instructions with the
MediaBreeze architecture.

7
Nested Loop
8
The analysis of loop architecture

The sub-block is very small which leads to the
limited DLP because it needs many supporting
instructions.
There are 5 loops for every block which waste so
much time on braches.
You need to reorganize the data to use SIMD

9
Access patterns
10
Access patterns

The addressing sequences are complex and big part
which need lots of supporting instructions to
generate them.
Using general-purpose instruction sets to
generate multiple addressing sequences is not
very efficient.

11
The overhead instructions