SuperwordLevel Parallelism in the Presence of Control Flow - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

SuperwordLevel Parallelism in the Presence of Control Flow

Description:

... the predicate covering predecessors for each predicated instruction. ... Predecessors ... instructions, Predicate covering predecessors of an instruction ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 27
Provided by: jaewoo
Category:

less

Transcript and Presenter's Notes

Title: SuperwordLevel Parallelism in the Presence of Control Flow


1
Superword-Level Parallelism in the Presence of
Control Flow
  • Jaewook Shin
  • Mary Hall
  • Jacqueline Chame

CGO05
March 22 2005
2
Multimedia Extension Architectures
  • Multimedia applications are becoming increasingly
    important.
  • Most microprocessors have multimedia extensions.
  • SIMD parallelism
  • Variable-sized data fields

3
Superword-Level Parallelism (SLP)
  • Fine grain SIMD parallelism in aggregate data
    objects larger than a machine word
  • Most compilers for multimedia extensions are
    based on conventional vectorization techniques.

4
SLP Compiler (Larsen Amarasinghe)
for (i0 ilt16 i) ai bi ci
Unroll by 4
for (i0 ilt16 i4) ai0 bi0
ci0 ai1 bi1 ci1 ai2
bi2 ci2 ai3 bi3 ci3
Pack isomorphic statements
for (i0 ilt16 i4) aii3 bii3
cii3
5
Control Flow and the SLP Compiler
for (i0 ilt16 i) if (ai ! 0) bi
Only parallelizes within a basic block !
6
Our Approach
for (i0 ilt16 i) if (ai ! 0) bi
for (i0 ilt16 i4) Vcond aii3 ! (0,
0, 0, 0) Vtemp bii3 (1, 1, 1, 1)
bii3 Combine bii3 and Vtemp
according to Vcond
7
Key Concepts
  • Borrow from optimizations for architectures
    supporting predicated execution
  • Derive a large basic block of predicated
    instructions
  • SELECT operations merge data values for different
    control flow paths
  • Restore control flow

if-conversion
parallelize
remove superword predicates(SELECT)
remove scalar predicates (unpredicate)
8
If-Conversion
if (a ! 0) b b 1
cond a ! 0 pT, pF pset(cond)
b b 1 ltpTgt
9
SELECT instruction
dst
src1
src2
predicate
3 2 , 3 , 1 2 2 , 3 , 0 3
2 , 3 , 1 2 2 , 3 , 0
SELECT( )
Va Vb (1, 1, 1, 1) ltVpgt
Vtemp Vb (1, 1, 1, 1) Va SELECT(Va,
Vtemp, Vp)
10
Unpredicate
if (p) bredi fred bgrei fgre
bblui fblu else bredi 100 bgrei
100 bblui 100
bredi fred ltpgt bredi 100
ltpgt bgrei fgre ltpgt bgrei 100
ltpgt bblui fblu ltpgt bblui 100 ltpgt
11
Algorithm
  • If-conversion
  • Park and Schlanskers RK-algorithm
  • SELECT
  • Insert the minimum number of SELECT instructions
  • Use reaching definition based on predicate
    covering
  • Unpredicate
  • Try to reduce the number of conditional branches
  • Use predicate covering

12
Predicate Hierarchy Graph (PHG)
TRUE
  • PHG represents relationships among predicates.

T
F
T
F
pT1
pF1
pT2
pF2

pT1, pF1 pset (c1) ltTRUEgt
T
F

pT2, pF2 pset (c2) ltTRUEgt
pT3
pF3

pT3, pF3 pset (c3) ltpT1gt

13
Predicate Covering
  • A predicate p is covered by a set of predicates G
    if p true e ½ pc G such that p true.

TRUE
Q Predicate covering predecessors of I3 ?
T
F
T
F

I1 ltpF3gt I2 ltpT3gt I3 ltpT1gt
pT1
pF1
pT2
pF2
pT1

T
F
G
pT3
,pF3
,pT1
pT3
pF3
pT3
pF3
14
SELECT Algorithm
  • SELECT is not necessary for the first reaching
    definition

TRUE

d1 VaV1 ltpF3gt d2 VaV2 ltpT3gt u3 VcVa ltpT1gt
T
F
T
F
pT1
pF1
pT2
pF2

T
F
d1 VaV1 d2 VaSELECT(Va,V2,pT3) u3 VcVa
pT3
pF3
15
Predicate CFG Generator
  • Find the predicate covering predecessors for each
    predicated instruction.

if(pT1) if(pF3) I1 else I2 I3
I1
I2
I1 ltpF3gt I2 ltpT3gt I3 ltpT1gt
I3
(a) predicated scalar code
(b) CFG
(c) code generated
16
Unpredicate Algorithm
  • Schedule each instruction within an existing
    basic block where it is safe.
  • If no such basic block exists, use predicate CFG
    generator.

I1
p
p
I2
I1 I4
I1 ltpgt I2 ltpgt I3 ltTRUEgt I4 ltpgt I5 ltpgt
I2 I5
p
p
I3
TRUE
I3
TRUE
I4
I5
p
p
(a) predicated scalar code
(b) predicate CFG generator
(c) unpredicate
17
Our Implementation
superword level locality
alignment analysis
original C code
unroll
remove superword predicates(SELECT)
parallelize
if-conversion
output C code
superword replacement
remove scalar predicates (unpredicate)
our previous work
MIT SLP compiler
new for this paper
18
Applications
  • Kernels
  • Chroma Chroma keying of two images
  • Sobel Sobel edge detection
  • TM Template Matching
  • Max Max value search
  • transitive Shortest path search
  • Functions from UCLA MediaBench
  • MPEG2-dist1 dist1 of MPEG2 encoder
  • EPIC-unquantize unquantize_image of unepic
  • GSM-Calculation Calculation_of_the_LTP_parameter
    s of gsmencode
  • Two data set sizes
  • Large Representative data set
  • Small Isolates parallelization effects

19
Experimental Flow
SLP compiler
Control flow extension
Original C code
Baseline
SLP
SLP-CF
GCC AltiVec extended
PowerPC G4
20
Overall Improvements large data
21
Overall Improvements small data
22
Related Research
  • Vectorization techniques for conditionals
  • Sreraman and Govindarajan(IJPP00)
  • Bik et. al.(IJPP02)
  • Architectures with SELECT
  • Multimedia Extension Architectures AltiVec
  • Processing-In-Memory DIVA
  • Vector machines Smith et. al.(ISCA00)
  • Phi-predication Chuang et. al.(CGO03)
  • Requires scalar SELECT
  • Predicate CFG generator Mahlke(96)

23
Conclusion
  • Compiler system to exploit SLP in the presence of
    control flow
  • Compiler algorithms to
  • Minimum SELECTs
  • Restore efficient control flow
  • In an experiment with 5 kernels and 3 benchmarks,
    the performance improved by 1.97X 15.07X.

24
Predicate Covering Predecessors
  • A predicate p is covered by a set of predicates G
    if p true e ½ pc G such that p true.
  • Given a sequence of predicated instructions,
    Predicate covering predecessors of an instruction
    I guarded by a predicate p is
  • The first set S of textually preceding
    instructions of I
  • The predicate set G of S cover the predicate p of
    I
  • For each instruction Ic S and its predicate pc
    G,
  • p is not covered by any subset of predicates
    guarding instructions between I and I
  • P and p are not mutually exclusive

25
Is PHG inaccurate ?
  • PHG is said to be inaccurate and limited in
    representation.
  • We use a subset of predicate analyses based on
    PHG.
  • Predicate deposit type only Conditional
  • No false positive for the following properties
  • Mutually exclusive
  • Predicate covering
  • Leads to possibly conservative but always correct
    code
  • Predicate analyses can be replaced with any other
    predicate analysis system

26
Reverse If-Conversion (RIC) ?
  • RIC assumes no operations are inserted nor
    removed during RIC.
  • ? Superword instructions are inserted and scalar
    instructions are removed in our case.
  • RIC uses
  • predicate define operations to preserve control
    dependences
  • implicit predicate merge operations to preserve
    control anti-dependence
  • ? The original CFG can be partially or totally
    removed in our case.
Write a Comment
User Comments (0)
About PowerShow.com