General Optimization Issues

About This Presentation

Title:

General Optimization Issues

Description:

General Optimization Issues M. Smith – PowerPoint PPT presentation

Number of Views:158

Avg rating:3.0/5.0

Slides: 30

Provided by: Micha878

Category:

more less

Transcript and Presenter's Notes

Title: General Optimization Issues

1
General Optimization Issues

M. Smith

2
To be tackled today

Most optimized TigerSHARC instruction
Integer and float
Systematic optimization procedure
SISD and SIMD modes
Exercises

3
Most optimized SIMD Floating point(32-bit)TigerSH
ARC instruction

xR30 CB Qj0 4 yR30 CB Qk0 4
xyFR4 R5 R6 xyFR7 R8 R9, FR10 R8 -
R9
xR30 CB Qj0 4 / Fetches 4 values on J
BUS into x compute registers XR3, XR2,
XR1, XR0 Increments J register and
adjusts for circular buffer
operation /
yR30 CB Qk0 4 / Fetches 4 values on J
BUS into x compute registers XR3, XR2,
XR1, XR0 Increments J register and
adjusts for circular buffer
operation /
xyFR4 R5 R6 / Two multiplications XFR5
XFR6 and YFR5 YFR6 /
xyFR7 R8 R9, FR10 R8 - R9 / Two
additions XFR8 XFR9 and YFR8 YFR9 AND Two
subtractions XFR8 - XFR9 and YFR8 - YFR9 /
/ Same register must be used either side
of and operators /

4
Most optimized SIMD Integer (short)(16-bit)TigerS
HARC instruction

xR30 CB Qj0 4 yR30 CB Qk0 4
R76 R54 R32 xySR98 R76R10,SR1110
R76-R10
xR30 CB Qj0 4 / Fetches 4 values on J
BUS into x compute registers XR3, XR2,
XR1, XR0 Increments J register and
adjusts for circular buffer
operation /
yR30 CB Qk0 4 / Fetches 4 values on J
BUS into x compute registers XR3, XR2,
XR1, XR0 Increments J register and
adjusts for circular buffer
operation /
xyR76 R54 R32 / Eight multiplications
XR5.H XR3.H, and XR5.L XR3.L, XR4.H
XR2.H, XR4.L XR3.L ditto YR /
xySR98 R76 R10, R1110 R76 R10
/ Eight additions ???????
AND Eight subtractions
????????????????? /

5
ExerciseWrite out the 16 operations performed

xySR98 R76 R10, R1110 R76 R10
/ Eight additions ???????
AND Eight subtractions
????????????????? /
Now do a sideways add on xySR98 and get a value

6
Steps to optimize

Get the algorithm to work in C
Determine how much time is available
If Timing already okay quit
Determine maximum number of each type of
operation (add, subtract, multiple, memory
fetches)
Divide the calculated maximum by the number of
available resources for that type of operation
The largest division result is the in theory
number of cycles needed for the algorithm
If that minimum time is more than 100 of the
time available find a new algorithm
If that minimum time is less than 40 of the time
available perhaps you can optimize the code to
meet the speed requirements

7
Code optimization 32 bit integersor 32-bit
floats
2 SIZE additions 2 SIZE Memory fetches If
done correctly Can do 2 additions AND 2 memory
fetches each cycle Therefore optimum isSIZE
cycles IFF can find all optimizations
8
Code optimization 32 bit integersor 32-bit
floats
2 SIZE additions 2 SIZE Memory fetches Left
fetched on J-bus And done in X-compute Right
fetched on K-bus And done in Y-compute
9
16-bit integers (short int) might be okay in some
circumstances
2 SIZE additions 2 SIZE Memory fetches If
done correctly Can do 8 short additions AND 32
short memory fetches each cycle Therefore
optimum isSIZE / 4 cycles IFF can find all
optimizations
10
FIR optimization
SIZE additions SIZE multiplications SIZE 2
memory fetches 2 additions, 2 multiplications
and 8 fetches per cycles Should be able to do it
in SIZE / 2 cycles
11
FIR optimization
SIZE additions SIZE multiplications SIZE 2
memory fetches Fetch 2 values along J-bus into
XA and YA compute Fetch 2 coefficients along
K-bus into XB and YB compute
12
Need a systematic approach to handling the
optimization of code

Get the C code to work
Rewrite code in simplest format one operation
per line
Recommend rewrite code using register names
Unwrap the loop start with twice
Rewrite the second part of the loop using
different register names avoids setting up
unexpected dependencies
Overlap the first and second parts of loops
Rearrange start-up and ending code

13
STAGE 1Get the C code to work
14
Need a systematic approach to handling the
optimization of code

Get the C code to work
Rewrite code in simplest format one operation
per line
Recommend rewrite code using register names
Unwrap the loop start with twice
Rewrite the second part of the loop using
different register names avoids setting up
unexpected dependencies
Overlap the first and second parts of loops
Rearrange start-up and ending code

15
Stage 2 Rewrite in simplest format
Note naming convention Single operation per
line Note other changes
16
Need a systematic approach to handling the
optimization of code

Get the C code to work
Rewrite code in simplest format one operation
per line
Recommend rewrite code using register names
Unwrap the loop start with twice
Rewrite the second part of the loop using
different register names avoids setting up
unexpected dependencies
Overlap the first and second parts of loops
Rearrange start-up and ending code

17
Step 3 -- Unwrap the loop
Again Note naming convention
18
Need a systematic approach to handling the
optimization of code

Get the C code to work
Rewrite code in simplest format one operation
per line
Recommend rewrite code using register names
Unwrap the loop start with twice
Rewrite the second part of the loop using
different register names avoids setting up
unexpected dependencies
Overlap the first and second parts of loops
Rearrange start-up and ending code

19
Step 4Overlap the first and second parts of
loops
Note The C code goes no faster, but using
this format for translating into parallel
assembly code will Step 1 -- 4 N Step 3 8
(N / 2) 2 Step 4 6 (N / 2) 2
20
Need a systematic approach to handling the
optimization of code

Get the C code to work
Rewrite code in simplest format one operation
per line
Recommend rewrite code using register names
Unwrap the loop start with twice
Rewrite the second part of the loop using
different register names avoids setting up
unexpected dependencies
Overlap the first and second parts of loops
Rearrange start-up and ending code

21
Step 5A - Rearrange start-up and ending code
Software Pipeline Move first read outside Need
to add extra read at the end of the
loop Timing 2 (N/2 1) 6 Need to adjust
loop start (Is it done correctly? Are we
one-out) CAUTION NEED TO FIX
22
Step 5B - Rearrange start-up and ending code
Can now parallel additional adds and memory
fetches Note loop still in error
23
Exercise -- Get the loop control correct
24
Exercise 1 -- Get the loop control correct
BUFFER_SIZE 1 BUFFER_SIZE 2 BUFFER_SIZE
4 BUFFER_SIZE 5 BUFFER_SIZE 8 BUFFER_SIZE
128
25
Exercise 2 -- Rewrite the code when it is known
that BUFFER_SIZE 127
BUFFER_SIZE 1 N 2 N 4 N 5 N 8 N 128
26
Code to this point is SISD parallel optimization

SISD single instruction single data
Using X_compute block and J memory bus
Next stage SIMD single instruction multiple
data
Using X_compute block and J memory bus for left
Using Y_compute block and K memory bus for right
Will need similar but different code when you are
doing FIR in Lab. 3

27
Exercise 3 -- BUFFER_SIZE 128Rewrite so that
X and Y ops done together
BUFFER_SIZE 1 N 2 N 4 N 5 N 8 N 128
28
Exercise 4 -- BUFFER_SIZE 128Rewrite so that
expect no data dependency stalls
BUFFER_SIZE 1 N 2 N 4 N 5 N 8 N 128
29
To be tackled today

Most optimized TigerSHARC instruction
Integer and float
Systematic optimization procedure
SISD and SIMD modes
Exercises

Write a Comment

User Comments (0)

About PowerShow.com

General Optimization Issues - PowerPoint PPT Presentation

General Optimization Issues

General Optimization Issues M. Smith – PowerPoint PPT presentation