General%20Optimization%20Issues

About This Presentation

Title:

General%20Optimization%20Issues

Description:

General Optimization Issues Solving the exercise issues – PowerPoint PPT presentation

Number of Views:115

Avg rating:3.0/5.0

Slides: 26

Provided by: Micha879

Category:

more less

Transcript and Presenter's Notes

Title: General%20Optimization%20Issues

1
General Optimization Issues

Solving the exercise issues

2
To be tackled today

Exercise 1
Solving the loop problem SIZE 128
Exercise 2
Solving the loop problem SIZE 127
Exercise 3
Moving from SISD to SIMD mode, SIZE 128
Exercise 4
Removing any expected stalls

3
Most optimized SIMD Floating point(32-bit)TigerSH
ARC instruction

xR30 CB Qj0 4 yR30 CB Qk0 4
xyFR4 R5 R6 xyFR7 R8 R9, FR10 R8 -
R9
xR30 CB Qj0 4 / Fetches 4 values on J
BUS into x compute registers XR3, XR2,
XR1, XR0 Increments J register and
adjusts for circular buffer
operation /
yR30 CB Qk0 4 / Fetches 4 values on J
BUS into x compute registers XR3, XR2,
XR1, XR0 Increments J register and
adjusts for circular buffer
operation /
xyFR4 R5 R6 / Two multiplications XFR5
XFR6 and YFR5 YFR6 /
xyFR7 R8 R9, FR10 R8 - R9 / Two
additions XFR8 XFR9 and YFR8 YFR9 AND Two
subtractions XFR8 - XFR9 and YFR8 - YFR9 /
/ Same register must be used either side
of and operators /

4
Steps to optimize

Get the algorithm to work in C
Determine how much time is available
If Timing already okay quit
Determine maximum number of each type of
operation (add, subtract, multiple, memory
fetches)
Divide the calculated maximum by the number of
available resources for that type of operation
The largest division result is the in theory
number of cycles needed for the algorithm
If that minimum time is more than 100 of the
time available find a new algorithm
If that minimum time is less than 40 of the time
available perhaps you can optimize the code to
meet the speed requirements

5
Code optimization 32 bit integersor 32-bit
floats
2 SIZE additions 2 SIZE Memory fetches Left
fetched on J-bus And done in X-compute Right
fetched on K-bus And done in Y-compute SIZE / 2
cycles in theory
6
STAGE 1Get the C code to work
7
Stage 2 Rewrite in simplest format
Note naming convention Single operation per
line Note other changes
8
Step 3 -- Unwrap the loop
Again Note naming convention
9
Step 4Overlap the first and second parts of
loops
Note The C code goes no faster, but using
this format for translating into parallel
assembly code will Step 1 -- 4 N Step 3 8
(N / 2) 2 Step 4 6 (N / 2) 2
10
Step 5A - Rearrange start-up and ending code
Software Pipeline Move first read outside Need
to add extra read at the end of the
loop Timing 2 (N/2 1) 6 Need to adjust
loop start (Is it done correctly? Are we
one-out) CAUTION NEED TO FIX
11
Step 5B - Rearrange start-up and ending code
Can now parallel additional adds and memory
fetches Note loop still in error
12
Exercise 1 -- Get the loop control correct
BUFFER_SIZE 1 BUFFER_SIZE 2 BUFFER_SIZE
4 BUFFER_SIZE 5 BUFFER_SIZE 8 BUFFER_SIZE
128
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Unrecognized second key error What is it? How do
you fix it?
17
(No Transcript)
18
Exercise 2 -- Rewrite the code when it is known
that BUFFER_SIZE 129
SIZE 129 But loop only handles 128 Since
129 / 2 128 / 2
19
(No Transcript)
20
(No Transcript)
21
Code to this point is SISD parallel optimization

SISD single instruction single data
Using X_compute block and J memory bus
Next stage SIMD single instruction multiple
data
Using X_compute block and J memory bus for left
Using Y_compute block and K memory bus for right
Will need similar but different code when you are
doing FIR in Lab. 3

22
Exercise 3 -- BUFFER_SIZE 128Rewrite so that
X and Y ops done together
23
(No Transcript)
24
Exercise 4 -- BUFFER_SIZE 128Rewrite so that
expect no data dependency stalls
BUFFER_SIZE 1 N 2 N 4 N 5 N 8 N 128
Leave this one for a while until we have handled
multiple memory accesses asanswer may changes
25
Tackled today