Generating a software loop with memory accesses - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Generating a software loop with memory accesses

Description:

Generating a software loop with memory accesses TigerSHARC assembly syntax – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 29
Provided by: Micha392
Category:

less

Transcript and Presenter's Notes

Title: Generating a software loop with memory accesses


1
Generating a software loop with memory accesses
  • TigerSHARC assembly syntax

2
Concepts
  • Learning just enough TigerSHARC assembly code to
    make a software loop work
  • Comparing the timings for rectification of
    integer and floating point arrays, using
  • debug C code,
  • Release C code
  • Our FIRST_ASM code
  • Looking in MIXED mode at the code generated by
    the compiler

3
Passing integer rectify
4
Add the ASM testsWant link to fail to find
mangled name
Name mangled function name
5
More detailed look at the code
As with 68K needs a .section But name and format
different
As with 68K need .align statement Is the 4 in
bytes (8 bits)or words (32 bits)
As with 68K need .globalto tell other code that
this function exists
Single semi-colons Double semi-colons
Start function label End function label
Label format similar to 68K Needs leading
underscore and final colon
6
Using J8 for returned int value
Now passing this test by accidentShould be
conditionally passing back NULL
7
Parameter passing
  • Spaces for first four parameters present on the
    stack (as with 68K)
  • But the first four parameters are passed in
    registers (J4, J5, J6 and J7 most of the time)
    (as with MIPS)
  • The parameters passed in registers are often
    stored into the spaces on the stack (like the
    MIPS) when assembly code functions call assembly
    code functions
  • J4, J5, J6 and J7 are volatile registers

8
Coding convention
  • // int HalfWaveRectifyRelease(int initial_array
    ,
  • //
    int final_array , int N)
  • define initial_pt_inpar1 J4
  • define final_pt_inpar2 J5
  • define M_J6_inpar3 J6
  • define return_pt_J8 J8

9
ELSE is a KEYWORD
Missing means allthese instructionsare
joined into 1-lineof more than 4 instructions
Note END_IF not definedand not yet recognized
asan error
10
Personally, because of name mangling issues, I
cut-and-paste function name into labels
Two issues Jumps can be predicted to happen
(default)Quad stuff issue
11
The code was not exactly what we designed (C
equivalent) refactor and retest after the
refactoring
NEXT STEP
12
For loop structure Use 68K style of looping
jumps
13
For loop structure Use 68K style of
looping tests and jumps
14
Accessing memory
  • Basic mode
  • Special register J31 acts as zero when used in
    additions
  • Pt_J5 is a pointer register into an array
  • Read_J1 is being used as a data register
  • J registers like MIPS registers (used as pointer
    and data).NOT like 68K registers either data
    or address but not both
  • Read_J1 Pt_J5 read value from memory
    location pointed to by J5 -- Compare to 68K
    MOVE.L (A5), D1
  • Read_J1 Pt_J5 8 read value from memory
    location pointed to by the value (J5 8) --
    Compare to 68K MOVE.L 8(A5), D1
    PREMODIFY address used J5 8, no change in J5
  • Read_J1 Pt_J5 J31 read value from memory
    location pointed to by J5 but read somewhere
    that this CAN be faster than just Read_J1
    Pt_J5 -- NEED TO CONFIRM

15
Accessing memory step 2
  • Basic mode
  • Pt_J5 is a pointer register into an array
  • Offset_J4 is used as an offset
  • Read_J1 is being used as a data register
  • Read_J1 Pt_J5 Offset_J4 read value from
    memory location pointed to by (J5 J4)
  • PRE-MODIFY address used J5 J4, no change in
    J5
  • Compare to 68K MOVE.L (A5, D4), D1
  • Read_J1 Pt_J5 Offset_J4 read value from
    memory location pointed to by J5, and then
    perform add
  • POST-MODIFY address used J5, then perform J5
    J5 J4
  • Compare to 68K MOVE.L (A5), D1
    ADD.L A4, A5 but as single
    instruction

16
Many other addressing modes
  • Normal memory accesses
  • Merged memory accesses
  • Broadcast memory accesses
  • Single register accesses
  • Dual register accesses
  • Quad register accesses
  • Cross-over accesses
  • Access of COMPLEX numbers

17
For loop structure Use 68K style of looping
QUAD ERRORISSUEAGAIN
18
Write the float-asm
  • Integer 0 has bit pattern 0x0000 0000
  • Float 0.0 has bit pattern 0x0000 0000
  • Integer has format b S??? ???? ???? ???? ? ???
    ???? ???? ????
  • Float has format b S??? ???? ???? ???? ? ???
    ???? ???? ????
  • Float algorithm - if S 1 (negative) set to
    zero
  • Otherwise leave unchanged same as integer
    algorithm
  • Just re-use integer algorithm with a change of
    name

EXPONENT
19
Float ASM test
20
Do the timing tests
21
Weird results
  • Variation of about 6 cycles in testing
  • Our first ASM is faster than debug and slower
    than release that was expected
  • Our integer code was slower than our float code
    that was unexpected since the same code
  • Can we optimize an improve the timing?

DEBUG RELEASE FIRST_ASM
INTEGER 426416 124 118 316 320
FLOAT 462 458 210216 224 222
22
Integer release code identify new instructions
23
Float release identify new instructions
24
Exercise 1 needed for Lab. 1
  • FIR filter operation -- data and
    filter-coefficients are both integer arrays in
    C

25
Exercise 1 needed for Lab. 1
  • FIR filter operation -- data and
    filter-coefficients are both integer arrays in
    ASM

26
Insert C code
27
Insert assembler code version
28
Concepts
  • Learning just enough TigerSHARC assembly code to
    make a software loop work
  • Comparing the timings for rectification of
    integer and floating point arrays, using
  • debug C code,
  • Release C code
  • Our FIRST_ASM code
  • Looking in MIXED mode at the code generated by
    the compiler
Write a Comment
User Comments (0)
About PowerShow.com