Processor Comparison - PowerPoint PPT Presentation

About This Presentation

Title:

Processor Comparison

Description:

Address registers. I0 I7 (dm bus) M0 M7. I8 I15 (pm bus) M8 M15. 16 CB operations ... Two instructions after the branch are ALWAYS executed. ... – PowerPoint PPT presentation

Number of Views:365

Avg rating:3.0/5.0

Slides: 17

Provided by: mrsm3

Category:

more less

Transcript and Presenter's Notes

Title: Processor Comparison

1
Processor Comparison

This presentation will probably involve audience
discussion, which will create action items. Use
PowerPoint to keep track of these action items
during your presentation
In Slide Show, click on the right mouse button
Select Meeting Minder
Select the Action Items tab
Type in action items as they come up
Click OK to dismiss this box
This will automatically create an Action Item
slide at the end of your presentation with your
points entered.

M. R. Smithsmithmr_at_ucalgary.ca

TigerSHARC ProcessorArchitecture

3
SHARC
4
Comparison -- Data registers

TigerSHARC ADSP-201
Data registers
32 XR (float or integer)
32 YR (float or integer)
Always in MIMD mode

SHARC ADSP-21XXX
Data registers
16 R (float or integer)
Can be switched to SIMD
16 S (float or integer)

5
Comparison Address registers

TigerSHARC ADSP-201
Address registers
32 J (J-Bus)
32 K (K-Bus)
4 CB operations
Instruction lineR6 J1 J4 R8 K1
K4

SHARC ADSP-21XXX
Address registers
I0 I7 (dm bus) M0 M7
I8 I15 (pm bus) M8 M15
16 CB operations
Instruction
ActivateSISDR6 dm(I1, M4),
R8 pm(I9, M13)

6
Comparison Compute Pipeline

TigerSHARC
10 stage pipeline
4 instruction, 4 J-IALU 2 Compute ALU stages
Consequences
R2 J2 J12
Stall R3 R2 R1
Stall
R5 R4 R3

SHARC
3 stage pipeline
1 instruction, 1 memory 1 compute ALU
Consequences
R2 dm(I2, M2)
NO stallR3 R2 R1
NO stall
R5 R4 R3
Has other consequences too

7
Comparison Memory access

TigerSHARC
6 memory blocks
Blocks accessed by 3 busses
J Bus
K Bus
Instruction Bus
Avoids data instruction fetch clashes
Avoids data data fetch clashes

SHARC
2 memory block
Blocks accessed by 2 busses
Dm bus
Pm bus (instruction)
Avoids data instruction fetch clashes
How does this architecture avoid data data
fetch classes?

8
SHARC -- 3 memory block

Most instructions use 1 instruction fetch plus
only 0 or 1 data fetch 2 busses sufficient
Have a separate (small) instruction cache that
fills with the instructions that need 1
instruction 2 data fetches

9
Dual data accesses on SHARC

R2 dm(I2, M2), R3 pm(I9, M9)
R4 R2 R3 Instruction fetch clashes
with data
fetch of previous
instruction put
instruction into instruction
cache
R2 dm(I2, M2), R3 pm(I9, M9)
R4 R2 R3 Instruction fetch clashes
with data
fetch of previous
instruction put
instruction into instruction
cache

10
Dual data accesses on SHARC

Start loop
R2 dm(I2, M2), R3 pm(I9, M9)
R4 R2 R3 Instruction fetch clashes
with data
fetch of previous
instruction first time
round the loop put
instruction into instruction
cache
End_loop BUT second time round the
loop the instruction is in the
instruction cache so that
3 busses are available
and there is no stall
Consequence If loop is large e.g. viteribi
algorithm then may have many dual data accesses
this means new dual access instruction placed
into instruction cache causes old one to be
thrown out. Next time around the loop, the old
instruction is put back into the cache, and the
new one is thrown out cache thrash occurs and
no speed savings are gained

11
Hardware loops

TigerSHARC
2 hardware loops available

SHARC
6 hardware loops available
Not as useful as it sounds
Only the inner loop is executed often, so that
is the only one where loop overhead is really
important
Cant have loops ending on same instruction so
need to add nops

12
Jumps and pipeline

TigerSHARC
10 stage pipeline
Non predicted branch causes many instruction
fetches and execution stages to be thrown away
Partially solved by having ability to chose
between predicted and non-predicted branches.
Heavy penalty when the other choice is taken
Cant use delayed branch concept as instructions
are always discarded
Most instructions are made conditional

SHARC
3 stage pipeline
Non predicted branch causes 2 instruction fetches
1 execution to be thrown away
Use delayed branch concept
Two instructions after the branch are ALWAYS
executed.
If you cant find useful instructions to put in
delay slots then put NOPs

13
BDTI good source of info www.bdti.com/bdtimark/c
hip_float_scores.pdf
14
BDTI good source of info www.bdti.com/bdtimark/
15
BDTI good source of info
16
BDTI good source of info

Write a Comment

User Comments (0)