Processor Comparison - PowerPoint PPT Presentation

About This Presentation
Title:

Processor Comparison

Description:

Address registers. I0 I7 (dm bus) M0 M7. I8 I15 (pm bus) M8 M15. 16 CB operations ... Two instructions after the branch are ALWAYS executed. ... – PowerPoint PPT presentation

Number of Views:362
Avg rating:3.0/5.0
Slides: 17
Provided by: mrsm3
Category:

less

Transcript and Presenter's Notes

Title: Processor Comparison


1
Processor Comparison
  • This presentation will probably involve audience
    discussion, which will create action items. Use
    PowerPoint to keep track of these action items
    during your presentation
  • In Slide Show, click on the right mouse button
  • Select Meeting Minder
  • Select the Action Items tab
  • Type in action items as they come up
  • Click OK to dismiss this box
  • This will automatically create an Action Item
    slide at the end of your presentation with your
    points entered.
  • M. R. Smithsmithmr_at_ucalgary.ca

2
  • TigerSHARC ProcessorArchitecture

3
SHARC
4
Comparison -- Data registers
  • TigerSHARC ADSP-201
  • Data registers
  • 32 XR (float or integer)
  • 32 YR (float or integer)
  • Always in MIMD mode
  • SHARC ADSP-21XXX
  • Data registers
  • 16 R (float or integer)
  • Can be switched to SIMD
  • 16 S (float or integer)

5
Comparison Address registers
  • TigerSHARC ADSP-201
  • Address registers
  • 32 J (J-Bus)
  • 32 K (K-Bus)
  • 4 CB operations
  • Instruction lineR6 J1 J4 R8 K1
    K4
  • SHARC ADSP-21XXX
  • Address registers
  • I0 I7 (dm bus) M0 M7
  • I8 I15 (pm bus) M8 M15
  • 16 CB operations
  • Instruction
  • ActivateSISDR6 dm(I1, M4),
  • R8 pm(I9, M13)

6
Comparison Compute Pipeline
  • TigerSHARC
  • 10 stage pipeline
  • 4 instruction, 4 J-IALU 2 Compute ALU stages
  • Consequences
  • R2 J2 J12
  • Stall R3 R2 R1
  • Stall
  • R5 R4 R3
  • SHARC
  • 3 stage pipeline
  • 1 instruction, 1 memory 1 compute ALU
  • Consequences
  • R2 dm(I2, M2)
  • NO stallR3 R2 R1
  • NO stall
  • R5 R4 R3
  • Has other consequences too

7
Comparison Memory access
  • TigerSHARC
  • 6 memory blocks
  • Blocks accessed by 3 busses
  • J Bus
  • K Bus
  • Instruction Bus
  • Avoids data instruction fetch clashes
  • Avoids data data fetch clashes
  • SHARC
  • 2 memory block
  • Blocks accessed by 2 busses
  • Dm bus
  • Pm bus (instruction)
  • Avoids data instruction fetch clashes
  • How does this architecture avoid data data
    fetch classes?

8
SHARC -- 3 memory block
  • Most instructions use 1 instruction fetch plus
    only 0 or 1 data fetch 2 busses sufficient
  • Have a separate (small) instruction cache that
    fills with the instructions that need 1
    instruction 2 data fetches

9
Dual data accesses on SHARC
  • R2 dm(I2, M2), R3 pm(I9, M9)
  • R4 R2 R3 Instruction fetch clashes
    with data
    fetch of previous
    instruction put
    instruction into instruction
    cache
  • R2 dm(I2, M2), R3 pm(I9, M9)
  • R4 R2 R3 Instruction fetch clashes
    with data
    fetch of previous
    instruction put
    instruction into instruction
    cache

10
Dual data accesses on SHARC
  • Start loop
  • R2 dm(I2, M2), R3 pm(I9, M9)
  • R4 R2 R3 Instruction fetch clashes
    with data
    fetch of previous
    instruction first time
    round the loop put
    instruction into instruction
    cache
  • End_loop BUT second time round the
    loop the instruction is in the
    instruction cache so that
    3 busses are available
    and there is no stall
  • Consequence If loop is large e.g. viteribi
    algorithm then may have many dual data accesses
    this means new dual access instruction placed
    into instruction cache causes old one to be
    thrown out. Next time around the loop, the old
    instruction is put back into the cache, and the
    new one is thrown out cache thrash occurs and
    no speed savings are gained

11
Hardware loops
  • TigerSHARC
  • 2 hardware loops available
  • SHARC
  • 6 hardware loops available
  • Not as useful as it sounds
  • Only the inner loop is executed often, so that
    is the only one where loop overhead is really
    important
  • Cant have loops ending on same instruction so
    need to add nops

12
Jumps and pipeline
  • TigerSHARC
  • 10 stage pipeline
  • Non predicted branch causes many instruction
    fetches and execution stages to be thrown away
  • Partially solved by having ability to chose
    between predicted and non-predicted branches.
  • Heavy penalty when the other choice is taken
  • Cant use delayed branch concept as instructions
    are always discarded
  • Most instructions are made conditional
  • SHARC
  • 3 stage pipeline
  • Non predicted branch causes 2 instruction fetches
    1 execution to be thrown away
  • Use delayed branch concept
  • Two instructions after the branch are ALWAYS
    executed.
  • If you cant find useful instructions to put in
    delay slots then put NOPs

13
BDTI good source of info www.bdti.com/bdtimark/c
hip_float_scores.pdf
14
BDTI good source of info www.bdti.com/bdtimark/
15
BDTI good source of info
16
BDTI good source of info
Write a Comment
User Comments (0)
About PowerShow.com