SMT for Dedicated Threads - PowerPoint PPT Presentation

About This Presentation
Title:

SMT for Dedicated Threads

Description:

There are cycles in which no. instructions can issue because. they must wait for prior ... spawn work to multiple threads, some threads are register / CPU intensive. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 22
Provided by: jga7
Category:

less

Transcript and Presenter's Notes

Title: SMT for Dedicated Threads


1
SMT for Dedicated Threads
  • Jim Gast, jgast_at_cs.wisc.edu,
  • Laura Spencer, ljspence_at_cs.wisc.edu
  • Brian Fields, fields_at_cs.wisc.edu

2
Agenda
  • Simultaneous Multithreading background
  • Motivation for Priority
  • Possible Priority Implementations
  • Priority Setting Mechanisms
  • Methodology/Results
  • Conclusions

3
Superscalar
SMT Background
1
2
5
Unused
3
7
6
Thread 1
Time (Processor Cycles)
8
9
4
Horizontal Waste
There are cycles in which no instructions can
issue because they must wait for
prior instructions to finish
12
11
10
14
13
15
Functional Units
4
Fine-grained multithreading
SMT Background
1
2
5
Unused
3
7
6
Thread 1
Time (Processor Cycles)
8
9
4
Thread 2
Thread 3
Fills horizontal waste, but still has vertical
waste if thread cannot use all FUs in parallel
12
11
10
17
14
13
Functional Units
5
Simultaneous MultiThreading
SMT Background
1
2
5
Unused
3
7
6
Thread 1
Time (Processor Cycles)
8
4
Thread 2
9
Thread 3
SMT allows several threads to issue instructions
in the same cycle
10
12
11
13
Functional Units
6
Benefits
SMT Background
  • Eliminate overheads in interthread communication
  • Minimize interrupts
  • Minimize context switches
  • Minimize the cost of branch mis-predicts
  • Minimize the cost of pipeline flushes

7
Agenda
  • Simultaneous Multithreading background
  • Motivation for Priority
  • Possible Priority Implementations
  • Priority Setting Mechanisms
  • Methodology/Results
  • Conclusions

8
Motivation for Priority
Example without priority
Thread 1 (long)
Thread 2 (short)
10 Add R10, R11, R12 11 Add R13, R14, R15 12 Lock
X
1 Add R1, R2, R3 2 Add R4, R5, R6 3 Add R7, R8,
R9 4 Add R9, R10, R11 5 Unlock X
4-way issue, no priority
Cycle 1 Cycle 2 Cycle 3
1
2
10
11
3
12
12
4
5
3 cycles to reach sync. point
9
Motivation for Priority
Example with priority
Thread 1 (long)
Thread 2 (short)
10 Add R10, R11, R12 11 Add R13, R14, R15 12 Lock
X
1 Add R1, R2, R3 2 Add R4, R5, R6 3 Add R7, R8,
R9 4 Add R9, R10, R11 5 Unlock X
4-way issue, T1 has priority
Cycle 1 Cycle 2 Cycle 3
1
2
3
10
11
4
12
5
2 cycles to reach sync. point
10
Do it on a larger scale???
Motivation for Priority
Time
Unlock
Lock
Lets give it a shot
11
Agenda
  • Simultaneous Multithreading background
  • Motivation for Priority
  • Possible Priority Implementations
  • Priority Setting Mechanisms
  • Methodology/Results
  • Conclusions

12
Where implement Priority?
Priority Implementation
  • Fetch unit?
  • ICOUNT 2.8 Tullsen, et al.
  • Fetch 8 Inst/cycle from up to 2 threads
  • Balance of Inst from each active thread
  • Issue unit?
  • Typically issue oldest first
  • Very difficult implementation

13
Priority Implementation
Proposed Prioritization Mechanisms
  • Fetch based on priority
  • Use ICOUNT for desired imbalance
  • Strict priority
  • Issue based on priority
  • Countdown to rendezvous
  • Issue first from thread farthest from barrier
  • Use empty issue slots for other threads

14
Agenda
  • Simultaneous Multithreading background
  • Motivation for Priority
  • Possible Priority Implementations
  • Priority Setting Mechanisms
  • Methodology/Results
  • Conclusions

15
Who chooses thread priority?
Priority Setting Mechanisms
  • Programmer does it
  • Has knowledge of program structure
  • But no dynamic information
  • Operating System
  • Limited dynamic information
  • Hardware
  • Has dynamic information
  • But no knowledge of program structure

16
Agenda
  • Simultaneous Multithreading background
  • Motivation for Priority
  • Possible Priority Implementations
  • Priority Setting Mechanisms
  • Methodology/Results
  • Conclusions

17
SMTSIM Tullsen at UCSD
Methodology/Results
  • Subset of the Alpha instruction set
  • Synchronization primitives
  • smt_create, smt_lock, smt_unlock, smt_terminate
  • Implements quiesce
  • 8 threads, 8-way issue, 32-entry IF queues
  • 32kB L1 DIcaches, 1MB shared L2, 4MB shared L3
  • We adjusted synch. interface, fixed several bugs,
    and added smt_priority instruction.

18
Benchmark
Methodology/Results
  • 7 threads simulating things a video conference
    server does
  • decrypt , encrypt, generate CRC, check CRC,
    replicate packets
  • some threads are DCache hogs, some threads spawn
    work to multiple threads, some threads are
    register / CPU intensive.

19
Experimental Results
Methodology/Results
20
Agenda
  • Simultaneous Multithreading background
  • Motivation for Priority
  • Possible Priority Implementations
  • Priority Setting Mechanisms
  • Methodology/Results
  • Conclusions

21
Conclusions
  • Using priority for performance gain
  • Use shared resources more effectively
  • Likely need program structure information
  • Be careful with metrics
  • IPC no good if it counts spin locks
  • Interaction between threads is complex
Write a Comment
User Comments (0)
About PowerShow.com