Using Speculation to Simplify Multiprocessor Design - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Using Speculation to Simplify Multiprocessor Design

Description:

Mark D. Hill3, David A. Wood3. 1Dept. of Electrical & Computer Engineering, ... Difficult to design for every ... Trapping to software for infrequent ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 24
Provided by: daniel78
Category:

less

Transcript and Presenter's Notes

Title: Using Speculation to Simplify Multiprocessor Design


1
Using Speculation to Simplify Multiprocessor
Design
  • Daniel J. Sorin1, Milo M. K. Martin2, Mark D.
    Hill3, David A. Wood3
  • 1Dept. of Electrical Computer Engineering, Duke
    University
  • 2Dept. of Computer Information Science, Univ.
    of Pennsylvania
  • 3Computer Sciences Dept., University of
    Wisconsin-Madison

2
My Talk in One Slide
  • Shared memory multiprocessors are complicated
  • Difficult to design for every possible corner
    case
  • Proposal Use speculation to target the common
    case
  • Speculate that corner cases wont happen
  • Detect if they do occur and recover system
  • Ensure forward progress
  • Case studies
  • Simplify cache coherence protocols
  • Simplify the interconnection network

3
Speculation for Simplicity
  • Why we want to avoid complexity
  • Time and money for design and verification
  • Design for the common case
  • But we have to make ALL cases work correctly
  • Examples of this philosophy in uniprocessors
  • Trapping to software for infrequent/obsolescent
    instructions
  • Pentium4 recovers from edge case scheduler
    deadlocks
  • But this idea hadnt been used for
    multiprocessors
  • Key we now have efficient multiprocessor recovery

4
Framework for Speculation
  • Four keys to design simplification with
    speculation
  • Ensure that mis-speculations are rare
  • Detect all mis-speculations
  • Recover from mis-speculations
  • Ensure forward progress even for worst-case

5
SafetyNet Checkpoint/Recovery
  • We use SafetyNet ISCA 2002 for system recovery
  • All-hardware checkpoint/recovery for shared
    memory multiprocessors
  • Periodically, takes logical checkpoints of system
  • Including caches, coherence state, memory,
    directory state
  • Implements checkpointing with incremental logging
  • Consistent checkpoints using logical time
    coordination
  • Can recover 100,000 cycles
  • Negligible performance impact
  • Incremental logging performed off critical path
  • Small log buffers (512 KB) at caches memories

6
The Need for Multiprocessor Recovery
  • Assumption multiprocessors will have system-wide
    recovery mechanisms for purposes of availability
  • As fault rates keep increasing, recovery is
    crucial
  • Will be all-hardware (like SafetyNet) for
    performance
  • But many alternative designs are possible
  • We leverage this recovery mechanism for
    recovering from mis-speculations

7
Outline
  • A Framework for Speculation
  • Simplifying Cache Coherence Protocols
  • Simplifying the Interconnection Network
  • Evaluation
  • Conclusions

8
Directory Protocol Complexity
  • We want adaptive routing in interconnection
    network
  • Better performance and availability
  • But adaptive routing precludes point-to-point
    ordering
  • So what?
  • Point-to-point ordering simplifies protocol
    design
  • Eliminates several potential corner case races

9
Race Case in Directory Protocol
  • Example race if no point-to-point ordering in
    network

Dir
Forwarded RequestReadWrite
RequestReadWrite
Writeback
P1
P2
RequestReadWrite arrives first at Dir, gets
forwarded to P1
10
Race Case in Directory Protocol
Dir
Forwarded RequestReadWrite
RequestReadWrite
Writeback
Writeback Ack
P1
P2
Forwarded RequestReadWrite arrives after
Writeback Ack
11
Race Case in Directory Protocol
  • Problem P1 sees Forwarded Request in state
    Invalid

Dir
Forwarded RequestReadWrite
RequestReadWrite
Writeback
Writeback Ack
P1
P2
Not possible if point-to-point order in
interconnection network
12
Simplifying a Directory Protocol
  • Speculate that adaptive network provides ordering
  • Why is mis-speculation rare?
  • Not many re-orderings
  • Most re-orderings dont matter!
  • How do we detect all mis-speculations?
  • If we get a Forwarded RequestReadWrite in state
    Invalid
  • How do we recover?
  • SafetyNet
  • How do we ensure forward progress?
  • Slow-start operation for a while after recovery
  • Guarantees that this race cant keep recurring

13
Simplifying a Snooping Coherence Protocol
  • During design, we missed a corner case

Request ReadWrite
Request ReadWrite
Writeback
State M
State trans1
State trans2
???
  • Solution its rare, treat it as mis-speculation
  • Detect by seeing RequestReadWrite in state trans2
  • Recovery with SafetyNet
  • Forward progress with slow-start after recovery

14
Outline
  • A Framework for Speculation
  • Simplifying Cache Coherence Protocols
  • Simplifying the Interconnection Network
  • Deadlock
  • Avoiding deadlock
  • Evaluation
  • Conclusions

15
Two Causes of Deadlock
full of requests
Response
P1
Endpoint Deadlock
full of requests
Response
P2
switch1
Message M1
Switch Deadlock
full of messages
switch2
Message M2
full of messages
16
Avoiding Deadlock
  • Simple but wasteful solution full buffering
  • But its rare that we ever need full buffering
  • More efficient solution virtual channels
    (networks)
  • For endpoint deadlock
  • Need a virtual network per type of message
  • For switch deadlock
  • Need some number of virtual channels per virtual
    network
  • Depends on network topology and routing scheme
  • A major source of design complexity

17
Simplifying Deadlock Avoidance
  • Speculate that deadlock wont occur, despite
    using less than full buffering and no virtual
    channels
  • Why is mis-speculation rare?
  • Can usually avoid deadlock with reasonable
    buffering
  • How do we detect all mis-speculations?
  • Timeout mechanism for cache coherence
    transactions
  • How do we recover?
  • SafetyNet
  • How do we ensure forward progress?
  • Slow-start operation for a while after recovery
  • Guarantees that deadlock cant keep recurring

18
Outline
  • A Framework for Speculation
  • Simplifying Cache Coherence Protocols
  • Simplifying the Interconnection Network
  • Evaluation
  • Goals
  • Methodology
  • Results
  • Conclusions

19
Goals
  • Discover the point at which mis-speculation
    recoveries impact performance
  • Determines whether our simplified snooping
    protocol and our simplified interconnection
    network are viable
  • Determine whether our simplified directory
    protocol can usefully speculate on point-to-point
    ordering

20
Methodology
  • Full-system simulation
  • Simics provides full-system functionality
  • We added detailed timing model for memory system
  • Workloads
  • Online transaction processing (OLTP) with DB2
  • SPECjbb2000 java middleware
  • Apache static web serving
  • Slashcode dynamic web serving
  • Barnes-Hut scientific simulation

21
How Rare Must Mis-speculation Be?
We can tolerate high mis-speculation rates
these rates are much higher than what our
simplified designs incur
22
Adaptive Routing with Speculative Ordering
Adaptive routing can provide better performance
by routing around congestion, even with
mis-speculations
23
Conclusions
  • Simplify multiprocessor design with speculation
  • Treat corner cases as mis-speculations recover
    from them
  • Must be able to ensure that
  • Mis-speculations are sufficiently rare
  • Can detect all mis-speculations
  • Can recover from mis-speculations
  • Can provide forward progress in all cases
  • Showed how to simplify
  • Cache coherence protocols
  • Interconnection network deadlock avoidance
  • Applicable to other complicated designs
Write a Comment
User Comments (0)
About PowerShow.com