Adaptive SingleChip Multiprocessing - PowerPoint PPT Presentation

About This Presentation
Title:

Adaptive SingleChip Multiprocessing

Description:

Many concurrent threads = long-latency memory accesses can be overlapped ... Allow multiple (HW) threads within the same execution pipeline ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 16
Provided by: dangibsont
Category:

less

Transcript and Presenter's Notes

Title: Adaptive SingleChip Multiprocessing


1
Adaptive Single-Chip Multiprocessing
  • Dan Gibson
  • degibson_at_wisc.edu
  • University of Wisconsin-Madison
  • Department of Electrical and Computer Engineering

2
Introduction
  • Moores Law continues to provide more transistors
  • Devices are getting smaller
  • Devices are getting faster
  • Leads to increases in clock frequency
  • Memories are getting bigger
  • Large memories often require more time to access
  • RC Circuits continue to charge exponentially
  • Long-wire signal propagation time is not
    improving as rapidly as switching speed
  • On-chip communication time is slower relative to
    processor clock speeds

3
The Memory Wall
  • Processors grow faster, memory grows slower
  • Off-chip cache misses can halt even aggressive
    out-of-order processors
  • On-chip cache accesses are becoming long-latency
    events
  • Latency can sometimes be tolerated
  • Caching
  • Perfecting
  • Speculation
  • Out-of-order execution
  • Multithreading

4
The Power Wall
  • More devices, faster clocks gt More power
  • Power supply accounts for lots of pins in chip
    packaging (3,057 of 5,370 pins on the
    POWER5)
  • Heat dissipation increases total cost of
    ownership (34W cooling power
    required to remove 100W of heat)
  • Dynamic Power in CMOS
  • Devices get smaller, faster, and more numerous
  • More Capacitance
  • Higher Frequency
  • Architects can constrain a, CL, and f

5
Enter Chip Multiprocessors (CMPs)
  • One chip, many processors
  • Multiple cores per chip
  • Often multiple threads per core

Dual-Core AMD Opteron Die Photo
From Microprocessor Report Best Servers of
2004
6
CMPs
  • CMPs can have good performance
  • Explicit thread-level parallelism
  • Related threads experience constructive
    prefetching
  • CMPs can tolerate long-latency events well
  • Many concurrent threads gt long-latency memory
    accesses can be overlapped
  • CMPs can be power-efficient
  • Enables use of simpler cores
  • Distributes hot spots

7
CMPs
  • CMPs are very specialized
  • Assumes (highly) threaded workload
  • Parallel machines are difficult to use
  • Parallel programming is not (yet) commonplace
  • Many problems similar to traditional
    multiprocessors
  • Cache coherence
  • Memory consistency
  • Many new opportunities
  • Cache sharing
  • More integration

8
Adaptive CMPs
  • To combat specialization, adapt a CMP dynamically
    to its current workload and system
  • Adapt caching policy ( Beckmann et. al., Chang
    et. al., and more )
  • Adapt cache structure ( Alameldeen et. al., and
    more )
  • Adapt thread scheduling ( Kihm et. Al., in the
    SMT space)
  • Current idea
  • Adaptive thread scheduling from the space of
    un-stalled and stalled threads
  • A union of single-core multithreading and
    runahead execution in the context of CMPs

9
Single-Core Multithreading
  • Allow multiple (HW) threads within the same
    execution pipeline
  • Shares processor resources FUs, Decode, ROB,
    etc.
  • Shares local memory resources L1 caches, LSQ,
    etc.
  • Can increase processor and memory utilization

Suns Niagara pipeline block diagram ( Kongetira
et. al.)
10
Runahead Execution
  • Continue execution in the face of a cache miss
  • Checkpoint architectural state
  • Continue execution speculatively
  • Convert memory accesses to prefetches
  • Runahead prefetches can be highly accurate, and
    can greatly improve cache performance ( Mutlu,
    et. al.)
  • It is possible to issue useless prefetches
  • Can be power-inefficient (Mutlu, et. al.)

11
Runahead/Multithreaded Core Interaction
  • Similar Hardware Requirements
  • Additional register files
  • Additional LSQ entries
  • Competition for Similar Resources
  • Execution time (Processor pipeline, Functional
    units, etc)
  • Memory bandwidth
  • TLB Entries, cache space, etc.

12
Runahead/Multithreaded Core Interaction
  • A multithreaded core in a CMP, with runahead,
    must make a difficult scheduling decisions
  • Thread scheduling considerations
  • Which thread should run?
  • Should the thread use runahead?
  • How long should the thread run/runahead?
  • Scheduling implications
  • Is an idle thread making foreword progress at the
    expense of a useful thread?
  • Is a thread spinning on a lock held by another
    thread?
  • Is runahead effective for a given thread?
  • Is a given thread causing performance problems
    elsewhere in the CMP?

13
Proposed Mechanism
  • Track per-thread state on
  • Runahead prefetching accuracy
  • High accuracy favors allowing thread to runahead
  • HW-assigned thread priority
  • Highly useful threads are preferred
  • Selection criteria
  • Heuristic-guided
  • Select the best priority/accuracy pair
  • Probabilistically-guided
  • Select a thread with likelihood proportional to
    its priority/accuracy
  • Useful-first
  • Select non-runahead threads first, then select
    runahead threads

14
Future Directions
  • Dynamically Adaptable CMPs offer several future
    areas of research
  • Adapt for power savings / heat dissipation
  • Computation relocation, load balancing, automatic
    low-power modes, etc.
  • Adapt to error conditions
  • Dynamically allocate backup threads
  • Automatically relocate threads to improve
    resource sharing
  • Combined HW/SW/VM approach

15
Summary
  • Latency now dominates off-chip communication
  • On-chip communication isnt far behind
  • Many techniques to tolerate latency, including
    multithreading
  • CMPs provide new challenges and opportunities to
    computer architects
  • Latency tolerance
  • Potential for power savings
  • Can adapt a CMPs behavior to its workload
  • Dynamic management of shared resources
Write a Comment
User Comments (0)
About PowerShow.com