Multicore: Panic or Panacea? - PowerPoint PPT Presentation

About This Presentation
Title:

Multicore: Panic or Panacea?

Description:

Multicore: Panic or Panacea? Mikko H. Lipasti Associate Professor Electrical and Computer Engineering University of Wisconsin Madison http://www.ece.wisc.edu/~pharm – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 21
Provided by: KevinL184
Category:

less

Transcript and Presenter's Notes

Title: Multicore: Panic or Panacea?


1
Multicore Panic or Panacea?
  • Mikko H. Lipasti
  • Associate Professor
  • Electrical and Computer Engineering
  • University of Wisconsin Madison

http//www.ece.wisc.edu/pharm
2
Multicore Mania
  • First, servers
  • IBM Power4, 2001
  • Then desktops
  • AMD Athlon X2, 2005
  • Then laptops
  • Intel Core Duo, 2006
  • Soon, your cellphone
  • ARM MPCore, prototypes for a while now

3
What is behind this trend?
  • Moores Law
  • Chip power consumption
  • Single-thread performance trend
  • source Intel

4
Dynamic Power
  • Static CMOS current flows when active
  • Combinational logic evaluates new inputs
  • Flip-flop, latch captures new value (clock edge)?
  • Terms
  • C capacitance of circuit
  • wire length, number and size of transistors
  • V supply voltage
  • A activity factor
  • f frequency
  • Future Fundamentally power-constrained

5
Easy answer Multicore
Single Core Dual Core Quad Core
Core area A A/2 A/4
Core power W W/2 W/4
Chip power W O W O W O
Core performance P 0.9P 0.8P
Chip performance P 1.8P 3.2P
6
Amdahls Law
n
f
CPUs
1
f
1-f
Time
  • f fraction that can run in parallel
  • 1-f fraction that must run serially

7
Fixed Chip Power Budget
n
CPUs
  • Amdahls Law
  • Ignores (power) cost of n cores
  • Revised Amdahls Law
  • More cores ? each core is slower
  • Parallel speedup lt n
  • Serial portion (1-f) takes longer
  • Also, interconnect and scaling overhead

8
Fixed Power Scaling
  • Fixed power budget forces slow cores
  • Serial code quickly dominates

9
Predictions and Challenges
  • Parallel scaling limits many-core
  • gt4 cores only for well-behaved programs
  • Optimistic about new applications
  • Interconnect overhead
  • Single-thread performance
  • Will degrade unless we innovate
  • Parallel programming
  • Express/extract parallelism in new ways
  • Retrain programming workforce

10
Research Agenda
  • Programming for parallelism
  • Sources of parallelism
  • New applications, tools, and approaches
  • Single-thread performance and power
  • Most attractive to programmer/user
  • Chip multiprocessor overheads
  • Interconnect, caches, coherence, fairness

11
Finding Parallelism
  • Functional parallelism
  • Car engine, brakes, entertain, nav,
  • Game physics, logic, UI, render,
  • Automatic extraction UW Multiscalar
  • Decompose serial programs
  • Data parallelism
  • Vector, matrix, db table, pixels,
  • Request parallelism
  • Web, shared database, telephony,

12
Balancing Work
  • Amdahls parallel phase f all cores busy
  • If not perfectly balanced
  • (1-f) term grows (f not fully parallel)
  • Performance scaling suffers
  • Manageable for data request parallel apps
  • Very difficult problem for other two
  • Functional parallelism
  • Automatically extracted
  • Scale power to mismatch Multiscalar

13
Coordinating Work
  • Synchronization
  • Some data somewhere is shared
  • Coordinate/order updates and reads
  • Otherwise ? chaos
  • Traditionally locks and mutual exclusion
  • Hard to get right, even harder to tune for perf.
  • Research Transactional Memory UW Multifacet
  • Programmer Declare potential conflict
  • Hardware and/or software speculate check
  • Commit or roll back and retry

14
Single-thread Performance
  • Still most attractive source of performance
  • Speeds up parallel and serial phases
  • Can use it to buy back power
  • Must focus on power consumption
  • Performance benefit Power cost

15
Single-thread Performance
  • Hardware accelerators and circuits
  • Domain-specific UW MESA
  • Reconfigurable UW Compton
  • VLSI and design automation UW WISCAD, Kursun
  • Increasing frequency
  • Seems prohibitive clock power
  • Clever clocking schemes can help UW Pharm
  • Increasing instruction-level parallelism
  • UW Multiscalar, UW Pharm, UW Smith
  • Without blowing power budget
  • Alternatively, reduce power for same performance

16
Chip Multiprocessor Overheads
  • Core Interconnect UW Pharm
  • 80 of chip power Borkar, ISLPED 07 panel
  • Need fundamentally different approach
  • Revisit circuit switching
  • Cache coherence UW Multifacet, Pharm
  • Match workload behavior
  • Optimize for on-chip communication

17
Chip Multiprocessor Overheads
  • Shared caches UW Multifacet, Multiscalar, Smith
  • On-chip memory can be shared
  • Optimize replacement, replication
  • Fairness UW Smith
  • Maintain Performance isolation
  • Share resources fairly (memory, caches)

18
Research Groups _at_ UW
Group Faculty URL
Compton Kati Compton www.ece.wisc.edu/kati
Kursun Volkan Kursun www.cae.wisc.edu/kursun
MESA Mike Schulte mesa.ece.wisc.edu
Multifacet Mark Hill, David Wood http//www.cs.wisc.edu/multifacet
Multiscalar Guri Sohi www.cs.wisc.edu/mscalar
PHARM Mikko Lipasti www.ece.wisc.edu/pharm
Smith James Smith www.engr.wisc.edu/ece/faculty/smith_james.html
Vertical Karu Sankaralingam www.cs.wisc.edu/vertical/wiki
WISCAD Azadeh Davoodi www.cae.wisc.edu/adavoodi
19
Conclusion
  • Forecast
  • Limited multicore (4) is here to stay
  • Manycore (gt4) will find its place
  • Hardware Challenges
  • Single-thread performance and power
  • Multicore overhead
  • Software Challenges
  • Finding application parallelism
  • Creating correct parallel programs
  • Creating scalable parallel programs

20
Questions?
  • http//www.ece.wisc.edu/pharm
Write a Comment
User Comments (0)
About PowerShow.com