Ioana Burcea - PowerPoint PPT Presentation

About This Presentation
Title:

Ioana Burcea

Description:

Ioana Burcea. Initial Observations of the Simultaneous ... Nathan Tuck and Dean M. Tullsen. Agenda. SMT proposed in research. Intel Hyper-threading ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 16
Provided by: johngratza
Category:
Tags: burcea | ioana | tuck

less

Transcript and Presenter's Notes

Title: Ioana Burcea


1
Initial Observations of the Simultaneous
Multithreading Pentium 4 Processor
Nathan Tuck and Dean M. Tullsen
  • Ioana Burcea

2
Agenda
  • SMT proposed in research
  • Intel Hyper-threading
  • Methodology
  • Benchmarks and experiments
  • Experimental Results
  • Questions?

3
SMT in Research
  • Up to 8 contexts 8 way SMT
  • ICOUNT 2.8 fetching policy

4
Intel Hyper-threading
  • SMT in real silicon Intel Pentium 4
  • Single vs. multithreaded mode

5
Methodology
  • Pentium 4 2.5 GHz 512 DRAM
  • RedHat 7.3 Linux 2.4.28smp
  • Linux treats the system as a dual-processor
  • It has a separate run queue for each virtual
    processor
  • Benchmarks
  • SPEC CPU2000
  • NAS parallel benchmarks
  • SPLASH2 (modified input)

6
Speedup for Heterogeneous Workloads
  • TSMT total_execution_time / number of runs
  • Speedup Tseq / TSMT
  • Speedup per combination Sbench_1 Sbench_2
  • At least 12 total jobs
  • At least 3 runs for each job

7
Static Partitioning of Resources
  • SPECINT 83 on average
  • SPECFP 85 on average
  • eon 71
  • wupwise 72
  • mcf 93
  • art 97
  • swim 98

8
Independent Threads
9
Parallel Multithreaded Speedup
NAS
SPLASH
10
Synchronization and Communication Speed
  • Reading a value protected by a lock
  • 37 million times per second
  • 68 cycles lock read
  • Updating a value protected by a lock
  • 14.6 million times per second
  • 171 cycles lock update

11
Synchronization and Communication Speed (contd)
  • Loop
  • result independent computation
  • computation that uses result flow dependence
  • Independent computation
  • a loop that contains
  • a load
  • a float multiply
  • a float add

12
Synchronization and Communication Speed (contd)
13
Heterogeneous vs. Homogeneous Workloads
  • Two self copies of SPEC
  • Average speedup 1.11 lt 1.20
  • Integer vs. integer 1.17
  • Float vs. float 1.20
  • Integer vs. float 1.21

14
Compiler Interaction
Baseline?
15
Questions?
  • Is resource partitioning a good approach?
  • IBMs Power5 implementation?
  • Other implementations?
Write a Comment
User Comments (0)
About PowerShow.com