Multiprogramming Performance of the Pentium 4 with HyperThreading - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Multiprogramming Performance of the Pentium 4 with HyperThreading

Description:

SPEC CPU2000 Benchmarks (INT and FP) Measure standalone run times ... Individual benchmark in a pair up to 40% faster than sequential execution ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 16
Provided by: JRB73
Category:

less

Transcript and Presenter's Notes

Title: Multiprogramming Performance of the Pentium 4 with HyperThreading


1
Multiprogramming Performance of the Pentium 4
with Hyper-Threading
  • James Bulpin and Ian Pratt
  • University of Cambridge
  • Computer Laboratory

2
Introduction
  • Simultaneous Multithreading (SMT)
  • Fine-grain hardware multithreading
  • Processor resources dynamically shared
  • Extracts thread-level parallelism
  • Intel Hyper-Threading (HT)
  • 2 heavyweight threads
  • Some static partitioning, dynamic core with some
    fairness mechanisms
  • Shared caches, free-for-all

3
Motivation to Measure
  • HT abstracted as independent processors
  • Processes can have mutual detrimental effect
  • Can improve performance by scheduling processes
    that perform well together
  • More important in multi-package/core systems
  • Need to know circumstances where performance lost

4
Duplicating and Deconstructing
  • Initial Observations of the Simultaneous
    Multithreading Pentium 4 Processor. N. Tuck and
    D. M. Tullsen PACT03
  • Various multithreaded and multiprogrammed
    workloads
  • We have slightly different hardware and compiler
  • We go further on multiprogrammed
  • Bias between processes performance
  • More analysis of performance counter data
  • Compare Hyper-Threading to SMP

5
Experimental Method
  • Aim to measure mutual performance effect of a set
    of compute-intensive workloads
  • Dual package 2.4GHz HT Xeon. Linux 2.4.19
  • SPEC CPU2000 Benchmarks (INT and FP)
  • Measure standalone run times
  • Measure runs times for crossproduct of
    simultaneously executing pairs

6
Hyper-Threading Results (1)
  • Performance for an individual benchmark
  • Standalone runtime / runtime in HT pair
  • System speedup
  • Sum of both performance figures. 1.0 is
    equivalent to serial execution
  • System speedups
  • Lowest 0.86 swim vs. mgrid
  • Highest 1.50 mcf vs. mesa
  • Mean 1.20

7
1.6
1.5
1.4
1.3
Multiprogrammed speedup
1.2
1.1
1
0.9
art
vpr
gcc
eon
mcf
gap
gzip
apsi
twolf
bzip2
swim
applu
mesa
crafty
mgrid
ammp
parser
vortex
equake
sixtrack
perlbmk
wupwise
8
Hyper-Threading Results (2)
9
Hyper-Threading vs. SMP
  • Mean speedups HT 20, SMP 77
  • SMP 48 improvement on HT
  • (HT is worth 65 of SMP)

10
Conclusions
  • Individual benchmark in a pair up to 40 faster
    than sequential execution
  • High variance, lots of different behaviour
  • Biased performance gains/losses
  • Cache contention plays big part
  • High L2 miss rates hurt the other threads
  • High L1 miss rates benefit from Hyper-Threading

11
(No Transcript)
12
Hyper-Threading Results (1)
13
Best HT System Throughput
14
Worst HT System Throughput
15
SMP Result Matrix
Write a Comment
User Comments (0)
About PowerShow.com