Measuring and Modeling Hyperthreaded Processor Performance - PowerPoint PPT Presentation

About This Presentation

Title:

Measuring and Modeling Hyperthreaded Processor Performance

Description:

(provided there are two active threads) The path to instruction execution ... measure the effect of hyper-threading on response time ... Hyper-threading is ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 24

Provided by: ethanb8

Learn more at: https://www.cs.umb.edu

Category:

Tags: hyperthreaded | measuring | modeling | performance | processor | threads

Transcript and Presenter's Notes

Title: Measuring and Modeling Hyperthreaded Processor Performance

1
Measuring and Modeling Hyper-threaded Processor
Performance

Ethan Bolker
UMass-Boston
September 17, 2003

2

Joint work with Yiping Ding, Arjun Kumar (BMC
Software)
Accepted for presentation at CMG32, December 2003
Paper (with references) available on request

3
Improving Processor Performance

Speed up clock
Invent revolutionary new architecture
Replicate processors (parallel application)
Remove bottlenecks (use idle ALU)
caches
pipelining
prefetch

4
Hyper-threading Technology (HTT) Default for new
Intel high end chips

One ALU
Duplicate state of computation (registers) to
create two logical processors (chip
size 1.05)
Parallel instruction preparation (decode)
ALU should see ready work more often
(provided there are two active threads)

5
The path to instruction execution
Intel Technology Journal, Volume 06 Issue 01,
February 14, 2002, p8
6
How little must we understand?

Treat processor as a black box
Experiment to observe behavior
Model to predict behavior

Batch workload repeated dispatch of identical
compute intensive jobs
vary number of threads
measure throughput (jobs/second)

7
Batch throughput
8
Transaction processing

More interesting than batch
Random size jobs arrive at random times
M/M/1
M Markov
M// arrival stream is Poisson, rate ?
/M/ job size exponentially distributed, mean
s
//1 single processor

9
M/M/1 model evaluation

Utilization U ?s
U is dimensionless jobs/sec sec/job
U lt 1 else saturation
Response time r s/(1-U)
randomness ? each job sees (virtual) processor
slowed down (by other jobs) by factor 1/(1-U), so
to accumulate s seconds of real work takes r
s/(1-U) seconds of real time

10
Benchmark

Java driver
chooses interarrival times and service times from
exponential distributions,
dispatches each job in its own thread,
records actual job CPU usage, response time
Input parameters
job arrival rate ?
mean job service time s
Fix s 1 second, vary ? (hence U), track r

11
Benchmark validation
12
Theory vs practice

In theory, there is no difference between theory
and practice. In practice, there is no
relationship between theory and practice.
Grant Gainey
The gap between theory and practice in practice
is much larger than the gap between theory and
practice in theory. Jeff Case

13
Explain/remove discrepancy

Examine, tune benchmark driver
Compute actual coefficients of variation,
incorporate in corrected M/M/1 formula
Nothing helps
Postpone worry in the meanwhile

14
HTT on vs HTT off

Use this benchmark to measure the effect of
hyper-threading on response time
Use throughput (?) as the independent variable
Utilization is ambiguous (digression)

15
HTT on vs HTT off
16
Whats happening

Hyper-threading allows more of the application
parallelism to make its way to the ALU
Can we understand this quantitatively?

17
Model HTT architecture
18
Theory vs practice
s1 0.13 s2 0.81
19
Model parameters

To compute response time r from model, need
(virtual) service parameters s1, s2 (? is
known)
Finding s1, s2
eyeball measured data
fit two data points
maximum likelihood
derive from first principles
s1 0.13, s2 0.81 make sense
15 of work is preparatory, 85
execution

20
Benchmark validation (reprise)

Chip hardware unchanged when HTT off
Assume one path used
Tandem queue
Parameter estimation as before

?
?
?
?
0
21
Theory vs practice

s1 0.045 s2 0.878
22
Future work

Do serious statistics
Does 11 tandem queue model predict
hyper-threading response as well as complex 21
model?
Understand two-processor machine puzzle
Explore how s1 and s2 vary with application
(e.g. fixed vs floating point)
Find ways to estimate s1 and s2 from first
principles

23
Summary

Hyper-threading is
Abstraction (modelling) leverages information
you can often understand a lot even when you know
very little
r s/(1-U) is worth remembering
You do need to connect theory and practice and
practice is harder than theory
Questions?

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Measuring and Modeling Hyper-threaded Processor Performance PowerPoint PPT Presentation

Measuring and Modeling Hyper-threaded Processor Performance - Batch model hyperthreading experiments. threads. throughput. vicksburg - job size scaled to processor speed. dell - hyperthreading off, seed 111, scaled job size. | PowerPoint PPT presentation | free to view

NIH Resource for Biomolecular Modeling and Bioinformatics PowerPoint PPT Presentation

NIH Resource for Biomolecular Modeling and Bioinformatics - Designing a Cluster for a Small Research Group Jim Phillips, Tim Skirvin, John Stone Theoretical and Computational Biophysics Group Outline Why and why not clusters? | PowerPoint PPT presentation | free to view

EECS 252 Graduate Computer Architecture Lec 1 - Introduction PowerPoint PPT Presentation

EECS 252 Graduate Computer Architecture Lec 1 - Introduction - Computer Architecture Renaissance. How would you like your CS252? 8/20/09 ... Concept has existed in high performance computing for 20 years (or is it 40? CDC6600) ... | PowerPoint PPT presentation | free to view

Parallel PowerPoint PPT Presentation

Parallel - High-Performance Grid Computing and Research Networking Concurrent Computers Instructor: S. Masoud Sadjadi http://www.cs.fiu.edu/~sadjadi/Teaching/ | PowerPoint PPT presentation | free to view

aka The Full Monte! PowerPoint PPT Presentation

aka The Full Monte! - Optimisation of Monte Carlo codes for High Performance Computing in Radiotherapy Applications aka The Full Monte! Dr Iwan Cornelius, M.B. Flegg, C.M. Poole, Prof ... | PowerPoint PPT presentation | free to view

Performance Monitoring on Pentium 4 Processor PowerPoint PPT Presentation

Performance Monitoring on Pentium 4 Processor - Performance Monitoring on Pentium 4* Processor. Nidhi. nidhi.nidhi@intel.com ... With hyperthreading, the counters may get divided among the logical processors. ... | PowerPoint PPT presentation | free to view

Operating Systems References: 1 Operating Systems, 3e, Gary Nutt Addison Wesley, ISBN 0201773449 2 O PowerPoint PPT Presentation

Operating Systems References: 1 Operating Systems, 3e, Gary Nutt Addison Wesley, ISBN 0201773449 2 O - ( Intel extends Hyper-Threading Technology to a variety of desktop PCs, with the ... Hyper-Threading Technology enables the processor to execute two threads (parts ... | PowerPoint PPT presentation | free to view

October 25 Slide 1 PowerPoint PPT Presentation

October 25 Slide 1 - Dual-processor nodes. Less memory bandwidth per processor. Dual-core processors ... Dual-core CPUs. October 25 | Slide 27. Tim Skirvin and Jim Phillips ... | PowerPoint PPT presentation | free to view

A Study on HyperThreading PowerPoint PPT Presentation

A Study on HyperThreading - Other resources partitioned equally between 2 threads ... HT On: Hyper-Threading on and OS context ... Extended the simulator to model SMT and Hyper-Threading: ... | PowerPoint PPT presentation | free to view

Planning the LCG Fabric at CERN openlab TCO Workshop November 11th 2003 Tony'CassCERN'ch PowerPoint PPT Presentation

Planning the LCG Fabric at CERN openlab TCO Workshop November 11th 2003 Tony'CassCERN'ch - At present, a single 100baseT NIC would support the I/O load of a quad processor ... Buy: License/maintenance cost plus staff time to track releases. ... | PowerPoint PPT presentation | free to view

Parallel PowerPoint PPT Presentation

Parallel - ... Ironic because none of today s MPPs are SIMD SMPs ... Parallel computer system comprising an integrated ... so that power consumption is low ... | PowerPoint PPT presentation | free to view

Computer System Architecture Introduction PowerPoint PPT Presentation

Computer System Architecture Introduction - Computer System Architecture. Introduction ... Computer Architecture, A Quantitative Approach ... Technology. 45nm process, 820M transistors, 2x107 mm dies ... | PowerPoint PPT presentation | free to view

Today PowerPoint PPT Presentation

Today - Trends in 'Supercomputers' for Scientific Computing ... Difficult to obtain snapshot to compare across vendor platforms. 4-way. Cpq PL 5000 ... | PowerPoint PPT presentation | free to view

INTRODUCTION TO PowerPoint PPT Presentation

INTRODUCTION TO - You are expected to. learn basic concepts of parallel computing ... Sharon Stone bores me, anyway.' Detailed Example: Climate Modeling (cont. ... | PowerPoint PPT presentation | free to view

Servers PowerPoint PPT Presentation

Servers - CIT 470: Advanced Network and System Administration Servers CIT 470: Advanced Network and System Administration Slide #* | PowerPoint PPT presentation | free to view

Hardware and Software Trends PowerPoint PPT Presentation

Hardware and Software Trends - Gordon Moore (a founder of Intel) observed a trend in semiconductor growth in ... The torrent of innovation of the past 30 years will continue ... | PowerPoint PPT presentation | free to view

Virtualisointi PowerPoint PPT Presentation

Virtualisointi - Muita virtualisointiin liittyvi asioita. Reduced number of servers ... HyperThreading tuki (ei lis tehoa) Virtual Disk Precompactor. PXE Boot support ... | PowerPoint PPT presentation | free to view

Last class review PowerPoint PPT Presentation

Last class review - FAA WAAS (Wide Area Augmentation System) Fault Detection. Duplication with comparison ... fail, the crew or the controllers on earth might decide to abort the mission ... | PowerPoint PPT presentation | free to view

Dasar komputer PowerPoint PPT Presentation

Dasar komputer - dasar komputer kuliah | PowerPoint PPT presentation | free to view

SPECjbb2005 PowerPoint PPT Presentation

SPECjbb2005 - Estimation of system requirements (use for the customer), choice of a ... large heaps ... collection and very large heaps. System.GC() called between ... | PowerPoint PPT presentation | free to view

EECS 252 Graduate Computer Architecture Lec 1 - Introduction PowerPoint PPT Presentation

EECS 252 Graduate Computer Architecture Lec 1 - Introduction - Die photo. App photo. 9/10/09. CS252-s05, Lec 01-intro. 4. Forces ... Obtain instruction from program storage. Determine required actions and instruction size ... | PowerPoint PPT presentation | free to view

Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations PowerPoint PPT Presentation

Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations - Implementation based on Jikes RVM. All analyses done in JIT* time ... In this study, we compare 4 algorithms: Connectivity Analysis (Pensieve) ... | PowerPoint PPT presentation | free to view

Multiprocessors and ThreadLevel Parallelism Contd PowerPoint PPT Presentation

Multiprocessors and ThreadLevel Parallelism Contd - An Example Snoopy Protocol. Invalidation protocol, write-back cache ... Similar to Snoopy Protocol: Three states. Shared: 1 processors have data, memory up-to-date ... | PowerPoint PPT presentation | free to view

Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations PowerPoint PPT Presentation

Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations - Chi-Leung Wong, Zehra Sura, Xing Fang, Kyungwoo Lee, Samuel P. Midkiff, Jaejin ... Experimental Settings (Machine) Intel (Dell PowerEdge 6600 SMP) ... | PowerPoint PPT presentation | free to view

Theo Ungerer PowerPoint PPT Presentation

Theo Ungerer - IBM RS64 IV: two-threaded block MT, reported 5% overhead ... Technical Data of the Komodo Prototype ... Using Real-time Scheduling in Hardware. Current work: ... | PowerPoint PPT presentation | free to view

Autotuning Memory Intensive Kernels for Multicore PowerPoint PPT Presentation

Autotuning Memory Intensive Kernels for Multicore - Auto-tuning Sparse Matrix-Vector Multiplication (SpMV) ... you trade free (always pay for it) cache-coherency traffic for additional memory ... | PowerPoint PPT presentation | free to view

Vulnerabilities on high-end processors PowerPoint PPT Presentation

Vulnerabilities on high-end processors - Vulnerabilities on high-end processors Andr Seznec IRISA/INRIA CAPS project-team | PowerPoint PPT presentation | free to view