Title: Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation
1Pinpointing Representative Portions of Large
Intel Itanium Programs with Dynamic
Instrumentation
IA32/EM64T/IPF
- Harish Patil, Robert Cohn, Mark Charney, Rajiv
Kapoor, Andrew Sun, Anand Karunanidhi - Enterprise Platform Group
- Intel Corporation
Presented at MICRO-37 Portland, OR, Dec. 6th,
2004
2Goal Accurate Performance Prediction
- Target LARGE Applications
- With little/no manual intervention
- Within reasonable time
3Instruction Counts Some Itanium Applications
4Whole-Program Simulation is Slow
5Solution Select Simulation Points
- Manually
- Randomly
- Anywhere
- From uniform regions
- Fine-grain sampling (SMARTS CMU)
- By program-phase analysis (SimPointUCSD, iPart
Intel/MRL)
6Running Commercial Applications on Simulators is
Hard
- Resource Requirements Disks etc.
- Need to modify/re-configure the simulator
- OS dependencies
- Need support for specific kernel and device
drivers - License checking
- Need special action
7Solution Native Execution with Instrumentation
- Use PIN to select simulation points (PinPoints)
and generate traces - PIN A dynamic-instrumentation system
- A tool for writing tools
- No special compiler/linker flags required
8PIN-Tools Profiling, Trace Generation and more.
PIN-based profiler
PIN-based Trace Generator
Profile
PinPoints
PIN-based Branch Predictor
Simulation Point Selection
Your Simulator Here
9Simulation Point Selection withSimPoint UCSD
- Why SimPoint?
- Instrumentation based
- Microarchitecture independent
- Works well (results later)
- Applied to multi-threaded programs
Basic Block Vectors
PinPoints
PIN-based profiler
SimPoint Tools
10Goal Accurate Performance Prediction
- Phase-detection is not enough!
- Need Trace Generation and Simulation
Error Source Phase detection
Error Source Non-repeatability
Error Source Warm-up, Modeling
Multiple Sources of Error
11Main Contributions
- A Toolkit that automatically
- Profiles, finds phases/ simulation regions
(PinPoints) - Validates that PinPoints are representative
- Generates traces for simulators
- Available for Itanium/IA32/EM64T
- Evaluations in a production environment
12The PinPoints Toolkit
Phase Detection PinPoint Selection
H/W counters-based Validation(pfmon
ItaniumPAPI IA32)
PinPoints file
Compute CPI
Weighted Sum for PinPoints
Whole Program
Trace Generation/Simulation
Match?
13Evaluations
- Applications Built w/ Intels compilers (high
opt)HPC Fluent, AMBER, LS-Dyna, RenderMan
SPEC2000 Processed 8-9 times - Test Configurations Linux (RedHat)
Merced Itanium (1) 800 MHz L3 2MB
McKinley Itanium-2 900 MHz L3 1.5MB
Madison Itanium-2 1.3 GHz L3 3-6 MB
14PinPoints Generated
Program Retired Instructions (billions) PinPoints (250 million insts. EACH)
AMBER-rt 3,994 6
Fluent-m3 2,625 8
LS-DYNA 4,932 6
SPECINT2000(avg.) 142 4
SPECFP2000(avg.) 373 5
- PinPoints ltlt 1 of program execution
- Turnaround time (Traces) Few days
15Results Overview
- PinPoints Whole-Program CPI prediction (SPEC2000
and HPC applications) - Average CPI prediction error 5
- PinPoints better than random selection
- Predicting speedup between microarchitectures
- PinPoints can be used to evaluate
microarchitecture variations - PinPoints Traces Prediction of native SPEC2000
ratios - INT within 8 FP within 3
- More results in the paper
16CPI Actual vs. PredictedSPEC2000
Itanium-Madison
17SPEC2000 CPI PredictionAverage Error Madison
2.8
Merced 3.2 McKinley 2.7
18HPC Applications CPI PredictionAverage Error
Madison 5.0
19Comparison With Random Selection 48 unique
program runs
20Comparison With Random Selection 18 unique
program runs
21Speedup Merced ? McKinleySPEC2000
22PinPoints Speedup Prediction SPEC2000 Merced ?
McKinley
23PinPoints Speedup Prediction Across Multiple
Microarchitectures Same Binaries/PinPoints
24Putting it All TogetherFrom PinPoints to
Projections
- Does simulation of traces for PinPoints predict
native performance?
Error Cumulative
Error Source Phase detection
Error Source Non-repeatability
Error Source Warm-up, Modeling
25CPI Prediction with SimulationSPEC2000 Itanium
Madison
26Native SPEC2000 RatiosSpring 2004Itanium
Madison 1.5GHz/6MB L3
27Performance Prediction from PinPoints
TracesItanium Madison 1.5GHz/6MB L3
28Summary
- PinPoints toolkit Automatic simulation region
selection, tracing, and validation - Dynamic instrumentation (PIN ) ? LARGE programs
- PinPoints ltlt 1 of executionCapture
whole-program CPI - Average error lt 5 for SPEC2000, HPC apps.
- Better than random selection
- PinPoints traces Predict SPEC2000 Ratios
- INT within 8 FP within 3
29Try it out!
(PIN PinPoints) toolkit http//rogue.colora
do.edu/Pin
New
30Backup Simulator Warm-up
- Strategy 1 Large slice-size (250 million
instructions) - Too coarse-grain for phase detection
- Too much simulation time
- Strategy 2 7 warm-up traces per simulation
trace (30 million instructions) - Art (SPECFP2000) First pinpoint touches most of
the working set - Simulate all pinpoint traces in succession