Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation

1 / 30
About This Presentation
Title:

Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation

Description:

Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, ... Applications: Built w/ Intel's compilers (high opt) HPC: Fluent, AMBER, LS-Dyna, RenderMan ... –

Number of Views:44
Avg rating:3.0/5.0
Slides: 31
Provided by: harish5
Learn more at: https://microarch.org
Category:

less

Transcript and Presenter's Notes

Title: Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation


1
Pinpointing Representative Portions of Large
Intel Itanium Programs with Dynamic
Instrumentation
IA32/EM64T/IPF
  • Harish Patil, Robert Cohn, Mark Charney, Rajiv
    Kapoor, Andrew Sun, Anand Karunanidhi
  • Enterprise Platform Group
  • Intel Corporation

Presented at MICRO-37 Portland, OR, Dec. 6th,
2004
2
Goal Accurate Performance Prediction
  • Target LARGE Applications
  • With little/no manual intervention
  • Within reasonable time

3
Instruction Counts Some Itanium Applications
4
Whole-Program Simulation is Slow
5
Solution Select Simulation Points
  • Manually
  • Randomly
  • Anywhere
  • From uniform regions
  • Fine-grain sampling (SMARTS CMU)
  • By program-phase analysis (SimPointUCSD, iPart
    Intel/MRL)

6
Running Commercial Applications on Simulators is
Hard
  • Resource Requirements Disks etc.
  • Need to modify/re-configure the simulator
  • OS dependencies
  • Need support for specific kernel and device
    drivers
  • License checking
  • Need special action

7
Solution Native Execution with Instrumentation
  • Use PIN to select simulation points (PinPoints)
    and generate traces
  • PIN A dynamic-instrumentation system
  • A tool for writing tools
  • No special compiler/linker flags required

8
PIN-Tools Profiling, Trace Generation and more.
PIN-based profiler
PIN-based Trace Generator
Profile
PinPoints
PIN-based Branch Predictor
Simulation Point Selection
Your Simulator Here
9
Simulation Point Selection withSimPoint UCSD
  • Why SimPoint?
  • Instrumentation based
  • Microarchitecture independent
  • Works well (results later)
  • Applied to multi-threaded programs

Basic Block Vectors
PinPoints
PIN-based profiler
SimPoint Tools
10
Goal Accurate Performance Prediction
  • Phase-detection is not enough!
  • Need Trace Generation and Simulation

Error Source Phase detection
Error Source Non-repeatability
Error Source Warm-up, Modeling
Multiple Sources of Error
11
Main Contributions
  • A Toolkit that automatically
  • Profiles, finds phases/ simulation regions
    (PinPoints)
  • Validates that PinPoints are representative
  • Generates traces for simulators
  • Available for Itanium/IA32/EM64T
  • Evaluations in a production environment

12
The PinPoints Toolkit
Phase Detection PinPoint Selection
H/W counters-based Validation(pfmon
ItaniumPAPI IA32)
PinPoints file
Compute CPI
Weighted Sum for PinPoints
Whole Program
Trace Generation/Simulation
Match?
13
Evaluations
  • Applications Built w/ Intels compilers (high
    opt)HPC Fluent, AMBER, LS-Dyna, RenderMan
    SPEC2000 Processed 8-9 times
  • Test Configurations Linux (RedHat)

Merced Itanium (1) 800 MHz L3 2MB
McKinley Itanium-2 900 MHz L3 1.5MB
Madison Itanium-2 1.3 GHz L3 3-6 MB
14
PinPoints Generated
Program Retired Instructions (billions) PinPoints (250 million insts. EACH)
AMBER-rt 3,994 6
Fluent-m3 2,625 8
LS-DYNA 4,932 6
SPECINT2000(avg.) 142 4
SPECFP2000(avg.) 373 5
  • PinPoints ltlt 1 of program execution
  • Turnaround time (Traces) Few days

15
Results Overview
  • PinPoints Whole-Program CPI prediction (SPEC2000
    and HPC applications)
  • Average CPI prediction error 5
  • PinPoints better than random selection
  • Predicting speedup between microarchitectures
  • PinPoints can be used to evaluate
    microarchitecture variations
  • PinPoints Traces Prediction of native SPEC2000
    ratios
  • INT within 8 FP within 3
  • More results in the paper

16
CPI Actual vs. PredictedSPEC2000
Itanium-Madison
17
SPEC2000 CPI PredictionAverage Error Madison
2.8
Merced 3.2 McKinley 2.7
18
HPC Applications CPI PredictionAverage Error
Madison 5.0
19
Comparison With Random Selection 48 unique
program runs
20
Comparison With Random Selection 18 unique
program runs
21
Speedup Merced ? McKinleySPEC2000
22
PinPoints Speedup Prediction SPEC2000 Merced ?
McKinley
23
PinPoints Speedup Prediction Across Multiple
Microarchitectures Same Binaries/PinPoints
24
Putting it All TogetherFrom PinPoints to
Projections
  • Does simulation of traces for PinPoints predict
    native performance?

Error Cumulative
Error Source Phase detection
Error Source Non-repeatability
Error Source Warm-up, Modeling
25
CPI Prediction with SimulationSPEC2000 Itanium
Madison
26
Native SPEC2000 RatiosSpring 2004Itanium
Madison 1.5GHz/6MB L3
27
Performance Prediction from PinPoints
TracesItanium Madison 1.5GHz/6MB L3
28
Summary
  • PinPoints toolkit Automatic simulation region
    selection, tracing, and validation
  • Dynamic instrumentation (PIN ) ? LARGE programs
  • PinPoints ltlt 1 of executionCapture
    whole-program CPI
  • Average error lt 5 for SPEC2000, HPC apps.
  • Better than random selection
  • PinPoints traces Predict SPEC2000 Ratios
  • INT within 8 FP within 3

29
Try it out!
(PIN PinPoints) toolkit http//rogue.colora
do.edu/Pin
New
30
Backup Simulator Warm-up
  • Strategy 1 Large slice-size (250 million
    instructions)
  • Too coarse-grain for phase detection
  • Too much simulation time
  • Strategy 2 7 warm-up traces per simulation
    trace (30 million instructions)
  • Art (SPECFP2000) First pinpoint touches most of
    the working set
  • Simulate all pinpoint traces in succession
Write a Comment
User Comments (0)
About PowerShow.com