HP Caliper

1 / 26
About This Presentation
Title:

HP Caliper

Description:

Oprofile=collect ( I deprecated) Data used to improve compiler optimizations Oprofile=use ( P deprecated) Can be done manually (caliper pbo ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: HP Caliper


1
HP Caliper
  • Eric Gouriou
  • September 2003

2
Todays Agenda
  • Intended audience of this presentation
  • What is HP Caliper ?
  • Measurements
  • Usage
  • Caliper cheat sheet
  • Limitations
  • Future plans
  • More information on Caliper, DSPP
  • Questions

3
Intended Audience
  • Developers
  • Tuning experts
  • Engineers porting to HP-UX IPF
  • Anyone interested in performance

4
What is HP Caliper ?
  • Performance analysis improvement tool
  • Dynamic performance measurement tool forC / C
    / Fortran / assembly applications
  • Data collection vehicle for compiler-feedback/PBO
  • Works on all programs as is(32/64bits, debug or
    optimized, stripped, etc.)
  • Multiple measurements via a unified interface
  • Provides insights thanks to
  • Itanium Performance Monitoring Unit (PMU)
  • Dynamic binary instrumentation

5
Key Features
  • Default measurement configurations, configurable
  • Selective process module measurements
  • Text HTML reports
  • Performance datafile
  • Three measurement models
  • start application under Caliper
  • attach to running process
  • auto-invocation

6
Measurements
  • PMU event counts
  • Total count of selected hardware events per
    process
  • Negligible overhead
  • Default set of events, can be overriden
  • 400 events described in Itanium2 documentation
  • Non-default use for advanced users
  • ------------------------------------------------
  • Counter Priv. Mask Count
  • -------------------------------------------
  • IA64_INST_RETIRED 8 (USER) 3414917
  • NOPS_RETIRED 8 (USER) 684477
  • CPU_CYCLES 8 (USER) 1899187
  • BACK_END_BUBBLE_ALL 8 (USER) 810631
  • -------------------------------------------
  • of Cycles lost due to stalls (lower is
    better)
  • 42.68 100 BACK_END_BUBBLE_ALL / CPU_CYCLES
  • Effective CPI (lower is better)

7
Measurements (contd)
  • Histograms from samples of PMU data
  • Allows identification of hotspots
  • Module summary, function summary, function
    details, selected global counts and derived
    metrics
  • Flat profile, Cache / TLB / Branch Prediction /
    ALAT
  • Per-thread data available
  • Very low overhead
  • Function Summary
  • -------------------------------------------------
    ------------------------
  • Total Cumulat
  • IP of IP
  • Samples Total Samples Function
    File
  • -------------------------------------------------
    ------------------------
  • 9.71 9.71 1286
    livermoreinit livermore.c
  • 6.03 15.75 799
    livermoremain livermore.c
  • 4.07 19.82 539
    libc.so.1T_19_3c30_cl___doprnt_main doprnt.c
  • 0.60 20.42 79
    libc.so.1_f80_to_dec bindec.c
  • 0.45 20.86 59
    libc.so.1getenv getenv.c
  • ...

8
Measurements (contd)
  • Traces of PMU samples
  • Provides full details for each sample
  • Low overhead but high volume of data
  • Customize configuration file for relevant data
  • -------------------------------------------------
    --------------------------------------
  • -----------------------DCache
    Miss---------------------- ------IP Samples------
  • Sample AddrSlot Data Bundle
    Bundle Address
  • Number (modulefunction) Runtime
    Address Latency (modulefunction)
  • -------------------------------------------------
    --------------------------------------
  • 1 0x3eda00
    0x200000007979f700 5 0x502f0
  • (dld.soMM_malloc)
    (dld.soBU_grow)
  • 2 0x211c00
    0x200000007950b200 26 0x212a0
  • (dld.soLE_finish_create)
    (dld.soLE_finish_create)
  • 3 0x37bf00
    0x20000000795297b8 172 0x37c40
  • (dld.soR_apply_eplt_relocs)
    (dld.soR_apply_eplt_relocs)
  • ...

9
Measurements (contd)
  • Source-level event counts
  • Function call counts, arc counts
  • High overhead, precise counts
  • Done via dynamic binary instrumentation
  • Function Count Details
  • -------------------------------------------------
    ---------
  • Total Function
    File Line
  • -------------------------------------------------
    ---------
  • 150 livermoreabs
    livermore.c 405
  • 104 libc.so.1__milli_memset
  • 92 libc.so.1__milli_memmove
  • ...
  • Arc Counts
  • -------------------------------------------------
    -------------------------------------------
  • Total Taken Taken Source Address
    Source Function File Line,Col
  • Target Address
    Target Function File Line,Col
  • -------------------------------------------------
    -------------------------------------------
  • 28672 28616 99 0x4005e702
    livermoreinit livermore.c376,7

10
Measurements (contd)
  • Call graph profile (gprof-like)
  • Flat profile and call graph
  • High overhead
  • Hybrid of exact counts and PMU sampling
  • Call Graph
  • -------------------------------------------------
    -------------
  • De- Called/Total
    Parents
  • Index Time Self scen- CalledSelf
    Name Index
  • dants Called/Total
    Children
  • -------------------------------------------------
    -------------
  • 0.00 0.00 1/1
    ROOT 1
  • 2 25.1 0.00 0.00 1
    livermoremain 2
  • 0.00 0.00 150/150
    livermoreabs 52
  • 0.00 0.00 30/30
    livermoreclock 45
  • 0.00 0.00 14/14
    livermoreinit 3
  • 0.00 0.00 18/18
    libc.so.1printf 4
  • -------------------------------------------------
    -------------
  • 0.00 0.00 14/14
    livermoremain 2

11
Measurements (contd)
  • PBO profile gathering configuration
  • Auto-invoked when compiling using Oprofilecolle
    ct (I deprecated)
  • Data used to improve compiler optimizations Opro
    fileuse (P deprecated)
  • Can be done manually (caliper pbo ...), however
    not recommended, sub-optimal
  • Variable overhead
  • gt cc Oprofilecollect -o livermore livermore.c
  • gt ./livermore
  • ...
  • gt ls flow
  • flow.data flow.log
  • gt cc Oprofileuse O3 -o livermore livermore.c

12
Usage
  • Typical command line
  • caliper config_file caliper_options program
    program_args
  • Example
  • gt caliper fprof --processall cc -o livermore
    livermore.c
  • Configuration files
  • packaged ones
  • copy/modify
  • command-line overrides

13
Usage (contd)
  • Type Configuration Files Comments
  • --------------------------------------------------
    --------------------------------------------
  • Histograms fprof, reduced samples,
  • dicache_miss very low impact
  • ditlb_miss
  • branch_prediction
  • alat_miss
  • Call graph cgprof sampled exact, high
    impact
  • Sampled details pmu_trace large data volume
  • Total HW event counts total_cpu exact totals,
    no impact
  • Exact source-level arc_count, exact
    details,event counts func_count, high impact
  • Compiler feedback pbo black box

14
Caliper Cheat Sheet
  • Where should I start ?
  • Global view
  • fprof, both for profile and per-process derived
    metrics
  • cgprof, caller/callee, check for surprises
  • dcache_miss, use latency threshold to show
    expensive misses
  • Drill-down
  • Restrict processes, libraries, functions measured
  • What is missing for a global view ?
  • System-wide measurements
  • Multiplexed global counts (vs. many total_cpu
    runs)

15
Caliper Cheat Sheet (contd)
  • Tuning the data collection parameters
  • Multi-process application ?Check process tree
    output, select processes using --process
  • Multi-threaded application ?Check --threadsall (
    per-thread histograms)versus --threadssum-all (d
    efault, aggregated data)
  • Libraries of interest or out of your control ?
  • Use --module-include / --module-exclude
  • Functions of interest ?
  • Check --user-regionsrum/sum and/or triggered
    samples

16
Caliper Cheat Sheet (contd)
  • Better reports
  • Use HTML output (--html), text is the default
  • Use datafiles
  • Allow multiple reports for a single run
  • Faster collection in multi-process runs
  • Check source-level reporting
  • --report-detailsstatement
  • Vary amount of details generated

17
Caliper Cheat Sheet (contd)
  • PBO
  • Performance for free for some applications
    (almost)
  • Use Oprofilecollect/use
  • caliper pbo works on O1 binaries but isnt
    recommended
  • Can use chatr I enable to enable
    auto-invocation
  • Trade-offs for large multi-process
    applications,1 vs. many Caliper

18
Limitations
  • Application characteristics
  • no dynamic library reload
  • Measurement control
  • pbo profile collection requires O1
    binary(automatic when using Oprofilecollect)
  • HW limits the measurements possible per run
  • per-thread data limited to histograms
  • Other
  • emulated PA binaries are not measured
  • minimal dynamic code support
  • limited gcc/g support
  • setuid binaries require workaround
  • limited support for MPAS binaries

19
Future Plans
  • PMU Measurements
  • multiplexed PMU runs
  • richer derived metrics
  • system-wide measurements
  • kernel profiles
  • PBO
  • PMU cache data collected for PBO
  • Data Files
  • aggregation
  • merging
  • diffing

20
Future Plans (contd)
  • Usability
  • Graphs w/ html reports
  • Reports on demand
  • Function context
  • Call stacks
  • Remove limitations
  • Detach for runs involving instrumentation
  • MPAS applications
  • Library load/unload
  • Dynamically generated code

21
More Information
  • The Caliper web page is on the DSPP website
  • lthttp//www.hp.com/go/hpcalipergt
  • Documentation / Support / Downloads
  • The Caliper mailing lists
  • Majordomo lists ltmajordomo_at_cxx.cup.hp.comgt
  • For product announcements
  • ltcaliper-announcegt
  • For announcements and user forum
  • ltcalipergt

22
DSPP Tools Resources for Itanium 2 Set You Up
for Success
  • Software
  • development environments, compilers, operating
    systems, installation/configuration tools,
    performance tools and more
  • Technical documentation
  • white papers, tutorials, references documents and
    manuals, FAQs, known problems, sample code, etc.
  • Training and Education
  • online and classroom training

23
More DSPP Tools Resources
  • Community
  • Itanium forums, source code repository, document
    sharing and mailing lists
  • Equipment
  • rentals and purchase discounts
  • Partner Resources
  • News Events

D S P P
24
Where to go
  • Start with the Itanium web site for DSPP
    partners
  • http//www.hp.com/go/dspp_itanium
  • Contact points for additional information,
    general support,
  • equipment, localization resources and more
  • Americas spp_at_cup.hp.com
  • telephone 1.800.249.3294
  • Europe dspp.emea_at_hp.com
  • telephone 800.100.929.70
  • Asia-Pac hpdev.support_at_hp.com or go to
    www.hp.com/go/dspp for local country
    contacts

25
Quote slide
Questions?
26
(No Transcript)
Write a Comment
User Comments (0)