Software Performance Analysis Using CodeAnalyst for Windows - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Software Performance Analysis Using CodeAnalyst for Windows

Description:

Timer-based sampling - identify time consuming or frequently executed code ... Pipeline Simulation - understand how data dependencies can stall the processor execution ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 25
Provided by: downloadM
Category:

less

Transcript and Presenter's Notes

Title: Software Performance Analysis Using CodeAnalyst for Windows


1
Software Performance Analysis UsingCodeAnalyst
for Windows
  • Sherry Hurwitz
  • SW Applications ManagerSRDsherry.hurwitz_at_amd.com
  • Advanced Micro Devices

Lei Yu Member Technical StaffSRDlei.yu_at_amd.com A
dvanced Micro Devices
2
Session Outline
  • Exploiting Performance Opportunities
  • Obvious Performance Potential
  • Hidden Performance Potential
  • Exposing Untapped Performance Potential
  • Analyzing Performance Improvement Trials
  • AMD CodeAnalyst Performance Analysis Tool
  • Capabilities of CodeAnalyst
  • Functionality of CodeAnalyst
  • Profile Capabilities
  • Thread Analysis
  • Pipeline Simulation

3
Obvious Performance Potential
  • Processor Architecture
  • x64 Processors
  • Extended Memory Addressing
  • Additional Registers
  • Deeper Execution Pipeline
  • Multi-Core Processors
  • Multiprocessing for the desktop system
  • Multiple processor platforms
  • 64-bit Windows operating systems
  • Compiler optimization switches
  • Optimized libraries (for example AMD ACML)

4
Hidden Performance Potential
  • Efficient algorithms
  • Cache friendly memory access
  • Branch Prediction friendly conditionals
  • Parallel work through Threads
  • Object Synchronization

5
Expose Untapped Performance Potential
  • Profile your application with the AMD CodeAnalyst
    Performance Analyzer
  • Timer-based sampling - identify time consuming or
    frequently executed code possibly pointing to
    algorithm issues (Hot Spots)
  • Opteron and Athlon 64 processor performance
    events - evaluate the applications use of
    architectural features
  • Thread View - evaluate effective use of multiple
    processors
  • Pipeline Simulation - understand how data
    dependencies can stall the processor execution
  • Iterate - between profiling and code
    modifications testing if there are performance
    benefits

6
Analyzing Performance Improvement Trials
7
Capabilities of AMD CodeAnalyst
  • CodeAnalyst CAN
  • Assist in optimizing your application
  • Identify program bottlenecks
  • Monitor and Analyze software performance
  • CodeAnalyst CANNOT
  • Identify defects in your program
  • (Profile a functioning stable application.)
  • CodeAnalyst RUNS ON
  • Windows WinNT, Win2K , WinXP, 64-bit Windows
    operating systems

8
Key Functionality of AMD CodeAnalyst
  • Profiling
  • Timer-based sampling
  • Event-based sampling
  • Thread analysis
  • Execution Pipeline Simulation

9
Profile Capabilities
  • Low overhead system-wide profile
  • Timer-based profile
  • 0.1 ms resolution on APIC enabled systems
  • 1.0ms resolution on APIC disabled systems
  • Event-based profile
  • 32 AMD Athlon and AMD Athlon XP performance
    events
  • 78 AMD Opteron and AMD Athlon 64 performance
    events
  • Simultaneously profile up to 4 user selected
    performance events.
  • Profiles multiple processor systems up to 16
    processor cores

10
Profile Analysis
  • Identifies all active Process Names, Process IDs,
    Thread IDs
  • Identifies the Process CPU affinity
  • Identifies performance event per CPU
  • Maps sample addresses to Process, Module,
    Function, Source Line, Assembly Instruction, Code
    Byte

11
Hierarchical Navigation of Data Views
  • System Data View
  • System Graph View
  • Module Data View
  • Module Graph View
  • Source View
  • Disassembly View
  • Demo will show the details of each of these views
    and the navigation between the views.

12
Timer-based Profiling - the First Level of
Analysis
  • Exposes areas of intense activity
  • Identifies the most likely suspects
  • Provides a sample distribution chart
  • Ability to drill down through several data views
  • View the source code on and around the sample
  • Algorithmic issues may be evident from the hot
    spot code
  • Hot spot code might suggest particular events to
    profile in next level of Analysis

13
Common Hot Spots
  • Loops
  • Large content and large loop counts are natural
    hot spots but not bad for performance
  • Small content with small fixed loop counts should
    be unrolled
  • Remove redundant constant calculations from inner
    loops, including from inner control structures
  • Long Logical Expressions in If Statements
  • Long data dependent expressions
  • Complicated Floating Point expressions

14
Event-based Profile - Second Level of Analysis
  • Useful Events to Identify Memory Issues
  • Data Cache Access and Data Cache Misses
    simultaneously
  • use the ratio of Misses to Access
  • Count Misaligned Data Reference
  • Useful Events to Identify Branching Issues
  • Retired branch mispredicted and Retired taken
    branches
  • use the ratio of mispredicted to branch taken

15
Examples of Memory Issues
  • Large data structures with variable size members
    not sorted by size
  • Use of pointer notation in manipulating large
    data arrays
  • Dereferenced pointer arguments inside a function
  • Large declarations of local variables declared
    randomly with respect to size
  • Memory buffers shared between threads

16
Examples of Branch Prediction Issues
  • Order of the expressions in compound branch
    conditions
  • Order of operands in Logical expressions
  • Large switch statements with noncontiguous
    expressions
  • Large switch statements cases out of order in
    respect to probability

17
Thread Analysis
  • Identities threads in the target application.
  • Shows Thread creation and termination
  • Monitors CPU affinity of each thread
  • Identifies Non-local memory access
  • Graphs thread activity on each CPU

18
Thread Analysis Data View
19
Pipeline Simulation Capabilities
  • CodeAnalyst can simulate a user specified block
    of code on AMD microprocessors and provide
    cycle-precise execution info.
  • Requirement
  • Defining a code block to simulate, requires the
    user to provide debug info for the target module.
  • Limitation
  • Cannot simulate instructions inside system space
  • Cannot simulate multi-thread

20
Some Assumptions in the Simulator
  • Assumes perfect memory subsystem
  • All Load/Store Micro-ops hit in the Data Cache
  • Assumes that 1 misaligned load 2 back-to-back
    aligned loads (64-bit)
  • Assumes no cache bank conflicts
  • 100 Instruction cache hit rate
  • Assumes perfect branch prediction
  • Assumes all schedulers are of infinite size

21
Pipeline Data View
22
CodeAnalyst Simulation Analysis
  • User specifies Simulation configuration
  • User sets Trace Point Start, Trace point End, and
    trace trigger
  • Pipeline Data View
  • Pipeline stage
  • Penalty
  • Dependency
  • Delta completion
  • IPC
  • User can view Simulation History

23
Call to Action
  • Download CodeAnalyst
  • Improve Your Software!

24
Additional Resources
  • Web Resources at
  • http//www.developwithamd.com
  • Download CodeAnalyst
  • Software Optimization Guide for AMD Athlon 64 and
    AMD Opteron
  • AMD64 Architecture Programmer's Manual Volume 1
    Application Programming
  • AMD64 Architecture Programmer's Manual Volume 2
    System Programming
  • AMD64 Architecture Programmer's Manual Volume 3
    General-Purpose and System Instructions
  • AMD64 Architecture Programmer's Manual Volume 4
    128-Bit Media Instructions
  • AMD64 Architecture Programmer's Manual Volume 5
    64-Bit Media and x87 Floating-Point Instructions
  • http//www.devx.com
  • Optimizing Your C/C Applications, Part 1 2
  • Whitepapers
  • Porting and Optimizing Applications on 64-bit
    Windows for AMD64 Architecture, Winhec 2004 paper
    by Mike Wall
Write a Comment
User Comments (0)
About PowerShow.com