Title: Software Performance Analysis Using CodeAnalyst for Windows
1Software Performance Analysis UsingCodeAnalyst
for Windows
- Sherry Hurwitz
- SW Applications ManagerSRDsherry.hurwitz_at_amd.com
- Advanced Micro Devices
Lei Yu Member Technical StaffSRDlei.yu_at_amd.com A
dvanced Micro Devices
2Session Outline
- Exploiting Performance Opportunities
- Obvious Performance Potential
- Hidden Performance Potential
- Exposing Untapped Performance Potential
- Analyzing Performance Improvement Trials
- AMD CodeAnalyst Performance Analysis Tool
- Capabilities of CodeAnalyst
- Functionality of CodeAnalyst
- Profile Capabilities
- Thread Analysis
- Pipeline Simulation
3Obvious Performance Potential
- Processor Architecture
- x64 Processors
- Extended Memory Addressing
- Additional Registers
- Deeper Execution Pipeline
- Multi-Core Processors
- Multiprocessing for the desktop system
- Multiple processor platforms
- 64-bit Windows operating systems
- Compiler optimization switches
- Optimized libraries (for example AMD ACML)
4Hidden Performance Potential
- Efficient algorithms
- Cache friendly memory access
- Branch Prediction friendly conditionals
- Parallel work through Threads
- Object Synchronization
5Expose Untapped Performance Potential
- Profile your application with the AMD CodeAnalyst
Performance Analyzer - Timer-based sampling - identify time consuming or
frequently executed code possibly pointing to
algorithm issues (Hot Spots) - Opteron and Athlon 64 processor performance
events - evaluate the applications use of
architectural features - Thread View - evaluate effective use of multiple
processors - Pipeline Simulation - understand how data
dependencies can stall the processor execution - Iterate - between profiling and code
modifications testing if there are performance
benefits
6Analyzing Performance Improvement Trials
7Capabilities of AMD CodeAnalyst
- CodeAnalyst CAN
- Assist in optimizing your application
- Identify program bottlenecks
- Monitor and Analyze software performance
- CodeAnalyst CANNOT
- Identify defects in your program
- (Profile a functioning stable application.)
- CodeAnalyst RUNS ON
- Windows WinNT, Win2K , WinXP, 64-bit Windows
operating systems
8Key Functionality of AMD CodeAnalyst
- Profiling
- Timer-based sampling
- Event-based sampling
- Thread analysis
- Execution Pipeline Simulation
9Profile Capabilities
- Low overhead system-wide profile
- Timer-based profile
- 0.1 ms resolution on APIC enabled systems
- 1.0ms resolution on APIC disabled systems
- Event-based profile
- 32 AMD Athlon and AMD Athlon XP performance
events - 78 AMD Opteron and AMD Athlon 64 performance
events - Simultaneously profile up to 4 user selected
performance events. - Profiles multiple processor systems up to 16
processor cores
10Profile Analysis
- Identifies all active Process Names, Process IDs,
Thread IDs - Identifies the Process CPU affinity
- Identifies performance event per CPU
- Maps sample addresses to Process, Module,
Function, Source Line, Assembly Instruction, Code
Byte
11Hierarchical Navigation of Data Views
- System Data View
- System Graph View
- Module Data View
- Module Graph View
- Source View
- Disassembly View
- Demo will show the details of each of these views
and the navigation between the views.
12Timer-based Profiling - the First Level of
Analysis
- Exposes areas of intense activity
- Identifies the most likely suspects
- Provides a sample distribution chart
- Ability to drill down through several data views
- View the source code on and around the sample
- Algorithmic issues may be evident from the hot
spot code - Hot spot code might suggest particular events to
profile in next level of Analysis
13Common Hot Spots
- Loops
- Large content and large loop counts are natural
hot spots but not bad for performance - Small content with small fixed loop counts should
be unrolled - Remove redundant constant calculations from inner
loops, including from inner control structures - Long Logical Expressions in If Statements
- Long data dependent expressions
- Complicated Floating Point expressions
14Event-based Profile - Second Level of Analysis
- Useful Events to Identify Memory Issues
- Data Cache Access and Data Cache Misses
simultaneously - use the ratio of Misses to Access
- Count Misaligned Data Reference
- Useful Events to Identify Branching Issues
- Retired branch mispredicted and Retired taken
branches - use the ratio of mispredicted to branch taken
15Examples of Memory Issues
- Large data structures with variable size members
not sorted by size - Use of pointer notation in manipulating large
data arrays - Dereferenced pointer arguments inside a function
- Large declarations of local variables declared
randomly with respect to size - Memory buffers shared between threads
16Examples of Branch Prediction Issues
- Order of the expressions in compound branch
conditions - Order of operands in Logical expressions
- Large switch statements with noncontiguous
expressions - Large switch statements cases out of order in
respect to probability
17Thread Analysis
- Identities threads in the target application.
- Shows Thread creation and termination
- Monitors CPU affinity of each thread
- Identifies Non-local memory access
- Graphs thread activity on each CPU
18Thread Analysis Data View
19Pipeline Simulation Capabilities
- CodeAnalyst can simulate a user specified block
of code on AMD microprocessors and provide
cycle-precise execution info. - Requirement
- Defining a code block to simulate, requires the
user to provide debug info for the target module. - Limitation
- Cannot simulate instructions inside system space
- Cannot simulate multi-thread
20Some Assumptions in the Simulator
- Assumes perfect memory subsystem
- All Load/Store Micro-ops hit in the Data Cache
- Assumes that 1 misaligned load 2 back-to-back
aligned loads (64-bit) - Assumes no cache bank conflicts
- 100 Instruction cache hit rate
- Assumes perfect branch prediction
- Assumes all schedulers are of infinite size
21Pipeline Data View
22CodeAnalyst Simulation Analysis
- User specifies Simulation configuration
- User sets Trace Point Start, Trace point End, and
trace trigger - Pipeline Data View
- Pipeline stage
- Penalty
- Dependency
- Delta completion
- IPC
- User can view Simulation History
23Call to Action
- Download CodeAnalyst
- Improve Your Software!
24Additional Resources
- Web Resources at
- http//www.developwithamd.com
- Download CodeAnalyst
- Software Optimization Guide for AMD Athlon 64 and
AMD Opteron - AMD64 Architecture Programmer's Manual Volume 1
Application Programming - AMD64 Architecture Programmer's Manual Volume 2
System Programming - AMD64 Architecture Programmer's Manual Volume 3
General-Purpose and System Instructions - AMD64 Architecture Programmer's Manual Volume 4
128-Bit Media Instructions - AMD64 Architecture Programmer's Manual Volume 5
64-Bit Media and x87 Floating-Point Instructions - http//www.devx.com
- Optimizing Your C/C Applications, Part 1 2
- Whitepapers
- Porting and Optimizing Applications on 64-bit
Windows for AMD64 Architecture, Winhec 2004 paper
by Mike Wall