Title: Low Overhead Program Monitoring and Profiling
1Low Overhead Program Monitoring and Profiling
Naveen Kumar, Bruce Childers
Mary Lou Soffa
- Department of Computer Science
- University of Pittsburgh
- Pittsburgh, Pennsylvania 15260
- naveen, childers_at_cs.pitt.edu
Department of Computer Science University of
Virginia Charlottesville, Virginia
22904 soffa_at_virginia.edu
2Introduction
- Program instrumentation Insertion of additional
code into a program - Monitor program behavior or gather information
- Can be inserted at source intermediate or binary
level - Applications
- Detect program invariants Ernst
- Dynamic slicing Zhang
- Software testing Misurda
- Software security checks Scott
3Running Example
- Consider a software security system that monitors
the memory behavior of untrusted programs (e.g.
Dynamo RIO) - Instrumentation at binary instruction level
- Instrument all loads and stores
- Program can be instrumented statically as well as
dynamically
4Static instrumentation
- ro1 ro1 ltlt 10
- ro1 ro1 0x228
- ro0 ro2 ltlt 0x14
- rl4 ro0 ltlt 0x14
- Mrl0 0x10 ro2
- Mro1 0x228 ro0
- ri4 ro1
- rl1 ro0
- jmp r31
-
- Mrl0 0x20 ro0
- rsp rsp -112
- ro0 ro0 ltlt 10
- ro1 Mro0 0x3d0
-
probe1 Mrsp -20 rl0 save call
save_gp_regs ro0 Mrsp 0x68
ro0 ro0 0x10 call secure ro1
rg0 1 call restore_gp_regs restore
rsp rsp 124 Mrl0 0x10 ro2
jmp probe1_ret
probe1 call secure() probe2 call
secure() probe3 call secure() probe4
call secure()
jmp probe1
jmp probe2
jmp probe3
jmp probe4
Example from gzip. Instrumentation performed
before execution starts
5Dynamic instrumentation
- ro1 ro1 ltlt 10
- ro1 ro1 0x228
- ro0 ro2 ltlt 0x14
- rl4 ro0 ltlt 0x14
- Mrl0 0x10 ro2
- Mro1 0x228 ro0
- ri4 ro1
- rl1 ro0
- jmp r31
-
- Mrl0 0x20 ro0
- rsp rsp -112
- ro0 ro0 ltlt 10
- ro1 Mro0 0x3d0
-
probe1 call secure() probe2 call
secure() probe3 call secure() probe4
call secure()
jmp probe1
jmp probe2
jmp probe3
jmp probe4
Instrumentation performed at run-time on code
that executes More powerful than static
instrumentation, possibly less expensive
6Motivation
- Stumbling block high overhead
- Slowdown by an order of magnitude or more Ernst
- Existing solutions user guided
- Sampling Arnold
- Smaller data sets analyzed (test data set of SPEC
instead of Ref) Mock - Less aggressive uses, especially in dynamic
settings Deusterwald - User has to decide how best to apply
instrumentation - What is needed are automatic techniques to
mitigate the overheads systematically
7Goals
- Gather exact information
- Separate out the accuracy from efficiency
- User should focus on what to gather, rather than
how to efficiently gather - Efficient
- Comparable to hand-optimized instrumentation
- Automatic
- No or little user guidance
8Instrumentation Optimization
- Costs associated with instrumentation
- Dynamic probe count Number of probes executed
- Probe cost Number of instructions in a probe
- Payload cost Frequency of invocation and cost of
payload - Optimize instrumentation code to reduce costs
- Dynamic probe coalescing
- Partial context switches
- Partial payload inlining
9Base Instrumenter
- ro1 ro1 ltlt 10
- ro1 ro1 0x228
- ro0 ro2 ltlt 0x14
- rl4 ro0 ltlt 0x14
- Mrl0 0x10 ro2
- Mro1 0x228 ro0
- ri4 ro1
- rl1 ro0
- jmp r31
-
- Mrl0 0x20 ro0
- rsp rsp -112
- ro0 ro0 ltlt 10
- ro1 Mro0 0x3d0
-
-
probe1 call secure() probe2 call
secure() probe3 call secure() probe4
call secure()
jmp probe1
jmp probe2
jmp probe3
jmp probe4
Base instrumenter generates a list of
Instrumentation Points
10Dynamic Probe Coalescing
probe1 call secure() probe2 call
secure() probe3 call secure() probe4
call secure()
probe5 call secure() call
secure() probe3 call secure() probe4
call secure()
- ro1 ro1 ltlt 10
- ro1 ro1 0x228
- ro0 ro2 ltlt 0x14
- rl4 ro0 ltlt 0x14
- Mrl0 0x10 ro2
- Mro1 0x228 ro0
- ri4 ro1
- rl1 ro0
- jmp r31
-
- Mrl0 0x20 ro0
- rsp rsp -112
- ro0 ro0 ltlt 10
- ro1 Mro0 0x3d0
-
-
probe6 call secure() call secure()
call secure()
jmp probe1
jmp probe5
jmp probe2
jmp probe3
jmp probe6
jmp probe4
11Partial Context Switch
probe6 call secure() call secure()
call secure() probe4 call secure()
- ro1 ro1 ltlt 10
- ro1 ro1 0x228
- ro0 ro2 ltlt 0x14
- rl4 ro0 ltlt 0x14
- Mrl0 0x10 ro2
- Mro1 0x228 ro0
- ri4 ro1
- rl1 ro0
- jmp r31
-
- Mrl0 0x20 ro0
- rsp rsp -112
- ro0 ro0 ltlt 10
- ro1 Mro0 0x3d0
-
-
probe6 Mrsp -20 rl0 Mrsp -28
ro1 save call save_gp_regs effective
address call secure effective address
call secure effective address call
secure call restore_gp_regs restore
jmp probe6_ret
jmp probe6
jmp probe4
Analyze register usage in payload
Remove spill and reload of GP registers
Regs. used in payload Not used g0g7
12Partial Payload Inlining
probe6 Mrsp -20 rl0 Mrsp -28
ro1 rsp rsp -140 effective
address call secure effective address
call secure effective address call
secure rsp rsp 140 jmp
probe6_ret
- ro1 ro1 ltlt 10
- ro1 ro1 0x228
- ro0 ro2 ltlt 0x14
- rl4 ro0 ltlt 0x14
- Mrl0 0x10 ro2
- Mro1 0x228 ro0
- ri4 ro1
- rl1 ro0
- jmp r31
-
- Mrl0 0x20 ro0
- rsp rsp -112
- ro0 ro0 ltlt 10
- ro1 Mro0 0x3d0
-
-
void secure(address) if(address gt
REDZONE) return redAlerts
createReport() if(critical(address))
assert(address)
ro1 Mrg10 ro1 ro1 -
ro0 ri0 1 jmp r31 ro3
Mrg2 0 ro3 ro3 1 !call
createReport !call assert call
__full_secure
void __inlined_secure(address)
__full_secure(address, tag)
void __full_secure(address, tag)
jmp probe6
jmp probe4
13Implementation
- Strata dynamic translation system Scott et.
al. - Generates code at run-time for an application
- Suitable for dynamic instrumentation
- FIST base instrumentation system Kumar et. al.
- Flexible for diverse instrumentation needs
- Generates a list of instrumentation points (IPs)
- INS-OP developed in this work
- Constructs an IR for the list of IPs obtained
from FIST - Each optimization is a pass that modifies the IR
14Case Studies
- Case study 1 Program profiling
- Lightweight instrumentation application
- Lower initial overhead implies lesser benefits
- Demonstrates efficacy of the optimizations in an
unfavorable scenario - Case study 2 Memory simulation
- Relatively heavy-weight instrumentation
application - Can compare with state-of-the-art systems to see
the benefits of optimization
15Case study 1 Program profiling
- The benefit of optimization varies depends
upon the initial overhead - The speedups range from 1.26 to 2.63
16Case study 2 Memory Simulation
- Strata-Embra is a SPARC implementation of
cache simulator from SimOS - Strata-Embra-Opt is optimized cache simulator
using INS-OP - INS-OP optimizes the fastest cache simulator
we could find by 2 - 3.3 times
17Conclusions
- Introduced instrumentation optimization to
reduce the cost of instrumented code - Reduced probe count
- Reduce cost of an individual probe
- Reduce the cost of payload
- Speedups between 1.2 - 3.3 times
- More detailed information gathering
- Accuracy need not be sacrificed for efficiency
- Feasibility of certain applications
- Run-time monitoring more feasible
- Example applications that perform continuous
testing
18Effectiveness of optimizations