Title: iDNA:%20Time%20Travel%20Debugging
1Instruction-level Tracing Framework
Applications
Sanjay Bhansali Binary Technologies
Group Center for Software Excellence
(CSE) Microsoft
11/04/2005
2Context
- Program analysis and transformation technology
can have huge impact on engineering of software. - Center for Software Excellence
- Part of Windows Core OS Division
- Balance research on innovation with focus on
deployment - Binary Technologies Group
- Binary analysis
- Static and Dynamic approaches
3Outline
- Applications of Execution Traces
- Dynamic Translation
- Trace Capture
- Trace Replay
- Applications
- Related Work
- Summary
4Applications of Execution Traces
- Debugging
- Regression Analysis
- Bug detection
- Coverage Analysis
- Optimization
- Impact analysis
- Usage analysis
5Run Once, Analyze Many
- Complete instruction-level trace
- Deterministic, full fidelity replay of user mode
execution - Pros
- Run once, analyze multiple times
- Cons
- Trace size, performance
6Framework for Instruction level Tracing and
Analysis
- Task and machine independent
- User mode processes
- Modest overhead (space and time)
- On-demand tracing
- Reduce engineering effort for building analysis
tools
7Dynamic Binary Translation
- Runtime interpretation/translation of binary
instructions - Pros
- Requires no static instrumentation, or special
symbol information - Handle dynamically generated code, self modifying
code - Cons
- Approximately 5x slower than native execution
8Nirvana Architecture
Nirvana Client
Nirvana API
Code Cache
JIT translator
Application
VM monitor
User
Kernel
Nirvana Driver
Operating System
9JIT Translation Example
Translated code
Native code
mov EDX, tls.ebp mov EAX, EDX
mov eax, ebp
10JIT Translation Example
Translated code
Native code
mov EDX, tls.ebp mov ECX, tls call
MemReadCallback mov EAX, EDX
mov eax, ebp
11Code Cache Management
- Single code cache
- Contention, locality
- Per Thread code cache
- Code bloat
- Pd code caches where
- P number of processors
- Reuse code caches when possible
- Fall back on interpretation
12Self modifying code
- Snoop on system calls to flush hardware cache
- Watch page protection of code bytes
- Mark page if non-writable, and flush code cache
on page protection change - Insert self-mod instruction check otherwise
- Fall back on interpretation if too many code
cache flushes
13Nirvana API
- RegisterEventCallback(event,callback)
- Events
- Translation
- InstructionStart
- MemRead
- MemWrite
- FlowChange
- Sequencing
14Example Nirvana Client
/ Memory Read Logger / bool Initialize()
if (!InitializeNirvanaClient)
RegisterCallback(MemReadEvent, MemCallback)
void MemCallback(NirvContext ctx, void pAddr,
int nBytes) X86REGS pRegs (X86Regs)
ctx-gtcpuRegs Log(pregs-gtInstructionPtr(),pAdd
r,nBytes)
15Tracing Replay Overview
Playback Process
Record Process
gtgt
ltlt
Application
Nirvana
Emulation
Replay
Defect
Trace Writer
Trace Reader
Debugger
Nirvana
Trace Log
Different Machines
16Trace Writer
- Log only what cannot be regenerated by processor
- Values read from memory
- Values changed by kernel
- Machine and time sensitive instructions
(cpuid,rdtsc) - Everything else can be regenerated
- Trace size is 4-5 bytes per instruction
17Optimization Trace select reads
- Observation Hardware caches eliminate most
off-chip reads - Use same trick to optimize logging
- Have logger and replayer simulate identical cache
memories - Only log cache misses
- Average trace size is lt1 bit per instruction
18Example
- The only read not predicted and logged follows
the system call
19Sequence points Checkpoints
Kernel/User
User/Kernel
lock xadd
Kernel/User
Exception
Module Load
- Tracing uses per-thread streams for performance
- Sequence points used to impose partial order on
instruction executions across threads - Checkpoint frames for random access into the
trace (every 5 million instructions)
20Trace Writer Performance
Application Simulated Instructions (millions) Trace File Size Trace File Bits / Instruction Native Execution Time Execution Time While Tracing Execution Overhead
Gzip 24,097 245 MB 0.09 11.7s 187s 15.98
Excel 1,781 99 MB 0.47 18.2s 105s 5.76
Power Point 7,392 528 MB 0.60 43.6s 247s 5.66
IE 116 5 MB 0.50 0.499s 6.94s 13.90
Vulcan 2,408 152 MB 0.53 2.74s 46.6s 17.01
Satsolver 9,431 1300 MB 1.16 9.78s 127s 12.98
21Trace Reader - Replay
- Nirvana requests code data via the Fetch
operations - TraceReader uses same prediction cache as
TraceWriter
Instruction Fetch
Trace Log
Data Read
Nirvana
Miss
Data Fetch
Prediction Cache
Data Write
22Trace Reader - Navigation
Current Position
Destination
2
3
4
5
8
7
6
Checkpoint Frame
1
- Navigation involves going back to the closest
Checkpoint frame before the destination and
executing forward to the destination from there.
23Trace Reader - Navigation
Current Position
Destination
2
3
4
5
8
7
6
1
Checkpoint Frame
- Navigation involves going back to the closest
Checkpoint frame before the destination and
executing forward to the destination from there.
24Trace Reader - Navigation
Current Position
Destination
2
3
4
5
8
7
6
1
Checkpoint Frame
- Navigation involves going back to the closest
Checkpoint frame before the destination and
executing forward to the destination from there.
25Time Travel Debugging
- Examine a program as it runs backwards to figure
out root cause of a problem. - Reverse breakpoint
- Step back
- Search backwards in time
- Used to diagnose bugs in shipped products
26Truscan Defect Detection Tool
- Scan traces for bugs that hide
- memory leaks
- dangling pointers
- un-initialized memory
- Report bugs that really happen no false
positives - Debug with time travel debugging
27Example Memory Leak Detection
eax HeapAlloc(42)
mov 0x4004, eax
ADDR 0x3004 SIZE 42 REFCOUNT
eax
0
1
2
0x4004
eax 0
Leak!!
mov 0x4004, 0
- This example is trivial, but
28Statistics
- A Windows Application (under development)
- 600 million instructions
- 80,000 allocations
- 30 million pointers
- 48 leaks (8 unique bugs)
- Native 9 seconds
- Trace 44 seconds
- Analyze 41 minutes
- (3 Ghz, single threaded, 1GB ram)
29Regression Analysis
OS1 OS2
App 1 ? ?
. . .
TraceDiff
30Related Work
- Process Virtualization
- DynamoRIO, Mojo, DELI, ReVirt, Valgrind
- Instrumentation
- ATOM, Vulcan, SHADE, Pin
- Trace Compression
- VPC
- Reverse Debugging
- ReVirt, Traceback, BugNet, Flashback, FDR
- Program/Trace Diffing Applications
- Zeller, ZhangGupta
31Summary
- Flexible framework for instruction level tracing
and analysis - Complete full-fidelity traces
- Run once, analyze multiple times
- Reasonable overhead
- Many useful applications
- Debugging, defect detection, optimization,
32Shameless self promotion?!
- Hiring for internships and full-time positions at
all levels - Contact sanjaybh_at_microsoft.com
33Questions