Title: Client-Driven Pointer Analysis
1Client-Driven Pointer Analysis
- Samuel Z. Guyer
- Calvin Lin
- June 2003
2Using Pointer Analysis
- Pointer analysis not a stand-alone analysis
- Supports other client analyses
- Current practice
- Client analysis well focus on error detection
- Pointer analysis algorithm choose precision
Pointer Analyzer
Client Analysis
Error Detector
Memory Model
3Motivation
- Real-life scenario
- Check for security vulnerabilities in BlackHole
mail filter - Manually inspect reported errors
- One thing in common a string processing routine
- Clone procedure ad hoc context sensitivity
- Using CIFI, all 85 false positives go away
- Can we automate this process?
Pointer Analyzer
Error Detector
Memory Model
4Our solution
- Problems
- Cost-benefit tradeoff severe for pointer
analysis - Precision choices are too coarse
- Choice is made a priori by the compiler writer
- Solution Mixed precision analysis
- Apply higher precision where its needed
- Use cheap analysis elsewhere
- Key Let the needs of client drive precision
- Customized precision policy created during
analysis
5Client-Driven Pointer Analysis
- Algorithm
- Start with fast cheap analysis FI and CI
- Monitor how imprecision causes information loss
- Adapt Reanalyze with a customized precision
policy
Pointer Analyzer
Client Analysis
Memory Model
6Overview
- Motivation
- Our algorithm
- Automatically discover what the client needs
- Experiments
- Real programs and challenging error detection
problems - Related work and conclusions
7False Positives
Remote access vulnerability
sock socket(AF_INET, SOCK_STREAM,
0) read(sock, buffer, 100) execl(buffer)
!
8Client-Driven Pointer Analysis
Pointer Analyzer
Client Analysis
Memory Model
9Analysis framework
- Iterative dataflow analysis
- Pointer analysis flow values are points-to sets
- Client analysis flow values form typestate
lattice - Fine-grained precision policies
- Context sensitivity per procedure
- CS Clone or inline procedure invocation
- CI Merge values from all call sites
- Flow sensitivity per memory location
- FS Build factored use-def chains
- FI Merge all assignments into a single flow value
10Client-Driven Pointer Analysis
Pointer Analyzer
Client Analysis
Memory Model
11Monitor
- Runs alongside main analysis
- Monitors information loss
- Detects polluting assignments
- Merge two accurate flow values ? ambiguous value
- Tracks complicit assignments
- Passing an ambiguous value from one variable to
another - Records in a dependence graph
- For both pointer and client analyses
12Dependence graph (I)
- Polluting assignment
- Add a node for the variable annotate with a
diagnosis - Complicit assignment
- Add an edge from left side back to right side
13Dependence graph (II)
or
14Adaptor
DependenceGraph
- After analysis...
- Start at the maybe error variables
- Find all reachable nodes collect the diagnoses
- Often a small subset of all imprecision
15In action...
- Monitor analysis
- Polluting assignments
- Diagnose and apply fix
- In this case one procedure context-sensitive
- Reanalyze
!
16Programs
- 18 real C programs
- Unmodified source all the issues of production
code - Many are system tools run in privileged mode
- Representative examples
Name Description Priv Lines of code Procedures CFG nodes
muh IRC proxy ü 5K (25K) 84 5,191
blackhole E-mail filter ü 12K (244K) 71 21,370
wu-ftpd FTP daemon ü 22K (66K) 205 23,107
named DNS server ü 26K (84K) 210 25,452
nn News reader û 36K (116K) 494 46,336
17Methodology
- 5 typestate error checkers
- Represent non-trivial program properties
- Stress the pointer analyzer
- Compare client-driven with fixed-precision
- Goals
- First, reduce number of errors reported
- Conservative analysis fewer is better
- Second, reduce analysis time
18Results
Remote access vulnerability
10X
19Why it works
Name Total procs procedures context-sensitive procedures context-sensitive procedures context-sensitive procedures context-sensitive procedures context-sensitive
Name Total procs Remote Access File Access FSV RFSV FTP
muh 84 6
apache 313 8 2 2 10
blackhole 71 2 5
wu-ftpd 205 4 4 17
named 210 1 2 1 4
cfengine 421 4 1 3 31
nn 494 2 1 1 30
- Notice
- Different clients have different precision
requirements - Amount of extra precision is small
20Related work
- Pointer analysis and typestate error checking
- Iterative flow analysis Plevyak Chien 94
- Demand-driven pointer analysis Heintze
Tardieu 01 - Combined pointer analysis Zhang, Ryder, Landi
98 - Effects of pointer analysis precision Hind 01
others - More precision is more costly
- Does it help? Is it worth the cost?
21Conclusions
- Client-driven pointer analysis
- Precision should match the client and program
- Not all pointers are equal
- Need fine-grained precision policies
- Key knowing where to add more and what kind
- Roadmap for scalability
- Use more expensive analysis on small parts of
progams
22(No Transcript)
23Time
24Precision policies
Name procedures context-sensitive procedures context-sensitive procedures context-sensitive procedures context-sensitive procedures context-sensitive variables flow-sensitive variables flow-sensitive variables flow-sensitive variables flow-sensitive variables flow-sensitive
Name RA File FSV RFSV FTP RA File FSV RFSV FTP
muh 6 0.1 0.07 0.31
apache 8 2 2 10 0.89 0.18 0.91 1.07 0.83
blackhole 2 5 0.24 0.04 0.32
wu-ftpd 4 4 17 0.63 0.09 0.51 0.53 0.23
named 1 2 1 4 0.14 0.01 0.23 0.20 0.42
cfengine 4 1 3 31 0.43 0.04 0.46 0.48 0.03
nn 2 1 1 30 1.82 0.17 1.99 2.03 0.97
25(No Transcript)
26Error detection problems
- Remote access vulnerabillity
- File access
- Format string vulnerability (FSV)
- Remote FSV
- FTP behavior
Data from an Internet socket should not specify a
program to execute
Files must be open when accessed
Format string may not contain untrusted data
Check if FSV is remotely exploitable
Can this program be tricked into reading and
transmitting arbitrary files
27Annotations (I)
- Dependence and pointer information
- Describe pointer structures
- Indicate which objects are accessed and modified
procedure fopen(pathname, mode) on_entry
pathname --gt path_string mode --gt
mode_string access path_string,
mode_string on_exit return --gt new
file_stream
28Annotations (II)
- Library-specific properties
- Dataflow lattices
property State Open, Closed initially
Open property Kind File,
Socket Local, Remote
Remote
Local
Open
Closed
Socket
File
29Annotations (III)
- Library routine effects
- Dataflow transfer functions
procedure socket(domain, type, protocol)
analyze Kind if (domain AF_UNIX)
IOHandle lt- Local if (domain AF_INET)
IOHandle lt- Remote analyze State
IOHandle lt- Open on_exit return --gt new
IOHandle
30Annotations (IV)
- Reports and transformations
procedure execl(path, args) on_entry path
--gt path_string report if (Kind
path_string could-be Remote) Error at
callsite remote access procedure
slow_routine(first, second) when (condition)
replace-with quick_check(first)
fast_routine(first, second)
31Type Theory
- Equivalent to dataflow analysis (heresy?)
- Different in practice
- Dataflow flow-sensitive problems, iterative
analysis - Types flow-insensitive problems, constraint
solver - Commonality
- No magic bullet same cost for the same precision
- Extracting the store model is a primary concern
Remember Phil Wadlers talk?
32Is it correct?
- Three separate questions
- Are Sam Guyers experiments correct?
- Yes, to the best of our knowledge
- Checked PLAPACK results
- Checked detected errors against known errors
- Is our compiler implemented correctly?
- Flip answer whos is?
- Better answer testing suites
- How do we validate a set of annotations?
33Annotation correctness
- Not addressed in my dissertation, but...
- Theoretical approach
- Does the library implement the domain?
- Formally verify annotations against
implementation - Practical approach
- Annotation debugger interactive
- Automated assistance in early stages of
development - Middle approach
- Basic consistency checks
34Error Checking vs Optimization
- Optimistic
- False positives allowed
- It can even be unsound
- Tend to be may analyses
- Correctness is absolute
- Black and white
- Certify programs bug-free
- Cost tolerant
- Explore costly analysis
- Pessimistic
- Must preserve semantics
- Soundness mandatory
- Tend to be must analyses
- Performance is relative
- Spectrum of results
- No guarantees
- Cost sensitive
- Compile-time is a factor
35Complexity
- Pointer analysis
- Address taken linear
- Steensgaard almost linear (log log n factor)
- Anderson polynomial (cubic)
- Shape analysis double exponential
- Dataflow analysis
- Intraprocedural polynomial (height of lattice)
- Context-sensitivity exponential (call graph)
- Rarely see worst-case
36Find the error part 3
- State-of-the-art compiler
struct __sue_23 var_72 struct __sue_25 new_f
(struct __sue_25 ) malloc(sizeof (struct
__sue_25)) _IO_no_init( new_f-gtfp.file, 1, 0,
((void ) 0), ((void ) 0)) (
new_f-gtfp)-gtvtable _IO_file_jumps _IO_file_in
it( new_f-gtfp) if (_IO_file_fopen((struct
__sue_23 ) new_f, filename, mode, is32) !
((void ) 0))  var_72 new_f-gtfp.file  if
((var_72-gt_flags2 1) (var_72-gt_flags 8))
   if (var_72-gt_mode lt 0) ((struct __sue_23
) var_72)-gtvtable _IO_file_jumps_maybe_mmap
   else ((struct __sue_23 ) var_72)-gtvtable
_IO_wfile_jumps_maybe_mmap   Â
var_72-gt_wide_data-gt_wide_vtable
_IO_wfile_jumps_maybe_mmap  if
(var_72-gt_flags 8192U) _IO_un_link((struct
__sue_23 ) var_72) if (var_72-gt_flags 8192U)
status _IO_file_close_it(var_72) Â else status
var_72-gt_flags 32U ? - 1 0 (( (struct
_IO_jump_t ) ((void ) ( ((struct __sue_23 )
(var_72))-gtvtable) Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
(var_72)-gt_vtable_offset))-gt__finish)(var_72,
0) if (var_72-gt_mode lt 0) Â if
(((var_72)-gt_IO_save_base ! ((void ) 0)))
_IO_free_backup_area(var_72) if (var_72 !
((struct __sue_23 ) ( _IO_2_1_stdin_))
var_72 ! ((struct __sue_23 ) (
_IO_2_1_stdout_)) Â Â Â var_72 ! ((struct
__sue_23 ) ( _IO_2_1_stderr_)))
var_72-gt_flags 0 Â
free(var_72)
bytes_read _IO_sgetn(var_72, (char ) var_81,
bytes_requested)
37Challenge 2 Scope
- Call graph
- Objects flow throughout program
- No scoping constraints
- Objects referenced through pointers
- We need whole-program analysis
!
sock (AF_INET, SOCK_STREAM, 0)
(sock, buffer, 100) (ref)
socket
read
execl
38The Broadway Compiler
- Broadway source-to-source C compiler
- Domain-independent compiler mechanisms
- Annotations lightweight specification language
- Domain-specific analyses and transformations
- Many libraries, one compiler
39Security vulnerabilities
- How does remote hacking work?
- Most are not direct attacks (e.g., cracking
passwords) - Idea trick a program into unintended behavior
- Automated vulnerability detection
- How do we define intended?
- Difficult to formalize and check application
logic - Libraries control all critical system
services - Communication, file access, process control
- Analyze routines to approximate vulnerability
40End backup slides