Title: New Techniques in Static and Dynamic Analysis
1New Techniques in Static and Dynamic Analysis
- Dr. Howard Trickey
- Bell Laboratories
- trickey_at_lucent.com
2How can we check our code?
Use Run time inputs?
No
Yes
Yes
Use Behavior Specs?
No
3Finding Errors
- Error type
- syntactic
- type
- coding style
- corner cases (array bounds)
- algorithm errors
Caught by
compilers (automatic, fast)
static dynamic analyzers (automatic, slower)
verifiers (humanautomated,slowest)
4PART I Static Analysis
5Static Code Analysis
- Examine Source Code
- Look for (usually bad) properties
- Uninitialized variable usage
- NULL pointer dereferencing
- Out-of-bounds array access
- portability problems
- security problems
- coding style
- code complexity
UNO problems
6Some Static Analysis Tools
C/C
Java
FlexeLint
Coverity
Flawfinder
Klocwork
Orion
Discover
PolySpace
CodeAssure
FindBugs
Klocwork
Prexis
Fortify
CodeSurfer
PREfast
Fortify
Orion
ITS4
DMS
C (only)
Splint
Smatch
Blast
Uno
CQual
MOPS
7Static Analysis Problem False Errors
- false error reported by analyzer but not in fact
a latent error in program
1 int f(int x) 2 int y 3 if (x gt 0) y
x 4 if (x gt 3) x y 5 return x 6
3
(x 0)
(x gt 0)
y x
4
(x 3)
(x gt 3)
x y
5
return x
6
8FlexeLint on sendmail
- collect.c 316 Warning 644 Variable 'df' (line
73) may not have been - initialized
- deliver.c 4269 Warning 644 Variable 'buf'
(line 3821) may not have been - initialized
- deliver.c 5023 Warning 644 Variable 'mxprefs'
(line 4874) may not have been - initialized
- deliver.c 5146 Info 771 Symbol 'prefer' (line
5086) conceivably not - initialized
- deliver.c 5150 Info 771 Symbol 'rndm' (line
5087) conceivably not - initialized
- domain.c 298 Info 771 Symbol 'weight' (line
128) conceivably not initialized - map.c 766 Info 771 Symbol 'status' (line 664)
conceivably not initialized - map.c 2241 Warning 644 Variable 'vsize' (line
2189) may not have been - initialized
- map.c 7323 Info 771 Symbol 'v' (line 7276)
conceivably not initialized - mime.c 251 Warning 644 Variable 'argv' (line
117) may not have been - initialized
- mime.c 660 Info 771 Symbol 'bt' (line 106)
conceivably not initialized
False Error Rate 72
9Orion
- A static analyzer from Bell Labs offering
- analysis of C, C, Java (GCC parser)
- tunable speed/precision (immediate vs. overnight,
or continuous) - incremental interprocedural analysis
- built-in and user-defined checks
- aim at semantic errors (use-before def rather
than type mismatch) - concentrate on UNO errors first
- detailed error traces (code view GUI)
Orion authors Dennis Dams Kedar Namjoshi Nils
Klarlund Chris Conway
Emphasis Reduce false error rate
10Track More Dataflow Info?
- Data-flow analysis determines an approximation of
all possible run-time states of a program - by exploring all possible execution paths, but
keeping track only of a finite amount of info
about data values - Tracking more info (ptr aliasing, linear rels.
among integer variables, ) increases precision,
but at the cost of increased analysis time
11Orions Approach 2-Phase Analysis
Check feasibility of each path
Find potential error paths
Parsing (GNU gcc)
Report errors and warnings
1. Analysis with light-weight dataflow info
- 2. Tunable tradeoff speed vs. precision
- fast, approximate internal solvers
- slower, more precise external solvers
12Orion on sendmail
- (-_at_precision 2)
- deliver.c4269 (function deliver.cputbody)
use of un-initialized variable(s) buf - domain.c1070 (function domain.cgethostalias)
possible null pointer(s) p - map.c766 (function map.cgetcanonname) use
of un-initialized variable(s) status - mime.c660 (function mime.cmime8to7) use of
un-initialized variable(s) bt
False Error Rate 0
13Orion GUI
14Orion on libxslt-1.1.9
- 1 CC"cco -_at_precision 2"
- 2 make e
- namespaces.c498 (function namespaces.cxsltCopyNa
mespaceList) possible null pointer(s) node - xsltutils.c1138 (function xsltutils.cxsltDefault
SortFunction) use of un-initialized
variable(s) tempstype - xsltutils.c1143 (function xsltutils.cxsltDefault
SortFunction) use of un-initialized
variable(s) temporder
15Developers Feedback
- ------- Additional Comments From
wbrack_at_mmm.com.hk 2004-10-23 1242 ------- - Good catch! The potential problem happens
because the routine is meant to be - used with or without a node specified when the
node is specified, additional - checks are made to assure duplicates are not
generated. When there is no - node specified, however, those checks need to be
bypassed. - I enhanced the code try to better avoid this
problem - fixed code is in CVS. - Please take a look and see if you agree with the
mod I made. - Thanks for the report.
- Bill
- ------- You are receiving this mail because
------- - You reported the bug.
16PART II Dynamic Analysis
17Dynamic Code Analysis
- Run executables on sample inputs
- Often have binary ? source code map
- Gather statistics, often correlated to source
units - time, memory, other resource usage
- Look for bad behavior
- uncaught exceptions
- ASSERT failures
- dynamic memory errors (failure to free, double
free) - security problems
18Some Dynamic Analysis Tools
C/C
Mpatrol
DART
Checker
Many
Memory Interceptor Library
Purify
Electric Fence
Valgrind
VeriSoft
Other Rational Tools
Microsoft Application Verifier
19VeriSoft
- Tool for systematic testing of concurrent systems
- VeriSoft
- automatically searches for coordination problems
(deadlocks, assertion violations) - single framework for test generation/execution/eva
luation - interactive graphic simulator can drive existing
debuggers - can quickly reveal behaviors that are virtually
impossible to detect using conventional testing
techniques - source code for all of tested program is not
necessary - VeriSoft is by Patrice Godefroid (Bell Labs)
20VeriSofts Approach
- Controls and observes execution of concurrent
processes of system under test by intercepting
system calls (communication, assertion
violations, etc.) - Systematically drives the system along all paths
(automatically generate, execute, and observe
many scenarios) - From a given initial state, guarantee complete
coverage of state space up to some finite depth
s0
VeriSoft
deadlock
A
B
C
System Processes
21Example Air Conditioner controller
- void AC_controller()
-
- char message
- int is_room_hot0 / initially, room is
not hot / - int is_door_closed1 / and door is closed /
- int ac0 / so, ac is off /
- while (1)
- message(char )rcv_from_queue(to_me,QSZ)
- if (strcmp(message,"room_is_hot") 0)
- is_room_hot1
-
- if (strcmp(message,"room_is_cool") 0)
- is_room_hot0
- if (strcmp(message,"open_door") 0)
- is_door_closed0
- ac0
-
- if ((strcmp(message,"close_door") 0))
- is_door_closed1
- if (is_room_hot)
- ac1
-
- / test /
- if (is_room_hot is_door_closed)
- VS_assert(ac)
-
-
22Test Harness
- void Environment()
-
- char message
- message(char )malloc(100)
- while (1)
- switch(VS_toss(3))
- case 0 sprintf(message,"room_is_cool")
break - case 1 sprintf(message,"room_is_hot")
break - case 2 sprintf(message,"open_door")
break - case 3 sprintf(message,"close_door")
break -
- send_to_queue(from_me, QSZ, message)
-
-
-
23The VeriSoft Simulator
24VeriSoft Users and Applications
- VeriSoft 2.0 available outside Lucent since 1999
- 100s of licenses in 25 countries, academia
industrial - free download at http//www.bell-labs.com/projects
/verisoft - A number of Lucent success stories
- e.g., several hard-to-find races in 4ESS
heartbeat monitor - Factors that limit its use
- need to close the environment
- need to put in ASSERTs for properties to test
25DART
- DART Directed Automated Random Testing
- New approach to automatic generation of Unit
Tests - Approach
- Automatic identification of interface
- Automatic generation of random test driverfor
that interface - Directed search
DART authors Patrice Godefroid Nils
Klarlund Koushik Sen
No need to write test driver or harness code!
26Directed Search
- dynamic analysis of random execution and
generation of new test inputs to drive the next
execution along an alternative execution path - collect symbolic constraints at branch points
(whenever possible) - negate one constraint at a branch point to take
other branch (say b) - call constraint solver to generate new test
inputs - Next execution driven by these new test inputs to
take alternative branch b - check with dynamic instrumentation that branch b
is indeed taken
27Example of Directed Search by DART
int is_room_hot0 / initially, room is not
hot / int is_door_closed0 / and door is
closed / int ac0 / so, ac is off
/ process_message(int message) if
(message 0) is_room_hot1 if
(message 1) is_room_hot0 if
(message 2) is_door_closed0 ac0
if (message 3) is_door_closed1
if (is_room_hot) ac1 /
test / if (is_room_hot is_door_closed
!ac) assert(0) / property violated /
- First execution
- process_message(8782590)
- process_message(-5278291)
- Next input sequence 8782590 3
- Second Execution
- process_message(0)
- process_message(0)
- Next input sequence 0 3
- .
- ..
- Next input sequence 3 0
- Some Execution
- process_message(3)
- process_message(0)
- assertion violated!!
28DART case study
- Tested a C implementation of a security protocol
(Needham-Schroeder) with a known attack - 406 lines of code
- Took less than 1 minute on a 2GHz machine to
discover the attack - In contrast, a software model-checker and a
hand-written nondeterministic model of the
attacker took hours to discover the attack
29Bigger DART Case Study - oSIP
- Open Source SIP
- 30,000 lines of C
- 600 externally visible functions
- DART runs limited to 1,000 runs/function
- crashed 65 of external functions (mostly NULL
arguments API is not well-documented, though
some functions do check for NULL arguments) - found significant security vulnerability
constructed message that crashes SIP parser
30Conclusions
- Static and Dynamic Analysis tools are advancing
to - decrease the false error rate
- user-customized analyses (e.g., locking
protocols) - find harder bugs with minimal amount of behavior
specification from the user - more exhaustive search of run time possibilities
- Software Developers should be using these tools
to improve the quality of their code