Title: CS343: Advanced Technology for Software Productivity
1- CS343 Advanced Technology for Software
Productivity
2Program Errors are Rampant
- Catastrophic errors
- Therac-25, Ariane-5, Mars Orbiter
- Numerous security vulnerabilities (Cert
Coordination center) - Products ship with a long list of known errors
3Software Reliability
- 1.2 bugs per 400 lines of codes, DeMarco and
Lister - NT has 50M lines of code
- 50 of the people at Microsoft do testing and
the rest of the 50 spend 50 of their time
testing, Gates - All the people who wrote this code have left
and I have to maintain the system, a CS 343
student
4Improving Software Productivity
- Language support to reduce errors
- garbage collection
- Tools for finding errors in existing programs
Verification Specification Programmers
resistance A potential source of errors Theorem
Proving Limited to small programs success with
partial correctness
Expected Behavior
Program Behavior
5Partially Correct Consistency Checkers
Add consistency rules to compilers e.g.
free(p) p Unsound incomplete
checkers Found numerous security holes in
Linux, FreeBSD, etc. (meta-compilation) Found
too many errors to fix in many commercial system
(PreFIX)
Expected Behavior
Program Behavior
6Confession
- Deep program analysis is cool!
- Overkill for optimizations!
- Software comprehension requires all the deep
program analysis
7Questions to Answer
- What kind of errors are there?
- How do we know what the expected behavior is?
- How do we check the programs behavior?
- What should / can be done by
- the user
- the compiler
- the runtime system
- What are the existing static analysis techniques?
- What are the opportunities?
- Difference between optimizations and program
comprehension
8This Course
- Objective
- a research proposal at the end of the term
- knowledge of the state of the art
- originality
- an argument for why the idea is worth pursuing
- propose an algorithm
- can include preliminary experimental
evidence(typical of research proposals) - Pre-requisite
- CS 243 or equivalent
9Format
- Learn the state of the art through reading and
discussing papers - A group of 2 leads a discussion each class after
meeting with me ahead of time - Evaluation
- a research proposal (a write up a
presentation)at the end of the term - presentation of papers read
- in-class discussion
- no exams
10What Kind of Errors Are There?
- What is the effect of the errors?
- Buffer overruns (over 50 of the attacks)
- What are the source of errors?
- Consistency errors
- language-related e.g. malloc free
- application-related e.g. lock unlock
- Misinterpretation of the specs
- Unexpected inputs
- Algorithmic errors
- Hidden errors vs. crashes
- What is the complexity in finding the errors?
11How to Get the Expected Behavior?
- Hardwiredcommon errors related to a programming
language - Application-specific, hardwired into the compiler
- User specified
- Assertions in program text
- Extended typing state carried by the variable
type - Aspect programming at all sites calling f, p
is true - A program query language
- Automatically inferred
- dynamically (statistical)
- statically (statistical or confidence assignment)
12Static/Dynamic Error Detection
- Static analysis
- Finding inconsistencies by analyzing across
infrequently executed paths and procedures - Intrinsa PREfix
- Englers meta-compilation(statistical inference
of behaviorSOSP paper) - Buffer over-runs
- A dynamic technique (Cowan)
- How hard is the static analysis required
(assignment) - Fully automatic dynamic error detection (Hangal)
- Finds hard, hidden errors
- Helps debugging
13Deep Static Program Analysis
- Data flow pointers
- p 1
- q 2
- p
- Requires interprocedural analysis
- Solution is large (unlike bit-vector problems)
- Control flow
- Call graph higher order functions
- C function pointers
- virtual method invocations
- Path sensitive
145. Concrete-Type Inference
- Determine the type of objects to determine the
virtual method invoked - Developed primarily to reduce the overhead
- Dynamic techniques
- Profile and test for common cases, instead of a
general virtual table lookup - Static analysis
15Type-Based Analysis
- Originally developed for functional languages
like ML - Types are not declared
- Better support for polymorphism
- Constants are typed infer the most general type
of variables x y 3 gt x, y are
integers a b gt a, b
have the same type x a gt x,y,a,b are
all integers - foo(x, f) f(x)
gt Let type(x)a, f returns type b
then foo a x (a -gt b ) -gt b
16Why is Type Analysis Interesting?
- Type inferencing algorithms are used for program
analysis - Handles higher-order functions
- Polymorphism context-sensitive
- Algorithm expressed as
- unification
- semi-unification (theoretically undecidable)
- Originally flow-insensitive, has been extended
to flow-sensitive analysis
17Extended Types
- Different units
- integers representing a year (2002)should not
be added to a dollar amount 10,000 - Integers have different types as long as they are
not used together - Pointer aliases
- Two pointers never pointing to the same type
cannot be aliased
186. Types Pointer Intro
- Introduction Schwartzbach
- Explains notations and summarizes basic results
- Context-insensitive and flow-insensitive pointer
analysis - Basic Steensgaard
- Probably the most well-read pointer alias paper
- Ultra-fast aliasing analysis, improves Andersons
algorithm (considered too expensive in the past
0(n3)) Heintz - A good trick, useful for other analyses
197. Applications of Type-Based Analysis
- Steensgaards pointer analysis concrete type
inferencing Liang - Using types to find bugs Aikens group
208. User-Specified Types
- Instead of flagging potential errors in
programs,can we guarantee that the program is
correct? - Create user-specified types to model the state of
resource Deline, Fahndrich - e.g. an open file, a closed file go in different
variables.
219-10. Context-Sensitive Analysis
- Lackwit a context-sensitive, type based error
checker OCallahan - A type-based context-sensitive pointer alias
analysis Fahndrich - Partial transfer functions, a context-sensitive
flow-sensitive pointer alias analysis Wilson - the most ambitious
- Both context-sensitive analyses are still
impractical
22Questions
- What kind of pointer alias analysis do we need
for program comprehension? - The real tools are unsound!
- How do we selectively use context sensitivity?
- Different kinds of pointers require different
treatments - What are the algorithmic ideas in type analysis?
23Additional Topics
- Flow-sensitive types How to get the effect of
flow sensitivity in type- based algorithms? - Demand-driven analysisCan we spend more time
selectively? - Object-oriented analysisCan we take advantage
of the organization of OO programs to get more
advanced knowledge? - A fully automatic C memory error detector
24Additional Topics
- Symbolic analysis How to reason about the values
of programs? - Path-sensitive analysis and model checkingHow to
reason about the correlations of paths? - Garbage collection What are the basic
techniques?
25Tentative Schedule
- Monday Wednesday
- 4/1 1. Intro
- 4/8 2. Meta-comp, PREfix 3. Buffer-ovf
- 4/15 4. Dynamic techniques 5. Concrete type
inference - 4/22 6. Type ptr intro 7. Applications of
types - 4/29 8. User-specified types 9.
Context-sensitive - 5/6 10. Context-sens. ptr 11. Flow-sensitive
types - 5/13 -- discussion -- 12. Demand-driven analysis
- 5/20 13.Symbolic analysis 14. OO analysis
- 5/27 -- holiday -- 15. Model chk/path sens.
- 6/3 16. Garbage collection -- presentation --
26Next week
- Monday
- Discuss Englers meta-compilation and PREfix
papers - Wednesday
- Buffer overrun (dynamic)
27Assignment (by next Wed)
- One (or more) case studies of buffer overrun
errors - Enter your name and program studied to eliminate
duplication http//suif.stanford.edu/courses/cs34
3/ - Resources
- http//www.kb.cert.org/
- http//www.securityfocus.com/
- http//www.securiteam.com/exploits/
- http//www.tlsecurity.net/archive/vuln/linux/
- Describe the problem
- Include code snippets
- Discuss the static analysis required to identify
the error automatically (or is it not possible?)