Static EXtended Checking for Cyclone - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Static EXtended Checking for Cyclone

Description:

insert meta-data & tests. use global optimizer to eliminate ... quantifier-free 1st-order, multi-sorted logic. Calculate verification conditions (VCs) ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 66
Provided by: gregm162
Category:

less

Transcript and Presenter's Notes

Title: Static EXtended Checking for Cyclone


1
Static EXtended CheckingforCyclone
  • Greg Morrisett

2
Collaborators
  • Yanling Wang (Cornell)
  • Aleks Nanevski (Harvard)
  • Amal Ahmed (Toyota Technical Inst.)
  • Lars Birkedal (ITU Denmark)

3
Cyclone
  • A type-safe dialect of C.
  • Goals
  • well-typed program won't crash
  • looks smells like C
  • familiar syntax and semantics
  • retain control over boxing, allocation,
    bit-twiddling, and to some degree, memory
    management.
  • some useful additions polymorphism, subtyping,
    tagged unions, pattern matching, exceptions, etc.
  • plays nicely with C

4
Lots of other tools for C
  • e.g., LCLint, Jeckyll, Metal, ...
  • But Ccured Cyclone focus on soundness.
  • Ccured Scheme
  • no annotation by programmer
  • insert meta-data tests
  • use global optimizer to eliminate overheads
  • Cyclone Modula
  • programmer must annotate types
  • no run-time type tests
  • local analysis only

5
Cyclone/GCC vs. Java/GCC
6
  • Macro-benchmarks
  • Ported a variety of security-critical apps
  • little overhead (e.g., 2 for the Boa
    Webserver.)
  • lots of bugs found

7
AES C version
  • int cipherUpdateRounds(cipherInstance cipher,
    keyInstance key, BYTE input,
  • int inputLen, BYTE
    outBuffer, int rounds)
  • int j, t
  • word8 block4MAXBC
  • if (cipher NULL key NULL
    cipher-gtblockLen ! key-gtblockLen)
  • return BAD_CIPHER_STATE
  • for (j 0 j lt cipher-gtblockLen/32 j)
  • for(t 0 t lt 4 t) blocktj
    input4jt 0xFF
  • switch (key-gtdirection)
  • case DIR_ENCRYPT rijndaelEncryptRound(block,
    key-gtkeyLen, cipher-gtblockLen,
    key-gtkeySched, rounds)
  • break
  • case DIR_DECRYPT rijndaelDecryptRound(block,
    key-gtkeyLen, cipher-gtblockLen,
    key-gtkeySched, rounds)
  • break
  • default return BAD_KEY_DIR
  • for (j 0 j lt cipher-gtblockLen/32 j)
  • for(t 0 t lt 4 t) outBuffer4jt
    (BYTE) blocktj

8
AES Cyclone version
  • int cipherUpdateRounds(cipherInstance cipher,
    keyInstance key, BYTE ?input,
  • int inputLen, BYTE
    ?outBuffer, int rounds)
  • int j, t
  • word8 block4MAXBC
  • if (cipher NULL key NULL
    cipher-gtblockLen ! key-gtblockLen)
  • return BAD_CIPHER_STATE
  • for (j 0 j lt cipher-gtblockLen/32 j)
  • for(t 0 t lt 4 t) blocktj
    input4jt 0xFF
  • switch (key-gtdirection)
  • case DIR_ENCRYPT rijndaelEncryptRound(block,
    key-gtkeyLen, cipher-gtblockLen,
    key-gtkeySched, rounds)
  • break
  • case DIR_DECRYPT rijndaelDecryptRound(block,
    key-gtkeyLen, cipher-gtblockLen,
    key-gtkeySched, rounds)
  • break
  • default return BAD_KEY_DIR
  • for (j 0 j lt cipher-gtblockLen/32 j)
  • for(t 0 t lt 4 t) outBuffer4jt
    (BYTE) blocktj

9
Refined pointer qualifiers
  • Fat pointers arbitrary arithmetic but the
    representation is different (3 words)
  • char ? a "fat" pointer to a sequence of
    characters.
  • numelts(s) returns number of elements in
    sequence s (0 when s
    NULL)
  • Thin pointers same representation as C, but
    restrictions on pointer arithmetic.
  • char NULL or a pointer to at least one
    character.
  • char _at_ a pointer to at least one character.
  • char _at_numelts42 pointer to a sequence of 42
    characters.

10
Fat pointers
  • To support dynamic checks, we must insert extra
    information (e.g., bounds for an array)
  • Similar to Ccureds representation.

11
Can we eliminate fat pointers?
  • Fat pointers are convenient, but
  • expensive
  • break compatibility
  • give us failure points
  • (make concurrency hard)

12
Aha! Dependent Types!
  • void f(int n, int An)
  • Good code pass lengths around.
  • e.g., strcpy vs. strncpy
  • But dependency raises subtle issues and only
    solves part of the problem.

13
Problem 1 Imperative Code
  • void f(int n, int An)
  • n n 1
  • ...
  • n random()
  • ...
  • Whats the type of A now?

14
Aha! Use const
  • void f(const int n, int An)
  • n n 1
  • ...
  • Now we get a type-error.
  • Alternatively, use DML-style indices...

15
DML Xi Pfenning
  • void fltigt(int(i) n, int Ai)
  • n n 1
  • ...
  • Again, a type-error.

16
Indices vs. Dependency
  • Indices are verbose.
  • looks like were duplicating the length
  • But they have advantages
  • can make index language small
  • can rule out side-effects
  • good for automating type-checking and constraint
    solving
  • dont always need the length at run-time

17
For Example...
  • int cb(int f(int An),
  • int An)
  • return f(A)
  • // Where is n bound?
  • int g(int A8) ...
  • int X8 0,1,2,3,4,5,6,7
  • cb(g,X)

18
Indices are static
  • int cbltigt(int f(int Ai),
  • int Ai)
  • return f(A)
  • int g(int A8) ...
  • int X8 0,1,2,3,4,5,6,7
  • cblt8gt(g,X)

19
Oops!
  • void fltigt(int(i) n,
  • int Ai)
  • A A 1
  • ...
  • Do we add index pointers?
  • And how do we track lower-bounds?

20
DML doesnt work here...
  • There are a bunch of other problems
  • How do I validate Ax when A inti?
  • Need to prove x lt i
  • But x is usually a mutable loop variable
  • for (x0 x lt n x) ...Ax...
  • And what about pointer arithmetic?

21
Aha! Hoare Logic!
  • int f(int n, int A)_at_requires(n
    numelts(A))_at_ensures(0 lt result lt n)
  • Hoare-Style Specifications.
  • Say, like ESC/Java or SPLint.

22
Lots of Advantages
  • Type no longer requires length
  • so you dont have to pass it around just to have
    it in scope.
  • Relations between variables
  • i lt n n lt numelts(A)
  • awkward in dependent setting
  • Relations can vary at program point.
  • nnumelts(A) n n-1numelts(A)
  • nnumelts(A) A nnumelts(A-1)
  • so imperative updates not a problem

23
Static EXtended Checking
  • An old idea
  • Added support for specs
  • _at_requires, _at_ensures, _at_throws, _at_assert
  • _at_check, subset types
  • quantifier-free 1st-order, multi-sorted logic
  • Calculate verification conditions (VCs)
  • strongest post-conditions
  • monadic interpretation
  • care in representation of predicates
  • simple loop invariants
  • Throw at VCs at a naive prover

24
Example strcpy
  • strcpy(char ?d, char ?s)
  • while (s ! 0)
  • d s
  • s
  • d
  • d 0

Run-time checks are inserted to ensure that s
and d are not NULL and in bounds. 6 words
passed in instead of 2.
25
Better
  • strcpy(char ?d, char ?s)
  • unsigned i, n numelts(s)
  • assert(n lt numelts(d))
  • for (i0 i lt n si ! 0 i)
  • di si
  • di 0

This assert is dynamic. But its presenceis
enough to eliminate the checks.
26
Even Better
  • strncpy(char d, char s, uint n)
  • _at_assert(n lt numelts(d) n lt
    numelts(s))
  • unsigned i
  • for (i0 i lt n si ! 0 i)
  • di si
  • di 0

No fat pointers or dynamic checks. But caller
must statically satisfy the pre-condition.
27
In Practice
  • strncpy(char d, char s, uint n)
  • _at_check(n lt numelts(d) n lt
    numelts(s))
  • unsigned i
  • for (i0 i lt n si ! 0 i)
  • di si
  • di 0

If caller can establish pre-condition, no
check. Otherwise, an implicit check is
inserted. Clearly, checks are a limited class of
assertions.
28
Throws Specs
  • val_t lookup(key_t x)
  • _at_ensures(x ! NULL result ! NULL)
  • _at_throws(x NULL exn NullExn
  • exn LookupFail)
  • void insert(key_t k, val_t v)
  • _at_requires(k!NULL v!NULL)
  • _at_throws(false)

29
Subset Types
  • typedef _at_subset(int x x gt 0) pos_t
  • typedef struct
  • int n
  • int A
  • pf_t
  • typedef _at_subset(pf_t p p.nnumelts(p.A))
  • fat_t

30
Subset Restrictions
  • Cant talk about contents of pointers
  • typedef _at_subset(int p p 42) T
  • int x 42
  • T p (T)x
  • x
  • No pointers into middle of subset
  • int x3 0,1,2
  • fat_t f 3, x
  • int i f.n
  • i i 1

31
How about the prover?
  • For the 165 files (78 Kloc) that make up the
    standard libraries and compiler
  • CLibs stdio, string,
  • CycLib list, array, splay, dict, set, bignum,
  • Compiler lex, parse, typing, analyze, xlate to
    C,
  • with almost no specifications, eliminated 96 of
    the (static) checks
  • null 33,121 out of 34,437 (96)
  • bounds 13,402 out of 14,022 (95)
  • 225s for bootstrap compared to 221s with all
    checks turned off (2 slower) on this laptop.
  • Optimization standpoint seems pretty good.

32
Not all Rosy
  • Don't do as well at array-intensive code.
  • For instance, on the AES reference
  • 75 of the checks (377 out of 504)
  • 2 slower than all checks turned off.
  • 24 slower than original C code.(most of the
    overhead is fat pointers)
  • The primary culprits
  • loop invariants are too weak
  • lack of context (i.e., pre/post-conditions)
  • prover only understands limited constraints.

33
Challenges
  • Assumed I could use off-the-shelf technology.
  • But ran into a few problems
  • scalable VC generation
  • previously solved problem (see ESC guys.)
  • but entertaining to rediscover the solutions.
  • usable theorem provers
  • (not the real focus.)
  • some foundational issues
  • semantic model, soundness
  • see ICFP06 paper

34
Verification-Condition Generation
  • We started with textbook strongest
    post-conditions
  • SPx e A Aa/x ? xea /x (a fresh)
  • SPS1S2 A SPS2 (SPS1 A)
  • SPif (e) S1 else S2 A
  • SPS1(A ? e?0) ? SPS2(A ? e0)

35
1st Problem with Textbook SP
  • SPx e A Aa/x ? xea/x
  • What if e has effects?
  • In particular, what if e is itself an assignment?
  • Solution use a monadic interpretation
  • SP Exp ? Assn ? Term ? Assn
  • Terms are pure (i.e., indices!)

36
For Example
  • SPx A (x, A)
  • SPe1 e2 A let (t1,A1) SPe1 A
  • (t2,A2) SPe2 A1
  • in (t1 t2, A2)
  • SPx e A let (t,A1) SPe A
  • in (ta/x, A1a/x ? x ta/x)

37
One Issue
  • Of course, this over sequentializes the code.
  • C has very liberal order of evaluation rules
    which are hopelessly unusable for any sound
    analysis.
  • So we force the evaluation to be left-to-right
    and match our sequentialization.

38
Next Problem Diamonds
  • SPif (e1) S11 else S12
  • if (e2) S21 else S22
  • ...
  • if (en) Sn1 else Sn2A
  • Textbook approach explodes paths into a tree.
  • SPif (e) S1 else S2 A
  • SPS1(A ? e?0) ? SPS2(A ? e0)
  • This simply doesn't scale.
  • e.g., one procedure had assn with 1.5B nodes.
  • WP has same problem. (see Flanagan Leino)

39
Hmmma lot like naïve CPS
Duplicate result of 1st conditional which
duplicatesthe original assertion.
  • SPif (e1) S11 else S12
  • if (e2) S21 else S22 A
  • SPS21 ((SPS11(A ? e1?0) ?
    SPS12(A ? e10)) ? e2?0)
  • ?
  • SPS22 ((SPS11(A ? e1?0) ?
    SPS12(A ? e10)) ? e20)

40
Aha! We need a let
  • SPif (e) S1 else S2 A
  • let XA in (e?0 ? SPS1X) ? (e0 ?
    SPS2X)
  • Alternatively, make sure we physically share A.
  • Oops
  • SPx e X Xa/x ? xea/x
  • This would require adding explicit substitutions
    to the assertion language to avoid breaking the
    sharing.

41
Handling Updates (Necula)
  • Factor out a local environment A xe1 ?
    ye2 ? ? Bwhere neither B nor ei contains
    program variables (i.e., x,y,)
  • Only the environment needs to change on update
    SPx 3 xe1 ? ye2 ? ? B
    x3 ? ye2 ? ? B
  • So most of the assertion (B) remains unchanged
    and can be shared.

42
So Now
  • SP Exp ? (Env ? Assn) ? (Term ? Env ? Assn)
  • SPx (E,A) (E(x), (E,A))
  • SPe1 e2 (E,A)
  • let (t1,E1,A1) SPe1 (E,A)
  • (t2,E2,A2) SPe2 (E,A1)
  • in (t1 t2, E2, A2)
  • SPx e (E,A)
  • let (t,E1,A1) SPe (E,A)
  • in (t, E1xt, A1)

43
Or as in Haskell
  • SPx lookup x
  • SPe1 e2 do t1 ? SPe1
    t2 ? SPe2
  • return t1 t2
  • SPx e do t ? SPe set x t
  • return t

44
What about Memory?
  • As in ESC, use a functional array
  • terms t upd(tm,ta,tv) sel(tm,ta)
  • with the environment tracking mem
  • SPe do a ? SPe
    m ? lookup mem return
    sel(m,a)
  • SPe1e2 do a ? SPe1
  • b ? SPe2
  • m ? lookup mem
    set mem upd(m,a,b)
    return b

45
Note
  • Monadic encapsulation crucial from a software
    engineering point of view
  • actually have multiple out-going flow edges due
    to exceptions, return, etc.
  • (see Tan Appel, VMCAI'06)
  • so the monad actually accumulates (Term ? Env ?
    Assn) values for each edge.

46
Diamond Problem Revisited
  • SPif (e) S1 else S2 xe1 ? ye2 ? ? B
  • (SPS1 xe1 ? ye2 ? ?B?e?0) ?
  • (SPS2 xe1 ? ye2 ? ?B?e0)
  • (xt1 ?yt2? ? B1) ?
  • (xu1?yu2 ? ? B2)
  • xax ? yay ? ?
  • ((ax t1 ? ay t2 ? ? B1) ?
  • (ax u1 ? ay u2? ? B2))

47
How does the environment help?
SPif (a) x3 else x y if (b) x5 else
skip xe1 ? ye2 ? B
?
xv ? ye2
?
?
?
b0 ? vt
b?0 ? v5
?
?
?
a?0 ? t3
B
a0 ? te2
48
Tah-Dah!
  • I've rediscovered SSA.
  • monadic translation sequentializes and names
    intermediate results.
  • only need to add fresh variables when two paths
    compute different values for a variable.
  • so the added equations for conditionals
    correspond to ?-nodes.
  • Like SSA, worst-case O(n2) but in practice O(n).
  • Best part all of the VCs for a given procedure
    share the same assertion DAG.

49
Scaling
50
Space Scaling
51
So far so good
  • Of course, I've glossed over the hard bits
  • loops
  • memory
  • procedures
  • Let's talk about loops first

52
Widening
  • Given A?B, calculate some C such that A ? C and
    B ? C and C lt A, B.
  • Then we can compute a fixed-point for loop
    invariants iteratively
  • start with pre-condition P
  • process loop-test body to get P'
  • see if P' ? P. If so, we're done.
  • if not, widen P?P' and iterate.
  • (glossing over variable scope issues.)

53
Our Widening
  • Conceptually, to widen A?B
  • Calculate the DNF
  • really only traverse assertion DAG by memoizing
  • Factor out syntactically common primitive
    relations
  • In practice, we do a bit of closure first.
  • e.g., normalize terms relations.
  • e.g., xe expands to x ? e ? x ? e.
  • Captures any primitive relation that was found on
    every path.

54
Note on Explicit Substitution
  • Originally, we used explicit substitution.
  • widen S (Subst(S',a)) widen (S ? S') a
  • widen S (x as Prim()) S(x)
  • widen S (And(a1,a2)) widen S a1 ? widen S a2
  • ...
  • Had to memoize w.r.t. both S and A.
  • rarely encountered same S and A.
  • result was that memoizing didn't help.
  • ergo, back to tree traversal.
  • Of course, you get more precision if you do the
    substitution (but it costs too much.)

55
Back to Loops
  • The invariants we generate aren't great.
  • worst case is that we get "true"
  • we do catch loop-invariant variables.
  • if x starts off at i, is incremented and is
    guarded by x lt e lt MAXINT then we can get x gt
    i.
  • But
  • covers simple for-loops well
  • it's fast only a couple of iterations
  • user can override with explicit invariant(note
    only 2 loops in string library annotated this
    way, but plan to do more.)

56
Procedures
  • Originally, intra-procedural only
  • Programmers could specify pre/post-conditions.
  • Recently, extended to inter-procedural
  • Calculate SP's and propagate to callers.
  • If too large, we widen it.
  • Go back and strengthen pre-condition of
    (non-escaping) callee's by taking "disjunction"
    of all call sites' assertions.

57
Summary of VC-Generation
  • Started with textbook strongest post-conditions.
  • Effects Rewrote as monadic translation.
  • Diamond Factored variables into an environment
    to preserve sharing (SSA).
  • Loops Simple but effective widening for
    calculating invariants.
  • Memory array-based approach.
  • Extended to inter-procedural summaries.

58
Proving
  • Original plan was to use off-the-shelf
    technology.
  • eg., Simplify, SAT solvers, etc.
  • But havent gotten around to it yet.
  • use very simple, custom prover

59
2 Prover(s)
  • Simple Prover
  • Given a VC A ? C
  • Widen A to a set of primitive relns.
  • Calculate DNF for C and check that each
    disjunct is a subset of A.
  • (C is quite small so no blowup here.)
  • This catches a lot
  • all but about 2 of the checks we eliminate!
  • void f(int _at_x) x
  • if (x ! NULL) x
  • for (i0 i lt numelts(A) i)Ai

60
2nd Prover
  • Given A ? C, try to show A ? ?C inconsistent.
  • Conceptually
  • explore DNF tree (i.e., program paths)
  • the real exponential blow up is here.
  • so we have a programmer-controlled throttle on
    the number of paths we'll explore (default 33).
  • accumulate a set of primitive facts.
  • at leaves, run difference constraint algorithm
  • care to treat overflow properly

61
Logic or Types?
  • What about dependent types?
  • A Cyclone expression has an indexed monadic type
    PrTQ
  • same as Haskells ST monad
  • except indexed by pre/post-conditions
  • And a Cyclone function is a dependent function
    that yields a monadic term p yA.PrTQ
  • And a subset is a sum ? yA.P
  • So really, its both.

62
Currently
  • Memory
  • The functional array encoding of memory doesn't
    work well.
  • e.g., cant accomodate malloc/free
  • doesnt yield modular specs (need modifies)
  • Can we adapt separation logic? Will it actually
    help?
  • Whats the full type theory look like?
  • Work with A. Nanevski L. Birkedal a start.

63
False Positives
  • We still have 2,000 checks left.
  • I suspect that most are not needed.
  • How to draw the eye to the ones that are?
  • strengthen pre-conditions artificially(e.g.,
    assume no aliasing, overflow, etc.)
  • if we still can't prove the check, then it should
    be moved up to a "higher-rank" warning.

64
Lots of Borrowed Ideas
  • ESC M3 Java
  • Touchstone, Special-J, Ccured
  • SPLint (LCLint)

65
More info...
http//cyclone.thelanguage.org
Write a Comment
User Comments (0)
About PowerShow.com