Title: Stacks, Heaps and Regions: One Logic to Bind Them
1Stacks, Heaps and RegionsOne Logic to Bind Them
- David Walker
- Princeton University
- SPACE 2004
2Stacks, Heaps and RegionsOne Logic to Bind Them
- David Walker
- Princeton University
- With Amal Ahmed Limin Jia
3Certifying Compilers
Source Program
- Certifying compilers produce
- machine code
- safety proof
- type safety
- thread safety
- memory safety
- Uses
- trustworthy mobile code
- safety-critical systems
- compiler debugging
Certifying Compiler
Machine Code Safety Proof
4Certifying Compilers
- Low-level typing abstractions
- support diverse source languages
- support diverse implementation optimization
strategies - clean interface between compiler and mechanical
safety checkers
Java
C
ML
Transform, Optimize
Low-level Typing Abstractions
Machine Code Safety Proof (Typing Invariants
Encoded)
5TALx86 Lessons Morrisett et al.
- Checking control-flow safety is fairly easy
- State memory management is the hard part
- new typing algorithms for each new compiler trick
- machine register state
- heap memory (pointers, structs, ...)
- stack memory (stack pointers, stack structs, ...)
- user-managed memory (more pointers, aliasing
info, ...) - Results
- complex, ad hoc axioms (type checker less
trust-worthy) - repeated work
- abstractions not generally composable or reusable
6A Goal for SPACE 20...
- What we are looking for A new proof-carrying
code system/typed assembly language for safe
memory management - More uniform more general
- Easier to understand (simpler semantics)
- Allows reuse and composition of abstractions
- A promising approach Search for new logics that
can capture common storage invariants - Following Ishtiaq, OHearn, Pym, Reynolds, and
others insights on storage semantics separation
logic - And Pfenning, CMU crew and others logical design
techniques work on logical frameworks
7This Talk
- What recurring properties of memory do we need to
reason about in a proof-carrying code system? - Internalizing storage properties in a modal
substructural logic - Semantics of formulae
- Using the logic to describe state in a low-level
type system (briefly) - Related Future work
- This talk based on work at TLDI 03 LICS 03
8Property 1 Separation
- The memory for the heap is separate from the
memory for the stack - The register EAX is separate from register EBX
(and ECX, etc...) - In general, memory A is separate from memory B if
the domain of A does not overlap with the domain
of B
?74
?75
?7
?8
?9
?14
?15
stack
heap
EAX
EBX
9Property 1 Separation
- The importance of separation
- If memory A is separate from memory B then
updates to A have no impact on B - Eg updating the stack does not change values in
the heap - Eg updating EAX does not change the contents of
EBX - Eg deallocating region r1 has no impact on
region r2 (if they are separate) - Present in
- Linear type systems
- TALx86
- Ishtiaq, OHearn, Reynolds separation logic
10Property 2 Adjacency
- A struct is a sequence of adjacent locations
- An activation record is a sequence of adjacent
locations - A stack is a sequence of adjacent activation
records - In general, A is adjacent to B if the greatest
location in A is next to the least location in B,
and A is separate from B
?7
?8
?9
a1
a2
rest...
top
11Property 2 Adjacency
- The importance of adjacency
- If memory A is adjacent to memory B and we can
access A then we can access B - Eg using a pointer to the beginning of a
struct, we can access all of its elements - Eg using a pointer to the top of the stack, we
can access the items in the current activation
record - Present in
- TALx86
- Foundational PCC (Appel et al)
- Ordered type systems (Petersen et al.)
12Property 3 Containment
- Register EAX can contain an integer value (or a
pointer value or other kinds of values) - A memory location (say, ?7) can contain a
sequence of 32 bits - A user-managed memory region may contain a
collection of memory locations.
EAX
3
31
0
1
...
?7
on
on
off
?22
?7
13
R7
7
13Property 3 Containment
- The importance of containment
- If A is contained in memory region r and region r
has property P then A has property P - Eg EAX may contain an integer --- if so, we can
add 3 to the contents of EAX - Eg Memory region R1 may contain live data ---
if so, we can dereference pointers into that
region - Present in
- Tofte Talpins region calculus
- Cardelli, Gardner, Ghelli Gordons ambient, tree
graph logics - TALx86 (registers, static data segment, stack
heap)
14Property 4 Aliasing
- Two pointers are aliases of one another if they
are the same location. - Aliasing information is important since changing
memory at x changes memory at y - Present in
- every system!!
- Talx86 reasoned about heap aliases and stack
aliases
(x y)
x
y
3
15This Talk
- What recurring properties of memory are
convenient for reasoning in a proof-carrying code
system? - Internalizing storage properties in a modal
substructural logic - Semantics of formulae
- Using the logic to describe state in a low-level
type system - Related Future work
- This talk based on work at TLDI 03 LICS 03
16Preliminaries - Memories
- A memory is a mapping from locations to values.
- Each location may have a single successor.
- Successor relation gives rise to an ordering.
- Locations may be composite
- ? ?.n eg .R1.a7
.R2.a14.b0
m
?9
?6
?5
?16
?7
?17
a
3
1
r2
r1
17Formulae
- Predicates q t
- Formulae F q
- Semantics of formulae given by m ? F _at_ ?
- F describes memory m, whose contents are located
in place ? (? acts like a constraint on the
memory) - Simplest case
- m ? t _at_ ? iff dom(m)? and ? m(?) t
18Formulae
- Example
- m ? int _at_ ?3 if
m
?3
(notice ? m(?3) int )
5
19Formulae Separation
- Predicates q t
- Formulae F q F1 ? F2
- m ? F1 ? F2 _at_ ? iff exists disjoint m1 and m2
such that - m1 ? F1 _at_ ? and m2 ? F2 _at_ ?
- and mm1?m2
20Formulae Separation
- Example
- m1 ? F1 _at_ m2 ? F2 _at_
m2
m1
?3
?16
?17
?7
?8
?9
7
r6
?3
?16
?5
21Formulae Separation
- Example
- m1?m2 ? F1 ? F2 _at_
m1?m2
?3
?16
?17
?7
?8
?9
7
r6
?3
?16
?5
22Formulae Adjacency
- Predicates q t
- Formulae F q F1 ? F2 F1 ? F2
- m ? F1 ? F2 _at_ ? iff there exist adjacent (and
disjoint) - m1 , m2 such that
- m1 ? F1 _at_ ? and m2 ? F2 _at_ ?
- and mm1?m2
23Formulae Adjacency
- Example
- m1 ? F1 _at_ m2 ? F2 _at_
m2
m1
?3
?5
?7
?8
?9
?10
?16
?17
7
b
c
24Formulae Adjacency
- Example
- m1?m2 ? F1 ? F2 _at_
m1?m2
?3
?5
?7
?8
?9
?10
?16
?17
7
b
c
25Formulae Containment
- Predicates q t
- Formulae F q F1 ? F2 F1 ? F2 nF
- m ? nF _at_ ? iff m ? F _at_ ?.n
26Formulae - Containment
- Example
- m ? eaxint _at_ since m ? int _at_ .eax
since ? m(.eax) int
m
eax
5
27Formulae - Containment
- Example
- m ? eaxint ? ebxchar _at_
m
eax
ebx
5
a
28Formulae - Containment
- Example
- m ? eaxint ? ebxchar _at_
- since m1 ? eaxint _at_ and m2 ? ebxchar _at_
m
eax
ebx
5
a
29Formulae - Containment
- Example
- m ? eaxint ? ebxchar _at_
- since m1 ? eaxint _at_ and m2 ? ebxchar _at_
- since m1 ? int _at_ .eax and m2 ? char _at_ .ebx
m
eax
ebx
5
a
30Aliasing
- Types t int bool S(?) ...
- Predicates q t
- Formulae F q F1 ? F2 F1 ? F2 nF
- ? v S(?) iff v ? (all values with type
S(?) are - aliases of one another)
31Aliasing
aliases
- Example
- m ? eaxS(.a2) ? ebxS(.a2) ? a2int _at_
m
eax
ebx
a2
7
32One More Useful Predicate
- Types t int bool S(?) ...
- Predicates q t more? more?
- Formulae F q F1 ? F2 F1 ? F2 nF ...
- m ? more? m ?
more?
m
m
?7
?8
?9
?6
?5
?4
?17
?18
?19
?16
?15
?14
. . .
. . .
33Simple Machine Memory Layout
- ( more? ? ?hd t ? Ftail ? Fheap ? ?ap t ?
more? ) - ? r1 t1 ? r2 t2 ? . . . ? sp S(?hd) ? ap
S(?ap) -
?hd
?ap
. . .
. . .
. . .
. . .
more? Ftail
Fheap more?
sp
r1
r2
ap
34More logic
- Predicates q t more? more?
- Formulae F q F1 ? F2 F1 ? F2 nF
- 1 F1 -o F2
- F1 F2 ? F1 ? F2 0
- f ?b. F b.F
- Bindings b ?L nN aT f F
- m ? 1 iff dom(m) is empty
- m ? F1 F2 iff m ? F1 and m ? F2
- m ? ? (holds for any memory m)
- ....
35Logical Deduction
- Judgments have the form q ? D ? F _at_ ?
- is a variable context a list of free variables
their kinds - is a bunched context trees rather than lists
- (OHearn Pym, 1999)
- ? . (F _at_ ?) ?, ? ? ?
object at a place
adjacent storage (no exchange prop)
separate storage (exchange prop)
36Logical Deduction
- The natural deduction rules are sound with
respect to the storage semantics - Semantics of contexts m ? D
- Theorem (Soundness)
- If m ? D and ?? D ? F _at_ ? then m ? F _at_ ?.
37This Talk
- What recurring properties of memory are
convenient for reasoning in a proof-carrying code
system? - Internalizing storage properties in a modal
substructural logic - Semantics of formulae
- Using the logic to describe state in a low-level
type system - Related Future work
- This talk based on work at TLDI 03 LICS 03
38Mini-KAM Simplified ML Kit Abstract Machine
- Registers r acc1 acc2 sp
- Values v ....
- Instructions i immed1(v) immed2(v) add
sub push pop - selectStack(i) storeStack(i)
- select(i) store(i)
- letRgnInf endRgnInf alloc(i)
register ops
stack ops
region ops
39Mini-KAM Types
- Types t int S(?) live dead
- (F _at_ ?) ? 0
- Integers 5 int
- Places ? S(?)
- Region status live live dead dead
- Code Locations c (F _at_ ?) ? 0
- Means it is safe to jump to c with a memory m
such that m ? F _at_ ?
40Mini-KAM Simplified ML Kit Abstract Machine
acc1 acc2 sp stack R1 . .
. Rn
R1live ? F ? (a- ? more?)
stmore? ? ak- ? . . ? a1- ? ?
current activation record
description of data in region
region allocation boundary
live region
stack tail
stack area
41Using Formulae in Typing Rules
- Judgments of the form F _at_ ? can be used to
describe the pre and postconditions of
instructions -
- Instruction typing judgment q ? F _at_ ? ? i
F _at_ ? -
42Using Formulae in Typing Rules
- Judgment q ? F _at_ ? ? i F _at_ ?
-
- In J, look up the type of place ?.n
- J(?.n) F if ?? J ? (? ? nF ) _at_ ?
- Rule for add instruction
- (F _at_ ?)(.acc1) int (F _at_ ?)(.acc2)
int -
- q ? F _at_ ? ? add F _at_ ?
43Using Formulae in Typing Rules
- Judgment q ? J ? i J (where J is of the
form F _at_ p) -
- J(.sp)S(.stack.n0) J(.acc1)t
- q ? J ? storeStack(i) J.stack.no i
t
( storeStack)
In J, update the type of place ?.no
i J?.noi t (F1 ? n0- ? ??? ? nit ?
F2) ? F3 _at_ ? if ?? uJ ? ((F1 ?
n0- ? ??? ? ni- ? F2) ? F3) _at_ ?
44This Talk
- What recurring properties of memory are
convenient for reasoning in a proof-carrying code
system? - Internalizing storage properties in a modal
substructural logic - Semantics of formulae
- Using the logic to describe state in a low-level
type system - Related Future work
- This talk based on work at TLDI 03 LICS 03
45Related Work
- Reasoning about adjacency
- Stack-based TAL (Morrisett et al., 1998)
- Foundational PCC reasoning about memory
allocation (Appel et al.) - lord - calculus for reasoning about data layout
at the frontier (Petersen et al., 2003) - Reasoning about aliasing
- Long history . . . singleton types for aliasing
(Smith, Walker Morrisett) continue to be useful - Spatial logics separation and/or containment
- BI, separation logic (Ishtiaq, OHearn, Reynolds
others, 2000, 2001) - Ambient logic (Cardelli Gordon, 2000)
- Tree and graph logics (Cardelli, Gardner, Ghelli,
2002)
46Lots More Work to Do
- Add inductive definitions syntactic rules for
reasoning about arrays, recursive data structures - Investigate encodings for common invariants
- stack-allocation algorithms
- region-allocation algorithms
- aliasing patterns
- Better understand the connection between modal
(hybrid) logic regions
47Conclusion
- Described a unified framework for reasoning about
- Separation
- Adjacency
- Containment
- Aliasing
- Semantics are sound, simple and uniform
- Logic forms the basis for a sound and flexible
low-level type system - See TLDI 03 LICS 03 for details
48(No Transcript)
49(No Transcript)
50May Alias Formula
- when two bits of storage (at a1 and a2) may
alias - ?a1. ?a2. (a1int ? ?) (a2int ? ?)
- both memories satisfy the formula
a1
a2
a
5
7
5
51Example Saving Temporaries on the Stack
- Code Describing Formula
-
- (b-stackgrow)(x 2)
- (b-unpack)(x 2)
- sub sp,sp,2
- st sp0,r1
- st sp1,r2
- lt Code for A gt
- ld r1,sp0
- ld r2,sp1
- add sp,sp,2
Fpre
(more? ? ?1a1 ? ?2a2 ? ?t ? F1) ?
spS(?) ? r1t1 ? r2t2
Fpost
(more? ? ?1a1 ? ?2a2 ? ?t ? F1) ?
spS(?1) ? r1t1 ? r2t2
52Formulae Wrapped in Types
- Types t int S(p) (F _at_ p) ? 0
- Informally, c (F _at_ p) ? 0 means it is safe to
jump to c with a memory m such that m ? F _at_ p
53Motivation Certifying Compilers
Source Program
Certifying Compiler
Safety Proof
Machine Code
54Motivation Certifying Compilers
Source Program
Parse, Typecheck
High-level Typed IL
Analysis, Optimization
Type- preserving Compiler
Medium-level Typed IL
Code Generation
Typed Assembly Language
Assembler
Hints
Prover
Safety Proof
Machine Code
55Motivation Certifying Compilers
Java
Java
ML
High TIL High TIL High TIL
Optimize Optimize Optimize
Type- preserving Compiler
Medium-level Typed IL
Code Generation
Typed Assembly Language
Assembler
Hints
Prover
Safety Proof
Machine Code
56Motivation Proof-Carrying Code
- The Princeton foundational PCC system (Appel et
al.) - Scaling PCC to production compilers and realistic
languages - Some requirements
- Multiple source languages, single target language
- Core proof system must be general and flexible
- support for general language features
- handle different implementation and optimization
strategies - Trusted computing base should be small
- to limit security bugs
57PCC System Layers of Abstraction
Compiler
High-level typing abstractions
Low-level typing abstractions
Semantics of types
Machine spec
Higher-order logic
58A Hard Problem (Semantics)
- Semantics of memory updates and memory reuse
- Semantic model of ML-style mutable references
(Ahmed, Appel, Virga, 2002) - To handle ML function closures
- extended model with mutable references to
(impredicative) polymorphic types (Ahmed, Appel,
Virga, 2003) - To allow memory reuse
- extended model to support region-based memory
management
59Motivation Certifying Compilers
Java
C
ML
High-level Typed IL
Analysis, Optimization
Medium-level Typed IL
Typing abstractions (TAL)
- Should be general flexible support many
- language features
- implementation
- optimization strategies
Prover
Machine Code Safety Proof
60Typing Abstractions for Memory
- Reasoning about memory is complicated
- many different memory management strategies,
aliasing patterns, data layout possibilities,
etc. - Systems for safe mobile code would benefit from
- a unified framework for reasoning about a variety
of invariants - convenient abstractions that help structure
proofs of memory safety
61Abstractions for Memory?
62Abstractions for Memory?
Cornell Popcorn Cyclone
Cedilla Systems Special J
Princeton Foundational PCC
Source
Source
Source
High TIL
High TIL
High TIL
Medium TIL
Medium TIL
Medium TIL
TALx86
LTAL
VCGen Prover
Prover
Machine Code Safety Proof
Machine Code Safety Proof
Machine Code Safety Proof
63Abstractions for Memory?
- Reasoning about
- memory is
- complicated
- many different
- memory
- management
- strategies,
- aliasing patterns,
- data layout
- possibilities, etc.
64Typing Abstractions for Memory?
- Reasoning about memory is complicated
- many different memory management strategies,
aliasing patterns, data layout possibilities, etc.
65Formulae Wrapped in Types
- Types t int S(p) (F _at_ p) ? 0
- Informally, c (F _at_ p) ? 0 means it is safe to
jump to c with a memory m such that m ? F _at_ p
66Lessons from Typed Assembly Language
- Lesson 1
- Much of the type theory designed for higher-level
languages can be reused to help verify machine
code. - TAL is just the closed, continuation-passing
style polymorphic lambda calculus () - Lesson 2
- The hard part is memory management memory
safety.
67One Logic to Bind Them
- New goals for general-purpose safe memory
management - composable abstractions
- reusable abstractions
- orthogonal abstractions
- comprehensible abstractions
- A unified composable framework for reasoning
about - separation of objects (memory blocks)
- adjacency of objects
- aliasing of pointers
- containment of one place in another
- Proof that deduction in our logic is sound with
respect to the memory model - Use logic in a type system for an IL for
region-based memory management (Mini-KAM) and
prove that the language is sound
68This Talk
- Logical formulae and the memory model
- Flat memory
- Hierarchical memory
- Type system for Mini-KAM (informally)
69A Logical Approach to Memory Management
- One logic for reasoning about key storage
properties - separation of objects (memory blocks)
- adjacency of objects
- containment of one place in another
- aliasing of pointers
- Logic comes with
- orthogonal connectives to internalize key
properties - syntactic proof rules
- sound store semantics
- Logic is incorporated into a typed abstract
machine - safe stack, heap and region-based memory
management
70Formulae Multiplicative Unit
- Predicates q ? t
- Formulae F q F1 ? F2 F1 ? F2 1
- m ? 1 iff
m
71Hierarchical Memories
m
72Hierarchical Memories, Paths
m
R2
R1
R1
R2
?7
?8
?9
?14
?15
?7
?8
?9
?14
?15
- Path/place p p.n eg
.R1.?7 .R2.?14
73Hierarchical Memories, Paths
m A1
R2
R1
R1
R2
?7
?8
?9
?14
?15
?7
?8
?9
?14
?15
- Path/place p p.n eg
.R1.?7 .R2.?14 - A hierarchical memory is a mapping from paths to
values.
74Formulae Containment
- Predicates q t more? more?
- Formulae F q F1 ? F2 F1 ? F2 1
- F1 F2 ? F1 ? F2 0
- f ?b. F b.F nF
-
- Bindings b pP nN aT f F
Semantics given by m ? F _at_ p
75Formula Semantics Separation
- Formulae F F1 ? F2 nF
- m ? (F1 ? F2) _at_ p iff there exist disjoint
m1 and m2 - m1 ? F1 _at_ p and m2 ? F2 _at_ p
- and mm1?m2
76Formula Semantics Separation
- Example
- m1 ? F1 _at_ m2 ? F2 _at_
- dom(m1).R5.?3 dom(m2).R5.?4
m1
m2
R5
R5
?3
?4
3
3
77Formula Semantics Separation
- Example
- m1 ? F1 _at_ m2 ? F2 _at_
- dom(m1).R5.?3 dom(m2).R5.?4
m1?m2
R5
R5
?3
?4
3
3
m1?m2 ? (F1 ? F2) _at_
78Sample Deductive Rules
(hypothesis)
q ? F _at_ p ? F _at_ p
q ? ? ? F _at_ p.n
q ? ? ? nF _at_ p
(n I)
(n E)
q ? ? ? nF _at_ p
q ? ? ? F _at_ p.n
- Each connective is defined in terms of judgmental
concepts only no dependencies on other
connectives - Simpler to understand manipulate