Title: Systematic Derivation of Concurrent Garbage Collectors
1Systematic Derivation of Concurrent Garbage
Collectors
Martin Vechev
Eran Yahav
David Bacon
University of Cambridge
IBM T.J. Watson Research Center
2Garbage Collection
- Automatic reclamation of memory
- Approximates liveness by reachability
- A wide range of techniques
- Reference counting collectors
- Mark-Sweep collectors
- Copying collectors
3GC is Simple
- Live objects roots objects reachable from
roots - Step 1 garbage detection
- Separates live from dead objects
- Step 2 garbage reclamation
- Reclaim the space of dead objects
4Example Stop-the-world Tracing
S MGC
B
D
C
A
Time
5Example Stop-the-world Tracing
S MGC
B
B
D
D
C
C
A
A
Time
6Example Stop-the-world Tracing
S MGC
B
B
B
D
D
D
C
C
C
A
A
A
Time
7Example Stop-the-world Tracing
S MGC
B
B
B
B
D
D
D
D
C
C
C
C
A
A
A
A
Time
GC reclaims D
8GC is Useful
- Pros
- Modular design
- Elimination of bugs
- Reduces memory leaks
- Cons
- Overhead of computing reachability
- Pause time
9Concurrent GC is Useful
- GC interleaved with program execution
- Pros
- Modular design
- Elimination of bugs
- Reduces memory leaks
- Shorter pause times (even real time!)
- Cons
- Notoriously hard to get right
10Challenges
- Difficult to understand
- Largely ad-hoc approach to algorithms
- Difficult to get right
- Wrong algorithms initial errors in Dijkstra and
Steele collectors. Error in DG collector
(POPL93) - Difficult to verify
- Wrong proofs Ben-Ari, Pixley
- No formal relation between algorithms
- Largely folklore (even very little empirical
comparison)
11Garbage Collection
ref counting
tracing
stw
inc
moving
nonmoving
concurrent
12Current State
Demmers (STW) 89
Dewar (STW) 82
Bacon (STW) 04
UNIFICATIONS
Nieto (Isabelle/HOL) 02
Snepscheut 87
PROOFS
Havelund (PVS)
Russinoff (BM)
Pixley (S, E)
Gries 78
Ben-Ari (L,S) 84
94
99
Azatchi
03
ALGORITHMS
Boehm
Barabash
Dijkstra
91
03
Steele(C)
Dijkstra
Yuasa
DG
DLG
Domani
Ben-Ari (B, E)
00
90
88
93
94
75
78
84
13Concurrent GC Current State
Boehm 91
Azatchi 03
Steele 75
DLG 94
Barabash 03
Yuasa 90
Domani 00
Dijkstra 78
14Vision
- Automatic generation of concurrent collectors
- Correctness by construction
- Controlling precision/performance
15This Work
- Concurrent
- Some atomicity constraints
- Single mutator, single collector (not parallel)
- Tracing
- Computes transitive closure from roots
- Non-Moving
- Collector does not relocate objects
- Exact
- Assumes pointers identified precisely
16Contributions
- A new parametric model of concurrent GC
- Correctness-preserving transformations
- Systematic exploration of algorithm space
- Better understanding
- Useful new algorithms
- Definition of relative precision between
algorithms
17Contributions Partial View
18Why is it Hard?
S M GC
B
D
C
R1
A
Time
GC marks B
19Why is it Hard?
S M GC
R2
B
B
D
D
C
C
R1
R1
A
A
Time
Mutator creates R2
GC marks B
20Why is it Hard?
S M GC
R2
R2
B
B
B
D
D
D
C
C
C
R1
R1
X
R1
A
A
A
Time
Mutator creates R2
Mutator removes R1
GC marks B
21Why is it Hard?
S M GC
R2
R2
R2
B
B
B
B
D
D
D
D
C
C
C
C
R1
R1
X
R1
A
A
A
A
Time
Mutator creates R2
Mutator removes R1
GC reclaims C D live WRONG!
GC marks B
22Why is it Hard?
- Collector needs to account for concurrent
mutations by mutator - Collector requires mutator cooperation
- Mutator defensively and conservatively handles
objects that may be lost - e.g., Mutator marks objects, mutator greys-out
objects - Write-barrier, allocation-barrier
- Mutator may require some collector information
- e.g, collector progress through the heap
(wavefront)
23Ad hoc Solution 1 Deletion Protection
S M GC
B
D
C
R1
A
Time
GC marks B
24Ad hoc Solution 1 Deletion Protection
S M GC
R2
B
B
D
D
C
C
R1
R1
A
A
Time
Mutator creates R2
GC marks B
25Ad hoc Solution 1 Deletion Protection
S M GC
R2
R2
B
B
B
D
D
D
C
C
C
R1
R1
X
R1
A
A
A
Time
Mutator removes R1 defensively marks C
Mutator creates R2
GC marks B
26Ad hoc Solution 1 Deletion Protection
S M GC
R2
R2
R2
B
B
B
B
D
D
D
D
C
C
C
C
R1
R1
X
R1
A
A
A
A
Time
Mutator removes R1 defensively marks C
GC marks D- all is well !
Mutator creates R2
GC marks B
27Ad hoc Solution 1 Are We Done?
- Mutator defensively marking on deletion
- All objects reachable when cycle started
considered as live - Results with large amounts of floating garbage
- Dead objects that are not collected
- Can we do better?
28Ad hoc Solution 2 Installation Protection
S M GC
B
D
C
R1
A
Time
GC marks B
29Ad hoc Solution 2 Installation Protection
S M GC
R2
B
B
D
D
C
C
R1
R1
A
A
Time
Mutator creates R2 defensively marks C
GC marks B
30Ad hoc Solution 2 Installation Protection
S M GC
R2
R2
B
B
B
D
D
D
C
C
C
R1
R1
X
R1
A
A
A
Time
Mutator removes R1
Mutator creates R2 defensively marks C
GC marks B
31Ad hoc Solution 2 Installation Protection
S M GC
R2
R2
R2
B
B
B
B
D
D
D
D
C
C
C
C
R1
R1
X
R1
A
A
A
A
Time
Mutator removes R1
GC marks D- all is well !
Mutator creates R2 defensively marks C
GC marks B
32Ad hoc Solution 2 Are We Done?
- Mutator defensively marking on installation
- Results with large amounts of floating garbage
- e.g., short-lived objects
- Already non-trivial implementation
- Mutator needs to know about collector progress
through the heap (wavefront) - Can we do better? Can we do best?
33A Trace Model of Concurrent GC
- Why should mutator make eager decisions?
- Let collector make well-informed decisions
- Record interaction history between collector and
mutator during tracing - Collector exposes hidden objects based on
entire interaction history - Repeat re-tracing/exposing until no hidden
objects remain
34STW vs. Concurrent
STW_collect R addRoots()
trace(R) reclaim()
CONC_collect R addRoots() Do
trace(R) R expose (log) while (R !
) reclaim()
35Collection Cycle
Collection Cycle
COLLECTOR
trace
reclaim
expose
trace
expose
MUTATOR
Mutate
36Concurrent Synchronization Skeleton
trace while (pending ! ) (obj,
fld) removeElement(pending) atomic dst
obj.fld log log ltT,obj,fld,dst,dstgt
if (dst ? NULL dst ? marked)
marked ? dst pending ? fields(dst)
mutate (src, fld, new) atomic log log
ltM,src,fld,src.fld,newgt src.fld new
37Interaction Log
B
r1
A
C
C
f1
f1
f1
f2
f2
f2
D
f3
f3
f3
E
(M4) A.f2null
(M6) A.f3null
(M2) A.f1B
(T4) (A,f2)
(T1) (r1,f2)
(M1) r1.f1B
(M3) r1.f3E
(T2) (A, f1)
(T5) (r1,f1)
(M5) r1.f1null
(M7) A.f1null
(T3) (r1,f3)
(T6) (A,f3)
ltT, r1, f2,A,Agt, ltT,A, f1, null, nullgt, ltT, r1,
f3, null, nullgt, ltM, r1, f1, null,Bgt, ltM,A, f1,
null,Bgt, ltM, r1, f3, null,Egt, ltM,A, f2,C, nullgt,
ltM, r1, f1,B, nullgt, ltT,A, f2, null, nullgt, ltT,
r1, f1, null, nullgt, ltM,A, f3,D, nullgt, ltM,A,
f1,B, nullgt, ltT,A, f3, null, nullgt
Log
38Collector Wavefront
B
B
B
B
B
B
r1
A
A
r1
r1
A
A
A
r1
r1
A
r1
C
C
C
C
C
C
f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
f2
f2
f2
f2
f2
f2
f2
f2
f2
f2
f2
f2
D
D
D
D
D
f3
f3
f3
f3
f3
f3
f3
f3
f3
f3
f3
f3
E
E
(M2) A.f1B
(T4) (A,f2)
(T1) (r1,f2)
(M1) r1.f1B
(M3) r1.f3E
(T2) (A, f1)
(T5) (r1,f1)
(T3) (r1,f3)
WF(Log) (r1, f2), (A, f1), (r1, f3), (A, f2),
(r1, f1), (A, f3)
39Apex Starting Point
B
B
B
B
B
B
r1
A
A
r1
r1
A
A
A
r1
r1
A
r1
C
C
C
C
C
C
f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
f2
f2
f2
f2
f2
f2
f2
f2
f2
f2
f2
f2
D
D
D
D
D
f3
f3
f3
f3
f3
f3
f3
f3
f3
f3
f3
f3
E
E
(M2) A.f1B
(T4) (A,f2)
(T1) (r1,f2)
(M1) r1.f1B
(M3) r1.f3E
(T2) (A, f1)
(T5) (r1,f1)
(T3) (r1,f3)
- Rescan objects mutated behind the wavefront
- exposeApex(log) E
- After marking E on next tracing, can reclaim
B,C,D
40Derivation Framework
- Apex Optimistic STW
- Very Similar to Steele Barriers
- All algorithms in our framework are derived from
Apex - Transformations
- any expose(L) function that exposes a superset of
exposeApex(L) is safe - Explore an algorithm-space by exploring safe
expose functions - Allow combined functions where different objects
are handled differently
41Precision
- Intuition an algorithm C1 is more precise than
an algorithm C2 if it produces less floating
garbage - Results may vary due to different interleavings
- Given two algorithms in the framework, relate
them by relating their expose functions over the
same log - C1 more precise than C2 when exposec1(L) ?
exposec2(L)
42Precision
- Useful property for relating algorithms
- Should be a reference point for practical
comparisons - no ad-hoc methods
- Hard to do manually need a tool to provide
insights - Finding the right definition was harder than
proving safety, yet simpler than more
concurrent
43Transformations
- Correctness-Preserving
- Precision-Reducing
- At the object-level
44Algorithm-space Dimensions
-
- Wavefront DW \ FL, OL n
, OL U - FL - Policy DP \ SR, LR n
- Threshold DT \ Cinf, , Ck, , C1 n
- Protection DR \ IS, DS n
- Allocation DA \ WC, YC, BC n
- Ordered partitions over object-universe
45Wavefront Dimension (WF)
FL A, OL G, E, F, B
E
G
B
D
C
F
R2
A
WF (G,),(E,),(F,),(B,)
46Wavefront Dimension (WF)
FL A, OL G, E, F, B
E
G
B
D
C
F
R2
A
WF (G,),(E,),(F,),(B,),(A, f1)
47Wavefront Dimension (WF)
FL A, OL G, E, F, B
E
G
B
D
C
F
R2
A
WF (G,),(E,),(F,),(B,),(A, f1), (A, f2)
48Wavefront Dimension (WF)
FL A, OL G, E, F, B
E
G
B
D
C
F
R2
A
WF (G,),(E,),(F,),(B,),(A, f1), (A, f2),
(A, f3)
49Wavefront and Precision
P FL A, OL G, E, F, B
Q FL , OL G, E, F, B,A
WF (G,),(E,),(F,),(B,)
WF (G,),(E,),(F,),(B,)
50Wavefront and Precision
P FL A, OL G, E, F, B
Q FL , OL G, E, F, B,A
E
E
G
G
B
B
D
D
C
F
C
F
R2
R2
A
A
WF (G,),(E,),(F,),(B,),(A,)
WF (G,),(E,),(F,),(B,),(A, f1)
51Wavefront and Precision
P FL A, OL G, E, F, B
Q FL , OL G, E, F, B,A
E
E
G
G
B
B
D
D
C
F
C
F
R2
R2
A
A
WF (G,),(E,),(F,),(B,),(A,)
WF (G,),(E,),(F,),(B,),(A, f1)
(assuming installation protection)
52Wavefront and Precision
P FL A, OL G, E, F, B
Q FL , OL G, E, F, B,A
E
E
G
G
B
B
D
D
X
C
F
C
F
X
X
X
R2
R2
A
A
WF (G,),(E,),(F,),(B,),(A,)
WF (G,),(E,),(F,),(B,),(A, f1)
(assuming installation protection)
53 Threshold Dimension (DT)
C4 A
E
G
B
D
C
F
A
M(A)0
54 Threshold Dimension (DT)
C4 A
E
G
B
D
C
F
A
M(A)1
55 Threshold Dimension (DT)
C4 A
E
G
B
D
C
F
A
M(A)2
56 Threshold Dimension (DT)
C4 A
E
G
B
D
C
F
X
A
M(A)1
57 Threshold Dimension (DT)
C4 A
E
G
B
D
C
F
A
X
M(A)0
58 Threshold Dimension (DT)
C4 A
E
G
B
D
C
F
A
M(A)0
59Threshold and Precision
E
E
G
G
B
B
D
D
C
C
F
F
A
A
M(A)0
M(A)0
60Threshold and Precision
E
E
G
G
B
B
D
D
C
C
F
F
A
A
M(A)1
M(A)1
61Threshold and Precision
E
E
G
G
B
B
D
D
C
C
F
F
A
A
M(A)0
M(A)1
?
b
P
Q
62Apex
- DW \ FL U, OL n DP \ SR
U, LR n - DT \ Cinf U, , C1 n
- DR \ IS U, DS n
- DA \ WC U, YC , BC n
63Instantiating an Algorithm
B
A
C
- DW FL A, B, OL C
- DP SR A, B, OL C
- DT Cinf , C2 A, B, C1 C
- DR IS A, B, C, DS
- DA WC C, YC , BC
64Transformation Example
- Move Object B from FL to OL
DW FL A, B, OL C
DW FL A, OL B, C
DP SR A, B, OL C DT Cinf , C2
A, B, C1 C DR IS A, B, C, DS DA
WC C, YC , BC
65Expose Computation
CONC_collect R addRoots() do
trace(R) R expose (L) while (R !
) reclaim()
expose(L) expose_DW(L) 4
expose_DP(L) 4 expose_DT(L) 4
expose_DR(L) 4 expose_DA(L)
expose_DT(L) n n Li.new . M(n, L) gt 0 .
n c IS . 0 I lt L
66Instantiations
APEX (U, U, U, U, )
STEELE
DIJKSTRA (stacks, U, , U, )
STEELE-YC
STEELE-D
STEELE-D-YC
STEELE-BC
DIJKSTRA-OLD
DIJKSTRA-YC
HYBRID-YC (stacks, A, , , )
STEELE-D-BC
DIJKSTRA-BC
YUASA (stacks, A, , , U)
(DW, DP, DT , DR, DA)
67Termination
- Two causes of non-termination
- allocated objects vs. existing objects
- Question how to guarantee termination?
- Existing objects by-product of atomic in the
skeleton - Allocated a new dimension DA and an expose_DA
68Rescanning Concurrency Issues
- A and B must be rescanned atomically
- Solution expose called atomically
- too restrictive
B
A
B
A
B
A
C
C
C
69Termination and Allocation
- Collector chasing allocations
- How do you guarantee termination?
E
G
B
A
C
D
F
70Allocation and Termination
- Answer adjust relative speed gt YC and BC
- Insight new nodes are leaves
DA \ WC, YC, BC n
71Allocation and Termination
WC A, YC , BC
A
B
WC , YC A, BC
A
B
WC , YC , BC A
A
B
- YC and BC are not-Traced-Through by the
collector
72Allocation
E
G
B
D
C
F
A
73Allocation White
WC N
E
G
B
D
C
F
A
N
M(N)1
74Allocation Yellow
E
G
B
D
C
F
A
N
T
M(N)1
M(T) 1
75Future Work From Mutator to Collector and Back
- Collector trace semantics back to state-based
semantics - Compute expose in write/alloc barriers
- Might affect concurrency/precision properties
76Future Work Cost Models and Tradeoffs
- Precision is only one dimension of the problem
- Concurrency transformations
- Algorithm C1 more concurrent than an algorithm C2
- Investigate trade-offs between concurrency and
precision - Intuition concurrency increasing ? precision
decreasing - Other cost models?
77Future Work Simulation and Evaluation
- Comparing algorithms side-by-side
- Intuition for precision / concurrency
- Possibly new insights
- Considering all/many interleavings is valuable
and impossible to do manually
78Future Work Performance Evaluation
- Evaluate derived algorithms for real
- But based on formal cost models
79Future Work The Next 700 Concurrent Collectors
ref counting
tracing
stw
inc
moving
nonmoving
concurrent
80Future Work Roadmap
Can we relate other algorithms ?
Formal Spec Verification - equiv. classes
Performance Evaluation - based on formal desc.
Cost/Benefit concurrency vs. precision
Mutator computations
Property Tools Needed
Derive New Algorithms ?
This Work
81The End
Special Thanks Noam Rinetzki Roman Manevich PLDI
reviewers
David F. Bacon
Martin Vechev
Eran Yahav
82Backup Slides
83Precision Relation
- Defn Given two collection algorithms C1 and
C2, - C1 is more precise than C2, when given any
global - state of C2 with log L where pending is empty
- exposec1(L) ? exposec2(L), denoted by C1 b C2
- b a pre-order relation
- transitive and reflexive, but not anti-symetric
- Inter-relation between algorithms
- Space Only
- Concurrency formalize more concurrent ?
84Precision Questions
- What did not work
- def for all t1 c C1 vs. for all t2 c C2
- difficult to outline how to find a witness trace
- What worked, but was unsatisfactory
- compare best traces the witness trace was in
effect too far away. Need continuous. - Nice thing would be
- precision -gt correctness.
- (depends on precision definition)
85Relating Algorithms
Steele
Dijkstra
Hybrid
Yuasa
86Questions
87Why Transformations?
- FFUFHR
- Exploring parameters of high-level algorithm
causes lower-level transforamtions - If you look at barrier implementations, these are
really code transformations - This is only half of the story
- Concurrency transformations are really
transformations - (not explained in paper, but yellow objects
require code change) - Midway points are also useful
88Isnt that just playing with specifications?
- You can really implement Apex as-is
- Some real algorithms use write-barrier buffers
- The essence of apex, though, is the
synchronization skeleton --- its atomicity
constraints - We have a method for taking these specs and
auto-generating the barrier code
89Which algorithms are interesting?
- Combinations with the allocation dimension
- E.g., disjkstra allocating black
90Why this precision def? What does it mean?
- Intuitively, floating garbage is something youd
like to minimize - This is still just a mathematical definition,
maybe a more intuitive one exists - Hard to experiment without automated tools