Title: Ulterior Reference Counting Fast GC Without The Wait
1Ulterior Reference CountingFast GC Without The
Wait
- Steve Blackburn Kathryn McKinley
- Presented by Dimitris Prountzos
- Slides adapted from presentation by Steve
Blackburn
2Outline
- Throughput-Responsiveness problem
- Reference counting optimizations
- Ulterior in detail
- BG-RC in action
- Experimental evaluation
- Conclusion
3Throughput/Responsiveness Trade-off
- GC and mutator share CPU
- Throughput net GC/mutator ratio
- Responsivness length of GC pauses
4The Ulterior approach
- Match mechanisms to object demographics
- Copying nursery (young space)
- Highly mutated, high mortality young objects
- Ignores most mutations
- GC time proportional to survivors, space
efficient - RC mature space
- Low mutation, low mortality old objects
- GC time proportional to mutations, space
efficient - Generalize deferred RC to heap objects
- Defer fields of highly mutated objects
enumerate them quickly - Reference count only infrequently mutated fields
5Pure Reference Counting
- Tracks mutations RCM(p)
- RCM(p) generates a decrement and an increment for
the before and after values of p - RCM(p) ? RC(pbefore)--, RC(pafter)
- If RC0, Free
1
a
1
b
RC space
6Pure Reference Counting
- Tracks mutations RCM(p)
- RCM(p) generates a decrement and an increment for
the before and after values of p - RCM(p) ? RC(pbefore)--, RC(pafter)
- If RC0, Free
1
a
0
1
b
c
RC space
7Pure Reference Counting
- Tracks mutations RCM(p)
- RCM(p) generates a decrement and an increment for
the before and after values of p - RCM(p) ? RC(pbefore)--, RC(pafter)
- If RC0, Free
1
a
?
0
1
b
c
RC space
8Pure Reference Counting
- Tracks mutations RCM(p)
- RCM(p) generates a decrement and an increment for
the before and after values of p - RCM(p) ? RC(pbefore)--, RC(pafter)
- If RC0, Free
1
a
1
c
RC space
RCM(p) for every mutation is very expensive
9RC Optimizations
- Buffering apply RC(p)--, RC(p) later
- Coalescing apply RCM(p) only for the initial and
final values of p (coalesce intermediate values) - RCM(p), RCM(p1), ... RCM(pn) ? RC(pinitial)--,
RC(pfinal) - Deferral of RCM events
10Deferred Reference CountingGoal Ignore RCM(p)
for stacks registers
- Deferral of p
- A mutation of p does not generate an RCM(p)
- Correctness
- For all deferred p RCR(p) at each GC
- Retain Event RCR(p)
- po temporarily retains o regardless of RC(o)
- Deutsch/Bobrow use a Zero Count Table
- Bacon et al. use a temporary increment
11Classic DeferralIn deferral phase Ignore RCM(p)
for stacks registers
0
a
1
b
RC space
12Classic DeferralIgnore RCM(p) for stacks
registers
0
a
0
1
b
c
RC space
Breaks RC0 Invariant
13Classic Deferral (Bacon et al.)
- Divide execution in epochs
- Store information in buffers
- Root buffer (RB) Store 1st level objects
- Increment buffer (IB) Store increments to 1st
level objects - Decrement buffer (DB) Store decrements to 1st
level objects - At GC time do
- Look at RB and apply temporary increments to all
objects there - Process IB of this epoch
- Look at RB of previous epoch and apply decrements
to all objects there - Process DB of previous epoch
- During DB processing recycle o if RC(o)0
- Avoid race conditions by
- Processing IB before DB
- Processing DB of one epoch behind
14Classic Deferral (Bacon et al.)
At GC time, RCR(p) for root pointers applies
temporary increments.
1
a
1
1
b
c
RC space
a
b
root buf
dec buf
15Classic Deferral (Bacon et al.)
At next GC, apply decrements
1
a
1
1
b
c
RC space
a
b
root buf
dec buf
16Classic Deferral (Bacon et al.)
Key Efficient enumeration of deferred pointers
At next GC, apply decrements
1
a
1
1
b
c
RC space
a
b
root buf
dec buf
17Classic Deferral (Bacon et al.)
Better, but not good enough!
1
a
1
1
b
c
RC space
root buf
dec buf
18Ulterior Reference Counting
- Idea Extend deferral to select heap pointers
- e.g. All pointers within nursery objects
- Deferral is not a fixed property of p
- e.g. A nursery object gets promoted
- Integrate Event I(p)
- Changes p from deferred to not deferred
19BG-RCBounded Nursery Generational - RC
- Heap organization
- Bounded copying nursery
- Ignore mutations to nursery pointer fields
- RC old space
- Object remembering, coalescing, buffering
- Collection
- Process roots
- Nursery phase promotes live p to old space and
I(p) - RC phase processes object buffer, dec buffer
20View of heap in Ulterior RC
defer
remember
1
1
a
b
s
r
defer
1
1
d
t
e
RC space
non-RC space
- How can we efficiently
- Enumerate all deferred pointer fields ?
- Remember old to young pointers ?
21Bringing it Together
- Deferral
- Defer nursery roots
- Perform I(p) on nursery promotion
- Piggyback on copying nursery collection
- Coalescing
- Remember mutated RC objects
- Upon first mutation, dec each referent
- At GC time, inc each referent
- Piggyback remset onto this mechanism
22BG-RC Write Barrier
1 private void writeBarrier(VM_Address srcObj, 2
VM_Address srcSlot, 3
VM_Address tgtObj) 4 throws
VM_PragmaInline 5 if (getLogState(srcObj)
! LOGGED) 6 writeBarrierSlow(srcObj) 7
VM_Magic.setMemoryAddress(srcSlot, tgtObj) 8
9
// unsync check for uniqueness
10 private void writeBarrierSlow(VM_Address
srcObj) 11 throws VM_PragmaNoInline 12
if (attemptToLog(srcObj)) 13
modifiedBuffer.push(srcObj) 14
enumeratePointersToDecBuffer(srcObj) //
trade-off for sparsely 15
setLogState(srcObj, LOGGED) //
modified objects 16 17
23BG-RCMutation Phase
1
0
a
b
1
1
d
e
RC space
non-RC space
root buf
obj buf
dec buf
24BG-RCMutation Phase
1
0
a
b
?
1
1
d
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
25BG-RCMutation Phase
1
0
a
b
1
1
d
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
26BG-RCMutation Phase
1
0
a
b
r
1
1
d
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
27BG-RCMutation Phase
1
0
a
b
s
r
1
1
d
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
28BG-RCMutation Phase
1
0
a
b
s
r
1
1
d
t
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
29BG-RCMutation Phase
1
0
a
b
s
r
1
1
d
t
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
30BG-RCNursery Collection Scan Roots
1
1
s
r
a
b
1
1
d
t
e
RC space
non-RC space
b
d
b
e
root buf
obj buf
dec buf
31BG-RCNursery Collection Scan Roots
1
1
1
a
b
s
r
s
1
1
d
t
e
RC space
non-RC space
b
d
b
e
s
root buf
obj buf
dec buf
32BG-RCNursery Collection Scan Roots
1
1
1
a
b
s
r
s
1
2
1
d
t
e
t
RC space
non-RC space
b
d
b
e
s
root buf
obj buf
dec buf
33BG-RCNursery Collection Process Object Buffer
2
1
1
1
a
b
s
r
s
r
1
3
1
d
t
e
t
RC space
non-RC space
b
d
b
?
e
s
root buf
obj buf
dec buf
34BG-RCNursery Collection Reclaim Nursery
2
1
1
1
a
b
r
s
r
s
Reclaim
1
3
1
d
e
t
t
RC space
non-RC space
d
b
e
s
root buf
obj buf
dec buf
35BG-RCRC Collection Process Decrement Buffer
2
1
1
1
a
b
s
r
0
3
1
d
t
e
RC space
non-RC space
d
b
?
e
s
root buf
obj buf
dec buf
36BG-RCRC Collection Recursive Decrement
1
1
1
1
a
b
s
r
0
?
3
1
free
d
t
e
RC space
non-RC space
e
b
s
root buf
obj buf
dec buf
37BG-RCRC Collection Process Decrement Buffer
1
1
1
1
a
b
s
r
2
1
t
e
RC space
non-RC space
e
b
?
s
root buf
obj buf
dec buf
38BG-RCCollection Complete!
1
1
1
1
a
b
s
r
2
1
t
e
RC space
non-RC space
b
b
?
s
s
?
root buf
obj buf
dec buf
39Controlling Pause Times
- Modest bounded nursery size
- Meta Data
- Decrement and modified object buffers
- Trigger a collection if too big
- RC time cap
- Limits time recursively decrementing RC obj in
cycle detection - Cycles - pure RC is incomplete
- Use Bacon/Rajan trial deletion algorithm
40Experimental evaluation
- Jikes RVM with MMTK
- Compare MS, BG-MS, BG-RC, RC
- Examine various heap sizes
- Collection triggers
- Each 4MB of allocation for BG-RC (1 MB for RC)
- Time cap of 60 ms
- Cycle detection at 512 KB
41Throughput/Pause time Moderate Heap Size
42Throughput Responsiveness
43Conclusion
- Ulterior design based on careful study of object
demographics and making collector aware of them - Extends deferred RC to heap objects
- Practically shows that high throughput low
pause times are compatible