Ulterior Reference Counting Fast GC Without The Wait - PowerPoint PPT Presentation

About This Presentation
Title:

Ulterior Reference Counting Fast GC Without The Wait

Description:

... 3625.49 1000.00 0.85 26617.98 1000.00 0.78 2142.05 1000.00 0.87 3625486.00 26617983.00 2142053.00 3633.81 1000.00 0.85 26737.88 1000.00 0.78 2158.04 1000.00 0 ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 44
Provided by: Dimi157
Category:

less

Transcript and Presenter's Notes

Title: Ulterior Reference Counting Fast GC Without The Wait


1
Ulterior Reference CountingFast GC Without The
Wait
  • Steve Blackburn Kathryn McKinley
  • Presented by Dimitris Prountzos
  • Slides adapted from presentation by Steve
    Blackburn

2
Outline
  • Throughput-Responsiveness problem
  • Reference counting optimizations
  • Ulterior in detail
  • BG-RC in action
  • Experimental evaluation
  • Conclusion

3
Throughput/Responsiveness Trade-off
  • GC and mutator share CPU
  • Throughput net GC/mutator ratio
  • Responsivness length of GC pauses

4
The Ulterior approach
  • Match mechanisms to object demographics
  • Copying nursery (young space)
  • Highly mutated, high mortality young objects
  • Ignores most mutations
  • GC time proportional to survivors, space
    efficient
  • RC mature space
  • Low mutation, low mortality old objects
  • GC time proportional to mutations, space
    efficient
  • Generalize deferred RC to heap objects
  • Defer fields of highly mutated objects
    enumerate them quickly
  • Reference count only infrequently mutated fields

5
Pure Reference Counting
  • Tracks mutations RCM(p)
  • RCM(p) generates a decrement and an increment for
    the before and after values of p
  • RCM(p) ? RC(pbefore)--, RC(pafter)
  • If RC0, Free

1
a
1
b
RC space
6
Pure Reference Counting
  • Tracks mutations RCM(p)
  • RCM(p) generates a decrement and an increment for
    the before and after values of p
  • RCM(p) ? RC(pbefore)--, RC(pafter)
  • If RC0, Free

1
a
0
1
b
c
RC space
7
Pure Reference Counting
  • Tracks mutations RCM(p)
  • RCM(p) generates a decrement and an increment for
    the before and after values of p
  • RCM(p) ? RC(pbefore)--, RC(pafter)
  • If RC0, Free

1
a
?
0
1
b
c
RC space
8
Pure Reference Counting
  • Tracks mutations RCM(p)
  • RCM(p) generates a decrement and an increment for
    the before and after values of p
  • RCM(p) ? RC(pbefore)--, RC(pafter)
  • If RC0, Free

1
a
1
c
RC space
RCM(p) for every mutation is very expensive
9
RC Optimizations
  • Buffering apply RC(p)--, RC(p) later
  • Coalescing apply RCM(p) only for the initial and
    final values of p (coalesce intermediate values)
  • RCM(p), RCM(p1), ... RCM(pn) ? RC(pinitial)--,
    RC(pfinal)
  • Deferral of RCM events

10
Deferred Reference CountingGoal Ignore RCM(p)
for stacks registers
  • Deferral of p
  • A mutation of p does not generate an RCM(p)
  • Correctness
  • For all deferred p RCR(p) at each GC
  • Retain Event RCR(p)
  • po temporarily retains o regardless of RC(o)
  • Deutsch/Bobrow use a Zero Count Table
  • Bacon et al. use a temporary increment

11
Classic DeferralIn deferral phase Ignore RCM(p)
for stacks registers
0
a
1
b
RC space
12
Classic DeferralIgnore RCM(p) for stacks
registers
0
a
0
1
b
c
RC space
Breaks RC0 Invariant
13
Classic Deferral (Bacon et al.)
  • Divide execution in epochs
  • Store information in buffers
  • Root buffer (RB) Store 1st level objects
  • Increment buffer (IB) Store increments to 1st
    level objects
  • Decrement buffer (DB) Store decrements to 1st
    level objects
  • At GC time do
  • Look at RB and apply temporary increments to all
    objects there
  • Process IB of this epoch
  • Look at RB of previous epoch and apply decrements
    to all objects there
  • Process DB of previous epoch
  • During DB processing recycle o if RC(o)0
  • Avoid race conditions by
  • Processing IB before DB
  • Processing DB of one epoch behind

14
Classic Deferral (Bacon et al.)
At GC time, RCR(p) for root pointers applies
temporary increments.
1
a
1
1
b
c
RC space
a
b
root buf
dec buf
15
Classic Deferral (Bacon et al.)
At next GC, apply decrements
1
a
1
1
b
c
RC space
a
b
root buf
dec buf
16
Classic Deferral (Bacon et al.)
Key Efficient enumeration of deferred pointers
At next GC, apply decrements
1
a
1
1
b
c
RC space
a
b
root buf
dec buf
17
Classic Deferral (Bacon et al.)
Better, but not good enough!
1
a
1
1
b
c
RC space
root buf
dec buf
18
Ulterior Reference Counting
  • Idea Extend deferral to select heap pointers
  • e.g. All pointers within nursery objects
  • Deferral is not a fixed property of p
  • e.g. A nursery object gets promoted
  • Integrate Event I(p)
  • Changes p from deferred to not deferred

19
BG-RCBounded Nursery Generational - RC
  • Heap organization
  • Bounded copying nursery
  • Ignore mutations to nursery pointer fields
  • RC old space
  • Object remembering, coalescing, buffering
  • Collection
  • Process roots
  • Nursery phase promotes live p to old space and
    I(p)
  • RC phase processes object buffer, dec buffer

20
View of heap in Ulterior RC
defer
remember
1
1
a
b
s
r
defer
1
1
d
t
e
RC space
non-RC space
  • How can we efficiently
  • Enumerate all deferred pointer fields ?
  • Remember old to young pointers ?

21
Bringing it Together
  • Deferral
  • Defer nursery roots
  • Perform I(p) on nursery promotion
  • Piggyback on copying nursery collection
  • Coalescing
  • Remember mutated RC objects
  • Upon first mutation, dec each referent
  • At GC time, inc each referent
  • Piggyback remset onto this mechanism

22
BG-RC Write Barrier
1 private void writeBarrier(VM_Address srcObj, 2

VM_Address srcSlot, 3
VM_Address tgtObj) 4 throws
VM_PragmaInline 5 if (getLogState(srcObj)
! LOGGED) 6 writeBarrierSlow(srcObj) 7
VM_Magic.setMemoryAddress(srcSlot, tgtObj) 8
9
// unsync check for uniqueness
10 private void writeBarrierSlow(VM_Address
srcObj) 11 throws VM_PragmaNoInline 12
if (attemptToLog(srcObj)) 13
modifiedBuffer.push(srcObj) 14
enumeratePointersToDecBuffer(srcObj) //
trade-off for sparsely 15
setLogState(srcObj, LOGGED) //
modified objects 16 17
23
BG-RCMutation Phase
1
0
a
b
1
1
d
e
RC space
non-RC space
root buf
obj buf
dec buf
24
BG-RCMutation Phase
1
0
a
b
?
1
1
d
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
25
BG-RCMutation Phase
1
0
a
b
1
1
d
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
26
BG-RCMutation Phase
1
0
a
b
r
1
1
d
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
27
BG-RCMutation Phase
1
0
a
b
s
r
1
1
d
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
28
BG-RCMutation Phase
1
0
a
b
s
r
1
1
d
t
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
29
BG-RCMutation Phase
1
0
a
b
s
r
1
1
d
t
e
RC space
non-RC space
b
d
e
root buf
obj buf
dec buf
30
BG-RCNursery Collection Scan Roots
1
1
s
r
a
b
1
1
d
t
e
RC space
non-RC space
b
d
b
e
root buf
obj buf
dec buf
31
BG-RCNursery Collection Scan Roots
1
1
1
a
b
s
r
s
1
1
d
t
e
RC space
non-RC space
b
d
b
e
s
root buf
obj buf
dec buf
32
BG-RCNursery Collection Scan Roots
1
1
1
a
b
s
r
s
1
2
1
d
t
e
t
RC space
non-RC space
b
d
b
e
s
root buf
obj buf
dec buf
33
BG-RCNursery Collection Process Object Buffer
2
1
1
1
a
b
s
r
s
r
1
3
1
d
t
e
t
RC space
non-RC space
b
d
b
?
e
s
root buf
obj buf
dec buf
34
BG-RCNursery Collection Reclaim Nursery
2
1
1
1
a
b
r
s
r
s
Reclaim
1
3
1
d
e
t
t
RC space
non-RC space
d
b
e
s
root buf
obj buf
dec buf
35
BG-RCRC Collection Process Decrement Buffer
2
1
1
1
a
b
s
r
0
3
1
d
t
e
RC space
non-RC space
d
b
?
e
s
root buf
obj buf
dec buf
36
BG-RCRC Collection Recursive Decrement
1
1
1
1
a
b
s
r
0
?
3
1
free
d
t
e
RC space
non-RC space
e
b
s
root buf
obj buf
dec buf
37
BG-RCRC Collection Process Decrement Buffer
1
1
1
1
a
b
s
r
2
1
t
e
RC space
non-RC space
e
b
?
s
root buf
obj buf
dec buf
38
BG-RCCollection Complete!
1
1
1
1
a
b
s
r
2
1
t
e
RC space
non-RC space
b
b
?
s
s
?
root buf
obj buf
dec buf
39
Controlling Pause Times
  • Modest bounded nursery size
  • Meta Data
  • Decrement and modified object buffers
  • Trigger a collection if too big
  • RC time cap
  • Limits time recursively decrementing RC obj in
    cycle detection
  • Cycles - pure RC is incomplete
  • Use Bacon/Rajan trial deletion algorithm

40
Experimental evaluation
  • Jikes RVM with MMTK
  • Compare MS, BG-MS, BG-RC, RC
  • Examine various heap sizes
  • Collection triggers
  • Each 4MB of allocation for BG-RC (1 MB for RC)
  • Time cap of 60 ms
  • Cycle detection at 512 KB

41
Throughput/Pause time Moderate Heap Size
42
Throughput Responsiveness
43
Conclusion
  • Ulterior design based on careful study of object
    demographics and making collector aware of them
  • Extends deferred RC to heap objects
  • Practically shows that high throughput low
    pause times are compatible
Write a Comment
User Comments (0)
About PowerShow.com