Title: Connectivity-Based%20Garbage%20Collection
1Connectivity-BasedGarbage Collection
- Martin HirzelUniversity of Colorado at Boulder
- Collaborators Amer Diwan, Michael Hind, Hal
Gabow, Johannes Henkel, Matthew Hertz
2Garbage Collection Benefits
- Garbage collection leads to simpler
- Design ? no complex deallocation protocols
- Implementation ? automatic deallocation
- Maintenance ? fewer bugs
- Benefits are widely accepted
- Java, C, Python,
3Garbage CollectionHavent we solved this
problem yet?
- For a state-of-the-art garbage collector
- time 14 of execution time
- space 3x high watermark
- pauses 0.8 seconds
- Can reduce any one cost
- Challenge reduce all three costs
4Example Heap
s1
o1
Boxes heap objects
o2
o3
o4
s2
o5
o6
Arrows pointers
o7
o8
o9
o15
g
o10
o11
o14
o12
o13
Long box stack global variables
5Thesis
o1
- Objects form distinct data structures
- Connected objects die together
- Garbage collectors can exploit 1. and 2. to
reclaim objects efficiently
o2
o3
o4
o5
o6
o7
o8
o9
o15
o10
o11
o14
o12
o13
stack globals
6Experimental Infrastructure
- JikesRVM Research Virtual Machine
- From IBM Research
- Written in Java
- Application and runtime system share heap
- ? Good garbage collection even more important
- Benchmarks
- SPECjvm98 suite and SPECjbb2000
- Java Olden suite
- xalan, ipsixql, nfc, jigsaw
7Outline
- Garbage Collector Design Principles
- Family of Garbage Collectors
- Design Space Exploration
- Pointer Analysis for Java
8Garbage Collector Design PrinciplesDo partial
collections.
- Dont collect the full heap every time
- ? Shorter pause times
o1
o2
o3
o4
o5
o6
o7
o8
o9
o15
o10
o11
o14
o12
o13
stack globals
9Garbage Collector Design PrinciplesPredict
lifetime based on age.
- Generational hypothesisMost objects die young
- Generational garbage collection
- Partition by age
- Collect young objects most often
- ? Low time overhead
- Thats the state of the art.
o1
o2
o3
o4
o5
o6
o7
o8
o9
o15
o10
o11
o14
o12
o13
stack globals
young generation
old generation
10Garbage Collector Design PrinciplesGenerational
GC Problems
o1
- Regular full collections? Long peak pause
- Old-to-young pointers? Need bookkeeping
- 37.5 long-lived objects? Pig in the python
o2
o3
o4
o5
o6
o7
o8
o9
o15
o10
o11
o14
o12
o13
stack globals
young generation
old generation
11Garbage Collector Design PrinciplesCollect
connected objects together.
Likelihood that two objects die at the same time Likelihood that two objects die at the same time Likelihood that two objects die at the same time
Connectivity Example Likelihood
Any pair 33.1
Weakly connected 46.3
Strongly connected 72.4
Direct pointer 76.4
?
o2
o1
o2
o1
o2
o1
o2
o1
12Garbage Collector Design PrinciplesFocus on
objects with few ancestors.
Lifetime Median number of ancestor objects
Short 2 objects
Long 83,324 objects
? Shortlived objects are easy to collect
13Garbage Collector Design PrinciplesPredict
lifetime based on roots.
Lifetime Lifetime
Objects reachable Short Long
indirectly from stack 25.6 16.2
only directly from stack 32.9 0.8
from globals 4.0 20.5
Total 62.5 37.5
o1
s
o2
g
o3
o4
For details, see our ISMM02 paper.
stack globals
14Outline
- Garbage Collector Design Principles
- Family of Garbage Collectors
- Design Space Exploration
- Pointer Analysis for Java
15CBGC Family of Garbage CollectorsConnectivity-Ba
sed Garbage Collection
p1
o1
- Do partial collections.
- Collect connected objects together.
- Predict lifetime based on age.
- Focus on objects with few ancestors.
- Predict lifetime based on roots.
o2
p2
o3
o4
o5
o6
o7
o8
o9
p3
o15
o10
o11
o14
o12
o13
p4
stack globals
16Family of Garbage CollectorsComponents of CBGC
- Before allocation
- PartitioningDecide into which partition to put
each object - Collection algorithm
- EstimatorEstimate dead live objects for each
partition - ChooserChoose good set of partitions
- Partial collectionCollect chosen partitions
17Family of Garbage CollectorsPartitioning Problem
p1
- Find fine-grained partitions, where
- Partition edges respect pointers
- Objects dont move between partitions
o1
o2
p2
o3
o4
o5
o6
o7
o8
o9
p3
o15
o10
o11
o14
o12
o13
p4
stack globals
18Family of Garbage CollectorsPartitioning
Solutions
p1
- Pointer analysis
- Type-based Harris
- o1 may point to o2 if o1 has a field of atype
compatible to o2 - Constraint-based Andersen
- We will discuss this later in the talk
o1
o2
p2
o3
o4
o5
o6
o7
o8
o9
p3
o15
o10
o11
o14
o12
o13
p4
stack globals
19Family of Garbage CollectorsEstimator Problem
p1
1 dead 2 live
- For each partition guess
- dead
- Objects that can be reclaimed
- Pay-off
- live
- Objects that must be traversed
- Cost
p2
3 dead 3 live
p3
2 dead 0 live
2 dead 2 live
p4
stack globals
20Family of Garbage CollectorsEstimator Solutions
p1
1 dead 2 live
- Heuristics
- Connected objects die together
- Most objects die young
- Objects reachable from globals live long
- The past predicts the future
p2
3 dead 3 live
p3
2 dead 0 live
2 dead 2 live
p4
stack globals
21Family of Garbage CollectorsChooser Problem
p1
1 dead 2 live
- Pick subset of partitions
- Maximize total dead
- Minimize total live
- Closed under predecessor relation
- ? No bookkeeping for external pointers
p2
3 dead 3 live
7 dead 5 live
p3
p3
2 dead 0 live
2 dead 2 live
p4
stack globals
22Family of Garbage CollectorsChooser Solutions
p1
1 dead 2 live
- Optimal algorithm based on network flow TR
- Simpler, greedy algorithm
p2
3 dead 3 live
7 dead 5 live
p3
p3
2 dead 0 live
2 dead 2 live
p4
stack globals
23Family of Garbage CollectorsPartial Collection
Problem
- Look only at chosen partitions
- Traverse reachable objects
- Reclaim unreachable objects
rest of heap
o2
p2
o
o5
o5
o6
o7
o8
o8
o
o9
p3
o15
o10
o10
o11
o11
o14
o12
o13
p4
stack globals
24Family of Garbage CollectorsPartial Collection
Solutions
- Generalize canonical full-heap algorithms
- Mark and sweep McCarthy60
- Semi-space copying Cheney70
- Treadmill Baker92
rest of heap
o2
p2
o5
o5
o6
o7
o8
o8
o9
p3
o15
o10
o10
o11
o11
o14
o12
o13
p4
stack globals
25Outline
- Garbage Collector Design Principles
- Family of Garbage Collectors
- Design Space Exploration
- Pointer Analysis for Java
26Design Space ExplorationQuestions
- How good is a naïve CBGC?
- How good could CBGC be in 20 years?
- How well does CBGC do in a JVM?
27Design Space ExplorationSimulator Methodology
- Garbage collection simulator (under GPL)
- Uses traces of allocations and pointer
writesfrom our benchmark runs - Simulator advantages
- Easier to implement variety of collector
algorithms - Know entire trace beforehandcan use that for
in 20 years experiments - Simulator disadvantages
- No bottom-line performance numbers
- Currently adding CBGC to JikesRVM
28Design Space ExplorationHow good is a naïve CBGC?
1.72
Cost in time
Cost in space
Pause times
Full-heap Semi-space copying CBGC-naïve Type-based partitioning Harris Heuristics estimator Appel Copying generational
0
0.87
0
0.22
0
29Design Space ExplorationHow good could CBGC be
in 20 years?
1.72
Cost in time
Cost in space
Pause times
Full-heap Semi-space copying CBGC-oracles Partitioningand estimatorbased on trace Appel Copying generational
0
0.87
0
0.22
0
30Design Space ExplorationHow good could CBGC be
in 20 years?
- CBGC with oracles beats Appel
- We did not find a performance wall
- CBGC has potential
- The performance gap between CBGC with oracles
and naïve CBGC is large - Research challenges
31How well does CBGC doin a Java virtual machine?
- Implementation in progress
- Need a pointer analysis for the partitioning
32Outline
- Garbage Collector Design Principles
- Family of Garbage Collectors
- Design Space Exploration
- Pointer Analysis for Java
33Pointer Analysis for JavaWhich analysis do we
need?
Cost in time
1.7
0
Full-heap CBGC Appel
Semi-space copying Type-based partitioning Harris Type-based partitioning (oracles) Allocation site partitioning (oracles) Copying generational
Andersen
34Pointer Analysis for JavaAndersens Analysis
- Allocation-site granularity
- Set-inclusion constraints
- Flow and context insensitive
cant analyze Javaahead of time!
What When
Constraint generation Model flow of pointers Ahead-of-timecompilation
Constraint propagation Find fixed-point solution Ahead-of-time compilation
35Pointer Analysis for JavaAndersen for all of Java
- Do
- as little as possible
- as late as possible
What When
Constraint generation Model flow of pointers VM build and start-up Class loading Type resolution Method compilation (JIT) Execution of reflection Execution of native code
Constraint propagation Find next fixed-point solution Points-to information used (before garbage collection)
36Pointer Analysis for JavaCorrectness Properties
Constraintgeneration
Constraintpropagation
time
If there is a pointer
then the results predict it
- Can not do any better for Java!
37Pointer Analysis for JavaAnalysis Cost
Constraint Constraint Constraint propagation Constraint propagation Constraint propagation Constraint propagation Constraint propagation Constraint propagation
generation generation Eager Eager At GC At GC At End At End
Seconds Count Seconds Count Seconds Count Seconds
compress 21.4 130 3.2 5 40.4 1 67.4
db 20.1 143 3.6 5 42.9 1 71.4
mtrt 20.3 265 2.1 5 46.2 1 68.1
mpegaudio 20.6 319 2.2 5 46.1 1 66.6
jack 21.2 397 4.2 7 49.0 1 78.2
jess 22.3 733 6.8 8 49.7 1 85.7
javac 21.1 1,107 5.9 10 87.4 1 187.6
xalan 20.1 1,728 4.9 8 85.7 1 215.7
? Expensive, but once behavior stabilizes,costs
diminish to zero
38Pointer Analysis for JavaValidation
- Lots of corner cases
- Dynamic class loading
- Reflection
- Native code
- Missing any one leads to nasty bugs
- CBGC relies on conservative results
- We performed validation runs
- Check analysis results against pointers in heap
during garbage collection
39Wrapping Up
40Related Work Using Program Analysis for Garbage
Collection
- Stack allocation ParkGoldberg92,
- Regions TofteTalpin97,
- Liveness analysis AgesenDetlefsMoss98,
- Early reclamation Harris99
- Thread-local heaps Steensgaard00,
- Object inlining DolbyChien00
- Write-barrier removal ZeeRinard02, Shuf02
41Related WorkPointer analyses for Java
- Andersens analysis for static Java
- RountevMilanovaRyder01
- LiangPenningsHarrold01
- WhaleyLam02
- LhotakHendren03
- Weaker analyses with dynamic class loading
- DOIT PechtchanskiSarkar01
- XTA QianHendren04
- Rufs escape analysis BogdaSingh01, King03
- Demand-driven / incremental analysis
42Other Research Interests
- Accuracy of Garbage Collection
- M.S.Thesis,ISMM00,ECOOP01,TOPLAS02
- Profiling
- FDDO01,Patent01a
- Dynamic Optimizations, Prefetching
- PLDI02,Patent02b
- Future directions
- More techniques for performance improvement
- Reducing bugs, improving productivity
43Contributions presented in this talk
- Connectivity-based GC design principles
- ISMM02
- CBGC, a new family of garbage collectors
- Design space exploration with simulator
- OOPSLA03
- First non-trivial pointer analysis for Java
- ECOOP04 (to appear)
- http//www.cs.colorado.edu/hirzel