Title: Reconsidering Custom Memory Allocation
1Reconsidering Custom Memory Allocation
- Emery D. Berger
- Benjamin G. Zorn
- Kathryn S. McKinley
- November 2002
- Proceedings of the Conference on Object-Oriented
Programming Systems, Languages, and Applications
(OOPSLA) 2002
2Lecture Topics
- Custom memory allocators
- General purpose allocators
- Regions (good performance)
- Reaps (very good performance and more)
- Results and Conclusions
3Key Contributions of the paper
- A comprehensive evaluation of custom allocators
- Custom allocations vs. General-Purpose allocators
(memory consumption and performance) - Most programmers seeking faster memory allocation
should use Lea allocator rather than writing
their own
4Key Contributions of the paper Cont.
- The custom allocators that do provide higher
performance use regions - Reaps are even better
5Key Contributions of the paper Cont.
- If you need fast regions use reaps
- Otherwise use Lea allocator, rather than any
custom allocator.
6Related Works
- Articles in the trade press claim Custom
Allocators are a good idea - Effective C
- C Programming language
- Benjamin Zorn in 1993 claims it to be a waste of
time - Articles on region allocation (arenas, groups,
zones) - We find that all of them are true
7General-purpose memory allocators
- Windows XP allocator
- Lea allocator (Linux)
8Lea Allocator
- An approximate best-fit allocator with different
behavior based on object size - Small Objects (lt64 bytes) allocated by exact-size
quicklists - Medium Objects (lt128K) coalesce quicklists
- Large Objects allocate and free by mmap
- The best allocator known
9Our Benchmarks
10Emulating Custom Semantics
- Custom allocators often support different
semantics from C interface - Region emulator
- Full region semantics
- General allocator
- Records a pointer to each allocated object to
allow region deletion - The pointer recorded in an out-of-band array no
impact on drag
11Custom memory allocators - Definition
- Memory allocation mechanism that differs from
general-purpose allocator in at least one of two - May provide more than one object for every
allocated chunk of memory - May not immediately return objects to the
system/general-purpose allocator - No wrappers
12Custom allocators widespread use
- Recommended as an optimization technique in a
trade press - Apache web server, GCC, C STL
- Direct support by C (by overloading new and
delete operators)
13Why programmers use Custom Allocators?
- Improving runtime performance
- Reducing memory consumption
- Improving software engineering (?)
14Improving runtime performance
- 16 (average) of the run-time in the memory
allocator - Most our benchmarks reason
- Per-operation cost of general allocators
- In programs with intensive use of allocator
15Improving runtime performance Cont.
16Reducing memory consumption
17Improving software engineering (?)
- Memory allocated by a custom allocator cant be
managed by another allocator - Free on custom allocated object may cause a
segmentation fault - Difficult to understand the source of memory
consumption in the program - No Purify
- No parallel allocator for SMP scalability
- No GC
- No shared multi-language heap
18Improving software engineering (!)
- Region-based allocator simplifies memory
management - Memory area can be deleted by a single call
- Separate memory areas
- Regions are good for multithreaded server
applications - Memory spaces isolation
- Memory leaks prevention
- Apache web server
19A Taxonomy of Custom Allocators
- Apply your knowledge about some set of objects
- Use regions to free objects dead at the same time
- Take advantage of object sizes
- Use known allocation patterns
20Benchmark allocators characteristics
- Per-class allocators
- Regions
- Nested regions
- Obstack
- Custom patterns
21Per-class allocators
- Objects of the same size (type)
- Eliding size checks
- Freelist with objects of the specific type
- The same API like malloc and free
22Regions
- Allocation by incrementing a pointer to a large
chunks of memory - Only entire region deletion - no deletion of
individual objects - freeAll function
- Nested regions
- Nested object lifetime
- Obstack (Object Stack)
- Deletion of every object allocated after a
certain object
23Custom patterns
- A general purpose allocator optimized for a
particular pattern of object behavior
24Custom allocators characteristics Cont.
25Problems with regions
- Excessive memory retention
- Unbounded memory consumption
- Unbounded buffers
- Dynamic arrays
- Producerconsumer patterns
- Complicated programming of server applications
(Apache)
26The ideal allocator
- Region Semantics
- General-Puspose Allocation (heap)
- Reaps
27Reaps
Heaps
Regions
malloc free
malloc freeAll
Reaps
malloc free freeAll
28Reaps - Example
29Implementation Issues
- Initially, Region similar behavior
- Allocation by bumping a pointer
- Geometrically-increasing chunks of memory
threaded onto a linked list - Header for every allocated object
- Freed objects (reapFree) are placed in an
associated heap - Allocations use memory from this heap
30Reap allocation interface
- void reapCreate (void reap, void parent)
- void reapDestroy (void reap)
- void reapFreeAll (void reap) //clear
- void reapMalloc (void reap, size_t size)
- void reapFree (void reap, void object)
31Design issues
32Design issues Cont.
Sbrk
RegionHeap
CoalesceableHeap
LeaHeap
ClearOptimizedHeap
NestedHeap
33Design issues Layers
- LeaHeap layer
- high speed
- low fragmentation
- NestedHeap layer
- ClearOptimizedHeap layer
- nothingOnHeap flag
- Fast allocations by pointer bumping on first heap
- Second heap after freeing an object
- CoalesceableHeap layer
- adds per-object metadata
- RegionHeap layer
- Linked list of allocated objects
- clear()
34Benchmark allocation statistics
35Benchmark allocation statistics Cont.
- Programs with general-purpose allocators
- Not allocation-intensive
- Spend little time in memory allocator
- Programs with custom allocators
- Tend to allocate many small objects
- More time in memory allocator
- Correct pinpointing of memory manager as a
significant factor in the performance
36Results
- Different memory management policies compared
(general, custom, reaps) - Execution time
- Memory consumption
37Results - technicalities
- Runtime the best of three
- Visual C 6.0 compilation
- Pentium III 600MHz 320Mb under Windows XP
38Runtime Performance
39Runtime Performance Cont.
- Custom Vs Windows justifies the use of custom
allocator - Lea provides almost the same performance as
custom - except regions - Reaps are comparable to Lea and to custom
40Memory Consumption
41Memory Consumption Cont.
- No Windows XP no equivalent way to keep track
of memory consumption - Reaps dont use individual deletion
- Mixed results
- Region space advantage - misleading
42Evaluating Region Allocation
- Total drag an average ratio of heap sizes with
and without immediate object deallocation - Immediate free of every dead object total drag
of 1 - Non-region allocators minimal drag
- Region allocators high drag, substantial
increase in memory consumption
43Evaluating Region Allocation Cont.
44Experimental Comparison to previous work
45Reaps in Apache
- Using space consumption advantages by allowing
individual deletion - bc an arbitrary-precision calculator language
- Apache region rerouting to reaps reapFree
(ap_pfree) call - Redefinition of malloc and free in bc
- Computing 1000th prime consumes 7.4Mb without
ap_free and 240 kilobytes with
46Why programmers use custom allocators to no effect
- Recommended practice
- Premature optimization
- Drift
- Improved competition
47Conclusions
- Despite widespread belief custom allocator
doesnt always improve performance - Lea allocator is as fast or even faster
- The exception is region-based allocator
- Reaps high-performance and reduction in memory
consumption
48Future plans
- Reaps integration with Hoard scalable memory
allocator - Reaps integration into garbage-collected setting
49Questions
50The End
51Custom Allocator implementation
- Standard C way (inheritance)
- Significant overhead of virtual method dispatch
- Limits compiler optimizations
- Fixed relations between classes, single
inheritance structure difficult reuse
52Mixins
- Mixins
- Can be reparented
- template ltclass Supergt
- class Mixin public Super
- No single class hierarchy
- class Composition1 public AltBgt
- class Composition2 public AltCgt
53Heap Layers
- Mixin
- Provides Malloc and Free
- Coding Guidelines
- Handle NULL returned by malloc() correctly
- Destructor must free any memory held by layer
- Top heaps system-provided memory wrappers
54Example Composing a Per-Class Allocator
- Perclass pool of memory
- Same-sized objects
- Singly-linked freelist for memory management
- No change of source code for the original class
- PerClassHeap Utility Class - to adapt a class to
use heap layer as its allocator - FreeListHeap Heap Layer
55Example - PerClassHeap
- Template ltclass Object, class SuperHeapgt
- class PerClassHeap public Object
- public
- inline void opertor new (size_t sz)
- return getHeap().malloc (sz)
- inline void opertor delete (void ptr)
- return getHeap().free (ptr)
- private
- static SuperHeap GetHeap ()
- static SuperHeap theHeap
- return theHeap
56Example - FreeListHeap
57Example - Combination
- Foo subclass that uses per-class pools
- Class FasterFoo
- public
- PerClassHeapltFoo, FreelistHeapltmallocHeapgt gt
58The End!!!