Reconsidering Custom Memory Allocation - PowerPoint PPT Presentation

About This Presentation
Title:

Reconsidering Custom Memory Allocation

Description:

A comprehensive evaluation of custom allocators ... Design issues Layers. LeaHeap layer. high speed. low fragmentation. NestedHeap layer ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 59
Provided by: mathT
Category:

less

Transcript and Presenter's Notes

Title: Reconsidering Custom Memory Allocation


1
Reconsidering Custom Memory Allocation
  • Emery D. Berger
  • Benjamin G. Zorn
  • Kathryn S. McKinley
  • November 2002
  • Proceedings of the Conference on Object-Oriented
    Programming Systems, Languages, and Applications
    (OOPSLA) 2002

2
Lecture Topics
  • Custom memory allocators
  • General purpose allocators
  • Regions (good performance)
  • Reaps (very good performance and more)
  • Results and Conclusions

3
Key Contributions of the paper
  • A comprehensive evaluation of custom allocators
  • Custom allocations vs. General-Purpose allocators
    (memory consumption and performance)
  • Most programmers seeking faster memory allocation
    should use Lea allocator rather than writing
    their own

4
Key Contributions of the paper Cont.
  • The custom allocators that do provide higher
    performance use regions
  • Reaps are even better

5
Key Contributions of the paper Cont.
  • If you need fast regions use reaps
  • Otherwise use Lea allocator, rather than any
    custom allocator.

6
Related Works
  • Articles in the trade press claim Custom
    Allocators are a good idea
  • Effective C
  • C Programming language
  • Benjamin Zorn in 1993 claims it to be a waste of
    time
  • Articles on region allocation (arenas, groups,
    zones)
  • We find that all of them are true

7
General-purpose memory allocators
  • Windows XP allocator
  • Lea allocator (Linux)

8
Lea Allocator
  • An approximate best-fit allocator with different
    behavior based on object size
  • Small Objects (lt64 bytes) allocated by exact-size
    quicklists
  • Medium Objects (lt128K) coalesce quicklists
  • Large Objects allocate and free by mmap
  • The best allocator known

9
Our Benchmarks
10
Emulating Custom Semantics
  • Custom allocators often support different
    semantics from C interface
  • Region emulator
  • Full region semantics
  • General allocator
  • Records a pointer to each allocated object to
    allow region deletion
  • The pointer recorded in an out-of-band array no
    impact on drag

11
Custom memory allocators - Definition
  • Memory allocation mechanism that differs from
    general-purpose allocator in at least one of two
  • May provide more than one object for every
    allocated chunk of memory
  • May not immediately return objects to the
    system/general-purpose allocator
  • No wrappers

12
Custom allocators widespread use
  • Recommended as an optimization technique in a
    trade press
  • Apache web server, GCC, C STL
  • Direct support by C (by overloading new and
    delete operators)

13
Why programmers use Custom Allocators?
  • Improving runtime performance
  • Reducing memory consumption
  • Improving software engineering (?)

14
Improving runtime performance
  • 16 (average) of the run-time in the memory
    allocator
  • Most our benchmarks reason
  • Per-operation cost of general allocators
  • In programs with intensive use of allocator

15
Improving runtime performance Cont.
16
Reducing memory consumption
17
Improving software engineering (?)
  • Memory allocated by a custom allocator cant be
    managed by another allocator
  • Free on custom allocated object may cause a
    segmentation fault
  • Difficult to understand the source of memory
    consumption in the program
  • No Purify
  • No parallel allocator for SMP scalability
  • No GC
  • No shared multi-language heap

18
Improving software engineering (!)
  • Region-based allocator simplifies memory
    management
  • Memory area can be deleted by a single call
  • Separate memory areas
  • Regions are good for multithreaded server
    applications
  • Memory spaces isolation
  • Memory leaks prevention
  • Apache web server

19
A Taxonomy of Custom Allocators
  • Apply your knowledge about some set of objects
  • Use regions to free objects dead at the same time
  • Take advantage of object sizes
  • Use known allocation patterns

20
Benchmark allocators characteristics
  • Per-class allocators
  • Regions
  • Nested regions
  • Obstack
  • Custom patterns

21
Per-class allocators
  • Objects of the same size (type)
  • Eliding size checks
  • Freelist with objects of the specific type
  • The same API like malloc and free

22
Regions
  • Allocation by incrementing a pointer to a large
    chunks of memory
  • Only entire region deletion - no deletion of
    individual objects
  • freeAll function
  • Nested regions
  • Nested object lifetime
  • Obstack (Object Stack)
  • Deletion of every object allocated after a
    certain object

23
Custom patterns
  • A general purpose allocator optimized for a
    particular pattern of object behavior

24
Custom allocators characteristics Cont.
25
Problems with regions
  • Excessive memory retention
  • Unbounded memory consumption
  • Unbounded buffers
  • Dynamic arrays
  • Producerconsumer patterns
  • Complicated programming of server applications
    (Apache)

26
The ideal allocator
  • Region Semantics
  • General-Puspose Allocation (heap)
  • Reaps

27
Reaps
Heaps
Regions
malloc free
malloc freeAll
Reaps
malloc free freeAll
28
Reaps - Example
29
Implementation Issues
  • Initially, Region similar behavior
  • Allocation by bumping a pointer
  • Geometrically-increasing chunks of memory
    threaded onto a linked list
  • Header for every allocated object
  • Freed objects (reapFree) are placed in an
    associated heap
  • Allocations use memory from this heap

30
Reap allocation interface
  • void reapCreate (void reap, void parent)
  • void reapDestroy (void reap)
  • void reapFreeAll (void reap) //clear
  • void reapMalloc (void reap, size_t size)
  • void reapFree (void reap, void object)

31
Design issues
  • Heap Layers
  • Mixins

32
Design issues Cont.
Sbrk
RegionHeap
CoalesceableHeap
LeaHeap
ClearOptimizedHeap
NestedHeap
33
Design issues Layers
  • LeaHeap layer
  • high speed
  • low fragmentation
  • NestedHeap layer
  • ClearOptimizedHeap layer
  • nothingOnHeap flag
  • Fast allocations by pointer bumping on first heap
  • Second heap after freeing an object
  • CoalesceableHeap layer
  • adds per-object metadata
  • RegionHeap layer
  • Linked list of allocated objects
  • clear()

34
Benchmark allocation statistics
35
Benchmark allocation statistics Cont.
  • Programs with general-purpose allocators
  • Not allocation-intensive
  • Spend little time in memory allocator
  • Programs with custom allocators
  • Tend to allocate many small objects
  • More time in memory allocator
  • Correct pinpointing of memory manager as a
    significant factor in the performance

36
Results
  • Different memory management policies compared
    (general, custom, reaps)
  • Execution time
  • Memory consumption

37
Results - technicalities
  • Runtime the best of three
  • Visual C 6.0 compilation
  • Pentium III 600MHz 320Mb under Windows XP

38
Runtime Performance
39
Runtime Performance Cont.
  • Custom Vs Windows justifies the use of custom
    allocator
  • Lea provides almost the same performance as
    custom - except regions
  • Reaps are comparable to Lea and to custom

40
Memory Consumption
41
Memory Consumption Cont.
  • No Windows XP no equivalent way to keep track
    of memory consumption
  • Reaps dont use individual deletion
  • Mixed results
  • Region space advantage - misleading

42
Evaluating Region Allocation
  • Total drag an average ratio of heap sizes with
    and without immediate object deallocation
  • Immediate free of every dead object total drag
    of 1
  • Non-region allocators minimal drag
  • Region allocators high drag, substantial
    increase in memory consumption

43
Evaluating Region Allocation Cont.
44
Experimental Comparison to previous work
45
Reaps in Apache
  • Using space consumption advantages by allowing
    individual deletion
  • bc an arbitrary-precision calculator language
  • Apache region rerouting to reaps reapFree
    (ap_pfree) call
  • Redefinition of malloc and free in bc
  • Computing 1000th prime consumes 7.4Mb without
    ap_free and 240 kilobytes with

46
Why programmers use custom allocators to no effect
  • Recommended practice
  • Premature optimization
  • Drift
  • Improved competition

47
Conclusions
  • Despite widespread belief custom allocator
    doesnt always improve performance
  • Lea allocator is as fast or even faster
  • The exception is region-based allocator
  • Reaps high-performance and reduction in memory
    consumption

48
Future plans
  • Reaps integration with Hoard scalable memory
    allocator
  • Reaps integration into garbage-collected setting

49
Questions
  • ?

50
The End
51
Custom Allocator implementation
  • Standard C way (inheritance)
  • Significant overhead of virtual method dispatch
  • Limits compiler optimizations
  • Fixed relations between classes, single
    inheritance structure difficult reuse

52
Mixins
  • Mixins
  • Can be reparented
  • template ltclass Supergt
  • class Mixin public Super
  • No single class hierarchy
  • class Composition1 public AltBgt
  • class Composition2 public AltCgt

53
Heap Layers
  • Mixin
  • Provides Malloc and Free
  • Coding Guidelines
  • Handle NULL returned by malloc() correctly
  • Destructor must free any memory held by layer
  • Top heaps system-provided memory wrappers

54
Example Composing a Per-Class Allocator
  • Perclass pool of memory
  • Same-sized objects
  • Singly-linked freelist for memory management
  • No change of source code for the original class
  • PerClassHeap Utility Class - to adapt a class to
    use heap layer as its allocator
  • FreeListHeap Heap Layer

55
Example - PerClassHeap
  • Template ltclass Object, class SuperHeapgt
  • class PerClassHeap public Object
  • public
  • inline void opertor new (size_t sz)
  • return getHeap().malloc (sz)
  • inline void opertor delete (void ptr)
  • return getHeap().free (ptr)
  • private
  • static SuperHeap GetHeap ()
  • static SuperHeap theHeap
  • return theHeap

56
Example - FreeListHeap
57
Example - Combination
  • Foo subclass that uses per-class pools
  • Class FasterFoo
  • public
  • PerClassHeapltFoo, FreelistHeapltmallocHeapgt gt

58
The End!!!
Write a Comment
User Comments (0)
About PowerShow.com