Title: Hardware Support for Dynamic Storage Allocation
1Hardware Support for Dynamic Storage Allocation
Steven M. Donahue M.S. Thesis Defense April 14th,
2003 Advisor Dr. Ron K. Cytron
Center for Distributed Object Computing Department
of Computer Science Washington University
Funded by the National Science Foundation under
grant ITR-008124
2Outline
- Motivation
- Background
- Related work
- Our approach
- Analysis and results
- Conclusions and future work
3MotivationReal Time
Application
Avionics
Language
Java
Timber
Ada
Operating System
lynxOS
VxWorks
QNX
Hardware
4MotivationReal Time Java
- Specify time properties
- Start
- Period
- Cost
- Deadline (relative)
- Storage allocation must be predictable and
reasonably bounded
new Foo()
5MotivationIntelligent RAM
Storage Management
IRAM
CPU
RAM
alloc, dealloc
Logic
cache
M
M
L2 cache
Data Bus
M
M
6Real-time Storage Allocation
- Fast is good
- Predictable is required
- Goal
- Develop hardware allocation system that provides
- Performance
- Robustness
- Portability
- Quick development time
7Related Work
- Simple hardware allocator Puttkamer, 75
- Customized allocators Grunwold Zorn, 92
- Hardware allocator to eliminate fragmentation
Chang Gehringer, 96, Cam et al., 99 - Real-time allocator for System-on-Chip Shalan
Mooney, 00
8Common Storage Allocation
- Sequential Fits
- Linked list of free blocks
- Search for desired fit
- Allocation worst-case?
- O(n) for n blocks in the list
Return
Allocation Times for List Allocator (ns)
9Manual Storage Allocation
- Application specific
- Example suppose application allocates only
blocks of size 135, 65, 23, and 5 - Have n free-list allocators, one for each size
- Allocation worst-case?
- O(1)
10Manual Storage Allocation
- Application specific
- Not general purpose
- A priori knowledge of allocated blocks
- Makes code ugly
- Sacrifices portability
- Can require extra storage
- Number of allocators times max-live
11Ideal Allocator
- General Purpose
- Ratio of worst-case and average-case close to 1
- Find a block in constant time
- Overall speed is as fast as possible
- Minimizes memory overhead
12Knuths Buddy System
- Free-lists segregated by size
- All requests input to the system are rounded up
to a power of 2
13Knuths Buddy System
- Allocation example
- System begins with one large block, 256
- Allocate a block of size 16
- Three operations
- Find
- Block
- Return
256
128
64
32
16
8
4
2
1
14Knuths Buddy System
- Allocation example
- Allocate 16
- Find
- O( log(M) )
- Block
- Recursively subdivide
- O( log(M) )
- Return
- O(1)
256
128
64
32
16
8
4
2
1
15General Architecture
16Finding a Block
Allocate a block of 16
Software approach
Our hardware trick
256
256
128
128
64
64
32
32
16
16
8
8
4
4
2
2
1
1
Find O( log(M) )
Find O( 1 ) in practice
17Return
Allocate a block of 16
Software approach
Our hardware trick
64
64
32
32
16
16
new Foo()
new Foo()
App
App
Allocator
Allocator
F
R
B
F
B
R
18Results and Analysis
- Buddy algorithm was also implemented in to the
Java Virtual Machine - Standard JVM allocator sequential-fits
- Compare algorithms in software environment
- Two versions of Buddy System
- Reference version
- Optimized version with Fast Find and Return
- SpecMark 98 benchmark suite
19Overview of Java Benchmarks
Allocation Time as a percentage of Execution Time
20Buddy vs. Sequential-fits
21Buddy vs. Sequential-fits
22Buddy vs. Sequential-fits
- On average, software buddy was slower than
sequential-fits implementation - Benchmarks do not exhibit worst-case behavior for
the sequential-fits allocator - But devil program is 91x worse
- Can we improve using hardware?
23Non-optimized Buddy System
24Non-optimized Buddy System
25Non-optimized Buddy System
26Non-optimized Buddy System
- Improves on mean allocation of software buddy
- Still behind JVM software allocator
- Worst-case allocation time significantly improved
- 6x better than JVM, 7x better than Buddy
- Much smaller range of allocation times (better
bounds) - 7x smaller
27Optimization Fast Find
28Optimization Fast Find
29Optimization Fast Find
30Optimization Fast Return
- Could Fast Return affect performance?
- In half of the benchmarks, minimum inter-arrival
time of allocation requests is greater than Block
time of Buddy System - Block time could complete in parallel
inter-arrival time
Application
new Foo()
new Foo()
new Foo()
Allocator
F
R
B
31Optimization Fast Return
32Optimization Fast Return
- Effectiveness depends on behavior of applications
- In other half of benchmarks, some allocations
came too quickly - How often?
33Optimizations
- Fast Find
- Suffer a small (8 ns) increase in mean
allocation times - Significantly improve worst-case times (600 ns)
- Fast Return
- Applicability is dependent on application
- Very few (4 out of 997,059 allocations) were too
quick - Can offer significant improvement
- 2-3 orders of magnitude improvement in worst-case
34Conclusions
- Lets revisit ideal allocator
- Used without modification by any application
- Does not require unreasonable amount of storage
- Difference between worst-case and average-case
performance is small - Find a block in constant time
- Overall speed is as fast as possible
35Future Work
- Improve Block time similar to improvements for
Find and Return - Algorithms for defragmentation of the heap with
Buddy System - Use allocator as a building block
- Garbage Collection
- Goal of IRAM (intelligent storage).
36Contributions
- Real-time hardware buddy allocator
- Donahue, Hampton, Cytron, Franklin, and Kavi.
Hardware Support for Fast and Bounded Time
Storage Allocation, Workshop on Memory Processor
Interfaces, May 2002 - Donahue, Hampton, Deters, Nye, Cytron, and Kavi.
Storage Allocation for Real-Time, Embedded
Systems, First International Workshop on Embedded
Software, May 2001 - IRAM environment
- Modularized allocator to be sub-component for
other systems - Testing and simulation environment
37Acknowledgements
- Dr. Ron K. Cytron
- Committee
- Dr. Roger Chamberlain
- Dr. Mark Franklin
- JVM Specialists
- Dante Cannarozzi
- Morgan Deters
- Matthew Hampton
- DOC Group
38Thanks!
Steve Donahue sd1_at_cse.wustl.edu
www.cs.wustl.edu/doc Department of Computer
Science Washington University Box 1045 St.
Louis, MO 63130 USA