Title: Hashtables for Realtime and Embedded Systems
1Hashtables for Real-time and Embedded Systems
Anand KrishnanApril 17, 2003Advisor Dr. Ron K.
Cytron Guest Advisor Dr. Douglas Niehaus
Center for Distributed Object Computing Department
of Computer Science Washington University
Sponsored by DARPA under contract F33615-00-C-1697
2Overview
- Motivation
- Hashtable Organization
- Real-time Hashtable model
- Behavioral Analysis
- Experiments and results
- Conclusions and Future Work
3Real-time and Embedded Systems(RTES)
- Real-time Systems
- Timing and predictability
- Embedded Systems
- Typically part of larger system with real-time
requirements - Space Constraints
Multimedia
Avionics
Application
Libraries
Programming Language
Operating System
Hardware
4Collection Objects
- Abstract data type to store information
- lists, sets, trees, hashtables etc
- Part of the library of a language specification
- Java Collection Library
- The Standard Template Library (STL)
- Building blocks of software modules and
applications - Emphasis on average case performance
- Necessity to migrate towards real-time
- Focus of work on Hashtables
- popular
- excellent average case performance
- interesting case for real-time systems
5Overview
- Motivation
- Hashtable Organization
- Real-time Hashtable model
- Behavioral Analysis
- Experiments and results
- Conclusions and Future Work
6Hashtable Organization
- HASH There is no definition for this word,
nobody - knows what hash is.
- - Devils Dictionary by Ambrose Bierce.
- Hashtable Provides an implementation of
dictionary interface - Insertion, access and removal of entries
- Each entry (key,value) pair
- Operations on a given hashtable, HT
- GET (key)
- Returns value (key, value) ? HT
- PUT (key, value)
- HT ? (HT (k, v) k key) U (key,value)
- REMOVE (key)
- HT ? (HT (k, v) k key)
7Hashtable Organization
- Hash Function
- h Keys ? Zn
- Zn is the set of integers modulo n
- n is the number of slots in the hashtable
- h(key) hashes into index i, if h(key) i, ?i in
Zn - Collision
- For two distinct keys, x and y, h(x) h(y)
- Collision Resolution
- Open Addressing
- Successive probing for empty slots
- Concern Search time for an entry (element) can
be O (n) - Chaining
- Colliding entries are placed on a linked list
- Concern Length of linked list or bucket
8Hashtable Organization
- Load (factor) Ratio of Number of entries to
number of buckets - This reflects average performanceusually the
measure of interest, but not for us. - Rehashing
- grow hashtable by increasing the number of slots
- Typically, if current load exceeds a threshold
load factor - As part of a hashtable operation
- New hash function reassigns elements over new
space - Desirable property of a hash function
- to delay rehash by even distribution of elements
9Overview
- Motivation
- Hashtable Organization
- Real-time Hashtable model
- Behavioral Analysis
- Experiments and results
- Conclusions and Future Work
10RTES Concerns for Hashtables
- Real-time issues
- Task provisioning
- Unbounded hashtable operation on rehash
- Over-provisioning (worst-case operation time)
- Under-provisioning (average-case operation time)
- Rehash triggered by average length of bucket
- worst case?
- Embedded system constraints
- Hashtable expansion
- allocate new table
- rehash extant elements into new table
- deallocate old table
- Problems
- Storage Blip
- Holes in runtime storage heap, leading to
excessive defragmentation
11Amortized Rehashing
- When to Rehash?
- Trigger based on length of any given bucket
- Rehash Triggering Length (RTL)
- Space concerns
- maintain two hash functions, H and H
- H maps to space 1..B, H' maps to space 1..B'
- allocate new buckets on demand
- when H' hashes an element to some bucket in range
B' - B, B') - Issue of Garbage Collection?
- Amortization during rehashing
- rehash (clean) extant elements from the B buckets
incrementally over each hashtable operation - Cleaning Remapping all elements in a bucket
using H'
12Amortized Rehashing
Incremental rehash from old to new
RTL 5 B 6 B' 8
Key k
1
2
H'
H
3
4
New bucket
5
Old bucket
6
7
8
13Amortized Rehashing
Clean the old bucket
RTL 5 B 6 B' 8
Key k
1
2
H'
H
3
4
5
6
7
8
14Amortized Rehashing
Clean the old bucket
RTL 5 B 6 B' 8
Key k
1
2
H'
H
3
4
5
6
7
8
15Amortized Rehashing
Clean the new bucket
RTL 5 B 6 B' 8
Key k
1
2
H'
H
3
4
5
6
7
8
16Amortized Rehashing
Clean the new bucket
RTL 5 B 6 B' 8
Key k
1
2
H'
H
3
4
5
6
7
8
17Amortized Rehashing
Clean one gratuitous bucket
RTL 5 B 6 B' 8
Key k
1
2
H'
H
3
4
5
6
7
8
18Amortized Rehashing
Clean one gratuitous bucket
RTL 5 B 6 B' 8
Key k
1
2
H'
H
3
4
5
6
7
8
19Amortized Rehashing
Perform hash operation at new bucket
RTL 5 B 6 B' 8
Key k
1
2
H'
H
3
4
5
6
7
8
20Amortized Rehashing
- Duties of a Hashtable operation
- Clean old bucket H(k) if necessary
- Clean or allocate new bucket H'(k) if necessary
- Perform Hashtable operation on new bucket H'(k)
- Perform incremental clean by cleaning gratuitous
bucket - Hashtable Modes
- Stable
- Rehash
- Issues resolved
- Rehash triggered based on worst case length of
bucket - Rehash distributed over multiple hashtable
operations - Storage blip avoided
- Worst case operation time a Length of longest
bucket
21Overview
- Motivation
- Hashtable Organization
- Real-time Hashtable model
- Behavioral Analysis
- Stable and Rehash mode analysis
- Incremental Cleaning Mechanisms
- Optimization of hashtable operation duties
- Experiments and results
- Conclusions and Future Work
22Analysis
Ridiculously Uniform Hash Assumption (RUHA)
Worst-Case Average-Case
Hashtable With B Buckets
23Analysis
Ridiculously Uniform Hash Assumption (RUHA)
- Bucket Increment
- Rehash successful if
- Bucket length bound
- At any instant a buckets length is strictly
bounded by 2 x RTL
Citation Friedman, Leidenfrost, Brodie, and
Cytron. Hashtables for embedded and real-time
systems. In Proceedings of IEEE Workshop on
Real-Time Embedded Systems, 2001.
Citation Friedman, Krishnan, Leidenfrost,
Brodie, Cytron, and Niehaus, Hashtables for
embedded and real-time systems. Technical
Report WUCS-03-15
24Analysis
Hash Table With B Buckets
25Stable mode model (analytical)
26Bucket Length Bound
- Bucket Length Bound
- Denoted by K
- Typically, K ? (2 x RTL - 1)
- RTL 5
- Various bounds (K)
- Peaks of Stable mode curve
27Rehash mode model
28Rehash mode model (analytical)
- RTL 5
- K 10
- Bounded by
- Stable mode curve
29Incremental Cleaning Mechanisms
- Need based Incremental Clean
- Cleaning Rate number of gratuitous
- Guaranteed Cleaning Rate
- List of unclean buckets
- Greedy Cleaning
- choose buckets based on increasing or decreasing
order of lengths - Prioritized Cleaning
- Schedule buckets observed but not visited if
longer than bound.
30Optimization of operation duties
- Problem Inefficiencies in performing hashtable
operation - Examine elements that are already clean
- What if element is found in old bucket ?
- Solution Maintain 2 lists for each bucket
- In sync list and Out of sync list
- Cost extra space
31Using Bucket with 2 lists
- Cleaning only examines In sync list of old
bucket - Enables cleaning and operation to be performed
simultaneously - Reduces the average time for an operation
- Useful for Systems with low or no cache
- Our implementation did not show a difference in
timings.
32Overview
- Motivation
- Hashtable Organization
- Real-time Hashtable model
- Behavioral Analysis
- Experiments and results
- Conclusions and Future Work
33Real-time Readiness Ratio
- Worst-case to Average-case operation time
Reasonably Bounded
RT Hash behavior under Solaris
34Experimental setup
- Keys Strings from dictionary, used by Unix
spell - Hash function
- Javas hashCode()
- Secondary hash function
- Involves Insertion of elements into the hashtable
- Metrics of interest
- Measured fraction of buckets exceeding the bound
at any instant - Number of operations examining a bucket of length
longer than the bound - Load factor
- Space versus time tradeoff
Knuth hash
int kHash(Object key) double A
0.618033988 int h key.hashCode()
double fractionPart (h A) -
Math.floor(h A) return
(int)(fractionPart B)
35Knuth Hash Function
- Why we see what we see?
- Bucket Bound (RUHA)
- Bucket increment (RUHA)
- Hash function?
36Knuth Hash function Various Bounds
- RTL 5
- K 10
- Minimum Bucket increment
- Cleaning Rate 3
37Knuth Hash function Various Bounds
- RTL 5
- K 14
- Minimum Bucket increment
- Cleaning Rate 3
38Number of Violating Operations
- Real-time system concern
- Violating operation rather than hashtable state
- Operations that examine bucket violating the bound
- RTL 5, Various Bounds
- Minimum Bucket increment
- Cleaning Rate 3 and 1.
- Higher cleaning rate is better
- Quicker rehash
- But more rehashes
- Bad for Average performance
39Compensation Factor and Bucket Increment
N
1 -
Hash Function Uniformity measure, HFU
(RTL 1) x B
B x RTL 1
(1 HFU CB)
x
B'
RTL - 1
40Supplemental Hash Function
- Supplemental hash (Doug Lea)
- java.util.HashMap in Suns Java 1.4.1
- Better distribution of elements
- What about Prioritized Cleaning?
int sHash(Object key) int h
key.hashCode() h (h ltlt 9) h (h
gtgtgt 14) h (h ltlt 4) h (h gtgtgt 10)
return h B
41Violating Operations Varying RTL
- Various RTL, Bound 2 x RTL - 1
- Minimum Bucket increment
- Cleaning Rate 1
- Boundary value, RTL 2
- Supplemental Hash lives up to
- its promise
- Load Factor
- 85 for RTL 10,20
42Overview
- Motivation
- Hashtable Organization
- Real-time Hashtable model
- Behavioral Analysis
- Experiments and results
- Conclusions and Future Work
43Conclusions
- Developed and analyzed a probabilistic model to
characterize bounds on Hashtable operations - Increasing permissible bound
- Compensating additional buckets
- Varying RTL
- Prioritized Cleaning
- Optimized the functionality of operations
- Targeted towards systems with no cache
44Future Work
- Migrate towards ACE implementation
- Explore effect of Hashtable size reduction
- Sequence of operations known
- Incremental work pattern
- Hashtables
- BSRB trees
- Incremental garbage collector
- Explore Lock-free incremental work
45Thanks
- Dr. Ron K. Cytron
- Dr. Doug Niehaus, University of Kansas
- Thesis Committee
- Dr. Chris Gill
- Dr. Chenyang Lu
- All members of the DOC Group
- Scott Friedman, Nick Leidenfrost and Ben Brodie
- Morgan Deters, Sharath Cholleti, Martin
Linenweber - Vignesh Nandakumar and Ravi Pratap
46Questions
47Technical Approach Bounded Resources
Solution use two-dimensional hash
48Prioritized Cleaning K 9
49Supplemental Hash Function
50Supplemental Hash Prioritized Cleaning