Title: WaitFree Reference Counting and Memory Management
1Wait-Free Reference Counting and Memory Management
2Outline
- Shared Memory
- Synchronization Methods
- Memory Management
- Garbage Collection
- Reference Counting
- Memory Allocation
- Performance
- Conclusions
3Shared Memory
CPU
CPU
CPU
. . .
Cache
Cache
Cache
Memory
- Uniform Memory Access (UMA)
...
...
...
CPU
CPU
CPU
CPU
CPU
CPU
. . .
Cache bus
Cache bus
Cache bus
Memory
Memory
Memory
- Non-Uniform Memory Access (NUMA)
4Synchronization
- Shared data structures needs synchronization!
- Accesses and updates must be coordinated to
establish consistency.
P1
P2
P3
5Hardware Synchronization Primitives
- Weak
- Atomic Read/Write
- Stronger
- Atomic Test-And-Set (TAS), Fetch-And-Add (FAA),
Swap - Universal
- Atomic Compare-And-Swap (CAS)
- Atomic Load-Linked/Store-Conditionally
Read
Read
Write
Mf(M,)
6Mutual Exclusion
- Access to shared data will be atomic because of
lock - Reduced Parallelism by definition
- Blocking, Danger of priority inversion and
deadlocks. - Solutions exists, but with high overhead,
especially for multi-processor systems
P1
P2
P3
7Non-blocking Synchronization
- Perform operation/changes using atomic primitives
- Lock-Free Synchronization
- Optimistic approach
- Retries until succeeding
- Wait-Free Synchronization
- Always finishes in a finite number of its own
steps - Coordination with all participants
8Memory Management
- Dynamic data structures need dynamic memory
management - Concurrent D.S. need concurrent M.M.!
9Concurrent Memory Management
- Concurrent Memory Allocation
- i.e. malloc/free functionality
- Concurrent Garbage Collection
- Questions (among many)
- When to re-use memory?
- How to de-reference pointers safely?
P2
P1
P3
10Lock-Free Memory Management
- Memory Allocation
- Valois 1995, fixed block-size, fixed purpose
- Michael 2004, Gidenstam et al. 2004, any size,
any purpose - Garbage Collection
- Valois 1995, Detlefs et al. 2001 reference
counting - Michael 2002, Herlihy et al. 2002 hazard
pointers
11Wait-Free Memory Management
- Hesselink and Groote, Wait-free concurrent
memory management by create and read until
deletion (CaRuD), Dist. Comp. 2001 - limited to the problem of shared static terms
- New Wait-Free Algorithm
- Memory Allocation fixed block-size, fixed
purpose - Garbage Collection reference counting
12Wait-Free Reference Counting
- De-referencing links
- 1. Read the link contents, i.e. a pointer.
- 2. Increment (FAA) the reference count on the
corresponding object. - What if the link is changed between step 1 and 2?
- Wait-Free solution
- The de-referencing operation should announce the
link before reading. - The operations that changes that link should help
the de-referencing operation.
13Wait-Free Reference Counting
- Announcing
- Writes the link adress to a (per thread and per
new de-ref) shared variable. - Atomically removes the announcement and retrieves
possible answer (from helping) by Swap with null.
- Helping
- If announcement matches changed link, atomically
answer with a proper pointer using CAS.
14Wait-Free Memory Allocation
- Solution (lock-free), IBM freelists
- Create a linked-list of the free nodes,
allocate/reclaim using CAS - How to guarantee that the CAS of a alloc/free
operation eventually succeeds?
Allocate
Head
Mem 1
Mem 2
Mem i
Reclaim
Used 1
15Wait-Free Memory Allocation
- Wait-Free Solution
- Create 2N freelists.
- Alloc operations concurrently try to allocate
from the current (globally agreed on) freelist. - When current freelist is empty, the current is
changed in round-robin manner. - Free operation of thread i only works on freelist
i or Ni. - Alloc operations announce their interest.
- All free and alloc operations try to help
announced alloc operations in round-robin.
16Wait-Free Memory Allocation
CAS!
SWAP!
X
X
Announcement variables
Null
Null
X
Null
Null
X
id
- Announcing
- A value of null in the per thread shared variable
indicates interest. - Alloc atomically announces and recieves possible
answer by using Swap.
- Helping
- Globally agreed on which thread to help,
incremented when agreed in round-robin. - Free atomically answers the selected thread of
interest with a free node using CAS. - First time that Alloc succeeds with getting a
node from the current freelist, it tries to
atomically answer the selected thread of interest
with the node using CAS.
17Performance
- Worst-case
- Need analysis of maximum execution path and apply
known WCET techniques. - e.g. 2N2 maximum CAS retries for alloc.
- Average and Overhead
- Experiments in the scope of dynamic data
structures (e.g. lock-free skip list) - H. Sundell and P. Tsigas, Fast and Lock-Free
Concurrent Priority Queues for Multi-thread
Systems, IPDPS 2003 - Performed on NUMA (SGI Origin 2000) architecture,
full concurrency.
18Average Performance
19Conclusions
- New algorithms for concurrent dynamic Memory
Management - Wait-Free Linearizable.
- Reference counting.
- Fixed-size memory allocation.
- To the best of knowledge, the first wait-free
memory management scheme that supports
implementing arbitrary dynamic concurrent data
structures. - Will be available as part of NOBLE software
library, http//www.noble-library.org - Future work
- Implement new wait-free dynamic data structures.
- Provide upper bounds of memory usage.
20Questions?
- Contact Information
- Address Håkan Sundell Computing
Science Chalmers University of Technology - Email phs_at_cs.chalmers.se
- Web http//www.cs.chalmers.se/phs