Title: NOBLE: A Non-Blocking Inter-Process Communication Library
1NOBLE A Non-Blocking Inter-Process Communication
Library
- Håkan Sundell
- Philippas Tsigas
- Computing Science
- Chalmers University of Technology
2Systems
- Multi-processor systems cache-coherent shared
memory - UMA
- NUMA
- Desktop computers
3Synchronization
- A significant part of the work performed by
todays parallel applications is spent on
synchronization - Mutual exclusion (Locks)
- Blocking
- Convoy effects
- Deadlocks
4Convoy effects
- The slowdown of one process may cause the whole
system to slowdown
5Research
- Non-blocking synchronization has been researched
since the 70s - Lock-free
- Wait-free
- Non-blocking are based on usage of
- atomic synchronization primitives
- shared memory
6Non-blocking Synchronization
- Lock-Free Synchronization
- Retries until not interfered by other operations
- Usually detecting interference by using some kind
of shared variable indicating busy-state or
similar. - Guarantees live-ness but not starvation-free.
Change flag to unique value, or remember current
state ... do the operation while preserving the
active structure ... Check for same value or
state and then validate changes, otherwise retry
7Non-blocking Synchronization
- Wait-free synchronization
- All concurrent operations can proceed
independently of the others. - Every process always finishes the protocol in a
bounded number of steps, regardless of
interleaving - No starvation
8Practice
- Non-blocking synchronization is still not used in
many practical applications - Non-blocking solutions are often
- complex
- having non-standard or un-clear interfaces
- non-practical
- Many results show that non-blocking improves the
performance of parallel applications
significantly
?
?
9Non-blocking Synchronization Practice
- P. Tsigas, Y. Zhang Evaluating the Performance
of Non-Blocking Synchronization on Modern Shared
Memory Multiprocessors, ACM Sigmetrics 2001
10NOBLE Brings Non-blocking closer to Practice
- Schedule
- Goals
- Design
- Examples
- Experiments
- Status
- Conclusions and Future work
11Goals
- Create a non-blocking inter-process communication
interface that have these properties - Attractive functionality
- Programmer friendly
- Easy to adapt existing solutions
- Efficient
- Portable
- Adaptable for different programming languages
12Design Attractive functionality
- Data structures for multi-threaded usage
- Queues.
- Stacks.
- Singly linked lists.
- Snapshots.
- Data structures for multi-process usage
- Shared Register.
- Clear specifications
enqueue and dequeue
push and pop
first, next, insert, delete and read
update and scan
read and write
13Design Programmer friendly
- Hide the complexity as much as possible!
- Just one include file
- Simple naming convention Every function is
beginning with the NBL characters
include ltNoble.hgt
NBLQueueEnqueue()NBLQueueDequeue()
14Design Easy to adapt solutions
- Support lock-based as well as non-blocking
solutions. - Several different create functions
- Unified functions for the operations, independent
of the synchronization method
NBLQueue NBLQueueCreateLF() NBLQueue
NBLQueueCreateLB()
NBLQueueFree(handle)NBLQueueEnqueue(handle,item)
NBLQueueDequeue(handle)
15Design Efficient
- To minimize overhead, usage of function pointers
- In-line redirection
typedef struct NBLQueue void data void
(free)(void data) void (enqueue)(void
data,void item) void (dequeue)(void
data) NBLQueue
define NBLQueueFree(handle) (handle-gtfree(handle-
gtdata))define NBLQueueEnqueue(handle,item)
(handle-gt enqueue(handle-gtdata,item))define
NBLQueueDequeue(handle) (handle-gtdequeue(handle-gt
data))
16Design Portable
Exported definitions
Identical on all platforms
Platform in-dependent
. . .
Platform dependent
SunHardware.asm
IntelHardware.asm
. . .
CAS, TAS, Spin-Locks
CAS, TAS, Spin-Locks ...
17Design Adaptable for different programming
languages
- Implemented in C, all compiled into a library
file. - C compatible include files and easy to make C
wrappers
class NOBLEQueue private NBLQueue
queuepublic NOBLEQueue(int type)
if(typeNBL_LOCKFREE)
queueNBLQueueCreateLF() else
NOBLEQueue() NBLQueueFree(queue) inline
void Enqueue(void item)
NBLQueueEnqueue(queue,item) ...
18Examples
- First create a global variable handling the
shared data object, for example a stack
- Create the stack with the appropriate
implementation
- When the data structure is not in use anymore
- When some thread wants to do some operation
19Examples
- To change the synchronization mechanism, only one
line of code has to be changed!
20Experiment
- Set of 50000 random operations performed
multithreaded on each data structure, with either
low or high contention - Comparing the different synchronization
mechanisms and implementations available - Varying number of threads from 1 30
- Performed on multiprocessors
- Sun Enterprise 10000 with 64 CPUs, Solaris
- Compaq PC with 2 CPUs, Win32
21Experiments Linked List
- Lock-Free nr.1 J. Valois Lock-Free Data
Structures Ph.D-thesis 1995. - Lock-Free nr.2 - T. Harris A Pragmatic
Implementation of Non-Blocking Linked Lists.
2001 Symposium on Distributed Computing. - Lock-Based Spin-locks (Test-And-Set).
22Experiments Linked List (high)
23Experiments Linked List (low)
24Experiments Linked List (high) - Threads
25Experiments Queues
- Lock-Free nr.1 J. Valois Lock-Free Data
Structures Ph.D-thesis 1995. - Lock-Free nr.2 - P. Tsigas, Y. Zhang A Simple,
Fast and Scalable Non-Blocking Concurrent FIFO
queue for Shared Memory Multiprocessor Systems,
ACM SPAA01, 2001. - Lock-Based Spin-locks (Test-And-Set).
26Experiments Queues (high)
27Experiments Queues (low)
28Experiments Queues (high) - Threads
29Status
- Multiprocessor support
- Sun Solaris (Sparc)
- Win32 (Intel x86)
- SGI (Mips) Testing phase
- Linux (Intel x86) Testing phase
- Extensive Manual
- Web site up and running, http//www.cs.chalmers.se
/noble
30Conclusions and Future work
- NOBLE Easy to use, efficient and portable
- Non-blocking protocols always performs better
than or similar to lock-based, especially on
multi-processor systems. - To do
- Use in real parallel applications
- Extend with more shared data object
implementations - Extend to other platforms, especially suitable
for real-time systems