Title: Applications of NonBlocking Data Structures to RealTime Systems
1Applications of Non-Blocking Data Structures to
Real-Time Systems
- Seminar for the degree of Licentiate of
Philosophy - Håkan Sundell
- Computing Science
- Chalmers University of Technology
2Background
- ARTES project Applications of wait/lock-free
protocols to real-time systems - Started in March 1999.
- One active Ph.D.-student.
- Project leader Philippas Tsigas
3Schedule
- Introduction
- Real-Time Systems
- Synchronization
- Shared Data Objects Snapshots
- Evaluation
- The Effect of Using Timing Information
- Snapshot
- Shared Register
- Software engineering part
- Conclusions Future Work
4Real-Time Systems
- Uni- or Multi-processor system
- Interconnection Network
- e.g. The Controller Area Network (CAN).
CPU
CPU
CPU
CPU
5Real-Time Systems
CPU
CPU
CPU
. . .
Cache
Cache
Cache
Memory
- Uniform Memory Access (UMA)
...
...
...
CPU
CPU
CPU
CPU
CPU
CPU
. . .
Cache bus
Cache bus
Cache bus
Memory
Memory
Memory
- Non-Uniform Memory Access (NUMA)
6Real-Time Systems
- Cooperating Tasks
- Timing Constraints
- Inter-task Communication Shared Data Objects
- Needs Synchronization
T2
? ? ?? ? ?
T1
T3
7Schedule
- Introduction
- Real-Time Systems
- Synchronization
- Shared Data Objects Snapshots
- Evaluation
- The Effect of Using Timing Information
- Snapshot
- Shared Register
- Software engineering part
- Conclusions Future Work
8Synchronization
- Synchronization using Locks
- Uses semaphores, spinning, disabling interrupts
- Negative
- Blocking
- Priority inversion
- Risk of deadlock
- Positive
- Execution time guarantees easy to do, but
pessimistic
Take lock ... do operation ... Release lock
9Non-blocking Synchronization
- Lock-Free Synchronization
- Retries until not interfered by other operations
- Usually detecting interference by using some kind
of shared variable indicating busy-state or
similar.
Change flag to unique value, or remember current
state ... do the operation while preserving the
active structure ... Check for same value or
state and then validate changes, otherwise retry
10Non-blocking Synchronization
- Lock-Free Synchronization
- Negative
- No execution time guarantees, can continue
forever - thus can cause starvation - Positive
- Avoids blocking and priority inversion
- Avoids deadlock
- Fast execution on average
11Non-blocking Synchronization
- Non-blocking Synchronization
- Uses atomic synchronization primitives
- Uses shared memory
- Wait-Free Synchronization
- Always finish in a finite number of its own steps
- Negative
- Complex algorithms
- Memory consuming
TestSet Compare Swap Copying Helping Announcing
Split operation ???
12Non-blocking Synchronization
- Wait-Free Synchronization
- Positive
- Execution time guarantees
- Fast execution
- Avoids blocking and priority inversion
- Avoids deadlock
- Avoids starvation
- Same implementation on both single- and
multiprocessor systems
13Schedule
- Introduction
- Real-Time Systems
- Synchronization
- Shared Data Objects Snapshots
- Evaluation
- The Effect of Using Timing Information
- Snapshot
- Shared Register
- Software engineering part
- Conclusions Future Work
14Shared Data Objects
- Correctness criteria for concurrent operations
linearizability - All concurrent executions can be transformed into
an equivalent serial sequence of atomic
operations preserving the partial order
15Snapshot
- Snapshot
- A consistent momentous state of a set of several
shared variables that are logically related - One reader (scanner)
- Reads the whole set of variables in one atomic
step - Many writers (updaters)
- Writes to only one variable each time
16Snapshot Correctness
- Atomicity / Linearizability criteria
Read
YES
ci
t
Write
Write
Read
YES
ci
t
Write
Write
Read
NO
ci
t
Write
Write
returned by scanner
17Snapshot Correctness
- Atomicity / Linearizability criteria
Read
NO
ci
t
Write
Write
ci
Write
Write
NO
Write
cj
t
returned by scanner
18Schedule
- Introduction
- Real-Time Systems
- Synchronization
- Shared Data Objects Snapshots
- Evaluation
- The Effect of Using Timing Information
- Snapshot
- Register
- Software engineering part
- Conclusions Future Work
19What are we evaluating
- Wait-free snapshot algorithm by Ermedahl et. al
- 3 register copies for each component
- Uses the TestSet atomic primitive for
synchronization
Used by reader
Used by writer
20Analysis
- Real-Time System Measured schedulability
- Created realistic scenarios on a theoretic
68020 uni-processor system - Real RTOS parameters
- Manual WCET-analysis on cycle level
- 1 scanner (5 components), 24 updaters (10
real-time tasks, 15 interrupts) - Fixed priority response time analysis
- Schedulable without any synchronization
- Adding lock/wait-free or semaphore synchronization
21Analysis Schedulability ()
22Experiments
- Simulation
- RT-simulator written in Erlang by Ermedahl and
Sjödin. - Fixed priority preemptive scheduler
- Semaphores
- Messages
- Subset of scenarios used in analysis
23Experiments Schedulability ()
24Experiments
- Multi-node Simulation of CAN-bus 1 MHz
- 10 nodes connected using messages
- Local snapshots on each node
- 1 super-snapshot task on 1 node
- Subset of scenarios used for single-node analysis
25Experiments Rsnap for multi-node
26Schedule
- Introduction
- Real-Time Systems
- Synchronization
- Shared Data Objects Snapshots
- Evaluation
- The Effect of Using Timing Information
- Snapshot
- Register
- Software engineering part
- Conclusions Future Work
27Timing Information
- Previously used by Chen and Burns in 1999.
- Assuming system with periodic fixed-priority
scheduling - Notations from Standard Real-Time Response Time
Analysis - Use information about
- Periods , T
- Worst-case Computation time , C
- Worst-case Response times , R
28Schedule
- Introduction
- Real-Time Systems
- Synchronization
- Shared Data Objects Snapshots
- Evaluation
- The Effect of Using Timing Information
- Snapshot
- Register
- Software engineering part
- Conclusions Future Work
29Snapshot
- Back to Basics Unbounded Memory Protocol
- The reader increases global index and scans
backwards.
? previous values / nil w writer position
Snapshotindex
. . .
v
?
?
?
?
w
nil
nil
c1
. . .
ci
v
?
?
?
?
w
nil
nil
. . .
cc
v
?
?
?
?
w
nil
nil
t
30Snapshot
- Bounded Memory Cyclical Buffers
- Needed buffer length is dependent on how fast the
updaters is compared to the scanner - Each component can have different buffer lengths
31Timing Information
- Bounding
- Needed buffer length for component k
- Can be refined even further
where Ts is the period for the snapshot task Tw
is the period for the writer tasks
32Experiments
- Using a Sun Enterprise 10000 multiprocessor
computer - 1 scanner task and 10 updater tasks, one on each
CPU - Comparing two wait-free snapshot algorithms
- Using timing information
- Using Test-and-Set synchronization
33Experiments
- Scenarios with different ratios between
scanner/updater - Measuring response time for scan versus update
operations
34Experiments
- Scan operation - Average Response Time
35Experiments
- Update operation Average Response Time
36Schedule
- Introduction
- Real-Time Systems
- Synchronization
- Shared Data Objects Snapshots
- Evaluation
- The Effect of Using Timing Information
- Snapshot
- Shared Register
- Software engineering part
- Conclusions Future Work
37Shared Register
- Target domain Shared Memory (Even no cache
coherency) - Wait-Free Atomic Shared Buffer by Vitanyi et. al
- A Matrix of 1-reader 1-writer registers
- Each register contains a value/tag pair encoded
as one value
Readers
R11
R12
...
R21
R22
Rij
- written by processor i
- read by processor j
...
...
...
tag value
Writers
38Shared Register
- Algorithm
- Readers scans its column for highest tag and
returns the corresponding value - Writers scan its column and writes the next tag
together with the new value to its row - Unbounded maximum size for the tag field in the
value/tag pair - Assume 8 writer tasks with 10 ms period
- Maximum tag after one hour is 2880000 which needs
22 bits!
39Timing Information
- Analyzing the maximum difference between tags
possible observable by a task at two consecutive
invocations of the algorithm - In any possible execution
- Tmax is the longest period
- Rmax is the longest response time
- Twr is the period of the writer tasks
- Recycling tags
- Newer tags can restart from zero when we reach a
certain tag value - In order to be able to decide if newer tags are
newer we need to have
v3
v4
v1
v2
v3
v4
0
N
40Examples
- Example Task Scenario on 8 processors
-
- Unbounded algorithm would have reached tag 68400
in one hour , needing gt16 bits
41Schedule
- Introduction
- Real-Time Systems
- Synchronization
- Shared Data Objects Snapshots
- Evaluation
- The Effect of Using Timing Information
- Snapshot
- Register
- Software engineering part
- Conclusions Future Work
42Background
- Multithreaded programming needs communication.
- Communicating using shared data structures like
stacks, queues, lists and so on. - This needs synchronization!
- Locks (Mutual exclusion) has several drawbacks,
especially for Real-Time Systems. - Non-blocking solutions are often complex to
implement and have non-standard interfaces.
43NOBLE A Non-Blocking Inter-Process Communication
Library
- Designed with the following properties
- Functionality Stacks, Queues, Lists, Snapshot,
Register with clear specifications - Programmer friendly - include ltnoble.hgt ,
NBLltfunctiongt - Easy to adapt existing solutions Provides locks
as well as non-blocking synchronization
44NOBLE A Non-Blocking Inter-Process Communication
Library
- Designed with the following properties (cont.)
- Efficient Object oriented design virtual
functions and inheritance with base classes in C - Portable Modular design, platform-dependent
code separated - Adaptable for different programming languages
C, C, Standard dynamic linked library
45Examples
- include ltnoble.hgt
- First create a global variable handling the
shared data object, for example a stackNBLStack
stackstackNBLCreateStackLF(10000) - When some thread wants to do some
operationNBLStackPush(stack, item)oritemNBLS
tackPop(stack)
46Examples
- When the data structure is not in use
anymoreNBLStackFree(stack) - To change the synchronization mechanism, only one
line of code has to be changed!stackNBLStackCrea
teLF(10000)replaced withstackNBLStackCreateLB(
)
47Experiment
- Set of 50000 random operations performed
multithreaded on each data structure, with either
low or high contention. - Comparing the different synchronization
mechanisms and implementations available. - Varying number of threads from 1 30.
- Performed on multiprocessors
- Sun Enterprise 10000 with 64 CPUs, Solaris
- Compaq PC with 2 CPUs, Win32
48Experiments Linked List (high)
49Status
- Multiprocessor support
- Sun Solaris (Sparc)
- Win32 (Intel x86)
- SGI (Mips) Evaluation stage
- Linux (Intel x86) Evaluation stage
- Extensive Manual
- Web site up and running, http//www.cs.chalmers.se
/noble
50Schedule
- Introduction
- Real-Time Systems
- Synchronization
- Shared Data Objects Snapshots
- Evaluation
- The Effect of Using Timing Information
- Snapshot
- Register
- Software engineering part
- Conclusions Future Work
51Conclusions
- Contributions
- Evaluations of snapshot
- Non-blocking performs better than lock-based in
all cases. Lock-free performs best on
uni-processor systems. - The effect of using Timing Information
- Snapshot and Shared Register
- Algorithms can be simplified and increase the
performance significantly. - Efficient recycling of time-stamps is possible
52Conclusions
- Contributions (cont.)
- A library of non-blocking protocols
- Easy to use, efficient and portable
- Non-blocking protocols always performs better
than lock-based, especially on multi-processor
systems. - Concluding judgment
- Non-blocking protocols are highly applicable to
real-time systems. Lock-free protocols seems very
promising and will be applicable to real-time
systems with applied analysis
53Future work
- NOBLE
- Adapt to commercial RTOS (Enea OSE).
- Extend to embedded systems
- Simpler uni- and multi-processor systems
including 8-bit processors with/without or
different support for atomic synchronization
primitives. - Timing Information
- Create lock-free translations to fulfill
real-time systems properties - General time-stamp recycling scheme
- More non-blocking protocols