Title: Concurrent Programming Without Locks
1Concurrent Programming Without Locks
- Based on Fraser Harris paper
2 Motivation Whats wrong with mutual exclusion
locks?
- Hard to design scalable locking strategies, and
therefore need special programming care. - Certain interactions between locks cause errors
such as deadlock, livelock and priority inversion
(high priority thread is forced to wait to a low
priority thread holding a resource he needs). - For good performance, programmers need to make
sure software will not hold locks for longer
than necessary. - Also for high performance, programmers must
balance the granularity at which locking operates
against the time that the application will spend
acquiring and releasing locks. - Consider a blocked thread responsible for real
time task- causes a lot of damage!
3Solutions without using locks
- Still safe for use in multi threaded multi
processor shared memory machines. - Maintain several important characteristics.
- Reduces responsibilities from the programmer.
- The implementations we will see are non blocking.
4Definitions
- Non- blocking even if any set of threads is
stalled then the remaining threads can still make
progress. - Obstruction free the weakest guarantee. A thread
is only guaranteed to make progress (and finish
his operation in a bounded number of steps) so
long as it doesnt contend with other threads for
access to any location.
5- Lock- freedom adds the requirement that the
system as a whole makes progress, even if there
is contention. Lock-free algorithm can be
developed from obstruction free one by adding
helping mechanism. - helping mechanism- If a thread t2 encounters
thread t1 obstruction it, then t2 helps t1 to
complete t1s operation, and then complete its
own. -
- A thread can decide to wait for the other
thread to complete his operation, or even cause
him to abort it. - an obstruction-free transactional API
requires transactions to eventually - commit successfully if run in isolation, but
allows a set of transactions to livelock aborting
one another if they contend. A lock-free
transactional API requires some transaction to
eventually commit successfully even if there is
contention.
6Overview
- Goals of the design
- Introducing the APIs
- Programming with the APIs
- Design methods
- Practical design implementation of the APIs
(shortly) - Performance of data structures built over the
APIs.
7Goals
- Concreteness- This means we build from atomic
single word read, write and compare-and-swap
(CAS) operations. - Remainder Compare and Swap
- atomically word CAS (word a, word e, word
n) - word x a
- if ( x e ) a n
- return x
-
8- Linearizability basically means that the
function appear to occur atomically (to other
threads) at some point between when it is called
and when it returns. - Non blocking progress- mentioned before.
- Disjoint access parallelism- operations that
access disjoint sets of words in memory should be
able to execute in parallel. - Dynamicity- Our APIs should be able to support
dynamically-sized data structures, such as lists
and trees. - Practicable space cost- Space costs should scale
well with the number of threads and the volume of
data managed using the API. (reserve no more than
2 bits in each word)
9- Composability - If multiple data structures
separately provide operations built with one of
our APIs then these should be composable to form
a single compound operation which still occurs
atomically (and which can itself be composed with
others). - Read parallelism- Implementations of our APIs
should allow shared data that is read on
different CPUs to remain in shared mode in those
CPUs data caches. (reduces overhead) -
10Common to all APIs
- Linearizable
- Non blocking
- Support dynamically sized data structures
11(No Transcript)
12Overview
- Goals of the design
- Introducing the APIs
- Programming with the APIs
- Design methods
- Practical design implementation of the APIs
(shortly) - Performance of data structures built over the
APIs.
13The APIs
- Multi word compare swap (MCAS)
- Word based software transactional memory (WSTM)
- Object based software transactional memory (OSTM)
14MCAS is defined to operate on N distinct memory
locations (ai), expected values (ei), and new
values (ni) each ai is updated to value ni if
and only if each ai contains the expected value
ei before the operation. MCAS returns TRUE if
these updates are made and FALSE otherwise. //
Update locations a0..aN-1 from e0..eN-1
to n0..nN-1 bool MCAS (int N, word a ,
word e , word n ) This action atomically
updates N memory locations.
15Heap accesses to words which may be subject to a
concurrent MCAS must be performed by calling
MCASRead. Reason MCAS implementation places
its own values in these locations while they are
updated. // Read the contents of location
a word MCASRead (word a)
16Advantages
- Effective when need to update a number of memory
locations from one consistent state to another. - Eases the burden of ensuring correct
synchronization of updates.
17Disadvantages
- it is a low level API gt programmers must
remember that the subsequent MCAS is not aware of
locations that have been read. Therefore the
programmer needs to keep lists of such locations
and their values, and to pass them to MCAS to
confirm that the values are consistent. - (unlike transactions that have read check
phase)
- This API will not allow us to reach our goal of
- composability.
- No read parallelism because reading also require
- updating.
18What can be done to improve these disadvantages?
Transactions!
19Reminder- software transactional memory A
concurrency control mechanism for controlling
access to shared memory. It functions as an
alternative to lock based synchronization, and is
implemented in a lock free way. Transaction-
code that executes a series of reads and writes
to shared memory. The operation in which the
changes of a transaction are validated, and if
validation is successful made permanent, is
called commit. A transaction can abort at any
time, causing all of its prior changes to be
undone. If a transaction cannot be committed due
to conflicting changes, it is aborted and
re-executed until it succeeds.
20The APIs related to STM that we will see follow
an optimistic style the core sequential code is
wrapped in a loop which retries to commit the
operations until it gets done. Every thread
completes its modifications to shared memory
without regard for what other threads might be
doing. It is the reader responsibility, after
completing the transaction, to make sure other
trades havent concurrently made changes to
memory that is accessed in the past.
Advantage increased concurrency. No threads need
to wait for an access to a resource, and disjoint
access parallelism is achieved. Disadvantage
overhead in retrying transactions that failed.
However in realistic programs conflicts arise
rarely.
21The APIs
- Multi word compare swap (MCAS)
- Word based software transactional memory (WSTM)
- Object based software transactional memory (OSTM)
22 // Transaction management wstm transaction
WSTMStartTransaction() bool
WSTMCommitTransaction(wstm transaction tx)
void WSTMAbortTransaction(wstm transaction
tx) // Data access word WSTMRead(wstm
transaction tx, word a) void
WSTMWrite(wstm transaction tx, word a, word
d)
- WSTMRead WSTMWrite must be used when accessing
words that may be subject to a concurrent
WSTMCommitTransaction. - The implementation allows a transaction to commit
as long as no thread has committed an update to
one of the locations we accessed. - Transactions succeed or fail atomically.
23What did we improve here?
- All relevant reads and writes are grouped into a
transaction that applied to the heap atomically.
( read check phase during commit operation) - Composability is easy (Why?)
- This implementation doesnt reserve space in each
word- allows acting on full word size data rather
than just on pointer valued fields in which
sparse bits can be reserved.
24The APIs
- Multi word compare swap (MCAS)
- Word based software transactional memory (WSTM)
- Object based software transactional memory (OSTM)
25// Transaction management ostm transaction
OSTMStartTransaction() bool
OSTMCommitTransaction(ostm transaction tx)
void OSTMAbortTransaction(ostm transaction
tx) // Data access t OSTMOpenForReading(ostm
transaction tx, ostm handlelttgt o) t
OSTMOpenForWriting(ostm transaction tx, ostm
handlelttgt o) // Storage management ostm
handleltvoidgt OSTMNew(size t size) void
OSTMFree(ostm handleltvoidgt ptr)
- We add a level of indirection- OSTM objects are
accessed by OSTM handles, rather than accessing
words individually. - Before the data it contains can be accessed, an
OSTM handle must be opened in order to obtain
access to the underlying object. - OSTMOpenForWriting return value refers to a
shadow copy of the underlying object that is, a
private copy on which the thread can work before
attempting to commit its updates. - OSTM implementation may share data between
objects that have been opened for reading between
multiple threads.
26- The OSTM interface leads to a different cost
profile from WSTM OSTM introduces a cost on
opening objects for access and potentially
producing shadow copies to work on, but
subsequent data access is made directly (rather
than through functions like WSTMRead and
WSTMWrite). - Furthermore, it admits a simplified non-blocking
commit operation.
The APIs implementation is responsible for
correctly ensuring that conflicting operations do
not proceed concurrently and for preventing
deadlock and priority inversion between
concurrent operations. The APIs caller remains
responsible for ensuring scalability by making it
unlikely that concurrent operations will need to
modify overlapping sets of locations. However,
this is a performance problem rather than a
correctness or liveness one.
27Some notes
- While locks require thinking about overlapping
operations and demand locking policy, in STM
things are much simpler - each transaction can be viewed in isolation
as a single thread computation. The programmer is
not need to worry about deadlock, for example. -
- However, transactions must always be able to
detect invalidity, and then decide on an action
(help, wait, abort and retry).
28Overview
- Goals of the design
- Introducing the APIs
- Programming with the APIs
- Design methods
- Practical design implementation of the APIs
(shortly) - Performance of data structures built over the
APIs.
29- typedef struct int key struct node next
node - typedef struct node head list
- void list insert mcas (list l, int k)
- node n new node(k)
- do
- node prev MCASRead( (l?head) )
- node curr MCASRead( (prev?next) )
- while ( curr?key lt k )
- prev curr
- curr MCASRead( (curr?next) )
-
- n?next curr
- while ( ?MCAS (1, prev?next,
curr, n) ) -
- Fig. 2. Insertion into a sorted list managed
using MCAS.
- typedef struct int key struct node next
node - typedef struct node head list
- void list insert single threaded (list l, int
k) - node n new node(k)
- node prev l?head
- node curr prev?next
- while ( curr?key lt k ) prev curr
- curr curr?next
-
- n?next curr
- prev?next n
-
- Fig. 1. Insertion into a sorted list.
30- typedef struct int key struct node next
node - typedef struct node head list
- void list insert wstm (list l, int k)
- node n new node(k)
- do
- wstm transaction tx
WSTMStartTransaction() - node prev WSTMRead(tx, (l?head))
- node curr WSTMRead(tx,
(prev?next)) - while ( curr?key lt k )
- prev curr
- curr WSTMRead(tx, (curr?next))
-
- n?next curr
- WSTMWrite(tx, (prev?next), n)
- while ( ?WSTMCommitTransaction(tx) )
-
- Fig. 3. Insertion into a sorted list managed
using WSTM.
- typedef struct int key ostm handleltnodegt
next h node - typedef struct ostm handleltnodegt head h
list - void list insert ostm (list l, int k)
- node n new node(k)
- ostm handleltnodegt n h new ostm
handle(n) - do
- ostm transaction tx
OSTMStartTransaction() - ostm handleltnodegt prev h l?head h
- node prev OSTMOpenForReading(tx,
prev h) - ostm handleltnodegt curr h prev?next
h - node curr OSTMOpenForReading(tx,
curr h) - while ( curr?key lt k )
- prev h curr h prev curr
- curr h prev ? next h curr
OSTMOpenForReading(tx, curr h) -
- n?next h curr h
- prev OSTMOpenForWriting(tx, prev h)
- prev?next h n h
- while ( ?OSTMCommitTransaction(tx) )
31Some notes
- The APIs could be used directly by expert
programmers. - They can help build a layer inside a complete
system. (like language features) - Adding runtime code generation to support a level
of indirection that OSTM objects need.
32Overview
- Goals of the design
- Introducing the APIs
- Programming with the APIs
- Design methods
- Practical design implementation of the APIs
(shortly) - Performance of data structures built over the
APIs.
33The key problem in the APIs ensuring that a set
of memory accesses appears to occur atomically
when it is implemented by a series of individual
instructions accessing one word at a time. We
separate a locations physical contents in memory
from its logical contents when accessed through
one of the APIs. For each of the APIs there is
only one operation which updates the logical
contents of memory locations MCAS,
WSTMCommitTransaction and OSTMCommit-Transaction.
For each of the APIs we present our design in a
series of four steps.
34Memory formats
- Define the format of the heap, the temporary data
structures that are used. - All three of our implementations introduce
descriptors which (i) set out the before and
after versions of the memory accesses that a
particular commit operation proposes to make, and
(ii) provides a status field indicating how far
the commit operation has progressed. - Descriptors properties (i) managed by garbage
collector (ii) a descriptors contents is
unchanged once it is made reachable from shared
memory (iii) once the outcome of a particular
commit operation has been decided then the
descriptors status field remains constant. - These first two properties mean that there is
effectively a one-to-one association between
descriptor references and the intent to perform a
given atomic update.
35Logical contents
- Each of our API implementations uses descriptors
to define the logical contents of memory
locations by providing a mechanism for a
descriptor to own a set of memory locations. - Logical contents of all of the locations in a
transaction that are need to be updated, is
updated at once, even if the physical contents of
these locations is not.
36Uncontended commit operation
- Well see how the commit operation arranges to
atomically update the logical contents of a set
of locations when it executes without
interference from concurrent commit operations. - A Commit operation contains 3 stages
- a) acquires exclusive ownership of the
locations being updated. - b) read-check phase ensures that locations
that have been read but not updated hold the
values expected in them. This is followed by the
decision point at which the outcome of the commit
operation is decided and made visible to other
threads through the descriptors status field. - c) release phase in which the thread
relinquishes ownership of the locations being
updated.
Start commit
Decision point
Finish commit
Release a2
Acquire a2
Read check a1
Read a1
Exclusive access to location a2
Location a1 guaranteed valid
37- Descriptors status field starts with
UNDECIDED. If there is a read check phase it
becomes READ CHECK. At the decision point it is
set to SUCCESSFUL if all of the required
ownerships were acquired and the read-checks
succeeded otherwise it is set to FAILED. - In order to show that an entire commit operation
appears atomic we identify a linearization point
within its execution at which it appears to
operate atomically on the logical contents of the
heap from the point of view of other threads. - Unsuccessful commit operation linearization
point is straightforward, when detected failure. - Successful commit operation
- a) no read check phase the linearization
point and decision point - coincide.
- b) with read check phase the linearization
point occurs at the start of - the read-check phase.
The linearization point comes before the decision
point!?
problem?
38Solution
- The logical contents are dependent on the
descriptors status field and so updates are not
revealed to other threads until the decision
point is reached. We reconcile this definition
with the use of a read-check phase by ensuring
that concurrent readers help commit operations to
complete, retrying the read operation once the
transaction has reached its decision point. This
means that the logical contents do not need to be
defined during the read-check phase because they
are never required.
39Contended commit operation
Consider a case when thread t2 encounters a
location currently hold by t1. 1) t1s status
is already decided (successful or failed). All of
the designs rely on having t2 help t1
complete its work, using the information in t1s
descriptor to do so.
2) t1s status is undecided and the algorithm
does not include a READ-CHECK phase.
strategy a) t2 causes t1 to abort if it has not
yet reached its decision
point- that is, if t1s status is still
UNDECIDED. This leads to an
obstruction free progress property and the risk
of livelock unless
contention management is employed to prevent t1
retrying its operation and
aborting t2. strategy b) threads sort the
locations that they require and t2 helps t1 to
complete its operation, even
if the outcome is currently UNDECIDED.
3) t1s status is undecided and the algorithm
does include a READ-CHECK phase. Here there is a
constraint on the order in which locations are
accessed because a thread must acquire locations
it wants to update before it enters the read
check phase.
40Status read check Data read y Data updated x
Status read check Data read x Data updated y
B
A
reading
reading
x
y
Solution abort at least one of the operations to
break the cycle. however, care must be taken not
to abort them all if we wish to ensure
lock-freedom rather than obstruction-freedom. (In
OSTM it can be done by imposing a total order on
all operations, based on the transactions
descriptor machine address)
41Overview
- Goals of the design
- Introducing the APIs
- Programming with the APIs
- Design methods
- Practical design implementation of the APIs
(shortly) - Performance of data structures built over the
APIs.
42MCAS
The MCAS function can be defined sequentially
as atomically bool MCAS (int N, word a ,
word e , word n ) for ( int i 0 i
lt N i ) if ( ai ei ) return FALSE
for ( int i 0 i lt N i ) ai ni
return TRUE MCAS implementation uses CCAS
(conditional compare swap) atomically word
CCAS (word a, word e, word n, word cond)
word x a if ( (x e) (cond 0) )
a n return x
43Memory formats
typedef struct word
status int N word
aMAX N, eMAX N, nMAX N
mcas descriptor The type of a value read from a
heap location can be tested using the IsMCASDesc
and IsCCASDesc predicates if either predicate
evaluates true then the tested value is a pointer
to the appropriate type of descriptor. These
function implementation require reserving 2 bits
in each word in order to distinguish between
descriptor and other values.
44Logical contents
- A location holds an ordinary value this value is
the logical contents. - The location refers to an UNDECIDED descriptor
the descriptors old value (ei) is the logical
contents. - The location refers to an FAILED descriptor like
in 2. - The location refers to an SUCCESSFUL descriptor
the new value (ni) is the logical contents of the
location.
word MCASRead (word a) word v
retry read v CCASRead(a)
if (IsMCASDesc(v)) for (
int i 0 i lt v?N i )
if ( v?ai a )
if (v?status SUCCESSFUL)
if (CCASRead(a) v) return
v?ni else
if (CCASRead(a)
v) return v?ei
goto retry read
return v
the descriptor is searched for an entry
relating to the address being read
re-check of ownership ensures that the status
field was not checked too late once the
descriptor had lost ownership of the location and
was consequently not determining its logical
contents.
45Commit operations
46(No Transcript)
47The conditional part of CCAS ensures that the
descriptors status is still UNDECIDED, meaning
that ei is correctly defined as the logical
contents of ai this is needed in case a
concurrent thread helps complete the MCAS
operation via a call to mcas help at line 18.
An unexpected non-descriptor value is seen (line
17).
when the status field is updated (line 23) then
all of the locations ai must hold references to
the descriptor and consequently the single status
update changes the logical contents of all of the
locations. This is because the update is made by
the first thread to reach line 23 for the
descriptor and so no threads can yet have reached
lines 25-27 and have starting releasing the
addresses.
48WSTM
- In WSTM there are 3 improvement of MCAS
- Space doesnt need to be reserved in each heaps
location. - WSTM implementation is responsible for tracking
the locations accessed, instead of the caller
(done in the read check phase) - Improving read parallelism locations that are
read but not updated can be cached in a shared
memory.
The cost the implementation is more
complex! Therefore we will see first a lock-based
framework and then an obstruction free one.
nothing comes for free!
49Memory formats
WSTM is based on associating version numbers with
locations in the heap. It uses these numbers to
detect conflicts between transactions.
A table of ownership records (orecs) hold the
information that WSTM uses to co-ordinate access
to the application heap. The orec table has a
fixed size. A hash function is used to map
addresses to orecs.
Each orec holds a version number, or, when update
is being commited, refers to the descriptor of
the transaction involves. We use the notation
to
indicate that address ai is being updated from
value oi at version number voi, to value ni at
version number vni. For read only access oini,
voivni. For an update vnivoi1.
50used to co-ordinate helping between transactions.
We will use the function isWSTMDesc to obtain if
an orec hold a reference to a transaction.
51Logical contents
Here there are 3 cases to consider 1 The orec
holds a version number. the logical contents in
this case comes directly from the application
heap. 2 The orec refers to a descriptor that
contains an entry for the address. Here that
entry gives the logical contents. For instance,
the logical contents of a1 is 100 because the
descriptor status is UNDECIDED. 3 The orec
refers to a descriptor that does not contain an
entry for the address. Here the logical contents
come from the application heap. For instance, the
logical contents of a101 is 300 because
descriptor tx1 does not involve a101 even though
it owns r1. Note we cannot determine the logical
content during read-check phase because the
decision point comes only after. Therefore we
rely on threads encountering such a descriptor to
help decide its outcome. A descriptor is well
formed if for each orec it either (i) contains at
most one entry associated with that orec, or (ii)
contains multiple entries associated with that
orec, but the old version number and new version
number are the same in all of them.
52Lines 32-38 ensure that the descriptor will
remain well formed when the new entry is added to
it. This is done by searching for any other
entries relating to the same orec (line 33). If
there is an existing entry then the old version
numbers must match (line 34). If the numbers do
not match then a concurrent transaction has
committed an update to a location involved with
the same orec tx is doomed to fail (line 35).
Line 36 ensures the descriptor remains well
formed even if it is doomed to fail. Line
37 ensures that the new entry has the same new
version as the existing entries e.g. if there
was an earlier WSTMWrite in the same transaction
that was made to another address associated with
this orec.
53WSTMWrite updates the entrys new value (lines
4850). It must ensure that the new version
number indicates that the orec has been updated
(lines 5152), both in the entry for addr and, to
ensure well formedness, in any other entries for
the same orec.
54Uncontended commit operations
a read-check phase checks that the version
numbers in orecs associated with reads are still
current (lines 28-31). Meaning that other
transactions havent changed the value we have
read.
We can see here the steps taking place during a
commit operation that we mentioned earlier.
Notice how this function follow the requirement
of setting the status to SUCCESSFUL atomically
updates the logical contents of all of the
locations written by the transaction.
55The read-check phase uses read check orec to
check that the current version number associated
with an orec matches the old version in the
entries in a transaction descriptor. If it
encounters another transaction descriptor, then
it ensures that its outcome is decided (line 12)
before examining it.
56Line 9 the orec is already owned by the current
transaction because its descriptor holds multiple
entries for the same orec. Note that the loop at
line 13 will spin while the value seen in the
orec is another transactions descriptor.
- This implementation uses orecs as mutual
exclusion locks, allowing only one transaction to
own an orec at a given time. - A transaction owns an orec when the orec holds a
pointer to this transactions descriptor.
57Graphic description of a commit operation
58Obstruction-free contended commit operations
Designing a non-blocking way to resolve
contention in WSTM is more complex than in MCAS.
The problem with WSTM is that a thread cannot be
helped while it is writing updates to the heap.
If a thread is pre-empted just before one of the
stores then it can be re-scheduled at any time
and perform that delayed update, overwriting
updates from subsequent transactions. In order
to make an obstruction-free WSTM, we make delayed
updates safe by ensuring that an orec remains
owned by some transaction while its possible that
delayed writes may occur to locations associated
with it. This means that the logical contents of
locations that may be subject to delayed updates
are taken from the owning transaction related to
the orec.
59 We will use a new field in the orecs data
structure count field. An orecs count field is
increased each time a thread successfully
acquires ownership of it. The count field is
decreased each time a thread releases ownership
in the obstruction-free variant of release orec.
A count field of zero therefore means that no
thread is in its update phase for locations
related to the orec.
In this implementation we allow ownership to
transfer directly between 2 descriptors. The
release_orec function takes care that the most
recent updates are placed back to the heap.
60OSTM
- The design organizes memory locations into
objects which act as the unit of concurrency and
update. - Another level of indirection data structures
contain references to OSTM handles. - As a transaction runs, the OSTM implementation
maintains sets holding the handles it accessed,
depending on the mode they were opened in
(read-only or read-write). The list of writable
objects include pointers to old and new versions
of the data. - This organization make it easier on conflicting
transaction (one that wishes to access a
currently acquired object) to finds the data it
needs. However the conflicting transaction must
peruse the read write list of the transaction, in
order to find the current copy of a given object. - meaning Concurrent reads can still determine
the objects current value by searching the
sorted write list and returning the appropriate
data-block depending on the transactions status.
61- Since there is no use in hash function, there is
no risk of false contention due to hash
collisions. - OSTM implements a simple strategy for conflicting
resolution if 2 transactions attempt to write
the same object, the one that acquires the object
first is considered to be winner. - To ensure non-blocking progress, the latter
arriving thread peruses the winners metadata and
recursively helps it complete its commit. - This, in addition to sorting the transactions
read write list and acquire them in order, allows
avoiding circular dependencies and deadlock. - While WSTM supports obstruction free
transactions, OSTM supports lock free progress.
62Memory formats
- The current contents of an OSTM object are stored
within a data-block. - We assume that a pointer uniquely identifies a
particular use of a particular block of memory. - Outside of a transaction context, shared
references to an OSTM object point to a
word-sized OSTM handle. - The state of incomplete transactions is
encapsulated within a per-transaction descriptor.
It contains the current status of the transaction
and lists of objects that have been opened in
read-only mode and in read-write mode. - Ordinarily, OSTM handles refer to the current
version of the objects data via a pointer to the
current data-block. - If a transaction is in the process of
committing an update to the object, then the
handle refers to the descriptor of the owning
transaction.
We will use the predicate IsOSTMDesc to
distinguishes between references to a datablock
and references to a transaction descriptor
63Figure (a) shows an example OSTM-based structure
which might be used by a linked-list.
A transaction in progress
64Logical contents
- Simpler than WSTM, where there was a many to one
relationship between orecs and heap words. - There are 2 cases
- 1 The OSTM handle refers to a data-block.
That block forms the objects logical contents. - 2 The OSTM handle refers to a transaction
descriptor. We then take the descriptors new
value for the block (if it is SUCCESSFUL) and its
old value for the block if it is UNDECIDED or
FAILED. - As usual, if a thread encounters a descriptor
in its read check phase, we require it to help
advance it to its decision point, where the
logical content is well defined.
65Commit operations
As in WSTM, there are 3 stages in a transactions
commit operation Acquire phase handles of
objects opened in read-write mode are acquired in
some global total order. (this is done using CAS
to replace the data block pointer to the
transactions descriptor) Read check phase
handles of objects opened in read mode are
checked in order to see if changed by other
transactions since read. Release phase each
updated object has its data block pointer
pointing to the correct value of the data.
66A note
- In both implementations transactions operate
entirely in private, and descriptors are only
revealed when ready to commit. Therefore
contention is discovered, if existed, only at the
end of the transaction. This is the Lazy Approach
mentioned before (remember?). - It allows many short time readers to co-exist
with a long time writer.
67Overview
- Goals of the design
- Introducing the APIs
- Programming with the APIs
- Design methods
- Practical design implementation of the APIs
(shortly) - Performance of data structures built over the
APIs.
68Performance evaluation
- The experiment that was done compared 14 set
implementation, 6 based on red-black trees and 8
based on skip lists. - Some of them are lock based, ant others are
implemented using CAS, MCAS, WSTM and OSTM. - The experiment compared the implementations both
in low contention and in varying contention. - The trades running committing operations like
lookup_key, add_key and remove_key - The operation chosen by the threads is random,
but the different probabilities are effected by
the fact that reads dominates writes in most of
the cases.
69performance under low contention.
- parallel writers are extremely unlikely to update
overlapping sections of the data structure. - A well-designed algorithm which provides
disjoint-access parallelism will avoid
introducing contention between these logically
non-conflicting operations. - As expected, the STM-based implementations
perform poorly compared with the other lock-free
schemes. - The reason there are significant overheads
associated - with the read and write operations (in WSTM)
or with maintaining the lists of opened objects
and constructing shadow copies of updated objects
(in OSTM). - The lock-free CAS-based and MCAS-based designs
perform extremely well because they add only
minor overheads on each memory access.
70Performance of skip lists implementations.
Performance of red black trees implementations.
71performance under varying contention.
- In non blocking algorithms, when conflicts occur
they are handled using a fairly mechanism such as
recursive helping or interaction with the thread
scheduler. - The poor performance of MCAS when contention is
high is because many operations must retry
several times before they succeed. - We can see here the weakness of locks the
optimal granularity of locking depends on the
level of contention. - Lock-free techniques avoid the need to take this
into consideration. - in red-black trees implementations, Both
lock-based schemes suffer contention for cache
lines at the root of the tree where most
operations must acquire the multireader lock.
72Performance of skip lists implementations.
Performance of red black trees implementations.
73Conclusions
- The non blocking implementation that we have seen
can match or surpass the performance of lock
based alternatives. - APIs like STM have benefits in ease of use
compared with mutual exclusion locks. - STM avoids the need to consider problems like
granularity of locking (which changed dynamically
according to the level of contention) and the
order of locking (that can cause deadlock). - Therefore, it is possible to use lock free
techniques in places where traditionally we would
use lock based synchronization.
74Bibliography
- Concurrent Programming Without Locks, KEIR FRASER
(University of Cambridge Computer Laboratory) and
TIM HARRIS (Microsoft Research Cambridge), ACM
Journal Name, Vol. V, No. N, M 20YY, Pages 159. - Concurrent Programming Without Locks, by KEIR
FRASER (University of Cambridge Computer
Laboratory) and TIM HARRIS (Microsoft Research
Cambridge), ACM Journal Name, Vol. V, No. N, M
20YY, Pages 1-48. - Lowering the Overhead of Nonblocking Software
Transactional Memory, Virendra J. Marathe,
Michael F. Spear, Christopher Heriot, Athul
Acharya, David Eisenstat, William N. Scherer III,
and Michael L. Scott, Technical Report 893
Department of Computer Science, University of
Rochester, March 2006 - En.wikipedia.org
75The End