Title: MiniProject on Algorithms for Multicore Systems 20214181
1Mini-Project on Algorithms for Multicore
Systems(20214181)
http//www.cs.bgu.ac.il/mpam092/Main
2Mini-project Plan
An introductory lecture on shared-memory
algorithms for multi-core systems
20/4/09
- Team formulation paper selection
- Teams of 1, 2, or 3 students
- Each team should give (at least!) 3 priorities
for papers they prefer
25/4/09
Paper allocation published
27/4/09
2
3From the New York Times
3
4The Future of Computing
- Speeding up uni-processors is harder and harder
- Intel, Sun, AMD, IBM now focusing on multi-core
architectures - Already, most computers are multiprocessors
How can we write correct and scalable algorithms
for multiprocessors?
4
5A fundamental problem of thread-level parallelism
Thread A
Thread B
.
. Accounti Accounti-X Accountj
AccountjX .
. .
.
. Accounti Accounti-X Accountj
AccountjX .
. .
But what if execution is concurrent?
Must avoid race conditions
6Inter-thread synch. alternatives
- Coarse-grained locks
- Fine-grained locks
- Lock-free implementations
- Transactional memory
7(No Transcript)
8(No Transcript)
9(No Transcript)
10More on lock-free synchronization
- The use of locks can sometimes be avoided if the
hardware supports stronger Read-Modify-Write
operations and not just read and write - Test-and-set
- Fetch-and-add
- Compare-and-swap
Test-and-set(w) atomically v read from
w w 1 return v
Fetch-and-add(w, delta) atomically v
read from w w vdelta return v
11The compare-and-swap (CAS) operation
Comareswap(w, expected, new) atomically v
read from w if (v expected) w
new return success else return
failure
Motorola 680x0 IBM 370 Sun SPARC 80X86, Pentium
MIPS PowerPC DECAlpha
12An example CAS usageTreibers stack algorithm
val
val
val
Top
next
next
next
- Push(int v, Stack S)
- n new NODE create node for new stack item
- n.val v write item value
- do forever repeat until success
- node top S.top
- n.next top next points to current
top (LIFO order) - if compareswap(S.top, top, n) try to add
new item - return return
if succeeded - od
13Treibers stack algorithm (contd)
val
val
val
Top
next
next
next
- Pop(Stack S)
- do forever
- top S.top
- if top null
- return empty
- if compareswap(S.top, top, top.next)
- return-valtop.val
- free top
- return return-val
- od
14More on lock-free synchronization (contd)
Lock-free synchronization algorithms are often
complicated
15Transactional Memory
- A transaction is a sequence of memory reads and
writes, executed by a single thread, that either
commits or aborts - If a transaction commits, all the reads and
writes appear to have executed atomically - If a transaction aborts, none of its stores take
effect - Transaction operations aren't visible until they
commit (if they do) -
16Transactional Memory Goals
- A new multiprocessor architecture
- The goal Implementing lock-free synchronization
that is - efficient
- easy to use compared with conventional
techniques based on mutual exclusion - Implemented by hardware support (such as
straightforward extensions to multiprocessor
cache-coherence protocols) and / or by software
mechanisms
17A Usage Example
Accounti Accounti-X Accountj
AccountjX
Locks Lock(Li) Lock(Lj) Accounti
Accounti X Accountj Accountj X
Unlock(Lj) Unlock(Li)
- Transactional Memory
- atomic
- Accounti Accounti X
- Accountj Accountj X
-
18Back to Lock-free and lock-based algorithms
19Progress Conditions
- Wait-freedom Each thread terminates its
operation in a finite number of its steps - Nonblocking (a.k.a. lock-freedom) After a finite
number of steps, some thread terminates its
operation - Obstruction-freedom If a thread runs by itself
long enough, it will finish its operation
20Correctness condition linearizability
Linearizability (an intuitive definition) Can
find a point within the time-interval of each
operation, where the operation took place, such
that the operations order is legal.
21Example A queue
linearizable
q.enq(x)
q.deq(y)
q.enq(y)
q.deq(x)
time
(6)
22Example
not linearizable
q.enq(x)
q.deq(y)
q.enq(y)
(5)
23Example
linearizable
q.enq(x)
q.deq(x)
time
(4)
24Example
multiple orders OK
linearizable
q.enq(x)
q.deq(y)
q.enq(y)
q.deq(x)
time
(8)
25Lin. points in Treibers algorithm
- Push(int v, Stack S)
- n new NODE create node for new stack item
- n.val v write item value
- do forever repeat until success
- node top S.top
- n.next top next points to current
top (LIFO order) - if compareswap(S.top, top, n) try to add
new item - return return
if succeeded - od
- Pop(Stack S)
- do forever
- top S.top
- if top null
- return empty
- if compareswap(S.top, top, top.next)
- return-valtop.val
- free top
- return return-val
- od
26Development/deployment platforms
- You will test your algorithm on an 8-core Xeon
machine, running Gentoo Linux (kernel 2.6.22) - We will set up accounts for you
- You can also develop (and do initial tests) on
any PC (obviously, a multi-core if preferable).
Easiest way for that seems to be - Install VMware
- Run a compatible Linux on top of VMWare(e.g.,
Ubuntu 8.1, kernel Linux 2.6.27 should work)
27Suggested Projects
- A lock-free extensible hash table
- A lock-free open addressing hash table
- Yet another lock-free hash table
- A lock-free elimination-based stack
- A lock-based skip-list
- A lock-based heap
- A lock-based linked list
- A lock-free double-ended queue