Dariusz Kowalski - PowerPoint PPT Presentation

About This Presentation
Title:

Dariusz Kowalski

Description:

Performing Tasks in Asynchronous Environments Dariusz Kowalski University of Connecticut & Warsaw University Alex Shvartsman University of Connecticut & MIT – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 24
Provided by: UNIVERSITY809
Category:

less

Transcript and Presenter's Notes

Title: Dariusz Kowalski


1
Performing Tasks in Asynchronous Environments
  • Dariusz Kowalski
  • University of Connecticut Warsaw University
  • Alex Shvartsman
  • University of Connecticut MIT

2
Structure of the presentation
  • Model
  • problem of performing tasks,
  • bibliography,
  • asynchronous message-passing model,
  • message delay and modeling issues
  • Delay-sensitive lower bounds for Do-All
  • Progress-tree Do-All algorithms
  • Simulating shared-memory and Anderson-Woll (AW)
  • Asynch. message-passing progress-tree algorithm
  • Permutation Do-All algorithms

3
Do-All problem (DHW et al.)
  • DA(p,t) problem abstracts the basic problem of
    cooperation in a distributed setting
  • p processors must perform t tasks, andat least
    one processor must know about it
    Dwork Halpern Waarts
    92/98
  • Tasks are
  • known to every processor
  • similar - each takes similar number of local
    steps
  • independent - may be performed in any order
  • idempotent - may be performed concurrently

4
Do-All message-passing model
  • Message-passing model -- communication by
    exchanging messages
  • 1. Synchronous model with crashes -- problem well
    understood, results close to optimal
  • Dwork, C., Halpern, J., Waarts, O.
  • Performing work efficiently in the presence of
    faults.
  • SIAM Journal on Computing, 27 (1998)
  • De Prisco, R., Mayer, A., Yung, M.
  • Time-optimal message-efficient work performance
    in
  • the presence of faults. Proc. of 13th
    PODC, (1994)
  • Chlebus, B., De Prisco, R., Shvartsman, A.A.
  • Performing tasks on synchronous restartable
    message- passing processors. Distributed
    Computing, 14 (2001)
  • 2. Asynchronous model -- no interesting solutions
    until recently

5
Related work
  • Shared-memory model -- communication by
    read/write --
  • widely studied, but solutions far from optimal
  • Kanellakis, P.C., Shvartsman, A.A.
    Fault-tolerant parallel computation. Kluwer
    Academic Publishers (1997)
  • Anderson, R.J., Woll, H. Algorithms for the
    certified Write-All problem. SIAM Journal on
    Computing, 26 (1997)
  • Kedem, Z., Palem, K., Raghunathan, A., Spirakis,
    P. Combining tentative and definite executions
    for very fast dependable parallel computing.
    Proc. of 23rd STOC, (1991)

6
Message-Passing - model and goals
  • p asynchronous processors with PID in 0,,p-1
  • processors communicate by message passing
  • in one local step each processor can send a
    message to any subset of processors
  • messages incur delays between send and receive
  • processing of all received messages can be done
    during one local step
  • Properties
  • duration of a local step may be unbounded
  • message delays may be unbounded
  • information may not propagate -- send/recv depend
    on delay
  • Goal understand the impact of message delay on
    efficiency of algorithmic solutions for Do-All

7
Message-delay-sensitive approach
  • Even if messages delay are bounded by d
    (d-adversary),cooperation may be difficult
  • Observation
  • If d ?(t) then work must be ?(t p)
  • This means that cooperation is difficult, and
    addressing scheduling alone is not enough - -
    algorithm design and analysis must be d-sensitive
  • Message-delay-sensitive approach
  • C. Dwork, N. Lynch and L. Stockmeyer. Consensus
    in the presence of partial synchrony. J. of the
    ACM, 35 (1988)

8
Measures of efficiency
  • Termination time the first time when all tasks
    are done and at least one processors knows about
    it
  • Used only to define work and message complexity
  • Not interesting on its own if all processors but
    one are delayed then trivially time is ?(t)
  • Work measures the sum, over all processors, of
    the number of local steps taken until termination
    time
  • Message complexity measures number of all
    point-to-point messages sent until termination
    time

9
Lower bound - randomized algorithms
  • Theorem Any randomized algorithm solving DA with
    t tasks using p asynchronous message-passing
    processors performs expected work
  • ?(tp?d?logd1 t)
  • against any d-adversary.
  • Proof (sketch)
  • Adversary partitions computation into stages,
    each containingd time units, and constructs
    delay pattern stage after stage
  • ? delays all messages in stage to be received at
    the end of stage
  • ? delays linear number of processors (which want
    to perform more than (1-1/(3d)) fraction of
    undone tasks) during stage
  • selection is on-line, with high probability has
    good properties

10
Simulating shared-memory algorithms
  • Write-All algorithm AWT
  • Anderson, R.J., Woll, H. Algorithms for the
    certified Write-All problem. SIAM Journal on
    Computing, 26 (1997)
  • Quorum systems Atomic memory services
  • Attiya, H., Bar-Noy, A., Dolev, D. Sharing
    memory robustly in message passing systems. J. of
    the ACM, 42 (1996)
  • Lynch, N., Shvartsman, A. RAMBO A
    Reconfigurable Atomic Memory Service. Proc. of
    16th DISC, (2002)
  • Emulating asynchronous shared-memory algorithms
  • Momenzadeh, M. Emulating shared-memory Do-All in
    asynchronous message passing systems. Masters
    Thesis, CSE, University of Conn, (2003)

11
Progress tree algorithms BKRS, AW
  • Shared memory
  • p processors, t tasks (p t)
  • q permutations of q
  • q-ary progress tree of depth logq p
  • nodes are binary completion bits
  • Permutations establish the order in which
    the children are visited
  • p processors traverse the tree and use
    q-ary expansion of their PID to choose
    permutations
  • Anderson Woll

1 2 3 q
1 2 3 q
1 2 3 q
12
Atomic memory is not required
  • We use q-ary progress trees as the main data
    structure that is written and read -- note
    that atomicity is not required
  • If the following two writes occur (the entire
    tree is written), then a subsequent read may
    obtain a third value that was never written
  • Property of monotone progress
  • 1 at a tree node i indicates that all tasks
    attached to the leaves in the sub-tree rooted in
    i have been performed
  • If 1 is written at a node i in the progress tree
    of a processor, it remains 1 forever

0
0
0
write
write
read
1
0
0
1
1
1
13
Algorithm DAq - traverse progress tree
  • Instead of using shared memory, processors
    broadcast their progress trees as soon as local
    progress is recorded
  • p, t 9 , q 3
  • ? list of 3 schedules from S3
  • T ternary tree of 9 leaves (progress
    tree), values 0-1
  • PID(j) j-th digit of ternary-representation
    of PID

1
2
3
?0 PID 0,3,6
?1 PID 1,4,7
0
?2 PID 2,5,8
7213
1
2
3
7213
4
5
8
7
9
10
12
11
6
14
Algorithm DAq - case p ? t
15
Procedure DOWORK
16
Contention of permutations
  • Sn - group of all permutations on set n,
  • with composition ? and identity ?n
  • ?, ? - permutations in Sn
  • ? - set of q permutations from Sn
  • i is lrm (left-to-right maximum) in ? if ?(i) gt
    maxjlti ?(j)
  • LRM(? ) - number of lrm in ? Knuth
  • Cont(?,? ) ?? ?? LRM(? -1 ? ?)
  • Contention of ? Cont(? ) max? Cont(?,? )
    AW
  • Theorem AW For any n gt 0 there exists set ?
    of n permutations from Sn with Cont(? ) ? 3nHn
    ?(n log n).
  • Knuth Knuth, D.E. The art of computer
    programming Vol. 3 (third edition).
    Addison-Wesley Pub Co. (1998)

10
3
5
2
4
6
1
9
7
8
11
17
Procedure Oblivious Do
  • n - number of jobs and units
  • ? - list of n schedules from Sn
  • Procedure Oblivious
  • Forall processors PID 0 to n-1
  • for i 1 to n do
  • perform Job(? PID(i))
  • Execution of Job(? PID(i)) by processor PID is
    primary, if job ? PID(i) has not been previously
    performed
  • Lemma AW In algorithm Oblivious with n units,
    n jobs, and using the list? of n permutations
    from Sn, the number of primary job executions is
    at most Cont(? ).

18
Algorithm DAq - analysis
  • Modification of algorithm DAq for p lt t
  • We partition the t tasks into p jobs of size t
    /p and let the algorithm DAq work with these
    jobs.
  • It takes a processor O(t /p) work (instead of
    constant) to process such a job (job unit).
  • In each step, a processor broadcasts at most one
    message to p-1 other processors, we obtain

Theorem 4 For any constant ? gt 0 there is a
constant q such that the algorithm DAq has
work W(p,t,d) O(t?p? p?d??t /d? ? ) and
message complexity O(p ? W(p,t,d)) against any
d-adversary (do(t)).
19
Permutation algorithms - case p ? t
  • Algorithms proceed in a loop
  • select the next task using ORDERSELECT rule
  • perform selected task
  • send messages, receive messages, and update state
  • ORDERSELECT rules
  • PARAN1 ORD - initially processor PID permutes
    tasks randomly
  • SEL - PID selects first task remaining
    on his schedule
  • PARAN2 ORD - no initial order
  • SEL - PID selects task from remaining
    sets randomly
  • PADET ORD - initially processor PID chooses
    schedule ?PID in ?
  • SEL - PID selects first task remaining
    on schedule ?PID
  • ? - list of p schedules from St

20
d-Contention of permutations
  • We introduce the notion of d-Contention
  • i is d-lrm in ? if j lt i ?(i) lt ?(j) lt d
  • d 2
  • LRMd(?) - number of d-lrm in ?
  • Contd(?,? ) ?? ?? LRMd(? -1 ? ?)
  • d-Contention of ? Contd(? ) max? Contd(?,? )

10
3
5
2
4
6
1
9
7
8
11
Theorem For sufficiently large p and n, there
is a list ? of p permutations from Sn such that,
for every integer d gt1, Contd(? ) ? n log n
5pd ln(en/d). Moreover, random ? is good with
high probability.
21
d-Contention and work
  • Lemma For algorithms PADET and PARAN1, the
    respective worst case work and expected work is
    at most
  • Contd(? )
  • against any d-adversary.
  • Example
  • p 2, t 11, d 2

Order of tasks to perform 1,2,3,4,5,6,7,8,9,10,1
1
1
3
2
5
7
4
9
8
6
11
10
1
3
2
5
7
9
11
10
2
4
6
8
10
11
9
7
5
3
1
2
4
6
8
10
11
22
Permutation algorithms - results
  • Theorem Randomized algorithms PARAN1 and PARAN2
    perform expected work
  • O(t?log p p?d?log(t /d))
  • and have expected communication
  • O(t?p?log p p2?d?log(t /d))
  • against any d-adversary (do(t)).
  • Corollary There exists a deterministic list of
    schedules ? such that algorithm PADET performs
    work
  • O(t?log p p?mint,d?log(2t /d))
  • and has communication
  • O(t?p?log p p2?mint,d?log(2t /d))
  • when p ? t.

23
Conclusions and open problems
  • First message-delay-sensitive analysis of the
    Do-All problem for asynchronous processors in
    message-passing model
  • lower bounds for deterministic and randomized
    algorithms
  • deterministic and randomized algorithms with
    subquadratic(in p and t ) work for any message
    delay d as long as do(t)
  • Among the interesting open questions are
  • for algorithm PADET how to construct list ? of
    permutations efficiently
  • closing the gap between the upper and the lower
    bounds
  • investigate algorithms that simultaneously
    control work and message complexity
Write a Comment
User Comments (0)
About PowerShow.com