Review Last Week - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Review Last Week

Description:

3 loyal generals and 1 traitor ('the fault' ... It is clear to all that General 3 is the traitor. ... algorithm except now with 2 loyal generals and 1 traitor. ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 48
Provided by: csAu1
Category:
Tags: last | review | traitor | week

less

Transcript and Presenter's Notes

Title: Review Last Week


1
Review Last Week
  • Coordination
  • Mutual exclusion
  • Election algorithms
  • Multicasting

2
Today
  • Consensus/Agreement with faulty processes
  • Impossibility of consensus in asynchronous
    systems
  • Networks, task graphs and scheduling

3
Agreement
  • Agreement of processes on a value after one or
    more of the processes has proposed what value
    should be
  • We have seen some related examples
  • In mutual exclusion processes agree on which
    should enter the CR
  • In elections, processes agree on who should be
    the coordinator
  • Some of these algorithms make strong assumptions
    on channel reliability and faulty processes

4
Consensus/Agreement
  • Requirements of a consensus algorithm
  • Termination Eventually every process sets its
    decision variable
  • Agreement The decision value of all correct
    processes is same
  • Validity If a process decides on a value, then
    there was a process that started with that value

5
Agreement in faulty systems
  • We will study other forms of agreement in faulty
    systems under the assumption that communication
    channels are reliable and the system is
    synchronous

6
Agreement in Faulty Systems (1)
  • How does a process group deal with a faulty
    member?
  • The Byzantine Generals Problem Three generals
    must agree to attack or retreat.
  • 3 loyal generals and 1 traitor (the fault).
  • The generals announce their troop strengths (in
    units of 1 kilosoldiers) to the other members of
    the group by sending a message.

7
Agreement in Faulty Systems (2)
  • The vectors (b) that each general assembles based
    on (a) in first step. Each general knows their
    own strength. They then send their vectors to
    all the other generals.
  • In (c) the vectors that each general receives in
    step 3.
  • It is clear to all that General 3 is the traitor.
    In each column, the majority value is assumed
    to be correct.

8
Agreement in Faulty Systems (3)
Unfortunately the algorithm does not always work!
  • The same algorithm except now with 2 loyal
    generals and 1 traitor.
  • It is no longer possible to determine the
    majority value in each column. The algorithm has
    failed to produce agreement.

9
Agreement in Faulty Systems (3)
  • Lamport showed that no solution exists if N3f
    (with N total number of processes and f number of
    failed processes)
  • They gave an algorithm that solves byzantine
    generals problem in a synchronous system if N3f1

10
Agreement with unreliable communication
  • The two army problem Two generals from the
    yellow army with 3000 soldiers each, agreeing on
    attacking the blue army with 4,000 soldiers.

11
Two army problem
  • Suppose general A sends the message to B
  • Attack at 12
  • General A wont attack alone since it will be
    defeated.
  • A doesnt know whether B has received the
    message.
  • B knows that As may not be sure of him receiving
    the message, so B sends an agreement message.

12
Two army problem
Now A sends ack to make sure that B knows he got
the confirmation. B gets ack but he wants to make
sure A knows he got ack A gets ack but he wants
to make sure B ..
By sending/receiving acks both generals get more
knowledge but never common knowledge.
13
Reliable and bounded time communication
  • If A knows that B will receive any message that A
    sends within one minute of As sending it, then
    if A sends
  • Attack at 12
  • A knows that within two minutes A and B will have
    common knowledge
  • A says attack at 12

14
Conclusion
  • Common knowledge is unattainable in systems with
    unreliable communication (or with unbounded
    delay)
  • Common knowledge is attainable in systems with
    reliable communication in bounded time

15
TDM and Common Knowledge
  • In a synchronous system with global clock, common
    knowledge can be gained by passage of time (no
    message passing)
  • For mutual exclusion - using time division
    multiplexing (TDM) - processes enter CR on their
    pre-assigned slots

A
B
C
A
16
Consensus/Agreement Problems
  • Consensus, synchronous settings, unreliable
    communication impossible.
  • Consensus, asynchronous settings, unreliable
    communication impossible
  • (Problem 1 is a special case of Problem 2).

17
FLP Impossibility Result
  • Fischer, Lynch and Paterson 1985
  • There is no deterministic algorithm solving the
    consensus problem in an asynchronous distributed
    system with a single crash failure

18
FLP Impossibility Result
  • A crashed process cannot be distinguished from a
    slow one.
  • Not even with a 100 reliable comm. network
  • There is always a chance that some continuation
    of the processes execution avoid consensus being
    reached.
  • No guarantee for consensus, but
  • Prob(consensus) gt 0
  • Solutions to this problem are based on fault
    masking or failure detectors

19
Failure Masking
  • A service masks a failure either by hiding it or
    converting it into a more acceptable form
  • Checksums are used to mask corrupted messages
    (converting arbitrary into omission failure).
    Omission failures can be handled by
    retransmitting
  • Replication can be used to mask processes
    failures replacing the process and restoring
    its information from memory/disk

20
Failure Detection
  • Failure Detection
  • Processes can agree in believing that a process
    that has not responded for more than some bounded
    time to have failed
  • Even if the process eventually responds, its
    answer will be discarded, turning the
    asynchronous system into a synchronous
  • Time outs can be adapted according to observed
    response times

21
FLP - Main results
  • Proves the impossibility of fault-tolerant
    consensus
  • Every asynchronous fault-tolerant consensus
    algorithm has an infinite run in which no process
    decides
  • It is possible to design asynchronous consensus
    algorithms that dont always terminate

22
The Failure Detectors abstraction (Chandra/Toueg
96)
  • Showed that FLP applies to many problems, not
    just consensus
  • In particular, they show that FLP applies to
    group membership, reliable multicast
  • So these practical problems are impossible in
    asynchronous systems, in formal sense
  • Chandra/Toueg also look at the weakest condition
    under which consensus can be solved for
    asynchronous systems with
  • reliable communication
  • less than N/2 processes crash

23
Distributed embedded systems
Transport layer provides message-based
programming interface send_msg(adrs,data1) Data
must be broken into packets at source,
reassembled at destination. Data-push
programming PE send data to the network when
ready.
PE
sensor
PE
PE
actuator
PEs may be CPUs or ASICs.
24
Bus arbitration
  • Fixed Same order of resolution every time.
  • Fair every PE has same access over long periods.
  • round-robin rotate top priority among PEs.

fixed
A
B
C
A
B
C
round-robin
A
B
C
A
B
C
A,B,C
A,B,C
25
Arbitration and delay
  • Fixed-priority arbitration introduces unbounded
    delay for all but highest-priority device.
  • Unless higher-priority devices are known to have
    limited rates that allow lower devices to
    transmit.
  • Round-robin arbitration introduces bounded delay
    proportional to N.

26
Multiprocessor networks
  • Multiple DSPs are often connected by high-speed
    networks for signal processing

DSP
DSP
Sharc DSP processors (21060) can be connected in
this way to improve processing performance
DSP
DSP
27
Communication Analysis Message delay
  • Assume
  • single message
  • no contention.
  • Delay
  • tm tx tn tr
  • xmtr overhead network xmit time rcvr
    overhead

28
Multiple messages
  • If messages can interfere with each other,
    analysis is more complex.
  • Model total message delay
  • ty td tm
  • wait time for network message delay
  • Further complications
  • Acknowledgment time.
  • Transmission errors.

Message wait time and delay are normally random
variables
29
Distributed Tasks
  • Task graph
  • Network

P1
P2
M1
M2
M3
d1
d2
P3
30
Initial schedule
M1
P1
M2
P2
M3
P3
network
d1
d2
time
0
20
10
5
15
31
New design
  • Modify P3
  • reads one packet of d1, one packet of d2
  • computes partial result
  • continues to next packet

32
New schedule
M1
P1
M2
P2
M3
P3
P3
P3
P3
network
d1
d2
d1
d2
d1
d2
d1
d2
time
0
20
10
5
15
33
Priority inversion in networks
  • In many networks, a packet cannot be interrupted.
  • Result is priority inversion
  • low-priority message holds up higher-priority
    message.
  • Doesnt cause deadlock, but can slow down
    important communications.

34
System performance analysis
  • System analysis is difficult in general.
  • multiprocessor performance analysis is hard
  • communication performance analysis is hard.
  • Simple example uncertainty in P1 finish time -gt
    uncertainty in P2 start time.

P1
P2
35
Lower bounds on system
  • Computational requirements
  • sum up process requirements over least-common
    multiple of periods, average over one period.
  • Communication requirements
  • Count all transmissions in one period.

36
Hardware platform design
  • Need to choose
  • number and types of PEs
  • number and types of networks.
  • Evaluate a platform by allocating processes,
    scheduling processes and communication.

37
I/O-intensive systems
  • Start with I/O devices, then consider
    computation
  • inventory required devices
  • identify critical deadlines
  • chooses devices that can share PEs
  • analyze communication times
  • choose PEs to go with devices.

38
Computation-intensive systems
  • Start with shortest-deadline tasks
  • Put shortest-deadline tasks on separate PEs.
  • Check for interference on critical
    communications.
  • Allocate low-priority tasks to common PEs
    wherever possible.
  • Balance loads wherever possible.

39
Parallel/Distributed Systems
  • Distributed systems have similarities with
    parallel systems
  • Parallel systems are concerned mostly (only) with
    improving performance
  • Distributed systems are concerned with
    performance, fault tolerance, scalability,
    dependability etc.
  • Scheduling problem is similar and important in
    both
  • Distributed performance, real time deadlines
  • Parallel performance

40
Parallel Task Scheduling (PTS)
  • Divide Workload
  • Parallel Execution
  • Considers only dependencies between tasks due to
    data
  • The Problem What is the best way to do this?

41
PTS Example
  • Computing n! when n is large
  • Idea Having 4 processors, divide workload among
    them to lower computation time

P1
P3
P2
P4
communication
42
PTS Representation of the Problem
  • Precedence graph Job
  • Nodes Tasks
  • Blue Execution times
  • Red Communication
  • times

43
PTS Communication Latency
  • Imposed Constraints
  • Communication
  • Latency
  • Problem becomes NP-Hard

44
Parallel Task Scheduling (PTS)
45
Minimum Cut in a Graph
  • Statement of Problem
  • G (V,E)
  • Source, s Î V
  • Sink, t Î V
  • Find cut Í E, such that
  • if G (V,E \ cut), no path from
  • s to t, and the sum of the edges in
  • cut is minimized
  • k-cut Í E, such that
  • if G (V, E \ k-cut), G has k components and
    the sum of the edges in k-cut is minimized

46
Parallel Task Scheduling
  • Proposed Algorithm
  • Theory
  • Minimize the amount of communication
  • Partition the graph into separate components
  • One processor executes one component
  • Minimize arcs going between components ? Min
    k-Cut!

47
Partition
Now each processor has to schedule internally
tasks to comply with dependencies
Critical path
Mapping many tasks to a single processor will
decrease overall performance
Tradeoff between computation, communication and
task granularity
Write a Comment
User Comments (0)
About PowerShow.com