Title: Review Last Week
1Review Last Week
- Coordination
- Mutual exclusion
- Election algorithms
- Multicasting
2Today
- Consensus/Agreement with faulty processes
- Impossibility of consensus in asynchronous
systems - Networks, task graphs and scheduling
3Agreement
- Agreement of processes on a value after one or
more of the processes has proposed what value
should be - We have seen some related examples
- In mutual exclusion processes agree on which
should enter the CR - In elections, processes agree on who should be
the coordinator - Some of these algorithms make strong assumptions
on channel reliability and faulty processes
4Consensus/Agreement
- Requirements of a consensus algorithm
- Termination Eventually every process sets its
decision variable - Agreement The decision value of all correct
processes is same - Validity If a process decides on a value, then
there was a process that started with that value
5Agreement in faulty systems
- We will study other forms of agreement in faulty
systems under the assumption that communication
channels are reliable and the system is
synchronous
6Agreement in Faulty Systems (1)
- How does a process group deal with a faulty
member? - The Byzantine Generals Problem Three generals
must agree to attack or retreat. - 3 loyal generals and 1 traitor (the fault).
- The generals announce their troop strengths (in
units of 1 kilosoldiers) to the other members of
the group by sending a message.
7Agreement in Faulty Systems (2)
- The vectors (b) that each general assembles based
on (a) in first step. Each general knows their
own strength. They then send their vectors to
all the other generals. - In (c) the vectors that each general receives in
step 3. - It is clear to all that General 3 is the traitor.
In each column, the majority value is assumed
to be correct.
8Agreement in Faulty Systems (3)
Unfortunately the algorithm does not always work!
- The same algorithm except now with 2 loyal
generals and 1 traitor. - It is no longer possible to determine the
majority value in each column. The algorithm has
failed to produce agreement.
9Agreement in Faulty Systems (3)
- Lamport showed that no solution exists if N3f
(with N total number of processes and f number of
failed processes) - They gave an algorithm that solves byzantine
generals problem in a synchronous system if N3f1
10Agreement with unreliable communication
- The two army problem Two generals from the
yellow army with 3000 soldiers each, agreeing on
attacking the blue army with 4,000 soldiers.
11Two army problem
- Suppose general A sends the message to B
- Attack at 12
- General A wont attack alone since it will be
defeated. - A doesnt know whether B has received the
message. - B knows that As may not be sure of him receiving
the message, so B sends an agreement message.
12Two army problem
Now A sends ack to make sure that B knows he got
the confirmation. B gets ack but he wants to make
sure A knows he got ack A gets ack but he wants
to make sure B ..
By sending/receiving acks both generals get more
knowledge but never common knowledge.
13Reliable and bounded time communication
- If A knows that B will receive any message that A
sends within one minute of As sending it, then
if A sends - Attack at 12
- A knows that within two minutes A and B will have
common knowledge - A says attack at 12
14Conclusion
- Common knowledge is unattainable in systems with
unreliable communication (or with unbounded
delay) - Common knowledge is attainable in systems with
reliable communication in bounded time
15TDM and Common Knowledge
- In a synchronous system with global clock, common
knowledge can be gained by passage of time (no
message passing) - For mutual exclusion - using time division
multiplexing (TDM) - processes enter CR on their
pre-assigned slots
A
B
C
A
16Consensus/Agreement Problems
- Consensus, synchronous settings, unreliable
communication impossible. - Consensus, asynchronous settings, unreliable
communication impossible - (Problem 1 is a special case of Problem 2).
17FLP Impossibility Result
- Fischer, Lynch and Paterson 1985
- There is no deterministic algorithm solving the
consensus problem in an asynchronous distributed
system with a single crash failure
18FLP Impossibility Result
- A crashed process cannot be distinguished from a
slow one. - Not even with a 100 reliable comm. network
- There is always a chance that some continuation
of the processes execution avoid consensus being
reached. - No guarantee for consensus, but
- Prob(consensus) gt 0
- Solutions to this problem are based on fault
masking or failure detectors
19Failure Masking
- A service masks a failure either by hiding it or
converting it into a more acceptable form - Checksums are used to mask corrupted messages
(converting arbitrary into omission failure).
Omission failures can be handled by
retransmitting - Replication can be used to mask processes
failures replacing the process and restoring
its information from memory/disk
20Failure Detection
- Failure Detection
- Processes can agree in believing that a process
that has not responded for more than some bounded
time to have failed - Even if the process eventually responds, its
answer will be discarded, turning the
asynchronous system into a synchronous - Time outs can be adapted according to observed
response times
21FLP - Main results
- Proves the impossibility of fault-tolerant
consensus - Every asynchronous fault-tolerant consensus
algorithm has an infinite run in which no process
decides - It is possible to design asynchronous consensus
algorithms that dont always terminate
22The Failure Detectors abstraction (Chandra/Toueg
96)
- Showed that FLP applies to many problems, not
just consensus - In particular, they show that FLP applies to
group membership, reliable multicast - So these practical problems are impossible in
asynchronous systems, in formal sense - Chandra/Toueg also look at the weakest condition
under which consensus can be solved for
asynchronous systems with - reliable communication
- less than N/2 processes crash
23Distributed embedded systems
Transport layer provides message-based
programming interface send_msg(adrs,data1) Data
must be broken into packets at source,
reassembled at destination. Data-push
programming PE send data to the network when
ready.
PE
sensor
PE
PE
actuator
PEs may be CPUs or ASICs.
24Bus arbitration
- Fixed Same order of resolution every time.
- Fair every PE has same access over long periods.
- round-robin rotate top priority among PEs.
fixed
A
B
C
A
B
C
round-robin
A
B
C
A
B
C
A,B,C
A,B,C
25Arbitration and delay
- Fixed-priority arbitration introduces unbounded
delay for all but highest-priority device. - Unless higher-priority devices are known to have
limited rates that allow lower devices to
transmit. - Round-robin arbitration introduces bounded delay
proportional to N.
26Multiprocessor networks
- Multiple DSPs are often connected by high-speed
networks for signal processing
DSP
DSP
Sharc DSP processors (21060) can be connected in
this way to improve processing performance
DSP
DSP
27Communication Analysis Message delay
- Assume
- single message
- no contention.
- Delay
- tm tx tn tr
- xmtr overhead network xmit time rcvr
overhead
28Multiple messages
- If messages can interfere with each other,
analysis is more complex. - Model total message delay
- ty td tm
- wait time for network message delay
- Further complications
- Acknowledgment time.
- Transmission errors.
Message wait time and delay are normally random
variables
29Distributed Tasks
P1
P2
M1
M2
M3
d1
d2
P3
30Initial schedule
M1
P1
M2
P2
M3
P3
network
d1
d2
time
0
20
10
5
15
31New design
- Modify P3
- reads one packet of d1, one packet of d2
- computes partial result
- continues to next packet
32New schedule
M1
P1
M2
P2
M3
P3
P3
P3
P3
network
d1
d2
d1
d2
d1
d2
d1
d2
time
0
20
10
5
15
33Priority inversion in networks
- In many networks, a packet cannot be interrupted.
- Result is priority inversion
- low-priority message holds up higher-priority
message. - Doesnt cause deadlock, but can slow down
important communications.
34System performance analysis
- System analysis is difficult in general.
- multiprocessor performance analysis is hard
- communication performance analysis is hard.
- Simple example uncertainty in P1 finish time -gt
uncertainty in P2 start time.
P1
P2
35Lower bounds on system
- Computational requirements
- sum up process requirements over least-common
multiple of periods, average over one period.
- Communication requirements
- Count all transmissions in one period.
36Hardware platform design
- Need to choose
- number and types of PEs
- number and types of networks.
- Evaluate a platform by allocating processes,
scheduling processes and communication.
37I/O-intensive systems
- Start with I/O devices, then consider
computation - inventory required devices
- identify critical deadlines
- chooses devices that can share PEs
- analyze communication times
- choose PEs to go with devices.
38Computation-intensive systems
- Start with shortest-deadline tasks
- Put shortest-deadline tasks on separate PEs.
- Check for interference on critical
communications. - Allocate low-priority tasks to common PEs
wherever possible. - Balance loads wherever possible.
39Parallel/Distributed Systems
- Distributed systems have similarities with
parallel systems - Parallel systems are concerned mostly (only) with
improving performance - Distributed systems are concerned with
performance, fault tolerance, scalability,
dependability etc. - Scheduling problem is similar and important in
both - Distributed performance, real time deadlines
- Parallel performance
40Parallel Task Scheduling (PTS)
- Divide Workload
- Parallel Execution
- Considers only dependencies between tasks due to
data - The Problem What is the best way to do this?
41PTS Example
- Computing n! when n is large
- Idea Having 4 processors, divide workload among
them to lower computation time
P1
P3
P2
P4
communication
42PTS Representation of the Problem
- Precedence graph Job
- Nodes Tasks
- Blue Execution times
- Red Communication
- times
43PTS Communication Latency
- Imposed Constraints
- Communication
- Latency
- Problem becomes NP-Hard
44Parallel Task Scheduling (PTS)
45Minimum Cut in a Graph
- Statement of Problem
- G (V,E)
- Source, s Î V
- Sink, t Î V
- Find cut Í E, such that
- if G (V,E \ cut), no path from
- s to t, and the sum of the edges in
- cut is minimized
- k-cut Í E, such that
- if G (V, E \ k-cut), G has k components and
the sum of the edges in k-cut is minimized
46Parallel Task Scheduling
- Proposed Algorithm
- Theory
- Minimize the amount of communication
- Partition the graph into separate components
- One processor executes one component
- Minimize arcs going between components ? Min
k-Cut!
47Partition
Now each processor has to schedule internally
tasks to comply with dependencies
Critical path
Mapping many tasks to a single processor will
decrease overall performance
Tradeoff between computation, communication and
task granularity