Review Last Week - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Review Last Week

Description:

3 loyal generals and 1 traitor ('the fault' ... It is clear to all that General 3 is the traitor. ... algorithm except now with 2 loyal generals and 1 traitor. ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 48

Provided by: csAu1

Category:

more less

Transcript and Presenter's Notes

Title: Review Last Week

1
Review Last Week

Coordination
Mutual exclusion
Election algorithms
Multicasting

2
Today

Consensus/Agreement with faulty processes
Impossibility of consensus in asynchronous
systems
Networks, task graphs and scheduling

3
Agreement

Agreement of processes on a value after one or
more of the processes has proposed what value
should be
We have seen some related examples
In mutual exclusion processes agree on which
should enter the CR
In elections, processes agree on who should be
the coordinator
Some of these algorithms make strong assumptions
on channel reliability and faulty processes

4
Consensus/Agreement

Requirements of a consensus algorithm
Termination Eventually every process sets its
decision variable
Agreement The decision value of all correct
processes is same
Validity If a process decides on a value, then
there was a process that started with that value

5
Agreement in faulty systems

We will study other forms of agreement in faulty
systems under the assumption that communication
channels are reliable and the system is
synchronous

6
Agreement in Faulty Systems (1)

How does a process group deal with a faulty
member?
The Byzantine Generals Problem Three generals
must agree to attack or retreat.
3 loyal generals and 1 traitor (the fault).
The generals announce their troop strengths (in
units of 1 kilosoldiers) to the other members of
the group by sending a message.

7
Agreement in Faulty Systems (2)

The vectors (b) that each general assembles based
on (a) in first step. Each general knows their
own strength. They then send their vectors to
all the other generals.
In (c) the vectors that each general receives in
step 3.
It is clear to all that General 3 is the traitor.
In each column, the majority value is assumed
to be correct.

8
Agreement in Faulty Systems (3)
Unfortunately the algorithm does not always work!

The same algorithm except now with 2 loyal
generals and 1 traitor.
It is no longer possible to determine the
majority value in each column. The algorithm has
failed to produce agreement.

9
Agreement in Faulty Systems (3)

Lamport showed that no solution exists if N3f
(with N total number of processes and f number of
failed processes)
They gave an algorithm that solves byzantine
generals problem in a synchronous system if N3f1

10
Agreement with unreliable communication

The two army problem Two generals from the
yellow army with 3000 soldiers each, agreeing on
attacking the blue army with 4,000 soldiers.

11
Two army problem

Suppose general A sends the message to B
Attack at 12
General A wont attack alone since it will be
defeated.
A doesnt know whether B has received the
message.
B knows that As may not be sure of him receiving
the message, so B sends an agreement message.

12
Two army problem
Now A sends ack to make sure that B knows he got
the confirmation. B gets ack but he wants to make
sure A knows he got ack A gets ack but he wants
to make sure B ..
By sending/receiving acks both generals get more
knowledge but never common knowledge.
13
Reliable and bounded time communication

If A knows that B will receive any message that A
sends within one minute of As sending it, then
if A sends
Attack at 12
A knows that within two minutes A and B will have
common knowledge
A says attack at 12

14
Conclusion

Common knowledge is unattainable in systems with
unreliable communication (or with unbounded
delay)
Common knowledge is attainable in systems with
reliable communication in bounded time

15
TDM and Common Knowledge

In a synchronous system with global clock, common
knowledge can be gained by passage of time (no
message passing)
For mutual exclusion - using time division
multiplexing (TDM) - processes enter CR on their
pre-assigned slots

A
B
C
A
16
Consensus/Agreement Problems

Consensus, synchronous settings, unreliable
communication impossible.
Consensus, asynchronous settings, unreliable
communication impossible
(Problem 1 is a special case of Problem 2).

17
FLP Impossibility Result

Fischer, Lynch and Paterson 1985
There is no deterministic algorithm solving the
consensus problem in an asynchronous distributed
system with a single crash failure

18
FLP Impossibility Result

A crashed process cannot be distinguished from a
slow one.
Not even with a 100 reliable comm. network
There is always a chance that some continuation
of the processes execution avoid consensus being
reached.
No guarantee for consensus, but
Prob(consensus) gt 0
Solutions to this problem are based on fault
masking or failure detectors

19
Failure Masking

A service masks a failure either by hiding it or
converting it into a more acceptable form
Checksums are used to mask corrupted messages
(converting arbitrary into omission failure).
Omission failures can be handled by
retransmitting
Replication can be used to mask processes
failures replacing the process and restoring
its information from memory/disk

20
Failure Detection

Failure Detection
Processes can agree in believing that a process
that has not responded for more than some bounded
time to have failed
Even if the process eventually responds, its
answer will be discarded, turning the
asynchronous system into a synchronous
Time outs can be adapted according to observed
response times

21
FLP - Main results

Proves the impossibility of fault-tolerant
consensus
Every asynchronous fault-tolerant consensus
algorithm has an infinite run in which no process
decides
It is possible to design asynchronous consensus
algorithms that dont always terminate

22
The Failure Detectors abstraction (Chandra/Toueg
96)

Showed that FLP applies to many problems, not
just consensus
In particular, they show that FLP applies to
group membership, reliable multicast
So these practical problems are impossible in
asynchronous systems, in formal sense
Chandra/Toueg also look at the weakest condition
under which consensus can be solved for
asynchronous systems with
reliable communication
less than N/2 processes crash

23
Distributed embedded systems
Transport layer provides message-based
programming interface send_msg(adrs,data1) Data
must be broken into packets at source,
reassembled at destination. Data-push
programming PE send data to the network when
ready.
PE
sensor
PE
PE
actuator
PEs may be CPUs or ASICs.
24
Bus arbitration

Fixed Same order of resolution every time.
Fair every PE has same access over long periods.
round-robin rotate top priority among PEs.

fixed
A
B
C
A
B
C
round-robin
A
B
C
A
B
C
A,B,C
A,B,C
25
Arbitration and delay

Fixed-priority arbitration introduces unbounded
delay for all but highest-priority device.
Unless higher-priority devices are known to have
limited rates that allow lower devices to
transmit.
Round-robin arbitration introduces bounded delay
proportional to N.

26
Multiprocessor networks

Multiple DSPs are often connected by high-speed
networks for signal processing

DSP
DSP
Sharc DSP processors (21060) can be connected in
this way to improve processing performance
DSP
DSP
27
Communication Analysis Message delay

Assume
single message
no contention.
Delay
tm tx tn tr
xmtr overhead network xmit time rcvr
overhead

28
Multiple messages

If messages can interfere with each other,
analysis is more complex.
Model total message delay
ty td tm
wait time for network message delay
Further complications
Acknowledgment time.
Transmission errors.

Message wait time and delay are normally random
variables
29
Distributed Tasks

Task graph

Network

P1
P2
M1
M2
M3
d1
d2
P3
30
Initial schedule
M1
P1
M2
P2
M3
P3
network
d1
d2
time
0
20
10
5
15
31
New design

Modify P3
reads one packet of d1, one packet of d2
computes partial result
continues to next packet

32
New schedule
M1
P1
M2
P2
M3
P3
P3
P3
P3
network
d1
d2
d1
d2
d1
d2
d1
d2
time
0
20
10
5
15
33
Priority inversion in networks

In many networks, a packet cannot be interrupted.
Result is priority inversion
low-priority message holds up higher-priority
message.
Doesnt cause deadlock, but can slow down
important communications.

34
System performance analysis

System analysis is difficult in general.
multiprocessor performance analysis is hard
communication performance analysis is hard.
Simple example uncertainty in P1 finish time -gt
uncertainty in P2 start time.

P1
P2
35
Lower bounds on system

Computational requirements
sum up process requirements over least-common
multiple of periods, average over one period.

Communication requirements
Count all transmissions in one period.

36
Hardware platform design

Need to choose
number and types of PEs
number and types of networks.
Evaluate a platform by allocating processes,
scheduling processes and communication.

37
I/O-intensive systems

Start with I/O devices, then consider
computation
inventory required devices
identify critical deadlines
chooses devices that can share PEs
analyze communication times
choose PEs to go with devices.

38
Computation-intensive systems

Start with shortest-deadline tasks
Put shortest-deadline tasks on separate PEs.
Check for interference on critical
communications.
Allocate low-priority tasks to common PEs
wherever possible.
Balance loads wherever possible.

39
Parallel/Distributed Systems

Distributed systems have similarities with
parallel systems
Parallel systems are concerned mostly (only) with
improving performance
Distributed systems are concerned with
performance, fault tolerance, scalability,
dependability etc.
Scheduling problem is similar and important in
both
Distributed performance, real time deadlines
Parallel performance

40
Parallel Task Scheduling (PTS)

Divide Workload
Parallel Execution
Considers only dependencies between tasks due to
data
The Problem What is the best way to do this?

41
PTS Example

Computing n! when n is large
Idea Having 4 processors, divide workload among
them to lower computation time

P1
P3
P2
P4
communication
42
PTS Representation of the Problem

Precedence graph Job
Nodes Tasks
Blue Execution times
Red Communication
times

43
PTS Communication Latency

Imposed Constraints
Communication
Latency
Problem becomes NP-Hard

44
Parallel Task Scheduling (PTS)
45
Minimum Cut in a Graph

Statement of Problem
G (V,E)
Source, s Î V
Sink, t Î V
Find cut Í E, such that
if G (V,E \ cut), no path from
s to t, and the sum of the edges in
cut is minimized
k-cut Í E, such that
if G (V, E \ k-cut), G has k components and
the sum of the edges in k-cut is minimized

46
Parallel Task Scheduling

Proposed Algorithm
Theory
Minimize the amount of communication
Partition the graph into separate components
One processor executes one component
Minimize arcs going between components ? Min
k-Cut!

47
Partition
Now each processor has to schedule internally
tasks to comply with dependencies
Critical path
Mapping many tasks to a single processor will
decrease overall performance
Tradeoff between computation, communication and
task granularity

Write a Comment

User Comments (0)