Title: Array Maximum Problem
1Concurrent Programming
???? ?? ??? ???? ???? 027382977 ????? ?????
32033946
2Module
- The module we are talking about is
- computer with multiple processors but only one
memory unit. - All the processors are synchronized using the
same clock. - The processors are all connected to each other
and to the memory. - If more then one processor writes the same value
to the same address in memory at the same time
then the value will be written correctly. If the
values are not the same then any value can be
written.
3Module
- More then one processor can read the same memory
address at the same time. - Other modules
- The processors are on different computers.
- There is no sheared memory for all the
processors. - The processors are not using the same clock.
4Array Maximum Problem
- On a computer with one processor
- Time O(N).
- Algorithm Going over an array and keeping the
maximum. - On a computer with K processors
- Time O(N/K).
- Algorithm Each processor handles N/K elements
from the array. And all the sum's of the parts of
the array are summed together.
5Array Maximum Problem
- On a computer with O(N) processors.
- Time O(log(N)).
- Algorithm On the first stage every processor
will add 2 items. So after the first round will
have N/2 numbers. On the next round N/4
processors each will take 2 numbers and sum them
so we will have on ly N/4 result after the 2
round. After log(N) rounds we will have the sum
of the array.
6Array Maximum Problem
1 2 3 4 5 6
7 8
Example 8 elements time 3 Log(8).
7Array Maximum Problem
- The number of commutations that are performed is
7 (4 in the first round, 2 in the second and 1 in
the last). This is the same number of computation
that is being done in the serial algorithm but
its being done in less time. - This Algorithm will work for a lot of other
functions not just Max like Min, Sum, Avg. - It will work for every Associative function.
8Finding The Two Greatest Numbers
- Simple solution for O(N) processors.
- Algorithm Find the first maximum remove it from
the array and find the second. - Time 2 Log(N).
- Smart algorithm for O(N) processors.
- Algorithm
- First round each processor handles 2 items find
the max and puts the other item in a. - Rounds 2..log(n) each processors handles 2 of
the result of the second round compares the 2 Max
values takes the Max as the new Max. and Takes
the candidate group of the new max adds the max
of the second group to it as the new candidate
group.
9Finding The Two Greatest Numbers
- On The last round the Max of the array is the
maximum and the second max is the maximum of the
candidate group. - Sample
- Array 7, 10, 1, 3, 100, 8, 55, 6.
10Finding The Two Greatest Numbers
100
8
55
10
10
100
7
8
3
55
10 3
100 55
7 1
8 6
7 10 1 3 100
8 55 6
Results The maximum is the maximum of the array
(100) and the second maximum is the maximum of
the candidate group (55).
11Finding The Two Greatest Numbers
- Time
- Log(N) LogLog(N).
- Log(N) to find the first maximum and the
candidate group. - LogLog(N) to find the maximum in the candidate
group. - The candidate group size grows in 1 in each round
(the maximum of the other group) so at the end
its size is Log(N).
12Merge problem
- Description We have 2 sorted N size arrays B, C
and we need to divide them into 2 new N sized
arrays A1, A2 that the N largest items from both
B and C will be in A1 and the N smallest will be
in A2. - Simple solution We can merge B and C into one
sorted array A and copy the firs N elements to A1
and the last N elements to A2. But with this
algorithm we cant use multiple processors the
cost will still be O(N).
13Merge problem
- Smart algorithm for O(N) processors.
- Processor I compares Bi with Cn1-i the largest
of the two is going to A1 and the other to A2. - Correction proof.
- If Bi gt Cn1-i the Bi gt B1..Bi-1 and Cn1-i gt
C1..Cn-iso Bi is larger then N elements (I - 1
from B and N - i 1 from C) so Bi needs to be in
A1. - If Cn1-i gt Bi then Cn1-i is larger then N
elements ( N - I from C and I from B ) so Cn1-i
needs to be in A1.
14Merge problem
- Example B 1, 8, 10, 17C 9, 12, 67, 100(B1,
Cn), (B2, Cn-1), (B3, Cn-2), (B4, Cn-3).A1
100, 67, 12, 17.A2 1, 8, 10, 9. - Time We can do all the comparisons at the same
time so the cost will be O(1).
15Prefix Problem
- Description Find the sum of the elements
group.S11 X1S12 X1 X2S1n X1 X2
Xn-1Xn - Simple solution Compute the sums with N
processors time O(NLogN) N sums where each one
takes O(LogN).
16Prefix Problem
- Algorithm
- for I 0 to n-1 doip
- Si Xi
- for j 0 to log n do
- for I 2j to n-1 doip
- Si Si Si-2j
- The doip means do in parallel in the different
processor. - At the end the results are in the array s.
17Prefix Problem
- Example With 8 numbers X1..X8 Sij is Xi
Xi1 Xj.
X1 X2 X3 X4 X5
X6 X7 X8
S11 S12 S23 S34 S45
S56 S67 S78
S11 S12 S13 S14 S25
S36 S47 S58
S11 S12 S13 S14 S15
S16 S17 S18
18Prefix Problem
- Timeeach round we get double the result S1i so
after log(n) rounds we will get all the result. - In order to use this algorithm each processor
needs to be connected to log(n) other processors.
19Prefix Problem
- Usage exampleProblem we have an arithmetic
expression and we need to test if the brackets
arrangement is legal. Algorithm we will create
an array x by adding 1 for each ( and -1 for
each ). And run the prefix algorithm. The
results needs to be.S11 1 and S11..S1n-1gt0
and S1n 0.Time with N processors O(logN)
log(N) for the prefix algorithm and O(1) for the
test.
20Partition Problem
- Description We have and array X that some of
its element are signed we need to move all the
signed elements to one array and the none signed
to another array. - Simple solution We take 2 stacks we push the
signed into one stack and the none signed into
the other stack. It will take o(N) time. - Simple solution 2 We take two indexes one for
the start of the array and one to the end. The
first search for signed and the second for none
signed and when they both find they exchange the
items they point to and move on until they meet.
This will take o(N) time too but its more
parallel.
21Partition Problem
- Smart algorithm for O(N) processors
- Create a new array B but in be if the element i
is signed Bi 1 else Bi 0. - Create an array C with the prefix sums of B that
is Ci B1 B2 Bi. - If Xi is signed then Y1Ci Xi.
- If Xi is not signed then Y2i-Ci Xi.
22Partition Problem
- Example X 2, 4, 7, 8, 1, 3, 10, 12, 15.
X 2, 4, 7, 8, 1, 3, 10, 12, 15
B 0, 1, 0, 0, 0, 1, 1, 0, 1
C 0, 1, 1, 1, 1, 2, 3, 3, 4
Y1 4, 3, 10, 15
Y2 2, 7, 8, 1, 12
23Partition Problem
- Time with O(N) processor.Computing B
O(1).Computing C O(log(n)) using the prefix
algorithm.Computing Y1 and Y2 O(1).Total
O(log(n)).
24Sorting Algorithm
- Description Sorting array A using O(N2)
processors and put the result into array C. - Simple algorithm The serial algorithm for
sorting an array takes a minimum of O(Nlog(N))
time. - Smart algorithm
- Create a matrix B size of NN and initialize it
with zeroes at all cells. - We will look at the N2 processor as a matrix of
processors. Processor Pi,j will compute AigtAj if
true then Bi,j 1.
25Sorting Algorithm
- For each i from 1 to N CSum(i) Ai. When
Sum(i) is the sum of Bi,1 to Bi,N. - Example A3, 5, 2, 9, 1Matrix B 1
2 3 4 5 1 1 0 1 0 1
2 1 1 1 0 1 3 0 0
1 0 1 4 1 1 1 1 0 5
0 0 0 0 1
26Sorting Algorithm
- C 1, 2, 3, 5, 9.
- Time
- Using O(N2) processors finding B matrix will
take O(1) and finding C will cost O(log(N)). - So the total cost of the algorithm will be
O(log(N)). - Using O(N) processors finding B will take O(N)
time and finding C will take O(N) time so the
total will be O(N).
27Sorting Algorithm
- Description Sorting array A using O(N2)
processors and put the result into array C. - Algorithm Merge sort the largest cost in the
merge sort algorithm is the cost of the merge.
Using a serial algorithm the cost of merging 2
sorted arrays is O(N) and the cost of the merge
sort algorithm is O(Nlog(N)). We will use the
regular algorithm but with a smarter merge
algorithm.
28Sorting Algorithm
- Smart merge algorithm
- Description We need to merge two sorted arrays
A, B to a sorted array R. - Algorithm We will describe a recursive algorithm
Merge.Cmerge(even(A), odd(B)).Dmerge(odd(A),
even(B)).Where odd(A) is all the items in A with
an Odd index. And Even(A) is all the items in A
with an even index.
29Sorting Algorithm
- When C C0, C1, C2.Cn D D0, D1,
D2.DnEC0, D0, C1, D1Cn, Dn.Compare each
Ci,Di and if CigtDi then replace Ci and Di in
array E.And array E is the merger of C and D.
30Sorting Algorithm
- Example A 3, 5, 8, 10 B 4, 7, 9,
12Even(A) 5 ,10 Odd(A) 3, 8Even(B) 7,
12 Odd(B) 4, 9C 3, 7, 8, 12D 4, 5, 9,
10E 3, 4, 7, 5, 8, 9, 12, 10After replacing
in EE 3, 4, 5, 7, 8, 9, 10, 12 - Time Using O(N) processors the merge will take
O(log(N)) time The merge sort runs the merge
algorithm log(N) times so the total cost of the
merge sort is O(log2(N)).
31Find Algorithm
- Description If array X contains the value Val
the Res needs to be True else Res needs to be
False. - Simple Algorithm Using a serial algorithm it
will take O(N) time. - Smart Algorithm Using O(N) processor.
Res False. Each process i tests if XI Val
if true Res True. - Time O(1).
32Model Description
- Many processors.
- Processors can send messages to each other
through communication. - We will want that each processor will have a
unique identification. - Since we have O(n) processors we need O(logn) bit
to represent the Id.
33Model Description
- Clean Net when a processor doesnt now anything
about his neighbors, not even their Ids. he only
knows how many neighbors he have. - We will explicitly mention when dealing with
Clean Net, otherwise every processor has a unique
Id.
34Model Description
- Message should include sender and receiver Id and
some information - total O(logn) bits. - If X wants to send message to Y through Z, it
will cost 2 steps to send the message.
X
Z
Y
35Model Description
- Local computation doesnt take time.
- we will analyzetime complexity - the number of
steps the algorithm takes in the worst
case.communication complexity - the total number
of messages that we sent in the execution of the
algorithm in the worst case.
36Distributed vs. Sequential
- Communication - we need in the distributed model
but not in the sequential. - Partial knowledge - together all the processor
knows everything, but not all the processors
necessarily knows everything. - There can be processors or communication channels
down.
37Distributed vs. Sequential
- Synchronization - we need to synchronize the
processor.
38Synchronic Model
- there is a global clock.
- In any clock cycle each of the processor- send
messages to his neighbors.- receive messages
from his neighbors.- make local computation in 0
time.- change state.
39Asynchronies Model
- There is no global clock.
- if a message was sent it will eventually arrive
to its destination (with no fall downs) but we
can't assume anything about the arrival time. - we will start the time from the beginning of the
execution until the last processor stooped.
40Asynchronies Model
- We will force the assumption that any of the
messages arrived in one time unit in the worst
case for time complexity calculations.
41Model Representation
- We can represent the processors net with a graph.
- Each node in the graph is a processor.
- There is an edge between two nodes if there is a
direct communication channel between the two
processors they represent.
42Complexity
- C(?, G, I) - communication complexitythe total
number of messages that were sent in the
execution in the worst case. - T(?, G, I) - time complexitythe number of clock
cycles that the execution take in the worst case. - Where ? is the protocol, G is the graph and I is
the input.
43Complexity - examples
- The following examples are in a full graph.
2
1
n
44Complexity - example 1
- Protocol A node 1 send the message m to node 2.
- C(A, G, I) 1.
- T(A, G, I) 1.
1
2
m
45Complexity - example 2
- Protocol B node 1 send the message mi to the
node i. - C(B, G, I) n.
- T(B, G, I) 1.
1
i
mi
? i?G
46Complexity - example 3
- Protocol C node i send the message mi to node
i1. - C(C, G, I) n.
- T(C, G, I) 1.
i
i1
mi
? i?G
47Complexity - example 4
- Protocol D node i send the message m to node i1
in cycle i. - C(D, G, I) n.
- T(D, G, I) n.
m
1
2
m
2
3
. . .
48Transmission Problem
- Input there is a message m in the node V0.
- Output the message m is written in all the nodes
in the graph. - dG(x,y) - the shortest path from x to y in graph
G. - D Diameter(G) max x,y?V dG(x,y) .
49Algorithms for the Transmission Problem
- Direct Delivery.
- Spanning Tree.
- DFS.
- Flooding.
50Direct Delivery
- Bases on the assumptions- there is a routing
system, such as that messages are sent in the
shortest path.- V0 knows the addresses of all
other nodes in the graph. - V0 send the message m n-1 times, each time to a
different node.
51DD Communication Complexity
- V0 sends n messages.
- It takes O(D) steps for each massages.
- C(DD, G, I) O(nD).
52DD Time Complexity
- Under the assumptions1. synchronic model.2. V0
sends one new message in any clock cycle. - There wont be collisions between messages,
because messages goes in the shortest path, and
therefore we cant have more then one message for
a given distance from V0.
53DD Time Complexity
- The last messages will be sent in the n-1 cycle.
- It will take O(D) steps for the last message to
arrive. - T(DD, G, I) O( nD ).
54DD Time Complexity
- We can show the same time complexity even without
assumption 2. - If we will have two messages in a node competing
for the same edge. We will send the message that
should arrive to the node with the smaller Id. - the message for node i, in time t, must be in a
distance t-i1 from V0 (or in Vi).
55Spanning Tree
- AssumptionsWe have a spanning tree in the
graph, that all the node aware off (each node
knows which of his edges is part of the spanning
tree). - Each node that receive the message send it on the
spanning tree edges.
56Spanning Tree Complexity
- We send the message once for each spanning tree
edge. - C(ST) n-1.
- We need tree depth rounds until the last node
receive the message. - T(ST) O( Depth( tree, V0 ) ).
- If we choose a BFS tree T(ST) O(D).
57Building a Spanning Tree
- If we dont have a spanning tree, we can built
one using any algorithm A for Transmission. - Execute algorithm A.
- each node V choose as a parent the node W from
which it received the message for the first time.
58Building a Spanning Tree
- V inform W that he is his parent.
- The edge E(W,V) is marked as a spanning tree
edge. - Since transmission algorithm deliver the message
to all nodes, we know that all the nodes are in
the spanning tree. - We have no cycles since V choose only one parent.
59DFS
- We traverse the graph in DFS order.
- If we reached a new node we leave a copy of the
message, mark the node and continue the
traversal. - If we reached a marked node we go back.
60DFS Complexity
- In the DFS algorithm we move on each edge exactly
twice. - C(DFS) T(DFS) O(E).
61Flooding
- Each node that receive the message for the first
time, sent it to all of his neighbors. - When a node receive a message in the next times,
it just dump the message. - Flooding is affective also in a Clean Net.
62Flooding Complexity
- In each edge the message will pass twice, once in
each direction. - C(Flood) O(?E?).
- After t time unit the message will reach all the
nodes that their distance from V0 is smaller or
equal to t. - T(Flood) O(D).