Title: Distributed Control Algorithms for Artificial Intelligence
1Distributed Control Algorithms for Artificial
Intelligence
- by Avi Nissimov,
- DAI seminar _at_ HUJI, 2003
2Control methods
- Goal deliberation on task that should be
executed, and on time when it should be executed. - Control in centralized algorithms
- Loops, branches
- Control in distributed algorithms
- Control messages
- Control for distributed AI
- Search coordination
3Centralized versus Distributed computation models
- Default centralized computation model
- Turing machine.
- Open issues in distributed models
- Synchronization
- Predefined structure of network
- Network graph structure knowledge on processors
- Processor identification
- Processor roles
4Notes about proposed computational model
- Asynchronous
- (and therefore non-deterministic)
- Unstructured (connected) network graph
- No global knowledge neighbors only
- Each processor has unique id
- No server-client roles, but there is a
computation initiator
5Complexity measures
- Communication
- Number of exchanged messages
- Time
- In terms of slowest message (no weights on
network graph edges) ignore local processing - Storage
- Common number of bits/words required
6Control issues
- Graph exploration
- Communication over the graph
- Termination detection
- Detection of state when no node is running and no
message is sent
7Graph exploration Tasks
- Routing of message from node to node
- Broadcasting
- Connectivity determination
- Communication capacity usage
8Echo algorithm
- Goal spanning tree building
- Intuition got a message let it go on
- On reception of message on first time, send it to
all of the neighbors, ignoring the rest - Termination detection after the nodes respond,
send echo message to father
9Echo alg. implementation
- receiveecho from w fatherw
- received1
- for all (v in Neighbors-w) sendecho to v
- while (received lt Neighbors.size) do
- receiveecho received
- send echo to father
10Echo algorithm - properties
- Very useful in practice, since no faster
exploration can happen - Reasonable assumption fast edges tend to stay
fast - Theoretical model allows worst execution, since
every spanning tree can be a result of the
algorithm
11DFS spanning tree algorithmCentralized version
- DFS(u, father)
- if (visitedu) then return
- visitedutrue
- fatherufather
- for all (neigh in neighborsu)
- DFS(neigh, u)
12DFS spanning tree algorithmDistributed version
- On reception of dfs from v
- if (visitedu) then
- send return to v
- statusvreturned return
- visitedtrue statusvfather
- sendToNext()
-
13DFS spanning tree algorithmDistributed version
(Cont.)
- On reception of return from v
- statusvreturned
- sendToNext()
- sendToNext
- if there is w s.t.statuswunused then
- send dfs to w
- else send return to father
14Discussion, complexity analysis
- Sequential in nature
- There is 2 messages on each node therefore
- Communication complexity is 2m
- All the messages are sent in sequence
- Time complexity is 2m as well
- Explicitly un-utilizing parallel execution
15Awerbuch linear time algorithm for DFS tree
- Main idea why to send to node that is visited?
- Each node sends visited message in parallel to
all the neighbors - Neighbors update their knowledge on status of the
node before they are visited in O(1) for each
node (in parallel)
16Awerbuch algorithm complexity analysis
- Let (u,v) be edge, suppose u is visited before v.
Then u sends visit message on (u,v) and v
sends back ok message to u. - If (u,v) is also a tree edge, dfs, return
messages are sent too. - Comm. complexity 2m2(n-1)
- Time complexity 2n2(n-1)4n-2
17Relaxation algorithm - idea
- DFS-tree property if (u,v) is edge in original
graph, then v is in path (root,..,u) or u is in
path of (root,..,v). - Union of lexically minimal simple paths (lmsp)
satisfies this property. - Therefore, all we need is to find lmsp for each
node in graph
18Relaxation algorithm Implementation
- On arrival of path, ltpathgt
- if (currentPathgt(ltpathgt,u)) then
- currentPath(ltpathgt, u)
- send all neighbors path, currentPath
- // (in parallel, of course)
19Relaxation algorithm analysis and conclusions
- Advantages low complexity
- In k steps, all the nodes with length k of lmsp
are set up, therefore time complexity is n - Disadvantages
- Unlimited message length
- Termination detection required (see further)
20Other variations and notes
- Minimal spanning tree
- Requires weighting the nodes, much like Kruskals
MST algorithm - BFS
- Very hard, since there is no synchronization
much like iterative deepening DFS - Linear message solution
- Like centralized sends all the information to
next node unlimited message length.
21Connectivity Certificates
- Idea let G be network graph. Throw from G some
edges, while preserving k paths when available in
G and all the paths if G itself contains less
than k paths (for each u,v) - Applications
- Network capacity utilization
- Transport reliability insurance
22Connectivity certificate Goals
- The main idea of certificates is to use as less
edges as possible, there always is the trivial
certificate whole graph. - Finding minimal certificate is NP-hard problem
- Sparse certificate is one that contains no more
than kn edges
23Sparse connectivity certificate Solution
- Let E(i) be a spanning forest in graph
G\Union(E(j)) for 1ltjlti-1 then Union(E(i)) is
a sparse connectivity certificate - Algorithm idea calculate all the forests
simultaneously if an edge closes a cycle in
tree of i-th forest, then add the edge to forest
(i1)-th (rank of the edge is i1)
24Distributed certificate algorithm
- Search(father)
- if (not visited) then
- for all neighbor v s.t. rankv
0 sendgive_rank to v - receiveranked, ltigt from v
- rankvi
- visitedtrue
25Distributed certificate algorithm (cont.)
- Search(v) (cont.)
- for all w s.t. needs_searchw and
- rankwgtrankfather in
- decreasing order
- needs_searchwfalse
- send search to w receive return
- send return to father
26Distributed certificate algorithm (cont.)
- On receipt of give_rank from v
- rankvmin(i) s.t. igtrankw for all w
- send ranked, ltrankvgt to v
- On receipt of search from father
- Search(father)
27Complexity analysis and discussion
- There is no reference to k in algorithm it
calculates sparse certificates for all ks - There is at most 4 messages on each edge
therefore time and communication complexity is at
most 4mO(m) - Ranking the nodes in parallel, we can achieve
2n2m complexity
28Termination detection definition
- Problem detect a state when all the nodes are
awaiting for messages in passive state - Similar to garbage collection problem determine
the nodes that no longer can accept the messages
(until reallocated reactivated) - Two approaches tracing vs. probe
29Processor states of execution global picture
- Send
- pre-condition stateactive
- action sendmessage
- Receive
- pre-condition message queue is not empty
- action stateactive
- Finish activity
- pre-condition stateactive
- action statepassive
30Tracing
- Similar to reference counting garbage
collection algorithm - On sending a message, increases children counter
- On receiving message finished_work, decreases
children counter - When finishes work, and when children counter
equals zero, sends a finished_work message to
the father
31Analysis and discussion
- Main disadvantage doubles (!!) the communication
complexity - Advantages simplicity, immediate termination
detection (because the message is initiated by
terminator). - Variations may send finished_work message on
chosen messages so called weak reference
32Probe algorithms
- Main idea Once per some time, collect garbage
calculate number of sent minus number of
received messages per processor - If sum of these numbers is 0 then there is no
message running on the network. - In parallel, find out if there is an active
processor.
33Probe algorithms details
- We will introduce new role controller and we
will assume it is in fact connected to each node. - Once in some period (delta), controller sends
request message to all the nodes. - Each processor sends back deficit
ltsent_number-received_numbergt.
34Think it works? Not yet
- Suppose U sends a message to V and becomes
passive then U receives request message and
replies (immediately) deficit1. - Next processor W receives request message it
replies def0 since it got no message yet - Meanwhile V activates W by sending it a message,
receives reply from W and stops receives
request and replies def-1 - But W is still active.
35How to work it out?
- As we saw, a message can pass behind the back
of the controller, since the model is
asynchronous - Yet, if we add some additional boolean variable
on each of processors, such as was active since
last request, we can deal with this problem - But that means, we will detect termination only
in 2delta time after the termination actually
occurs
36Variations, discussion, analysis
- If there is more one edge between the controller
and a node, usage of echo when
initiatorcontroller, sum calculated inline - Not immediate detection, initiated by controller
- Small delta causes to communication bottleneck,
while large delta causes long period before
detection
37CSP and Arc Consistency
- Formal definition find x(i) from D(i) so that if
Cij(v,w) and x(i)v then x(j)w - Problem is NP-complete in general
- Arc-consistency problem is removing all values
that are redundant if for all w from D(j)
Cij(v,w)false then remove v from D(i) - Of course, Arc-consistency is just the primary
step of CSP solution
38Sequential AC4 algorithm
- For all Cij,v in Di,w in Dj
- if Cij(v,w) then
- counti,v,j Suppj,w.insert(,lti,vgt)
- For all Cij,v in Di checkRedundant(i,v,j)
- While not Q.empty
- ltj,wgt Q.deque()
- forall lti,vgt in Suppj,w
- counti,v,j-- checkRedundant(i,v,j)
-
-
39Sequential AC4 algorithm redundancy check
- checkRedundant(i,v,j)
- if (counti,v,j0) then
- Q.enque(lti,vgt) Di.remove(v)
-
40Distributed Arc consistency
- Assume that each variable x(i) is assigned to
separate processor, and that all the mutually
dependent variables assigned to neighbors. - The main idea of algorithm Suppj,w and
counti,v,j lay on processor of x(j) while D(i)
is on processor i if v is to be removed from
D(i), then j processor sends message
41Distributed AC4 Initialization
- Initialization
- For all Cij,v in Di
- For all w in Dj_initial
- if Cij(v,w) countv,j
- if countv,j0 Redundant(v)
- Redundant(v)
- if v in Di
- Di.remove(v) SendQueue.enque(v)
42Distributed AC4 messaging
- On not SendQueue.empty
- vSendQueue.deque
- for all Cji send remove v to j
- On reception of remove w from j
- for all v in Di such that Cij(v,w)
- countv,j--
- if countv,j0 Redundant(v)
43Distributed AC4 complexity
- Assume AmaxDi, mCij.
- Sequential execution both loops pass over all
the Cij,v in Di and w in Dj gt O(mA2) - Distributed execution
- Communication complexity on each edge can be at
most A messages gt O(mA) - Time complexity each node sends in parallel each
of A messages gt O(nA). - Local computation O(mA2) because of
initialization
44Dist. AC4 Final details
- Termination detection is not obvious, and
requires explicit implementation - Usually probe algorithm is preferred because of
big quantity of messages - AC4 ends in three possible states
- Contradiction
- Solution
- Arc Consistent sub-set
45Task assignment for AC4
- Our assumption was that each variable is assigned
to different processor. - Special case is multiprocessor computer, when all
the resources are on hand - In fact, that is NP-hard problem to minimize
communication cost when assignment has to be done
by computer gt heuristic approximation
algorithms.
46From AC4 to CSP
- There are many heuristics, taught mainly in
introduction AI course (such as most restricted
variable and most restricting value) that tells
which variables should be removed after
arc-consistency is reached - On contradiction termination usage in
back-tracing
47Loop cut-set example
- Definition Pit in loop L a vertex in directed
graph, such that both edges of L are incoming. - Goal break loops in directed graph.
- Formulation Let GltV,Egt be graph find C subset
of V such that any loop in G contains at least
one non-pit vertex. - Applications Belief networks algorithms
48Sequential solution
- It can be shown that finding minimal cut-set is
NP-hard problem, therefore approximations are
used instead - Best known approximation by Suermondt and
Cooper is shown on next slide - Main idea on each step drop all leaves and then
find a vertex so that is common to the maximal
number of cycles that has 1 incoming edge
49Suermondt and Cooper algorithm
- Cempty
- While not V.empty do
- remove all v such that deg(v)lt1
- Kv in V indeg(v)lt1
- vargmaxdeg(v) v in K
- C.insert(v)
- V.remove(v)
50Edge case
- There is still one subtlety (that isnt described
in Tels article) what to do if K is empty
while V is not (for example, if G is Euler path
on octahedron)
51Distributed version ideas
- Four parts of algorithm
- Variables and leaf trim removal of all leaf
nodes - Control tree construction tree for search of
next cut node - Cut node search search of the best node to add
to cut - Controller shift optimization, see later
52Data structures for distributed version
- Each node contains
- its activity status (nas yes, cut, non-cut)
- activity status of all adjacent edges-links (las
yes, no) - control status of all adjacent links (lcsbasic,
son, father, frond)
53Leaf trim part
- Idea remove all leafs from the graph (put them
to non-cut state). - If the algorithm discovers that a node has 1
active edge left, it sends its unique neighbor
remove message - Tracing-like termination detection
54Leaf trim implementation
- Var lasxyes nasyes
- procedure TrimTest
- if xlasx1 then
- nasnoncut lasxno
- send remove to x and receive return or
remove back - On reception of remove from x
- lasxno TrimTest send return to x
55Control tree search
- For this goal echo algorithm is used (with
appropriate variation now each father should
know list of its children - This task is completely independent from the
previous (leaf trim), therefore they can be
executed in parallel - During this task, lcs variable is set up
56Control tree construction implementation
- Procedure constructSubtree
- For all x s.t. lcsxbasic do
- send construct, father to x
- while exists x s.t. lcsxbasic
- receive construct, ltigt from y
- lcsy (ltigtfather)? frond son
- First construct,father message from x
- lcsxfather constructSubtree TrimTest
- send construct,son to x
57Cut node search
- Idea pass over the control tree and combine
- For this reason we will need to collect
(un-broadcast) on control tree the maximal degree
of a node in the sub-tree - Note, that only nodes with indeglt1 (for this
reason, Income represents incoming
edges-neighbors), that still are active.
58Cut node search implementation
- Procedure NodeSearch
- my_degree ( x in Income lasx lt2)?
- x lasx 0
- best_degreemy_degree
- for all x lstxson send search to x
- do x lstxson times
- receive best_is, d from x
- if (best_degreeltd) then
- best_degreed best_branchx
- send best_is, best_degree to father
59Controller shift
- This task has no parallel code in sequential
algorithm and is only optimization issue - Idea because the newly selected cut-code is the
center of trim activity, the root of control tree
should pass there. - In fact, this part doesnt involve search, since
we already on this stage the path to best degree
on best branches
60Controller shift root change
- On change_root message from x
- lcsxson
- if (best_branchu) then
- TrimFromNeighbors
- InitSearchCutnode
- else
- lcsbest_branchfather
- sendchange_root to best_branch
61Trim from neighbors
- TrimFromNeighbors
- for all xlasx send remove to x
- do xlasx times
- receive return or remove from x
- lasxno
62Complexity
- New measures sC.size dtree diameter
- Communication complexity
- 2m for remove/return2m for construct
2(n-1)(s1) for search/best_issd for
change_root gt 4m2sn - Time complexity without trim
- 2d2(s1)dsd(3s4 )d
- Trim time complexity
- Worst case 2(n-s)
63Controller shift