Title: Link Failure Monitoring Using Network Coding
1Link Failure Monitoring Using Network Coding
- Hamed Firooz
- Sumit Roy, Linda Bai
- firooz,sroy,lyb3_at_u.washington.edu
2Outline
- Network Tomography
- Introduction (Network Monitoring)
- Approaches
- Deterministic vs. Stochastic
- Active vs Passive
- Challenges Overhead, Identifiability
- Network Coding
- Applications to network monitoring new method
- Optimization speed/complexity tradeoffs
- OPNET Implementation
3Network Tomography
- Networks set of nodes, links modeled as graph
G(V,E) - Network monitoring
- Involves collection of network performance
statistics (link delay, link loss or failure
status) - Important for QoS guarantees (media streaming,
interactive video applications) - Challenges
- Choice of appropriate measurement technique and
algorithmics
G(V,E)
4Measurement Methods
- Node-oriented These methods are based on
cooperation among network nodes, e.g. ping or
traceroute - Using Ping, round trip delay to every node can be
measured. - Uses Internet control message protocol (ICMP)
packets - Many routers do NOT respond to these packets
- Many service providers do not own the entire
network
l1
R
l2
R
R
5Measurement Methods
- Edge-oriented Access is available to nodes at
the edge only (and not to any in the interior) - Does not require exchanging special control
messages between interior nodes - Inverse problem estimate link level status from
end-2-end (path level) measurements
S
S
Network(?)
S
S
6Measurement Methods
- Active (sending probe packets)
- - Adds overhead to normal data traffic by
- introducing new control packets
- Passive (insitu traffic analysis)
- - No overhead temporal and spatial dependence
might bias measurement - Our method edge-oriented, active network
tomography - Given a network, and a limited number of end
hosts, when can we infer failure status of the
links?
7End-to-End Probing
- Probes are inserted into a data stream, and
end-to-end properties on that route measured. - Probes are exchanged between end nodes using
routing matrix of the graph
End1
link1
router1
link2
link3
Routing matrix A
End2
End3
8End-to-End Probes
- Routing matrix relates link attribute to route
attribute - For some parameters like delay or path loss, this
relation is linear under some assumptions
End1
l1
R
l2
l3
End2
End3
9Deterministic
- Link attributes (e.g. delay) are considered
unknown, constant - Goal estimate constants
- Link attributes are typically time varying
- ? method is suitable for periods of local
stationarity
10Stochastic
- Link attribute specified by a suitable
probability distribution - e.g. link delay follows a Gaussian distribution
- Estimation problem unknown model parameters
- based on path observation in the presence of
additive noise
11Deterministic vs. Stochastic Methods
- Stochastic
- Bayesian - requires a prior distribution
- incorrect choice leads to biases in the estimates
- More computationally intensive
- Deterministic
- Lower complexity but suffers from generic
non-identifiability
12Link Failure Model
l1
l2
l3
R1
R2
End1
End2
Define an indicator function for status of each
link
13Binary Deterministic Model
l1
l2
l3
R1
R2
End1
End2
y Ax A N-by-M binary routing matrix x M-by-1
binary vector, the status of each link y N-by-1
binary vector, the status of each path
(measurements)
14Failure Monitoring
- Network G(V,E) with set of paths P
- x, y are binary vectors
- A path is congested if at least one of its links
is congested
End1
l1
Router
l2
l3
End2
End3
15Identifiability y Ax
- Problem Estimate x from y with
- A (N-by-M) binary routing matrix
- x (M-by-1) binary link failure status
- y (N-by-1) end-to-end measurements
- Identifiability a network is identifiable if y
Ax has a unique solution - Usually, M ( of links in network) gtgt N ( of
measurements), so network is generically NOT
identifiable.
6 links, 3 End-to-End routes ? N6, M3
16Identifiability Binary Model
- Solution limit (maximum) number of failed links
inside the network - Suppose at most k links can fail simultaneously
-
- Defn k-Identifiability
- Network is k-identifiable if
- from end-to-end observation it is possible to
uniquely identify up to k congested links
Only one link can be congested
17Example of 1-identifiability
l1
-- l1 l2 l3 l4 l5 l6
0 1 1 0 0 0 0
0 1 0 1 1 0 1
0 0 0 0 0 1 1
l2
l3
l5
l4
l6
18Example k2 identifiability
l1
Ambiguity
l2
l3
l5
l4
l6
191-Identifiability
- A network with an intermediate degree two node
is not 1-identifiable - If path End1?End2 is congested, it is impossible
to determine which link among l1 and l2 is
congested . - Necessary but not sufficient!
20k1 Identifiability
- 1-identifiability Theorem End-to-End probe based
measurements can detect a unique congested link
in a network if and only if there are no two
identical columns in the network routing matrix
P1
P3
21k- identifiability
- k-identifiability Theorem End-to-End probe based
measurements can detect a unique congested link
in a network only if there are no k1 dependent
columns in the network routing matrix
22Example k2 identifiability
l1
Ambiguity
l2
l3
l5
l4
l6
23Shortest Path Routing Revisited
- Packets are sent on shortest path between two end
nodes - - sub-graphs tree starting from a boundary
(source) node - Node 4 has degree two in all graphs
- But node 4 has degree four in the original
network
24Revisiting Shortest Path Routing
- What if we could change routing matrix ?
- Example in place of shortest path routing,
route packets through longer paths, e.g. n1l2l4n2 - Now network is 1-identifiable !
- Intrinsic limitation for end-to-end measurement
methods based on shortest path routes - probes transmitted along such paths contain only
minimum information
25Solution
- Look to exchange probes between boundary nodes
via other (non-shortest) paths? - Changing the routing tables violates tomography
assumption - Use Network Coding exploit broadcast nature of
network coding, a transmitted probe will traverse
almost every path between two boundary nodes
26Network Coding Short Review
- Present routers just forward incoming packets,
i.e. copy the packets on an input link onto the
output links - Proposed What if each node in a network performs
some computation on received data prior to
forwarding?
27How does NC work? (1)
receiver t2
A
sender s
D
C
B
receiver t1
- Butterfly network All links have the same
capacity 1 b/s - s wants to send data bits a, b to both t1 and
t2 - Bottleneck is C?D
28How does NC work?(2)
receiver t2
a
A
sender s
a
D
XOR
b
B
receiver t1
b
- Node C XORs received messages on each of its links
29How does NC work?(3)
receiver t2
a
A
sender s
a
D
XOR
b
B
receiver t1
b
- t1 and t2 know both a and b
- Now s can send data at rate 2 b/s/receiver
30Linear Network Coding
- Network Coding is a coding at layer three
- The coding is conducted over the finite field Fu,
u2q - each coded symbol can be represented by q-bits
within an IP layer frame - Signal Y(j) on an outgoing link j of node v, is a
linear combination of signals Y(i) on incoming
link i of v - We assume there is no process generated at node v
31Received Symbols
- Pi i-th route from source to destination
- Source sends a over Pi
- ßi depends on topology G hence ßi(G)
a
?1
?2
S
D
?3
?4
?5
32Received Symbols Linear Model
- ek one of source outgoing links
- Pek collection of all paths between source and
destination starts at ek - Source sends ak over ek. By superposition
destination receives
?1
?2
a
a1
e1
S
D
?3
?4
?5
33Received Symbols Linear Model
- Source sends out symbols ak over ek using
superposition once more - In vector format yatß(G)
- ß(G) is total network coding vector
?1
?2
a
a1
e1
S
D
?3
a2
?4
?5
34Received Symbols Linear Model
- Source sends symbols in M succ. time slots
35Link Failure Model
- If a link is severely congested, packets are
significantly delayed and assumed lost at the
destination - We model the network with link l in congestion
state by its edge deleted subgraph denoted by
Gl(V,El)
?1
S
?3
D
?4
?5
36Link Failure Model
- Total network coding vecor of Gl(VEl), ß(Gl) is
different from ß(G) - if the congested link doesnt belong to i-th path
from source to destination, Pi, it will not
affect packets going through those paths - It is zero otherwise
?1
?2
e1
l1
S
D
?3
e2
?4
?5
37Link Failure Model
- Training sequence is A
- yl vector of symbols observed at the
destination in M time slots with link l congested - Potential for identifying received symbols
change uniquely in response to link congestion
38Example
1
-- e1 e2 l1 l2 l3
1st time slot 0 2 2 3 1 1
2nd time slot 2 3 1 0 1 3
1
e1
S
D
2
e2
3
2
39Theorem 1 Sufficient Conditions
- If Rank(A) deg(S), and
- for all Pek set of paths between source and
destination starting at ek - then
(more next slide)
40Theorem 1
- Condition
means - For a set of paths having ek in common, Pek , NC
coefficient of the paths are independent !
Independent
Independent
?1
?2
e1
S
D
?3
e2
?4
?5
41Example
Independent
1
-- e1 e2 l1 l2 l3
1st time slot 0 2 2 3 1 1
2nd time slot 2 3 1 0 1 3
1
e1
S
D
2
e2
3
2
42Complexity/Speed
- First condition of Theorem 1
- In previous example M2deg(S)
- Number of time slots at least the number of
outgoing links of source - Is it possible to decrease number of time slots?
? faster monitoring - Possible by increasing number of bits in LNC
coeff. ? more complexity
43Example
1
1
e1
S
D
2
-- e1 e2 l1 l2 l3
1st time slot 6 4 2 5 7 1
e2
3
2
44Theorem 2 Complexity/Speed tradeoff
- NiPi
- q bits per symbol are used in network coding
- M number of (desired) time slots
- Let Z1,2,,K
- K degree of source
- ZM collection of all partitions of Z with size M
- K3, 2 ? Z1,2,3
- ZM 1,2,3 , 1,3,2 ,
2,3,3
K links
S
45Theorem 2 Complexity/speed tradeoff
- Network is 1-identifiable if
- Rank(A)M
46Theorem 3 Random LNC
- Random linear network coding is a distributed
approach achieving capacity asymptotically - Intermediate node choose their NC coefficients
uniformly from the elements of Fu (u2q) - Exponential increase with q (number of bits) and
M (number of time slots) - Quadratic decrease with size of network
47Multi-source Multi-destination
- So far, considered only Single source Single
destination - Easily extendable to Multi-source
Multi-destination
48Simulation
- Simulation environment
- OPNET 14.5
- MATLAB 7.1 (finite field operations)
- Evaluation
- University of Washingtons Electrical Engineering
network - Thirteen subnets
- 3 backbone routers
- Full Duplex Ethernet links
49Simulation Set-Up
- Implementation of Network Coding (NC) within
OPNET - We employ network coding at transport layer
(instead of IP layer) - Easier to implement
- Routers model is modified to distinguish between
non-NC/NC packets through the use of a flag bit
within the UDP header - NC packets are sent for separate processing
- non-NC packets are processed normally
- We assign a q-bit field called LNC field within
the TCP/UDP header, for linear network coding.
50RECEIVE/SEND interface
- Inherently network coding operates on
unidirectional links - Each interface within a router mode is designated
as a SEND or RECEIVE interface only for the
network coded packets - operating regularly with non-network coded
packets - Finite field operation is done in MATLAB
- Using MATLAB API within OPNET
51RECEIVE/SEND
52Evaluation
53UW EE Network
54UW EE Network-lookup table
55Thank you
56 Network Tomography A Stochastic Model 1
- Passage of probes can be modeled as two
stochastic process Xl(i) and Zl(i) for each
node k - Zl(i) time delay process of link k
- Xl(i) called bookkeeping process cumulative
probe from root to k
1 V. Arya, N. Duffield, D. Veitch Temporal
Delay Tomography, IEEE Infocom 2008
57Network Tomography Stochastic Method
- Discretize delay D0,b,2b,,mb,8
- mb is delay threshold
- Xl(i)Xl(i)Zl(i)