Title: Lecture Note on Survivability
1Lecture Note on Survivability
2Impact of Outages
Service Outage Impact
FCC Reportable
Social/ Business Impacts
Packet (X.25) Disconnect
Call- Dropping Private Line Disconnect
6th Range
5th Range
Trigger Change- over of CCS Links
May Drop Voiceband Calls
4th Range
3rd Range
2nd Range
"Hit"
1st Range
APS
5 min
50 msec
0
200 msec
2 sec
10 sec
30 min
3Market Drivers for Survivability
- Customer Relations
- Competitive Advantage
- Revenue
- Negative - Tariff Rebates
- Positive - Premium Services
- Business Customers
- Medical Institutions
- Government Agencies
- Impact on Operations
- Minimize Liability
4Network Survivability
- Availability 99.999 (5 nines) gt less than 5
min downtime per year - Since a network is made up of several components,
the only way to reach 5-nines is to add
survivability - Survivability continued services in the
presence of failures - Protection switching or restoration mechanisms
used to ensure survivability - Add redundant capacity, detect faults and
automatically re-route traffic around the failure - Restoration related term, but slower time-scale
- Protection fast time-scale 10s-100s of ms
- implemented in a distributed manner to ensure
fast restoration
5Failure Types
- Types of failure
- Components links, nodes, channels in WDM, active
components, software - Human error backhoe fiber cut
- Systems Entire COs can fail due to catastrophic
events - Single failure vs multiple concurrent failures
- Goal mean repair time ltlt mean time between
failures - Protection depends upon applications
- SONET/SDH 60 ms (legacy drop calls threshold)
- Survivability provided at several layers
6Network Survivability Architectures
Linear Protection Architectures
Ring Protection Architectures
Mesh Restoration Architectures
7Network Availability Survivability
Availability is the probability that a system is
able to perform its designed functions when
called upon to do so.
Availability
Reliability Reliability Recovery
8Quantification of Availability
Percent Availability N-Nines Downtime Time Minutes/Year
99 2-Nines 5,000 Min/Yr
99.9 3-Nines 500 Min/Yr
99.99 4-Nines 50 Min/Yr
99.999 5-Nines 5 Min/Yr
99.9999 6-Nines .5 Min/Yr
9PSTN
- Individual elements have an availability of
99.99 - One cut off call in 8000 calls (3 min for average
call). Five ineffective calls in every 10,000
calls.
NI
NI
0.005
0.005
AN 0.01
AN 0.01
LE
LE
Facility Entrance
Facility Entrance
NI Network Interface LE Local Exchange LD
Long Distance AN Access Network
LD
0.005
0.005
0.02
10Service Requirements Vs Network Availability
11IP Network Expectations
Service Delay Jitter Loss Availability
Real Time Interactive (VOIP, Cell Relay ..) L L L H
Layer 2 Layer 3 VPNs (FR/Ethernet/AAL5) M L L H
Internet Service H H M L
Video Services L M M H
L Low M Medium H High
12Measuring Availability Port Method
- Based on Port Count in Network
- Does not take into account the bandwidth of ports
(e.g. OC-192 and 64k are both ports) - Good for dedicated access service because ports
are tied to customers.
(Total of Ports X Sample Period) - (number of
impacted port x outage duration)
x 100
(Total number of Ports x sample period)
13Port Method Example
- 10,000 active access ports Network
- Access router with 100 access ports fails for 30
minutes. - Total Available Port-Hours 10,00024 240,000
- Total Down Port-Hours 100.5 50
- Availability for a Single Day
(240000-50/240,000)100 99.979166
14Bandwidth Method
- Based on Amount of Bandwidth available in
Network - Takes into account the bandwidth of ports
- Good for core routers
(Total amount of BW X Sample Period) - (Amount of
BW impacted x outage duration)
x 100
(Total amount of BW in network x sample period)
15Bandwidth Method Example
- Total capacity of network 100 Gigabits/sec
- Access Router with 1 Gigabits/sec BW fails for 30
minutes. - Total BW available in network for a day 10024
2400 Gigabits/sec - Total BW lost in outage 1.5 0.5
- Availability for a Single Day
((2400-0.5)/2,400)100 99.979166
16Defects Per Million Method
- Used in PSTN networks, defined as number of
blocked calls per one million calls averaged over
one year.
17Defects Per Million Example
- 10,000 active access ports Network
- Access Router with 100 access ports fails for 30
minutes. - Total Available Port-Hours 10,00024 240,000
- Total Down Port-Hours 100.5 50
- Daily DPM (50/240,000)1,000,000 208
18Working and Protect Fibers
19Protection Topologies - Linear
- Two nodes connected to each other with two or
more sets of links
Working
Protect
Working
Protect
(11)
(1n)
20Protection Topologies - Ring
- Two or more nodes connected to each other with a
ring of links - Line vs. Drop interfaces
- East vs. West interfaces
E
W
D
L
W
E
L
Working
Protect
W
E
E
W
21Protection Topologies - Mesh
- Three or more nodes connected to each other
- Can be sparse or complete meshes
- Spans may be individually protected with linear
protection - Overall edge-to-edge connectivity is protected
through multiple paths
Working
Protect
22Ring Topologies
ADM
ADM
2 Fiber Ring
4 Fiber Ring
DCC
ADM
DCC
ADM
Each Line Is Full Duplex
Each Line Is Full Duplex
ADM
ADM
ADM
ADM
DCC
ADM
DCC
ADM
ADM
ADM
Uni- vs. Bi- Directional
All Traffic Runs Clockwise, vs Either Way
23Automatic Protection Switching (APS)
ADM
ADM
ADM
ADM
ADM
ADM
Line Protection Switching
Path Protection Switching
Uses TOH Trunk Application Backup Capacity Is
Idle Supports 1n, where n1-14
Uses POH Access Line Applications Duplicate
Traffic Sent On Protect 11
- Automatic Protection Switching
- Line Or Path Based
- Restoration Times 50 ms
- K1, K2 Bytes Signal Change
24Protection Switching Terminology
- 11 architectures - permanent bridge at the
source - select at sink - mn architectures - m entities provide protection
for n working entities where m is less than or
equal to n - allows unprotected extra traffic
- most common - SONET linear 11 and 1n
- Coordination Protocol - provides coordination
between controllers in source and sink - Required for all mn architectures
- Not required for 11 architectures unless they
employ bi-directional protection switching
2511 vs 1n
Working
Protect
Working
Protect
(11)
(1n)
26Linear 11 APS
TX Transmitter RX Receiver
BR Bridge SW Switch
Working
BR
SW
TX
RX
Protection
RX
TX
Working
SW
RX
BR
TX
RX
TX
Protection
27Protection Switching
- Dedicated vs Shared working connection assigned
dedicated or shared protection bandwidth - 11 is dedicated, 1n is shared
- Revertive vs Non-revertive after failure is
fixed, traffic is automatically or manually
switched back - Shared protection schemes are usually revertive
- Uni-directional or bi-directional protection
- Uni each direction of traffic is handled
independent of the other. Fiber cut gt only one
direction switched over to protection . Usually
done with dedicated protection no signaling
required. - Bi-directional transmission on fiber (full
duplex) gt requires bi-directional switching
signaling required
28Ring Protection
Today multiple stacked rings over DWDM
(different ?s)
29Unidirectional Path Switched Ring (UPSR)
A-B
B-A
Bridge
Failure-free State
Path Selection
W
B
fiber 1
Bridge
P
A-B
C
A
B-A
Path Selection
fiber 2
D
One fiber is working and the other is
protecting at all nodes Traffic sent
simultaneously on working and protect paths
Protection done at path layer (like 11)
30Unidirectional Path Switched Ring (UPSR)
Bridge
Path Selection
Failure State
W
fiber 1
B
Bridge
P
A-B
A
C
B-A
Path Selection
fiber 2
D
31UPSR Discussion
- Easily handles failures of links, transmitters,
receivers or nodes - Simple to implement no signaling protocol or
communication needed between nodes - Drawback does not spatially re-use the fiber
capacity because it is similar to 11 linear
protection model - No sharing of protection (like mn model)
- BLSRs can support aggregate traffic capacities
higher than transmission rate - UPSR is popular in lower-speed local exchange and
access networks - No specified limit on number of nodes or ring
length of UPSR, only limited by difference in
delays of paths
32Bidirectional Line Switched Ring (BLSR/2)
Working
Protection
2-Fiber BLSR
B
A?C
A ? C
C ? A
A
C ? A
33Bi-directional Line Switched Ring (BLSR/2)
Working
Protection
Ring Switch
2-Fiber BLSR
B
A
A ? C
A ? C
C
C ? A
C ? A
Ring Switch
D
34Bi-directional Line Switched Ring (BLSR/2)
Working
Protection
Node Failure
2-Fiber BLSR
A
A ? C
A ? C
C ? A
C ? A
Ring Switch
Ring Switch
D
35Node Failures gt Squelching
Customer 1
Customer 2
2-Fiber BLSR
Node Failure
Customer 1
Customer 2
A
A ? C
A ? C
C ? A
C ? A
Ring Switch
Ring Switch
D
36Bi-directional Line Switched Ring (BLSR/4)
4-Fiber BLSR
Working
Protection
A ? C
A ? C
C ? A
C ? A
37Bidirectional Line Switched Ring
4-Fiber BLSR
Span Switch
A ? C
A ? C
C ? A
A
C ? A
Protection
Working
38Bidirectional Line Switched Ring
Node Failure
4-Fiber BLSR
Ring Switch
A ? C
A
A ? C
C ? A
C ? A
Ring Switch
Protection
Also Need to Squelch any Misconnected Traffic
Working
39BLSR Discussion
- BLSR/2 can be thought of as BLSR/4 with
protection fibers embedded in the same fiber - One half of the capacity is used for protection
purposes in each fiber - Span switching and ring switching is possible
only in BLSR, not in UPSR - 1n and mn capabilities possible in BLSR
- More efficient in protecting distributed traffic
patterns due to the sharing - Ring management more complex in BLSR/4
- K1/K2 bytes of SONET overhead is used to
accomplish this
40Deployment of UPSR and BLSR
Regional Ring (BLSR)
Intra-Regional Ring (BLSR)
Intra-Regional Ring (BLSR)
Access Rings (UPSR)
41Mesh Restoration
Central Controller
DC
DCS
DCS
DC
DC
DCS
DCS
DCS
DCS
DC
DCS
DCS
Self Healing Restoration Architecture
Reconfigurable (or Rerouting) Restoration
Architecture
DC Distributed Controller
42Mesh Restoration
Working Path
DCS
DCS
Line or Link Restoration
DCS
DCS
DCS
DCS
Path Restoration
- Control Centralized or Distributed
- Route Calculation Preplanned or Dynamic
- Type of Alternate Routing Line or Path
43Mesh Restoration vs Ring/Linear Protection