Title: Lecture no 18: Redundancy
1Lecture no 18 Redundancy
TDT4285 Planlegging og drift av
IT-systemer Spring 2007 Anders Christensen, IDI
2Definition
- Redundancy is using extra, superfluous capacity
to establish a safety margin against faults in
the system.
3Types of redundancy
- Fail-over seemless transition to another unit
providing identical service - Backup An extra copy of important data.
- Duplication Several units provide the same
services, and they can cover up for eachother if
one of them fails. - Fall-back alternative, but usually less
attractive solution, often temporary
4Fail-over and full redundancy
server1
The principle is the same as for RAID
Originally
Two or more equivalent servers
client
Auto fail over
mirroring
server2
After fault on server1
5Over-capacity and N1redundancy
server1
- Needs three units
- Has four units
- Extra capacity is a useful side effect
- By faults, it is possible to manage with a
reduced capacity.
server2
server3
server4
Extra unit
6Single point of failure
- Definition SPOF is a component which the
totality relies on. - Examples Power, computer networks
- Redundancy eliminate SPOF.
spof
spof
7Akkumulation of down-time
80
- Down-time or a system consisting of several
subunits depends on the down-time of these units.
- This is the calculation of probability.
- Note that there is a difference between units
that fail independant of eachother, and where
they feil synchronously.
90
50
50
90
100
8SPOF may also be what you need...
- Allows you to control where a fault happens (like
in a fuse). - Allows you to detect and have overview over
faults. - The use is simultaneously a testing
- Its cheaper (unless you need the up-time and
quality)
9Example RAID5
Note simplified model
Data
Parity
Extra
Disk1
Disk2
Disk3
Disk4
Disk5
Disk6
Disk7
rebuild of disk2
10Full and N1redundancy
- N1-redundancy
- One extra unit of every component of a certain
type - Usually HW
- Used for selected parts of the system
- Full redundancy
- The whole system is duplicated
- Expensive (gt2cost)
- May be implemented for a whole site.
11Redundancy traps.
- Fail-over becomes double life-time
- Identical units with identical bugs
- Installed but un-tested
- Bought but false safty
- No guarantee against human errors
- Can become a white elefant
12Example HVAC at IDIs computer room
Fan
Power
Duplicated unit at other side of room
Closed loop cooling fluid
Radiator
Computer floor
Water supply
13Costs
- Reduncy costs more (in operations!)
Extra equipment
Higher complexity
Note only approximately calculation
More maintenance
- Advantage of higher up-tim
Extra costs
14Costs vs up-time
99,9
Costs
99
90
Up-time
Alternative strategy recovery in stead of
redundancy
15Return On Investment (ROI)
- Result with current solution (R1)
- Result with investment (R2)
- Cost of investment (Invest)
- ROI (R2 R1) Invest
16Example A car
- Tires 4spare tire (N1 redundancy)
- Petrol Spare petrol tank (backup?)
- Break lights 2-3 units (full redundancy)
- Breaks hand break (fall-back)
- Taxi Can call for (out-soucing)
- Safetybelt and collision bag different
implementation, common use (duplication) - Head lights can use parking lights (fall-back)