Title: Reliability in SuperJANET 5
1Reliability in SuperJANET 5
- Roland Trice
- SuperJANET 5 Project
- r.trice_at_ulcc.ac.uk
2Introduction
- Background to SJ5 project
- Reliability issues
- How might reliability be improved?
- Funding
- Reliability consultation
- Homework
3Background to SJ5 project
- Requirements analysis Jeremy Sharp
- Views from the community on
- Reliability
- Services
- Bandwidth
- Applications
- Etc etc etc
- Investigations
- Carrier class routers
- Reliability Roland Trice
- Transmission technology Henry Hughes
- Architecture Duncan Rogerson
4Background to SJ5 project
- Reporting in December
- Network Strategy Workshop
- JCN, JISC
- Basis for funding provision
- No CSR this time around
- Cant assume how much or over what period
5Background to SJ5 project
- If funding is released
- Procure new bandwidth in 2004
- Procure routers in 2005
- Deploy in 2005
- Hopefully before clearing
- Must by end of 2005 when SJ4 contract ends
6Background to SJ5 project
- Key features
- Reliability and availability
- Multiple service streams
- End-end delivery
- http//www.ja.net/SJ5
7Work in progress
- Meeting with UK MANs
- Investigate ways in which UKERNA and RNOs can
collaborate to increase reliability - Reliability questionnaire
- Published on SJ5 web page
- Responses coming in
- The more the better
- Informal investigations
- Telco market
- Carrier class routers
8Reliability
- Seen as key feature of new network
- JANET necessary for core business activities
- End-end reliability needed for emerging services
9What do we mean by reliability?
- Maximum availability
- Works first time, every time
- Minimum disruption
- Everyone hates down time
- Predictable performance
- Any time of the day
- Any application
10What do we mean by reliability?
- Everything works for everyone
- From E-mail FTP to voice and video
- Big institutions down to small colleges
- Web browsing to Peta-byte file transfers
- Difficult to provide with on-size-fits-all
network
11What has SJ4 achieved?
- A huge improvement on SJ III
- No user affecting faults on core
- Few RN access link failures
- Improvement plans to rectify problems
- No congestion anywhere on the backbone
- Into regional networks
- To UK ISPs
- To Europe
- To the Internet
12What has SJ4 achieved?
- Freedom to develop new services
- Not continually fire fighting
- New services bring new problems
- Market test
- Needed to prove someone would sell us
- 16 times more bandwidth
- Flexible contract
- Upgrade path
- Improvement plans
- Development infrastructure
13Youre never alone with Schizophrenia
- Network must be reliable
- SLA failure will displease funding bodies
- Failure of credibility with user community
- We need to develop new services
- Need to add value where ISP cant
- Failure in credibility if we dont
14Maximise Reliability
- Choose stable OS and keep to it
- Deploy ultra reliable hardware
- Fault tolerant leased lines
- Eliminate single points of failure
- Strict change control
- Avoid complexity
- Keep your sticky mitts off it!
- At the Core, RN and institution
15Issues-1
- Choose stable OS and keep to it
- Need to deploy new features
- Mix of features causes instability
- Deploy ultra reliable equipment
- Cost
- Current generation of routers inadequate
- Not carrier class
16Issues-2
- Fault tolerant leased lines
- Cost
- Diverse routing Vs Diversity of supplier
- Sub-contracting
- Duct swapping
- Eliminate single points of failure
- Costly
- Adds complexity
17Issues-3
- Avoid complexity
- Resilience and/or more adventurous services all
increase complexity - Keep your sticky mitts off it!
- Need to develop new services
- At risk periods are unavoidable
- Strict change control
- Does slow down rate of change
- Temptation to cut corners
18Issues-4
- Change is difficult to implement on an
operational network - Multicast, QoS etc taking months or years to roll
out - Problems during deployment will affect
reliability
19Squaring the circle
- High reliability and leading edge facilities are
mutually exclusive on a on-size-fits-all network
infrastructure - Possibility that JANET fails to deliver the
required reliability and the desired innovation
unless the infrastructure is radically different
20Multiple independent services
- Commodity best-efforts IP service
- Stable
- Always there
- Minimal at risk time
- Stability, stability, stability
21Multiple independent services
- Offer greater range of leading edge facilities
- Early adoption of emerging technology
- Special application environment
- Guaranteed latency and jitter
- Very high bandwidth
- Managed bandwidth
22Multiple independent services
- Will almost certainly need more attention
- More frequent at-risk work
- Less stability
- Trickle down
- New services can be brought into production
- Once all the bugs have been ironed out
23Multiple independent services
- Not just about reliability
- Allows greater exploitation of the network
- Allows support of a diverse community
- Needs of the many are not out-weighed by the
needs of the few or the one - Needs of the few or the one are not stifled by
the needs of the many
24Provision
- MPLS
- TAG switching of different services
- Still relies on on-size-fits-all network
- Complex
- Untried in our community
- Not flexible enough
- May have a place if no other way can be found
- Potential use in legacy networks
25Provision
- Multiple virtual links from a Telco
- Lambda or SDH services
- Use optical muxes to apportion bandwidth
- Could provide raw bandwidth without IP
- Very high data rates available
- 10 Gbit/s now
- 40 Gbit/s shortly
- Possibly by the time we procure
26Provision
- Dark or Managed fibre Fibre
- As above, but we own and operate the fibre and
optical kit - Long term cost benefit
- If we can get away from 5 year procurement cycle
- Absolute control
- Maintenance issues
27Routers, The Next Generation
- NG routers 99.999 availability
- Better hardware
- More reliable
- Very high capacity
- Cunning features
- Caching of routing and forwarding
- ATM like QoS control
- Virtual Routers
- Modular software
28In the real world
- Core network very reliable, access links are the
weakest link - Telco fault handling remains poor
- Lies, damn lies and the art of parking
- Duplication of links
- RN to JANET
- UKERNA RNOs will look at resilience
- Institution to RN
- Institutions need to talk to their RNO
29In the real world
- Institutions will have to fund duplicate links
- More effective than shouting at a Telco
- Institutional power problems still a significant
issue
30Funding, funding, funding
- Everyone says they want more reliability
- Not a free good
- Who wants to pay for more reliability?
- Ah well, err maybe in a few months.
- Need to put a price on reliability
- Cost to institution of a loss of JANET service
- Indicate spending that mitigates risk
- LAN or WAN or both or none?Â
- Results could influence the spend if not the
purse
31Reliability consultation
- Available now on SJ5 web page
- To determine how important JANET actually is to
institutions and to users - To recommend ways in which reliability may be
improved in a scaleable and affordable way - Cost benefit analysis to maximize value for money
- Responses by mid November please
32Key questions
- Dependence on the network
- How tolerant is your institution to outages?
- Have you identified a cost of network down time?
- Most responses dont give a figure
- Investment priority
- LAN - WAN
- Risk Analysis
- Resilience
- Spending
33Key questions
- Balance between reliability and development
- Regional Network Reliability
- At-Risk sessions
- Institutions Risk Analysis
34Homework
- Talk to your institution, UKERNA and your RNO
- Respond to the questionnaire
- If possible, with an institutional remit
- Highly desirable
- Risk analysis
- Cost implications
- Better reliability comes at a price
- Cost benefit analysis
- Smart spending
35Questions