Title: Olivier H' Martin 1
1Wide Area Networking requirements and challenges
at CERN for the LHC era
- Presented at the NEC01 conference, 17th
September, VARNA, Bulgaria - Olivier H. Martin
- CERN - IT Division
- September 2001
- Olivier.Martin_at_cern.ch
2Presentation Outline
- CERN connectivity Update
- Academic Research Networking in Europe
- TEN-155
- GEANT project Update
- Interconnections with Academic Research
networks worldwide - STAR TAP
- STAR LIGHT
- DataTAG project
- Internet today
- What it is
- Growth
- Technologies
- Trends
- Challenges ahead
- QoS
- Gigabit/second file transfer
- Security architecture
3Main Internet connections at CERN
Swiss National Research Network
IN2P3
Mission Oriented World Health Org.
2.5Gbps
SWITCH
155Mbps
WHO
45Mbps
1Gbps
TEN-155 (39/155Mbps)
Europe
GEANT (1.25/2500Mbps)
General purpose Internet connections (Europe/USA/
World)
CERN
155Mbps
USA
1Gbps
CIXP
Commercial
4 CERNs Distributed Internet Exchange Point
(CIXP)
- Telecom Operators dark fibre providers
- Cablecom, COLT, diAx, France Telecom, Global
Crossing, GTS/EBONE, KPNQwest, LTT(), Deutsche
Telekom/Multilink, MCI/Worldcom, SIG, Sunrise,
Swisscom (Switzerland), Swisscom (France),
Thermelec. - Internet Service Providers include
- Infonet, ATT Global Network Services (formerly
IBM), Cablecom, CW, Carrier1, Colt, DFI,
Deckpoint, Deutsche Telekom, diAx (dplanet),
Easynet, Ebone//GTS, Eunet/KPNQwest, France
Telecom OpenTransit, Global-One, Globix, HP,
INSnet/Wisper InterNeXt, ISDnet/Ipergy, IS
Internet Services (ISION), LTT(), Madge.web,
Net Work Communications (NWC), PSI Networks
(IProlink), MCI/Worldcom, Petrel, Renater,
Sita/Equant(), Sunrise, Swisscom IP-Plus,
SWITCH, TEN-155, Urbanet, VTX, Uunet().
isp
isp
Telecom operators
c i x p
isp
isp
isp
isp
isp
isp
CERN firewall
Telecom operators
Cern Internal Network
5CERN co-location status (Chicago)
Qwest PoP (Chicago)
Qwest-IP
21 Mbps / T3
KPN- Qwest
CERN (Geneva)
CERN-USA
ESnet
STM-1
CERNH9
STM-1
STM-1
T3
STAR TAP
LS1010
STM-1
6Academic Research Networking in Europe
- Focus on Research Education, also includes
access to the commodity Internet. - The TERENA compendium of NREN is an excellent
source of information (e.g. funding, focus,
budget, staff) - Hierarchical structure with pan-European backbone
co-funded by the European Union (EU)
interconnecting the various national networks. - TEN-155 (Trans-European 155 Mbps)
- 155Mbps ATM core initially, recently upgraded to
622Mbps Packet over Sonet (POS) - still strong ATM focus
- Managed Bandwidth Service (MBS) to support
special needs European Union projects. - Will terminate at the end of November 2001
- Map available at http//www.dante.net/ten-155/ten
155net.gif
7(No Transcript)
8The GEANT Project
- 4 years project started on December 1, 2000
- call for tender issued June 2000, closed
September 2000, adjudication made in June 2001 - first parts to be delivered in September 2001.
- Provisional cost estimate 233 Meuro
- EC contribution 87 MEuro (37), of which 62 will
be spent during year1 in order to keep NRNs
contributions stable (i.e. 36MEuro/year). - Several 2.5/10Gbps rings (mostly over unprotected
lambdas, three main Telecom Providers
(Colt/DT/Telia) - West ring (2half rings) provided by COLT
(UK-SE-DE-IT-CH-FR) - East legs and miscellaneous circuits provided by
Deutsche Telekom (e.g. CH-AT, CZ-DE)
9(No Transcript)
10The GEANT Project (cont.)
- Access mostly via 2.5 Gbps circuits
- Routers Juniper
- CERN
- CERN SWITCH will share a 2.5 Gbps access
- CERN should purchase 50 of the total capacity
(i.e. 1.25 Gbps). - It is also expected that the access bandwith to
GEANT will double every 12-18 months. - The Swiss PoP of GEANT will be located at the
Geneva airport (Ldcom_at_Halle de Fret)
11The GEANT Project (cont.)
- Connection to other World Regions
- in principle via core nodes only, They will,
together, form a European Distributed Access
(EDA) point conceptually similar to the STAR
TAP. - Projected Services Applications
- standard (i.e. best effort) IP service
- premium IP service (diffservs Expedited
Forwarding (EF)) - guaranteed capacity service (GCS), using
diffservs Assured Forwarding (AF)) - Virtual Private Networking (VPN), layer2 (?)
layer 3 - Native Multicast
- Various Grid projects (e.g. DataGrid) expected to
be leading applications. - Use of MPLS anticipated for traffic engineering,
VPN (e.g. DataGrid, IPv6).
12STAR TAP
- Science, Technology And Research Transit Access
Point - International Connection Point for Research and
Education Networks at the Ameritech NAP in
Chicago - Project goal to facilitate the long-term
interconnection and interoperability of advanced
international networking in support of
applications, performance measuring, and
technology evaluations. - Hosts the 6 TAP - IPv6 Meet Point
- http//www.startap.nethttp//www.startap.net/
13STAR TAP (cont)
- One of three Internet eXchange Points provided
by AADS (Ameritech Advanced Data Services) out of
a huge ATM switch, namely - Chicago NAP
- MREN (Metropolitan Research and Education
Network), the local Internet2 GigaPoP. - STAR TAP
- A by-product is a full mesh of ATM VC with ALL
the connected ISPs, thus making it easy to
establish peerings and/or to buy commercial
Internet services (e.g. NAP.NET). - No transit issues because of its non-distributed
nature.
14(No Transcript)
15StarLight The Optical STAR TAP
SURF net
BN
STAR TAP
Purdue
Star Light
OC-12
GigE
NU Evanston iCAIR
IUPUI
NU Chicago
GigE
IU Bloom-ington
CERN
I-WIRE Optical MREN
CAnet4 Bell Nexxia (Chicago)
?
This diagram subject to change
16The STAR LIGHT
- Next generation STAR TAP with the following main
distinguishing features - Neutral location (Northwestern University)
- 1/10 Gigabit Ethernet based
- Multiple local loop providers
- Optical switches for advanced experiments
- GMPLS, OBGP
- The STAR LIGHT will provide 2622 Mbps ATM
connection to the STAR TAP - Started in July 2001
- Also hosting other advanced networking projects
in Chicago State of Illinois
17StarLight Connections
- STAR TAP (AADS NAP) is connected via two OC-12c
ATM circuits now operational - The Netherlands (SURFnet) is bringing two OC-12c
POS from Amsterdam to StarLight on September 1,
2001 and a 2.5Gbps lambda to StarLight on
September 15, 2001 - Abilene will soon connect via GigE
- Canada (CAnet3/4) is connected via GigE, soon
10GigE - I-WIRE, a State-of-Illinois-funded dark-fiber
multi-10GigE DWDM effort involving Illinois
research institutions is being built. 36 strands
to the Qwest Chicago PoP are in. - NSF Distributed Terascale Facility (DTF) 4x10GigE
network being engineered by PACI and Qwest. - NORDUnet will be using StarLights OC-12 ATM
connection - CERN should come in March 2002 with OC-12 from
Geneva. A second 2.5 Gbps research circuit is
also expected to come during the second half of
2002 (EU DataTAG project).
18StarLight Infrastructure
- Soon, Star Light will be an optical switching
facility for wavelengths
19Evolving StarLightOptical Network Connections
Asia-Pacific
SURFnet, CERN
Vancouver
CAnet4
CAnet4
Seattle
Portland
U Wisconsin
NYC
Chicago
PSC
San Francisco
IU
DTF 40Gb
NCSA
Asia-Pacific
Caltech
Atlanta
SDSC
AMPATH
ANL, UIC, NU, UC, IIT, MREN
20DataTAG project
- Main aims
- Ensure maximum interoperability between USA and
EU Grid projects - Transatlantic test bed for advanced Grid applied
network research - 2.5 Gbps circuit between CERN and StarLight
(Chicago) - Partners
- PPARC (UK)
- University of Amsterdam (NL)
- INFN (IT)
- CERN (Coordinating Partner)
- Negotiation with EU is well advanced
- Expected project start 1/1/2002
- Duration 2years.
21DataTAG project
NewYork
Abilene
STAR-LIGHT
ESNET
Geneva
MREN
STAR-TAP
22 CERNUSA access requirements (2002)
Abilene
Japan
vBNS
STAR TAP
Commodity Internet
Canada
2OC-12
ESnet
CIXP
T3
MREN
Star Light
E3
Production
622 Mbps
CERN
CERN PoP (USA)
ANL
FNAL
2.5 Gbps
StarLight Co-location facility (NWU)
DataTAG
23(No Transcript)
24Internet Backbone Speeds
MBPS
IP/?
OC12c
OC3c
ATM-VCs
T3 lines
T1 Lines
25High Speed IP Network Transport
Multiplexing, protection and management at every
layer
IP
Signalling
ATM
SONET/SDH
Optical
B-ISDN
Higher Speed, Lower cost, complexity and overhead
26(No Transcript)
27(No Transcript)
28Transmission Systems of The Recent Past
Low-rate Data
Low-rate Data
30-50 km
E D M U X
XMTR
Regen. Repeater
Regen. Repeater
RCVR
Regenerative Receiver
Transmitter (DFB Laser)
Opto-Electronic Regenerative Repeaters
Electronic Multiplexer
Electronic Demuliplexer
- Single channel operation
- Opto-electronic regenerative repeaters - one per
50 km per fiber - 30-50 km repeater spacing
- Capacity upgrades increased speed
Still Found In Legacy Network Systems
29Todays Transmission System
l1
80-140 km
XMTR
RCVR
l1
O M U X
O D M U X
XMTR
RCVR
l2
Regen. Repeater
l2
ln
XMTR
RCVR
ln
Optical Demultiplexer
Optical Amplifiers
Optical Multiplexer
- Multi-channel WDM operation
- One amplifier supports many channels
- 80-140km amplifier (repeater) spacing
regeneration required every 200-300 km - Capacity upgrades adding wavelengths (channels)
increasing speeds
However, regeneration is still very expensive and
fixes the optical line rate
30Next GenerationThe Now Generation
l1
80-140 km
XMTR
l1
O M U X
O D M U X
RCVR
XMTR
l2
RCVR
l2
ln
XMTR
RCVR
ln
1600 km
Optical Demultiplexer
Optical Multiplexer
- Multi-channel WDM operation
- One amplifier supports many channels
- 80-140km amplifier (repeater) spacing
regeneration required only every 1600 km - Capacity upgrades adding wavelengths (channels)
increasing speeds
Over 1000 Km optically transparent research
network tested on the Qwest network
31IAB Workshop
- The Internet Architecture Board (IAB) held a
workshop on the state of the Internet Network
Layer in July 1999, a number of problem areas and
possible solutions were identified - Network/Port Address Translators (NAT/PAT),
- Application Level Gateways (ALG) and their impact
on existing and future Internet applications. - End to end transport security requirements
(IPSEC) - Transparency (e.g. H.323)
- Realm Specific IP (RSIP).
- Mobility (completely different set of protocol
requirements) - IPv6
- Routing (growth of routing table, route
convergence) - DNS (renumbering)
32Loss of End to end transparency
- Loss of end to end transparency due to
- proliferation of Firewalls, NATs, PATs
- Web caches, Content Engines, Content Distribution
Networks (CDN), - Application Level gateways, Proxies, etc.
- Cons
- violation of end to end transport principle,
- possible alteration of the data,
- only partially fits the client-server model (i.e.
server must be outside) - Pros
- better performance, service differentiation, SLA,
- cheaper to deliver services to large number of
recipients, etc.
33Client/Server Architecture is breaking down
Private Address Realm
- For web-based transactions
- Sufficient to allow clients in private address
spaces to access servers in global address space
Global Addressing Realm
- For telephones and I-Msg
- You need to use an address when you call them,
and are therefore servers in private realm
Private Address Realm
34Several major issues
- Quality of Service (QoS)
- High performance (i.e. wire speed) file transfer
 end to end - Will CDN technology help?
- Is the evolution towards edge services likely to
affect global GRID services? - Impact of security
- Internet Fragmentation, one vs several Internets
- e.g. GPRS top level domain
- Transition to IPv6 and long term coexistence
between IPv4 IPv6
35Quality of Service (QoS)
- Two approaches proposed by the IETF
- integrated services (intserv),
- intserv is an end-to-end architecture based on
RSVP that has poor scaling properties. - differentiated services (diffserv).
- diffserv is a newer and simpler proposal that has
much better chances to get deployed in some real
Internet Service Providers environments, at
least. - even though diffserv has good scaling properties
and takes the right approach that most of the
complexity must be pushed at the edges of the
network, there are considerable problems with
large diffserv deployments. - ATM is far from dead, but has serious scaling
difficulties (e.g. TEN-155, Qwest/ATM). - MPLS is extremely promising, today it looks like
it is where the future lies (including ATM AAL5
emulation!)
36Quality of Service (QoS)
- QoS is an increasing nightmare as the
understanding of the implications are growing - Delivering QoS at the edge and only at the edge
is not sufficient to guarantee low jitter, delay
bound communications, - Therefore complex functionality must also be
introduced in Internet core routers, - is it compatible with ASICs,
- is it worthwhile?
- Is MPLS an adequate and scalable answer?
- Is circuit oriented technology (e.g. dynamic
wavelength) appropriate? - If so, for which scenarios?
37Internet Backbone Technologies (MPLS/1)
- MPLS (Multi-Protocol Label Switching) is an
emerging IETF standard that is gaining impressive
acceptance, especially with the traditional
Telecom Operators and the large Internet Tier 1. - Recursive encapsulation mechanism that can be
mapped over any layer 2 technology (e.g. ATM, but
also POS). - Departure from destination based routing that has
been plaguing the Internet since the beginning. - Fast packet switching performed on source,
destination labels, as well as ToS. Like ATM
VP/VC, MPLS labels only have local significance. - Better integration of layer 2 and 3 than in an IP
over ATM network through the use of RSVP or LDP
(Label Distribution Protocol). - Ideal for traffic engineering, QoS routing, VPN,
IPv6 even.
38Internet Backbone Technologies (MPLS/2)
- MPLS provides 2 levels of VPNs
- Layer 3 (i.e.conventional VPNs)
- Layer 2 (i.e encapsulation of various layer2
frame formats), e.g. - Ethernet
- ATM
- PPP
-
- MPLS can also be used for circuit and/or
wavelength channel restoration. - MPlS (MPLambdaS), GMPLS (Generalized MPLS)
39 Gigabit/second networking
- The start of a new era
- Very rapid progress towards 10Gbps networking in
both the Local (LAN) and Wide area (WAN)
networking environments are being made. - 40Gbps is in sight on WANs, but what after?
- The success of the LHC computing Grid critically
depends on the availability of Gbps links between
CERN and LHC regional centers. - What does it mean?
- In theory
- 1GB file transferred in 11 seconds over a 1Gbps
circuit () - 1TB file transfer would still require 3 hours
- and 1PB file transfer would require 4 months
- In practice
- major transmission protocol issues will need to
be addressed - () according to the 75 empirical rule
40Very high speed file transfer (1)
- High performance switched LAN assumed
- requires time money.
- High performance WAN also assumed
- also requires money but is becoming possible.
- very careful engineering mandatory.
- Will remain very problematic especially over high
bandwidthdelay paths - Might force the use Jumbo Frames because of
interactions between TCP/IP and link error rates. - Could possibly conflict with strong security
requirements (i.e.throughput, handling of TCP/IP
options (e.g. window scaling))
41Very high speed file transfer (2)
- Following formula proposed by Matt Mathis/PSC
(The Macroscopic Behavior of the TCP Congestion
Avoidance Algorithm) to approximate the maximum
TCP throughput under periodic packet loss - (MSS/RTT)(1/sqrt(p))
- where MSS is the maximum segment size, 1460
bytes, in practice,and p is the packet loss
rate. - Are TCP's "congestion avoidance" algorithms
compatible with high speed, long distance
networks. - The "cut transmit rate in half on single packet
loss and then increase the rate additively (1 MSS
by RTT)" algorithm may simply not work. - New TCP/IP adaptations may be needed in order to
better cope with lfn, e.g. TCP Vegas
42Very high speed file transfer (3)
- The Mathis formula shows the extreme variability
of achievable TCP throughputs in the presence of,
- even small, packet loss rates (i.e. less than
1), - Small packets vs large packets (e.g. Jumbo
frames), - Delay (RTT), also called long fat networks
(lfn), i.e. with large bandwidthdelay products,
hence the need for very large windows - 3.3MB over 155Mbps link to Caltech and 170ms RTT.
- and 53MB over 2.5Gbps to Caltech!
- Consider a 10Gbs link with a RTT of 100ms and a
TCP connection operating at 10Gbps - the effect of a packet drop (due to link error)
will drop the rate to 5Gbs. It will take 4
MINUTES for TCP to ramp back up to 10Gbps. - In order to stay in the regime of the TCP
equation, 10 Gbit/s for a single stream of 1460
byte segments, a packet loss rate of about 1E-10
is required - i.e. you should lose packets about once every
five hours.
43Acceptable link error rates
44Very high speed file transfer (tentative
conclusions)
- Tcp/ip fairness only exist between similar flows,
i.e. - similar duration,
- similar RTTs.
- Tcp/ip congestion avoidance algorithms need to be
revisited (e.g. Vegas rather then Reno/NewReno). - Current ways of circumventing the problem, e.g.
- Multi-stream parallel socket
- just bandages or the practical solution to the
problem? - Web100, a 3MUSD NSF project, might help
enormously! - better TCP/IP instrumentation (MIB)
- self-tuning
- tools for measuring performance
- improved FTP implementation
- Non-Tcp/ip based transport solution, use of
Forward Error Corrections (FEC), Early Congestion
Notifications (ECN) rather than active queue
management techniques (RED/WRED)?
45CERNs new firewall technology and topology
Gbit Ethernet
Cabletron SSR
Gbit Ethernet
Fast Ethernet
FastEthernet
DxmonFE and FDDIbridge
CiscoPIX
Cisco RSP7000
FastEthernet
100/1000 Ethernet
FastEthernet
Cabletron SSR
Securitymonitor
Gbit Ethernet