Title: Esnet:%20DOE
1EsnetDOEs Science NetworkGNEW March, 2004
- William E. Johnston, ESnet Manager and Senior
Scientist - Michael S. Collins, Stan Kluz,Joseph Burrescia,
and James V. Gagliardi, ESnet Leads - and the ESnet Team
- Lawrence Berkeley National Laboratory
2Esnet Provides
- High bandwidth backbone and connections for
Office of Science Labs and programs - High bandwidth peering with the US, European, and
Japanese Research and Education networks - SecureNet (DOE classified RD) as an overlay
network - Science services Grid and collaboration
services - User support ESnet owns all network trouble
tickets (even from end users) until they are
resolved - one stop shopping for user network problems
- 7x24 coverage
- Both network and science services problems
3ESnet Connects DOE Facilities and Collaborators
CAnet4 CERN MREN Netherlands Russia StarTap Taiwa
n (ASCC)
PNWG
SEA HUB
ESnet IP
Abilene
Japan
Abilene
Chi NAP
NY-NAP
QWEST ATM
Abilene
MAE-E
SNV HUB
MAE-W
PAIX-E
Fix-W
PAIX-W
Euqinix
Abilene
42 end user sites
International (high speed) OC192 (10G/s
optical) OC48 (2.5 Gb/s optical) Gigabit Ethernet
(1 Gb/s) OC12 ATM (622 Mb/s) OC12 OC3 (155
Mb/s) T3 (45 Mb/s) T1-T3 T1 (1 Mb/s)
ESnet backbone Optical Ring and Hubs
peering points
hubs
SNV HUB
4ESnet is Driven by the Needs of DOE Science
August 13-15, 2002
Organized by Office of Science Mary Anne Scott,
Chair Dave Bader Steve Eckstrand Marvin
Frazier Dale Koelling Vicky White
Workshop Panel Chairs Ray Bair and Deb
Agarwal Bill Johnston and Mike Wilde Rick
Stevens Ian Foster and Dennis Gannon Linda
Winkler and Brian Tierney Sandy Merola and
Charlie Catlett
- Focused on science requirements that drive
- Advanced Network Infrastructure
- Middleware Research
- Network Research
- Network Governance Model
Available at www.es.net/research
5Eight Major DOE Science Areas Analyzed at the
August 02 Workshop
Feature Vision for the Future Process of Science Characteristics thatMotivate High Speed Nets Requirements Requirements
Discipline Discipline Vision for the Future Process of Science Characteristics thatMotivate High Speed Nets Networking Middleware
Climate (near term) Climate (near term) Analysis of model data by selected communities that have high speed networking (e.g. NCAR and NERSC) A few data repositories, many distributed computing sites NCAR - 20 TBy NERSC - 40 TBy ORNL - 40 TBy Authenticated data streams for easier site access through firewalls Server side data processing (computing and cache embedded in the net) Information servers for global data catalogues
Climate (5 yr) Climate (5 yr) Enable the analysis of model data by all of the collaborating community Add many simulation elements/components as understanding increases 100 TBy / 100 yr generated simulation data, 1-5 PBy / yr (just at NCAR) Distribute large chunks of data to major users for post-simulation analysis Robust access to large quantities of data Reliable data/file transfer (across system / network failures)
Climate (5 yr) Climate (5 yr) Integrated climate simulation that includes all high-impact factors 5-10 PBy/yr (at NCAR) Add many diverse simulation elements/components, including from other disciplines - this must be done with distributed, multidisciplinary simulation Virtualized data to reduce storage load Robust networks supporting distributed simulation - adequate bandwidth and latency for remote analysis and visualization of massive datasets Quality of service guarantees for distributed, simulations Virtual data catalogues and work planners for reconstituting the data on demand
Driven by
6Evolving Qualitative Requirements for Network
Infrastructure
In the near term applicationsneed high bandwidth In the near term applicationsneed high bandwidth 2-4 yrs requirement is for high bandwidth and QoS. 2-4 yrs requirement is for high bandwidth and QoS.
3-5 yrs requirement is for high bandwidth and QoS and network resident cache and compute elements. 3-5 yrs requirement is for high bandwidth and QoS and network resident cache and compute elements. 4-7 yrs requirement is for high bandwidth and QoS and network resident cache and compute elements, and robust bandwidth (multiple paths) 4-7 yrs requirement is for high bandwidth and QoS and network resident cache and compute elements, and robust bandwidth (multiple paths)
S
C
S
C
guaranteedbandwidthpaths
I
1-40 Gb/s,end-to-end
I
2-4 yrs
1-3 yrs
C
C
C
C
storage
S
S
S
compute
C
instrument
I
cache compute
CC
S
C
CC
CC
I
4-7 yrs
3-5 yrs
CC
CC
CC
C
CC
100-200 Gb/s,end-to-end
C
S
7Evolving Quantitative Science Requirements for
Networks
Science Areas Today End2End Throughput 5 years End2End Throughput 5-10 Years End2End Throughput Remarks
High Energy Physics 0.5 Gb/s 100 Gb/s 1000 Gb/s high bulk throughput
Climate (Data Computation) 0.5 Gb/s 160-200 Gb/s N x 1000 Gb/s high bulk throughput
SNS NanoScience Not yet started 1 Gb/s 1000 Gb/s QoS for control channel remote control and time critical throughput
Fusion Energy 0.066 Gb/s(500 MB/s burst) 0.198 Gb/s(500MB/20 sec. burst) N x 1000 Gb/s time critical throughput
Astrophysics 0.013 Gb/s(1 TBy/week) NN multicast 1000 Gb/s computational steering and collaborations
Genomics Data Computation 0.091 Gb/s(1 TBy/day) 100s of users 1000 Gb/s QoS for control channel high throughput and steering
8New Strategic Directions to Address Needs of DOE
Science
June 3-5, 2003
Organized by the ESSC Workshop Chair Roy
Whitney, JLAB Report Editors Roy Whitney,
JLAB Larry Price, ANL
Workshop Panel Chairs Wu-chun Feng,
LANL William Johnston, LBNL Nagi Rao,
ORNL David Schissel, GA Vicky White, FNAL Dean
Williams, LLNL
- Focused on what was needed to achieve the science
driven network requirements of the previous
workshop - Both Workshop reports are available at
es.net/research
Available at www.es.net/research
9ESnet Strategic Directions
- Developing a 5 yr. strategic plan for how to
provide the required capabilities identified by
the workshops - Between DOE Labs and their major collaborators in
the University community we must address - Scalable bandwidth
- Reliability
- Quality of Service
- Must address an appropriate set of Grid and human
collaboration supporting middleware services
10ESnet Connects DOE Facilities and Collaborators
CAnet4 CERN MREN Netherlands Russia StarTap Taiwa
n (ASCC)
PNWG
SEA HUB
ESnet IP
Abilene
Japan
Abilene
Chi NAP
NY-NAP
QWEST ATM
Abilene
MAE-E
SNV HUB
MAE-W
PAIX-E
Fix-W
PAIX-W
Euqinix
Abilene
42 end user sites
International (high speed) OC192 (10G/s
optical) OC48 (2.5 Gb/s optical) Gigabit Ethernet
(1 Gb/s) OC12 ATM (622 Mb/s) OC12 OC3 (155
Mb/s) T3 (45 Mb/s) T1-T3 T1 (1 Mb/s)
ESnet backbone Optical Ring and Hubs
peering points
hubs
SNV HUB
11While ESnet Has One Backbone Provider, there
areMany Local Loop Providers to Get to the Sites
NY-NAP
QWEST ATM
LBNL/ CalRen2
GTN
DOE-NNSA
PANTEX
Qwest Contracted
Touch America (bankrupt)
MCI Contracted/Owned
Site Contracted/Owned
12ESnet Logical InfrastructureConnects the DOE
Community With its Collaborators
Abilene
ESnet provides complete access to the Internet by
managing the full complement of Global Internet
routes (about 150,000) at 10 general/commercial
peering points high-speed peerings w/ Abilene
and the international networks
13ESnet Traffic
Annual growth in the past five years has
increased from 1.7x annually to just over 2.0x
annually.
14Who Generates Traffic, and Where Does it Go?
ESnet Inter-Sector Traffic Summary, Jan 2003
72
21
Commercial
14
DOE is a net supplier of data because DOE
facilities are used by Univ. and commercial, as
well as by DOE researchers
ESnet
17
25
DOE sites
RE
10
Peering Points
53
9
International
DOE collaborator traffic, inc.data
4
ESnet Appropriate Use Policy (AUP) All ESnet
traffic must originate and/or terminate on an
ESnet an site (no transit traffic is
allowed) E.g. a commercial site cannot exchange
traffic with an international site across
ESnet This is effected via routing restrictions
ESnet Ingress Traffic Green ESnet Egress
Traffic Blue Traffic between sites of total
ingress or egress traffic
15ESnet Site Architecture
New York (AOA)
Chicago (CHI)
Washington, DC (DC)
The Hubs have lots of connections(42 in all)
Backbone(optical fiber ring)
Atlanta (ATL)
Sunnyvale (SNV)
ESnet responsibility
Site responsibility
El Paso (ELP)
Hubs(backbone routers and local loop connection
points)
Site gateway router
ESnet border router
SiteLAN
Local loop (Hub to local site)
DMZ
Site
16SecureNet
- SecureNet connects 10 NNSA (Defense Programs)
Labs - Essentially a VPN with special encrypters
- The NNSA sites exchange encrypted ATM traffic
- The data is unclassified when ESnet gets it
because it is encrypted before it leaves the NNSA
sites with an NSA certified encrypter - Runs over the ESnet core backbone as a layer 2
overlay that is, the SecureNet encrypted ATM is
transported over ESnets Packet-Over-SONET
infrastructure by encapsulating the ATM in MPLS
using Juniper CCC
17SecureNet Mid 2003
Backup SecureNet Path
AOA-HUB
CHI-HUB
GTN
SNV-HUB
LLNL
DC-HUB
SNLL
ORNL
KCP
DOE-AL
Pantex
LANL
SNLA
SRS
Primary SecureNet Path
ATL-HUB
ELP-HUB
SecureNet encapsulates payload encrypted ATM in
MPLSusing the Juniper Router Circuit Cross
Connect (CCC) feature.
18IPv6-ESnet Backbone
9peers
18 peers
BNL
StarLight
7peers
Distributed 6TAP
PAIX
LBL
Chicago
Sunnyvale
New York
ANL
FNAL
DC
Albuquerque
Atlanta
SLAC
El Paso
- IPv6 is the next generation Internet protocol,
and ESnet is working on addressing deployment
issues - one big improvement is that while IPv4 has 32 bit
about 4x109 addresses (which we are running
short of), IPv6 has 132 bit about 1040
addresses (which we are not ever likely to run
short of) - another big improvement is native support for
encryption of data
19Operating Science Mission Critical Infrastructure
- ESnet is a visible and critical pieces of DOE
science infrastructure - if ESnet fails,10s of thousands of DOE and
University users know it within minutes if not
seconds - Requires high reliability and high operational
security in the ESnet operational services the
systems that are integral to the operation and
management of the network - Secure and redundant mail and Web systems are
central to the operation and security of ESnet - trouble tickets are by email
- engineering communication by email
- engineering database interface is via Web
- Secure network access to Hub equipment
- Backup secure telephony access to Hub equipment
- 24x7 help desk (joint with NERSC)
- 24x7 on-call network engineer
20Disaster Recovery and Stability
- The network operational services must be kept
available even if, e.g., the West coast is
disabled by a massive earthquake, etc. - ESnet engineers in four locations across the
country - Full and partial engineering databases and
network operational service replicas in three
locations - Telephone modem backup access to all hub
equipment - All core network hubs are located in commercial
telecommunication facilities with high physical
security and backup power
21Disaster Recovery and Stability
- Engineers, 24x7 NOC, generator backed power
- Spectrum (net mgmt system)
- DNS (name IP address translation)
- Eng database
- Load database
- Config database
- Public and private Web
- E-mail (server and archive)
- PKI cert. repository and revocation lists
- collaboratory authorization service
- Remote Engineer
- partial duplicate infrastructure
DNS
Remote Engineer
Duplicate Infrastructure (currently deploying
full replication of the NOC databases and servers
and Science Services databases)
Engineers Eng Srvr Load Srvr Config Srvr
- ESnet backbone operated without interruption
through - N. Calif. Power blackout of 2000
- the 9/11 attacks
- the Sept., 2003 NE States power blackout
22Maintaining Science Mission Critical
Infrastructurein the Face of Cyberattack
- A Phased Security Architecture is being
implemented to protect the network and the sites - The phased response ranges from blocking certain
site traffic to a complete isolation of the
network which allows the sites to continue
communicating among themselves in the face of the
most virulent attacks - Separates ESnet core routing functionality from
external Internet connections by means of a
peering router that can have a policy different
from the core routers - Provide a rate limited path to the external
Internet that will insure site-to-site
communication during an external denial of
service attack - provide lifeline connectivity for downloading
of patches, exchange of e-mail and viewing web
pages (i.e. e-mail, dns, http, https, ssh, etc.)
with the external Internet prior to full
isolation of the network
23Phased Response to Cyberattack
ESnet third response shut down the main peering
path and provide only a limited bandwidth path
for specific lifeline services
ESnet second response filter traffic from
outside of ESnet
ESnet first response filters to assist a site
peeringrouter
X
X
router
ESnet
router
LBNL
attack traffic
router
X
borderrouter
Lab first response filter incoming traffic at
their ESnet gateway router
gatewayrouter
peeringrouter
border router
Lab
gatewayrouter
Lab
Sapphire/Slammer worm infection created almost a
Gb/s traffic spike on the ESnet backbone until
filters were put in place (both into and out of
sites) to damp it out.
24Future Directions the 5 yr Network Strategy
- Elements
- University connectivity
- Scalable and reliable site connectivity
- Provisioned circuits for hi-impact science
bandwidth - Close collaboration with the network RD
community - Services supporting science (Grid middleware,
collaboration services, etc.)
255 yr Strategy Near Term Goal 1
- Connectivity between any DOE Lab and any Major
University should be as good as ESnet
connectivity between DOE Labs and Abilene
connectivity between Universities - Partnership with I2/Abilene
- Multiple high-speed peering points
- Routing tailored to take advantage of this
- Latency and bandwidth from DOE Lab to University
should be comparable to intra ESnet or intra
Abilene - Continuous monitoring infrastructure to verify
265 yr Strategy Near Term Goal 2
- Connectivity between ESnet and RD nets a
critical issue from Roadmap - UltraScienceNet and NLR for starters
- Reliable, high bandwidth cross-connects
- IWire ring between Qwest ESnet Chicago hub and
Starlight - This is also critical for DOE lab connectivity to
the DOE funded LHCNet 10 Gb/s link to CERN - Both LHC tier 1 sites in the US Atlas and CMS
are at DOE Labs - ESnet ring between Qwest ESnet Sunnyvale hub
and the Level 3 Sunnyvale hub that houses the
West Coast POP for NLR and UltraScienceNet
275 yr Strategy Near-Medium Term Goal
- Scalable and reliable site connectivity
- Fiber / lambda ring based Metropolitan Area
Networks - Preliminary engineering study completed for San
Francisco Bay Area and Chicago area - Proposal submitted
- At least one of these is very likely to be funded
this year - Hi-impact science bandwidth provisioned circuits
28ESnet Future Architecture
- Migrate site local loops to ring structured
Metropolitan Area Network and regional nets in
some areas - Goal is local rings, like the backbone, that
provide multiple paths - Dynamic provisioning of private circuits in the
MAN and through the backbone to provide high
impact science connections - This should allow high bandwidth circuits to go
around site firewalls to connect specific
systems. The circuits are secure and end-to-end,
so if the sites trust each other, they should
allow direct connections if they have compatible
security policies. E.g. HPSS lt-gt HPSS - Partnership with DOE UltraNet, Internet 2 HOPI,
and National Lambda Rail
29ESnet Future Architecture
site
one optical fiber pairDWDM providing
point-to-point, unprotected circuits
provisioned circuits initially via MPLS paths,
eventually via lambda paths
Layer 2 management equipment (e.g. 10Â GigEthernet
switch)
MetropolitanAreaNetworks
corering
site
Layer 3 (IP)management equipment (router)
production IP
provisioned circuits carriedover lambdas
Optical channel (?) management equipment
provisioned circuits carriedas tunnels through
the ESnetIP backbone
site
30ESnet MAN Architecture - Example
CERN(DOE funded link)
StarLight
Qwest hub
Current DMZs are back-hauled to the core
router Implemented via 2 VLANs one in each
direction around the ring
ESnet core
other international peerings
Vendor neutral facility
ESnet managed? / circuit services
ESnet management and monitoring partly to
compensate for no site router
- Ethernet switch
- DMZ VLANs
- Management of provisioned circuits
ESnet managed? / circuit services tunneled
through the IP backbone via MPLS
ESnet production IP service
ANL
FNAL
site equip.
Site gateway router
site equip.
Site gateway router
Site LAN
Site LAN
31Future ESnet Architecture
circuit cross connect
ESnet border
Site gateway router
MANoptical fiber ring
SiteLAN
DMZ
Site
New York (AOA)
Washington
ESnetbackbone
Atlanta (ATL)
Private circuit from one Lab to another
El Paso (ELP)
circuit cross connect
Site gateway router
ESnet border
SiteLAN
MANoptical fiber ring
DMZ
Site
32Long-Term ESnet Connectivity Goal
- MANs for scalable bandwidth and redundant site
access to backbone - Connecting MANs with two backbones to ensure
against hub failure (for example NLR is shown as
the second backbone below)
Japan
Europe
CERN/Europe
Japan
MANs
Local loops
High-speed cross connects with Internet2/Abilene
Qwest
Major DOE Office of Science Sites
NLR
33Long-Term ESnet Bandwidth Goal
- Harvey NewmanAnd what about increasing the
bandwidth in the backbone? - Answer technology progress
- By 2008 (the next generation ESnet backbone) DWDM
technology will be 40 Gb/s per lambda - And the backbone will be multiple lambdas
- Issues
- End-to-End, end-to-end, and end-to-end
34Science Services Strategy
- ESnet is in a natural position to be the provider
of choice for a number of middleware services
that support collaboration, colaboratories,
Grids, etc. - The characteristics of ESnet that make it a
natural middleware provider are that ESnet - is the only computing related organization that
serves all of the Office of Science - is trusted and well respected in the OSC
community - has the 7x24 infrastructure required to support
critical services, and is a long-term stable
organization. - The characteristics of the services for which
ESnet is the natural provider are those that - require long-term persistence of the service or
the data associated with the service - require high availability, require a high degree
of integrity on the part of the provider - are situated at the root of a hierarchy so that
the service scales in the number of people that
it serves by adding nodes that are managed by
local organizations (so that ESnet does not have
a large and constantly growing direct user base).
35Science Services Strategy
- DOE Grids CA that provides X.509 identity
certificates to support Grid authentication
provides an example of this model - the service requires a highly trusted provider,
requires a high degree of availability - provides a centralized agent for negoiating trust
relationships with, e.g., European CAs - it scales by adding site based or Virtual
Organization based Registration Agents that
interact directly with the users
36Science Services Public Key Infrastructure
- Public Key Infrastructure supports cross-site,
cross-organization, and international trust
relationships that permit sharing computing and
data resources and other Grid services - Digital identity certificates for people, hosts
and services essential core service for Grid
middleware - provides formal and verified trust management
an essential service for widely distributed
heterogeneous collaboration, e.g. in the
International High Energy Physics community - DOE Grids CA
- Have recently added a second CA with a policy
that permits bulk issuing of certificates with
central private key mgmt - Important for secondary issuers
- NERSC will auto issue certs when accounts are set
up this constitutes an acceptable identity
verification - May also be needed for security domain gateways
such asKerberos X509 e.g. KX509
37Science Services Public Key Infrastructure
- Policy Management Authority negotiates and
manages the formal trust instrument (Certificate
Policy - CP) - Sets and interprets procedures that are carried
out by ESnet - Currently facing an important oversight situation
involving potential compromise of user X.509 cert
private keys - Boys-from-Brazil style exploit gt kbd sniffer on
several systems that housed Grid certs - Is there sufficient forensic information to say
that the pvt keys were not compromised?? - Is any amount of forensic information sufficient
to guarantee this, or should the certs be
revoked? - Policy refinement by experience
- Registration Agents (RAs) validate users against
the CP and authorize the CA to issue digital
identity certs - This service was the basis of the first routine
sharing of HEP computing resources between US and
Europe
38Science Services Public Key Infrastructure
- The rapidly expanding customer base of this
service will soon make it ESnets largest
collaboration service by customer count
39Voice, Video, and Data Collaboration Service
- The other highly successful ESnet Science Service
is the audio, video, and data teleconferencing
service to support human collaboration - Seamless voice, video, and data teleconferencing
is important for geographically dispersed
collaborators - ESnet currently provides voice conferencing,
videoconferencing (H.320/ISDN scheduled, H.323/IP
ad-hoc), and data collaboration services to more
than a thousand DOE researchers worldwide
40Voice, Video, and Data Collaboration Service
- Heavily used services, averaging around
- 4600 port hours per month for H.320
videoconferences, - 2000 port hours per month for audio conferences
- 1100 port hours per month for H.323
- approximately 200 port hours per month for data
conferencing - Web-Based registration and scheduling for all of
these services - authorizes users efficiently
- lets them schedule meetings
- Such an automated approach is essential for a
scalable service ESnet staff could never handle
all of the reservations manually
41Science Services Strategy
- The Roadmap Workshop identified twelve high
priority middleware services, and several of
these fit the criteria for ESnet support. These
include, for example - long-term PKI key and proxy credential management
(e.g. an adaptation of the NSFs MyProxy service) - directory services that virtual organizations
(VOs) can use to manage organization membership,
member attributes and privileges - perhaps some form of authorization service
- in the future, some knowledge management services
that have the characteristics of an ESnet service
are also likely to be important - ESnet is seeking the addition funding necessary
to develop, deploy, and support these types of
middleware services.
42Conclusions
- ESnet is an infrastructure that is critical to
DOEs science mission and that serves all of DOE - Focused on the Office of Science Labs
- ESnet is evolving its architecture and services
strategy to need the stated requirements for
bandwidth, reliability, QoS, and Grid and
collaboration supporting services