ESNet Update Joint Techs Meeting, July 19, 2004 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

ESNet Update Joint Techs Meeting, July 19, 2004

Description:

1. ESNet Update. Joint Techs Meeting, July 19, 2004 ... Vaulted Root CA. HSM. DOEGrids Security. RAs and. certificate applicants. 20 ... – PowerPoint PPT presentation

Number of Views:281
Avg rating:3.0/5.0
Slides: 33
Provided by: joe5
Category:

less

Transcript and Presenter's Notes

Title: ESNet Update Joint Techs Meeting, July 19, 2004


1
ESNet UpdateJoint Techs Meeting, July 19, 2004
  • William E. Johnston, ESnet Dept. Head and Senior
    Scientist
  • R. P. Singh, Federal Project Manager
  • Michael S. Collins, Stan Kluz,Joseph Burrescia,
    and James V. Gagliardi, ESnet Leads
  • and the ESnet Team
  • Lawrence Berkeley National Laboratory

2
ESnet Connects DOE Facilities and Collaborators
CAnet4 CERN MREN Netherlands Russia StarTap Taiwa
n (ASCC)
PNWG
SEA HUB
ESnet IP
Abilene
Japan
Abilene
Chi NAP
NY-NAP
QWEST ATM
Abilene
MAE-E
SNV HUB
MAE-W
PAIX-E
Fix-W
PAIX-W
Euqinix
Abilene
42 end user sites
Office Of Science Sponsored (22)
International (high speed) OC192 (10G/s
optical) OC48 (2.5 Gb/s optical) Gigabit Ethernet
(1 Gb/s) OC12 ATM (622 Mb/s) OC12 OC3 (155
Mb/s) T3 (45 Mb/s) T1-T3 T1 (1 Mb/s)
NNSA Sponsored (12)
Joint Sponsored (3)
Other Sponsored (NSF LIGO, NOAA)
Laboratory Sponsored (6)
ESnet core Packet over SONET Optical Ring and
Hubs IPv6 backbone and numerous peers
peering points

hubs
SNV HUB
3
CAnet4 CERN MREN Netherlands Russia StarTap Taiwa
n (ASCC)
Australia CAnet4 Taiwan (TANet2) Singaren
GEANT - Germany - France - Italy - UK - etc
SInet (Japan) KEK Japan Russia (BINP)
KDDI (Japan) France
PNW-GPOP
SEA HUB
2 PEERS
Distributed 6TAP 19 Peers
Abilene
Japan
1 PEER
CalREN2
NYC HUBS
1 PEER
LBNL
Abilene 7 Universities
SNV HUB
5 PEERS
Abilene
2 PEERS
PAIX-W
26 PEERS
MAX GPOP
MAE-W
22 PEERS
39 PEERS
20 PEERS
FIX-W
6 PEERS
3 PEERS
LANL
CENIC SDSC
Abilene
ATL HUB
TECHnet
ESnet provides complete access to the Internet by
managing the full complement of Global Internet
routes (about 150,000) at 10 general/commercial
peering points high-speed peerings w/ Abilene
and the international networks.
Commercial
ESnet Peering (connections to other networks)
University
International
Commercial
4
ESnet New Architecture Goal
  • MAN rings provide dual site and hub connectivity
  • A second backbone ring will multiply connect the
    MAN rings to protect against hub failure

5
First Step SF Bay Area ESnet MAN Ring
Seattle and Chicago
  • Increased reliability and site connection
    bandwidth
  • Phase 1
  • Connects the primary Office of Science Labs in a
    MAN ring
  • Phase 2
  • LLNL, SNL, andUC Merced
  • Ring should not connect directly into ESnet SNV
    hub (still working on physical routing for this)
  • Have not yet identified both legs of the mini ring

Chicago
Joint Genome Institute
LBNL
NERSC
SF BA MAN ring topology phase 1
SF Bay Area
mini ring
SLAC
Qwest /ESnet hub
Level 3hub
NLR / UltraScienceNet
Existing ESnet Core Ring
El Paso
LA and San Diego
6
Traffic Growth Continues
ESnet Monthly Accepted Traffic ESnet is currently
transporting about 250 terabytes/mo.
TBytes/Month
Annual growth in the past five years has
increased from 1.7x annually to just over 2.0x
annually.
7
Who Generates Traffic, and Where Does it Go?
ESnet Inter-Sector Traffic Summary,Jan 2003 /
Feb 2004 (1.7X overall traffic increase, 1.9X OSC
increase) (the international traffic is
increasing due to BABAR at SLAC and the LHC tier
1 centers at FNAL and BNL)
72/68
21/14
Commercial
14/12
DOE is a net supplier of data because DOE
facilities are used by universities and
commercial entities, as well as by DOE researchers
ESnet
25/18
17/10
RE (mostlyuniversities)
DOE sites
10/13
Peering Points
53/49
9/26
International
DOE collaborator traffic, inc.data
4/6
Note that more that 90 of the ESnet traffic is
OSC traffic ESnet Appropriate Use Policy
(AUP) All ESnet traffic must originate and/or
terminate on an ESnet an site (no transit traffic
is allowed)
Traffic coming into ESnet Green Traffic leaving
ESnet Blue Traffic between sites of total
ingress or egress traffic
8
ESnet Top 20 Data Flows, 24 hrs., 2004-04-20
A small number of science users account for a
significant fraction of all ESnet traffic
SLAC (US) ? IN2P3 (FR)
1 terabyte/day
Fermilab (US) ? CERN
SLAC (US) ? INFN Padva (IT)
Fermilab (US) ? U. Chicago (US)
U. Toronto (CA) ? Fermilab (US)
CEBAF (US) ? IN2P3 (FR)
INFN Padva (IT) ? SLAC (US)
DFN-WiN (DE) ? SLAC (US)
Fermilab (US) ? JANET (UK)
SLAC (US) ? JANET (UK)
DOE Lab ? DOE Lab
Argonne (US) ? Level3 (US)
DOE Lab ? DOE Lab
Fermilab (US) ? INFN Padva (IT)
Argonne ? SURFnet (NL)
IN2P3 (FR) ? SLAC (US)
9
Top 50 Traffic Flows Monitoring 24hr 2 Intl
and 2 Commercial Peering Points
10 flowsgt 100 GBy/day
More than 50 flowsgt 10 GBy/day
10
Disaster Recovery and Stability
  • Engineers, 24x7 Network Operations Center,
    generator backed power
  • Spectrum (net mgmt system)
  • DNS (name IP address translation)
  • Eng database
  • Load database
  • Config database
  • Public and private Web
  • E-mail (server and archive)
  • PKI cert. repository and revocation lists
  • collaboratory authorization service
  • Remote Engineer
  • partial duplicate infrastructure

DNS
Remote Engineer
Duplicate Infrastructure Currently deploying full
replication of the NOC databases and servers and
Science Services databases in the NYC Qwest
carrier hub
  • Remote Engineer
  • partial duplicate infrastructure
  • The network must be kept available even if, e.g.,
    the West Coast is disabled by a massive
    earthquake, etc.
  • high physical security for all equipment
  • non-interruptible core - ESnet core operated
    without interruption through
  • N. Calif. Power blackout of 2000
  • the 9/11/2001 attacks, and
  • the Sept., 2003 NE States power blackout
  • Reliable operation of the network involves
  • remote Network Operation Centers (3)
  • replicated support infrastructure
  • generator backed UPS power at all critical
    network and infrastructure locations

11
Disaster Recovery and Stability
  • Duplicate NOC infrastructure to AoA hub in two
    phases, complete by end of the year
  • 9 servers dns, www, www-eng and noc5 (eng.
    databases), radius, aprisma (net monitoring), tts
    (trouble tickets), pki-ldp (certificates), mail

12
Maintaining Science Mission Critical
Infrastructurein the Face of Cyberattack
  • A Phased Response to Cyberattack is being
    implemented to protects the network and the ESnet
    sites
  • The phased response ranges from blocking certain
    site traffic to a complete isolation of the
    network which allows the sites to continue
    communicating among themselves in the face of the
    most virulent attacks
  • Separates ESnet core routing functionality from
    external Internet connections by means of a
    peering router that can have a policy different
    from the core routers
  • Provide a rate limited path to the external
    Internet that will insure site-to-site
    communication during an external denial of
    service attack
  • Provide lifeline connectivity for downloading
    of patches, exchange of e-mail and viewing web
    pages (i.e. e-mail, dns, http, https, ssh, etc.)
    with the external Internet prior to full
    isolation of the network

13
Phased Response to Cyberattack
ESnet third response shut down the main peering
paths and provide only limited bandwidth paths
for specific lifeline services
ESnet second response filter traffic from
outside of ESnet
ESnet first response filters to assist a site
peeringrouter
X
X
router
ESnet
router
LBNL
attack traffic
router
X
borderrouter
  • Lab first response filter incoming traffic at
    their ESnet gateway router

gatewayrouter
peeringrouter
border router
Lab
gatewayrouter
Lab
  • Sapphire/Slammer worm infection created a Gb/s of
    traffic on the ESnet core until filters were put
    in place (both into and out of sites) to damp it
    out.

14
Phased Response to Cyberattack
15
Grid Middleware Services
  • ESnet is the natural provider for some science
    services services that support the practice of
    science
  • ESnet is trusted, persistent, and has a large
    (almost comprehensive within DOE) user base
  • ESnet has the facilities to provide reliable
    access and high availability through assured
    network access to replicated services at
    geographically diverse locations
  • However, service must be scalable in the sense
    that as its user base grows, ESnet interaction
    with the users does not grow (otherwise not
    practical for a small organization like ESnet to
    operate)

16
Grid Middleware Requirements (DOE Workshop)
  • A DOE workshop examined science driven
    requirements for network and middleware and
    identified twelve high priority middleware
    services (see www.es.net/research)
  • Some of these services have a central management
    component and some do not
  • Most of the services that have central management
    fit the criteria for ESnet support. These
    include, for example
  • Production, federated RADIUS authentication
    service
  • PKI federation services
  • Virtual Organization Management services to
    manage organization membership, member attributes
    and privileges
  • Long-term PKI key and proxy credential management
  • End-to-end monitoring for Grid / distributed
    application debugging and tuning
  • Some form of authorization service (e.g. based on
    RADIUS)
  • Knowledge management services that have the
    characteristics of an ESnet service are also
    likely to be important (future)

17
Science Services PKI Support for Grids
  • Public Key Infrastructure supports cross-site,
    cross-organization, and international trust
    relationships that permit sharing computing and
    data resources and other Grid services
  • DOEGrids Certification Authority service provides
    X.509 identity certificates to support Grid
    authentication provides an example of this model
  • The service requires a highly trusted provider,
    and requires a high degree of availability
  • Federation ESnet as service provider is a
    centralized agent for negotiating trust
    relationships, e.g. with European CAs
  • The service scales by adding site based or
    Virtual Organization based Registration Agents
    that interact directly with the users
  • See DOEGrids CA (www.doegrids.org)

18
ESnet PKI Project
  • DOEGrids Project Milestones
  • DOEGrids CA in production June, 2003
  • Retirement of initial DOE Science Grid CA (Jan
    2004)
  • Black rack installation completed for DOE Grids
    CA (Mar 2004)
  • New Registration Authorities
  • FNAL (Mar 2004)
  • LCG (LHC Computing Grid) catch-all near
    completion
  • NCC-EPA in progress
  • Deployment of NERSC myProxy CA
  • Grid Integrated RADIUS Authentication Fabric
    pilot

19
DOEGrids Security
Bro Intrusion Detection
RAs andcertificate applicants
PKI Systems
Fire Wall
HSM
Internet
Secure racks
Secure Data Center
Vaulted Root CA
Building Security
LBNL Site security
20
Science Services Public Key Infrastructure
  • The rapidly expanding customer base of this
    service will soon make it ESnets largest
    collaboration service by customer count

Registration Authorities ANL LBNL ORNL DOESG (DOE
Science Grid) ESG (Climate) FNAL PPDG
(HEP) Fusion Grid iVDGL (NSF-DOE HEP
collab.) NERSC PNNL
21
ESnet PKI Project (2)
  • New CA initiatives
  • FusionGrid CA
  • ESnet SSL Server Certificate CA
  • Mozilla browser CA cert distribution
  • Script-based enrollment
  • Global Grid Forum documents
  • Policy Management Authority Charter
  • OCSP (Online Certificate Status Protocol)
    Requirements For Grids
  • CA Policy Profiles

22
Grid Integrated RADIUS Authentication Fabric
  • RADIUS routing of authentication requests
  • Support One-Time Password initiatives
  • Gateway Grid and collaborative uses standard UI
    and API
  • Provide secure federation point with O(n)
    agreements
  • Support multiple vendor / site OTP
    implementations
  • One token per user (SSO-like solution) for OTP
  • Collaboration between ESnet, NERSC, a RADIUS
    appliance vendor, PNNL and ANL are also involved,
    others welcome
  • White paper/report 01 Sep 2004 to support early
    implementers, proceed to pilot
  • Project pre-proposal http//www.doegrids.org/CA/R
    esearch/GIRAF.pdf

23
Collaboration Service
  • H323 showing dramatic increase in usage

24
Grid Network Services Requirements (GGF, GHPN)
  • Grid High Performance Networking Research Group,
    Networking Issues of Grid Infrastructures
    (draft-ggf-ghpn-netissues-3) what networks
    should provide to Grids
  • High performance transport for bulk data transfer
    (over 1Gb/s per flow)
  • Performance controllability to provide ad hoc
    quality of service and traffic isolation.
  • Dynamic Network resource allocation and
    reservation
  • High availability when expensive computing or
    visualization resources have been reserved
  • Security controllability to provide a trusted and
    efficient communication environment when required
  • Multicast to efficiently distribute data to group
    of resources.
  • Integrated wireless network and sensor networks
    in Grid environment

25
Priority Service
  • So, practically, what can be done?
  • With available tools can provide a small number
    of provisioned, bandwidth guaranteed, circuits
  • secure and end-to-end (system to system)
  • various Quality of Service possible, including
    minimum latency
  • a certain amount of route reliability (if
    redundant paths exist in the network)








  • end systems can manage these circuits as single
    high bandwidth paths or multiple lower bandwidth
    paths of (with application level shapers)
  • non-interfering with production traffic, so
    aggressive protocols may be used

26
Guaranteed Bandwidth as an ESNet Service
allocation will probably be relatively static and
ad hoc
  • A DOE Network RD funded project

bandwidthbroker
authorization
resource manager
policer
usersystem1
site A
resource manager
usersystem2
  • will probably be service level agreements among
    transit networks allowing for a fixed amount of
    priority traffic so the resource manager does
    minimal checking and no authorization
  • will do policing, but only at the full bandwidth
    of the service agreement (for self protection)

Phase 1
resource manager
usersystem2
Phase 2
site B
27
Network Monitoring System
  • Alarms Data Reduction
  • From June 2003 through April 2004 the total
    number of NMS up/down alarms was 16,342 or 48.8
    per day.
  • Path based outage reporting automatically
    isolated 1,448 customer relevant events during
    this period or an average of 4.3 per day, more
    than a 10 fold reduction.
  • Based on total outage duration in 2004,
    approximately 63 of all customer relevant events
    have been categorized as either Planned or
    Unplanned and one of ESnet, Site, Carrier
    or Peer
  • Gives us a better handle on availability metric

28
2004 Availability by Month
Jan. June, 2004 Corrected for Planned
Outages(More from Mike OConnor)
lt99.9available
gt99.9 available
Unavailable Minutes
29
ESnet Abilene Measurements
  • We want to ensure that the ESnet/Abilene cross
    connects are serving the needs of users in the
    science community who are accessing DOE
    facilities and resources from universities or
    accessing university facilities from DOE labs.
  • Measurement sites in place
  • More from Joe Metzger
  • 3 ESnet Participants
  • LBL
  • FERMI
  • BNL
  • 3 Abilene Participants
  • SDSC
  • NCSU
  • OSU

30
OWAMP One-Way Delay Tests Are Highly Sensitive
  • NCSU Metro DWDM reroute adds about 350 micro
    seconds

ms
Fiber Re-Route
42.0 41.9 41.8 41.7 41.6 41.5
31
ESnet Trouble Ticket System
  • TTS used to track problem reports for the
    Network, ECS, DOEGrids, Asset Management, NERSC,
    and other services.
  • Running Remedy ARsystem server and Oracle
    database on a Sun Ultra workstation.
  • Total external ticket 11750 (1995-2004),
    approx. 1300/year
  • Total internal tickets 1300 (1999-2004),
    approx. 250/year

32
Conclusions
  • ESnet is an infrastructure that is critical to
    DOEs science mission and that serves all of DOE
  • Focused on the Office of Science Labs
  • ESnet is working on providing the DOE mission
    science networking requirements with several new
    initiatives and a new architecture
  • QoS service is hard but we believe that we have
    enough experience to do pilot studies
  • Middleware services for large numbers of users
    are hard but they can be provided if careful
    attention is paid to scaling
Write a Comment
User Comments (0)
About PowerShow.com