Title: Open Science Grid: Interoperation and Interoperability on a Grid among Grids
1Open Science GridInteroperation and
Interoperabilityon a Grid among Grids
2Accomplishments need Sustained System Use at
Scale
Physics communities are depending on large scale
interoperation across grids for data movement and
job placement.
3Flexible Interfaces allow interoperation of
previously disjoint resources.
Engage the Resource Owners Administrators
- 96 Resources across
- production integration infrastructures
Sustaining through OSG submissions 3,000-4,000
simultaneous jobs . 10K jobs/day 50K
CPUhours/day. Peak test jobs of 15K a day.
20,000 CPUs (from 30 to 4000) 6 PB Tapes 4 PB
Shared Disk
Using production research networks
20 Virtual Organizations 6 operations Includes
25 non-physics.
4Engage the Science Communities
Common interfaces allow interoperation of
disparate storage implementations of different
performance, capacity and capabilities.
Over 2 months. Average of 1.6 Gigabits/sec. (25
of 2008 needs)
1 Petabyte
Italy
Taiwan
Brazil
USA
T3 -- university departments
T2 -- university campus facilities
T1 -- regional, national facilities
5Empower the Users
Submit locally, execute globally. Integration of
end-to-end environment for the user hides the
administrative and facility boundaries.
15,000 processsing jobs/day. 27 sites. Handful
of submission points. (test jobs at 55K/day).
6Challenge management, operational and technical
cooperation across diverse legislative domains.
Biggest risk is a security incident.
7SecurityNeed agreements and proactive response
across administrative interfaces
Accept and deal with local identity and security
needs and constraints of each grid
infrastructure. But note that incidents cross
grid boundaries.
- Case 1
- My site on A-Grid is running many X jobs from
CNYY, VOW which look quite different from the
rest of the jobs from W. - If I look up the application it seems to be well
outside the domain of W. - I tried to check if this user really is in the W
VO, but the database is not available. How do I
find out that he is a genuine X user. - Do VO managers have a way to trace and contact
users quickly? - How do I check with others sites and grids?
8Mitigate Risk need consistent policies and
agreements across legislative entities
- Case 2
- A machine offering grid services was
compromised. - The compromise turned out to not use grid
methods. - Nothing in the Grid was compromised.
- The Grid incident response team was notified
and a security assessment made. - A true attack against grid middleware might
harvest grid proxies at one site and use them at
another, just as sniffed passwords are used to
attack other machine. - How do we look for suspicious proxy use
cross-site and cross-grid? - How do we ensure good security practices by those
offering services across many different control
domains with different privacy and policy
requirements and constraints ?
9OSG starting to address and control risk.
84 Risks to date! Each has an understood
mitigation and control. Each needs understanding
of the interoperation interface and expectations.
10Another Challenge Get past sociological
mistrustSupport (for users and administrators)
must be transparent and reactive across
operational domains.
Need coordination between operations centers of
participating infrastructures. Must ensure all
problems are owned until solved. End-to-end
troubleshooting must involve people, software
and services from multiple infrastructures
organizations.
11How OSG is approaching Interoperation
121) Adaptors for differences in interfaces
VO or User that acts across grids
A(nother) Grid
OSG
Interface to Service-X
Adaptor between OSG-X and AGrid-X
Service-X
Service-X
13e.g. OSG - EGEE Interoperation for WLCG Resource
Selection
VO UI
VO RB
VO RB
VO RB
BDII
BDII
LDAP URLs/ CEMON
BDII/ CEMON
BDII
BDII
T2
BDII
T2
BDII/ CEMON
T2
T2
BDII
T2
T2
T2
T2
T2
T2
Site
Site
GRAM
GRAM
GRAM
GRAM
GRAM
GRAM
GRAM
GRAM
GRAM
GRAM
142) Sponsoring Common Interfaces e.g. Storage
Resource Manager
SRM implementations for gt6 Storage
Systems. Compatability testing a must for
semantics as well as the interfaces! Challenges
in how to evolve in response to user needs and
new capabilities.
153) Principle of SiteVO Interoperabilitye.g.
Security
Site
VO
Perceived Risk
Perceived Risk
Perceived Risk
Perceived Risk
Operate
Operate
n
n
Actual Risk?
Actual Risk?
y
y
Implement controls
Implement controls
164) Middleware Interoperability
- Want to avoid unnecessary duplication while not
stifling progress and evolution. - Want integrated systems of software from diverse
sources. - OSG strategy A common build and test
infrastructure to foster free flow of software
between the communities.
175) Grid of Grids - from Local to Global
National
Campus
Community
18e.g. resources are accessible from multiple
infrastructures
- Resource administration has local autonomy to
apply policy, priorities, security across access
from any path.
?
OSG CE gateway
Campus Interface
TeraGrid gateway
Local Batch System
Local Computing Resource
19Local Grids have adaptors to the National Grids
e.g. FermiGrid
Guest User
Common Gateway (Adaptor) Central Services
OSG
CDF User
Before FermiGrid
OSG
Existing
20Community (VO) environments are overlaid on the
physical infrastructure
- Encourage and help users/VOs to deploy their
applications and place their data "on the fly. - Support dynamic reservation and opportunistic
sharing of resources within and across VOs. - VO is responsible for intra-VO prioritization and
access control to the resources.
21Accept that a ubiqutious shared
cyberinfrastructure needs diversity and
heterogeneity harmonization will enable it to
flourish and mature.