Title: Technical Roadmap and evolving Grid3 into OSG
1Technical Roadmap and evolving Grid3 into OSG
- Rob Gardner
- University of Chicago
- rwg_at_hep.uchicago.edu
- OSG Workshop
- Harvard University
- Sep 9, 2004
2Outline
- Snapshot of present Grid3 status
- OSG Service Categories from the Blueprint Group
- Materialization of OSG v0
3Grid3 at a Glance
- Grid environment built from core Globus and
Condor middleware, as delivered through the
Virtual Data Toolkit (VDT) - GRAM, GridFTP, MDS, RLS, VDS, VOMS,
- equipped with VO and multi-VO security,
monitoring, and operations services - allowing federation with other Grids where
possible, eg. CERN LHC Computing Grid (LCG) - USATLAS GriPhyN VDS execution on LCG sites
- USCMS storage element interoperability
(SRM/dCache) - Delivering the US LHC Data Challenges
4Grid3 Design
- Simple approach
- Sites consisting of
- Computing element (CE)
- Storage element (SE)
- Information and monitoring services
- VO level, and multi-VO
- VO information services
- Operations (iGOC)
- Minimal use of grid-wide systems
- No centralized resource broker, replica/metadata
catalogs, or command line interface - to be provided by individual VOs
5Site Services and Installation
- Goal is to install and configure with minimal
human intervention - Use Pacman tool and distributed software caches
- Registers site with VO and Grid3 level services
- Accounts, application install areas working
directories
four hour install and validate
pacman get iVDGLGrid3
Grid3 Site
app
VDT VO service GIIS register Info providers Grid3
Schema Log management
tmp
Compute Element
Storage
6Multi-VO Security Model
VOMS servers
- DOEGrids Certificate Authority
- PPDG and iVDGL RA with VO or site sponsorship
- Automated multi-VO authorization, using VOMS
- Each VO manages a service and its members
- Each Grid3 site is able to generate a Globus
gridmap file with an authenticated SOAP query to
each VO service - Site-specific adjustments or mappings
SDSS
US CMS
US ATLAS
Grid3 gridmap
Grid3 Sites
BTeV
LSC
iVDGL
7Collective Monitoring Framework
Intermediaries
Producers
Consumers
OS (syscall, /proc)
WWW
GRIS
Reports
Log files
System config.
Job manager
MonALISA client
MDViewer
8I/O and Job Monitors
9Grid3 a snapshot
Jan
- 30 sites, multi-VO
- shared resources
- 3000 CPUs
10Opportunistic use of Grid3
Grid3, non-CMS (blue)
Events produced vs. day
dedicated (red)
11US ATLAS DC2
12Overall last 6 months
atlas dc2
cms dc04
13Grid3 and Testbed Infrastructures
Grid3
Grid3dev
- VDT 1.1.14
- Authentication Service
- Approved VOMS servers
- Monitoring Service
- catalog
- MonALISA
- ganglia
- ACDC
- Stable software cache
- Running applications
- Operations support
- VDT 1.2.x
- Authentication Service
- new VOMS server(s)
- Monitoring Service
- GridCat (test)
- MonALISA (test)
- Ganglia (test)
- MDS (test)
- Policy information provider
-
- Development caches
- Less support
and VO specific development testgrids,
VDT-testers,
14Grid3 evolving to OSG v0
- Timeline
- Current Grid3 remains stable through 2004
- Updates at VDT level mostly
- Service development continues
- Specialized test environments, small development
testbeds - Grid3dev platform
- Start with stable, VDT base
- Add new services (Storage, Policy), configuration
- Service, application certification
- Mechanisms for updates
15Service Categories and TGs
- Security
- Storage
- Just forming
- Policy
- Monitoring and Information Services
- Operations and Support Centers
- OSG Technical Groups (TG) associated with each
16Federation of Grids/Interoperability
VO
campus grid
17Materializing OSG v0
VOs apps
TG MonInfo
TG Policy
OSG v0 integration
TG Storage
TG Security
TG Support Centers
Chairs
18Activities
- Initial set
- Security, Operations, Storage, Policy,
Monitoring/Info - Continued input from Architecture BP activity
- Specific programs of work coordinated advised by
the TGs - Activity Co-Chairs responsibilities
- Developing programs of work
- Arrange periodic meetings, provide minutes
- Participate in v0 weekly integration group
meetings
19Summary, Afternoon session
- Review of Technical Group charters, org.
- Blueprint roadmap in more detail
- Breakout sessions for specific Activities
- Definition, scope in service categories
- Development of programs of work
- Opportunities for contributions
- Working structure of Activities
- Looking for volunteers to fill in Activity chairs
20appendix
21Migration Strategies
New Service
dev software cache
pacman update dev
grid3
g3dev
osg v0
resource
pacman update prod
prod software cache
22Storage
- Mix of systems drawn from existing efforts
- Tier1 MSS strategic storage
- Fermilab GLOW, Run II
- BNL, LBNL SRM/HRM
- USCMS Tier2 SRM/dCache
- SRM/Nest available for all sites
- SRM/DRM, SRM/NFS,
- Closely coupled to VO requirements
- Grid-wide replica location service
- Links to monitoring and info services
23Monitoring and Information Services
- Baseline Set
- Grid3 MDS, GridCat, MonALISA, Ganglia
- Augmentation
- new MDS with archival, dynamic providers, eg.
Hawkeye, trigger service, new web tools - Troubleshooting
- Discovery Service
- Clarens and MonALISA based
- First instance to support CMS analysis
- Provisions for other services
- Accounting Service
24Operations and Support Centers
- Multi-VO infrastructure, help, and
troubleshooting - Centers and communication, coordination
- iGOC
- VO and Tier1 centers
- Sites
- Specific models need to be developed
- BNL Tier1-iGOC operations model developed for DC2
25Security I
- Authentication
- current PKI acceptable as baseline
- future scaling and operations issues to be
studied - authorization layer not in place
- VO membership service
- Attributes attached to proxies
- Infrastructure available user training use of
attributes - Acceptable Use Policy
- Goal is to minimize requirements
- Open-ness beyond core
- LCG, TG policies under study
- Authorization Attributes
- VOMS is the defacto standard
- Commitment to move to the GGF OGSA-AuthZ standard
- Authorization decision/ enforcement software is
not as well developed. No clear timeline - Policy enforcement points
- GRAM authZ callout, GT4
- Security Audit
- The planned "mock incident" service challenge
should help understand status and needs.
more topics at end
26Security II
- Restricted Proxies
- RD required, not started
- Lifetime VOMS group selection
- Intrusion detection
- Use existing infrastructure
- Monitoring and troubleshooting tools needed
- Need to get groups working together
- Testing services
- No coordinated testing plan yet
- Need to understand who is allowed to do what
- Must be consistent with acceptable use
agreements
- Recovery Procedures
- Users and service providers need to understand
what is involved - Walk through full system recover for widescale
breach - Policy Management
- Manual for the near term
- Rely on advancements in authorization framework
- Publication and version control are immediate
issues - Incident Response
- A first draft of a response plan has just been
released. - The secure communications network has not yet
been built tbut has begun design