Title: Operational Architecture of PLGrid project
1Operational Architectureof PL-Grid project
- M.Radecki, T.Szepieniec, M.Krakowian, M.Tomanek
Cracow Grid Workshop Cracow, 12.10.2009
- Customers in Grid and issues to be solved
- VO manager
- Resource Provider
- VO user
- Site Administrator
- PL-Grid approach to resource sharing
- Bazaar tool
- Other operational consequences
- Information monitoring system improvements
- Solving operational issues in PL-Grid
3Grid Customers and Issues
- Virtual Organization Manager
- how could I obtain (more) resources for my VO?
- does QoS for provided resources satisfiy me?
- VO User
- I want to use the right resources i.e. these
which are well configured and supported - Resource Provider
- Are all VOs satisfied with resources they get?
- Site Administrator
- ooops, something failed in grid middleware at my
site, how to fix it quickly?
4Linking Customer with Provider
- Well-defined procedure for VO Managers and
Resource Providers for making an agreement on
sharing resources is indispensable
5Resource-related SLA and Bazaar tool
- PL-Grid will use Bazaar tool for implementing
process of resource negotiation between VO
manager and Resouce Provider - Example contract parameters
- time boundaries of the contract
- number of CPUs, disk space with the Quality of
Service (guaranteed, best effort etc.) - availablity/reliability of resources
- average acknowledge/response time to trouble
6Profits from having resource-related SLAs
- Communication channel between parties established
- Agreement can be monitored
- need accounting data
- monitoring availability/reliability of services
- trouble ticket ack/response time
- First step towards business model in grid
- Impact on operations
- verification (certification) of resources for
particular contract - only certified resources accessible for VO users
7Infrastructure Monitoring
- Typically one dedicated VO for monitoring all
resources in Grid (e.g. ops VO in EGEE) - requirement on sites to support this VO for
monitoring purposes - configured as high-priority VO
- not always reflects the status of the site
according to other VOs - PL-Grid uses regular VOs for monitoring
- special role configured within VO
- high priority for jobs executed with this role
- requires subscription of a technican's
certificate as VO member - reflects the VO status at a site
- site's service availability/reliability measured
within real VOs
8Grid Information System Improvements
- Use VO-level Information Systems instead of the
global instance - VO-scope makes sense for user
- better scalability no longer global grid service
- easier to manage
- can be handled by the VO as other VO-services VO
Membership Service, File Catalogue, Resource
Broker, etc. - big VOs may take an effort to establish a
high-availability cluster for information
service, smaller ones will not require that - reduces the network traffic by localizing it
- Include information only about sites which have
an active contract with the VO and resources were
verified (certified) - Require a separate instance of Information System
including all sites for testing/certification
9Solving Grid Operations Problems
- Site Administrator's perspective
- many sources of data wiki pages, GGUS knowledge
base - many of them outdated, not providing working
recipe - Customers (VOs) are pressing on
availability/reliability of resources - need for quick problem solving
- need for interactive support e-mail not always
efficient - PL-Grid support structures
- Actors
- Site Administrators
- 1st line support Team
- Regional Operator on Duty (aka ROD)
10Use case Operations Problem Handling
Monday, 7 P.M.
Tuesday, 8 A.M.
Tuesday, 9 A.M.
Tuesday, 7 P.M. 24h passed
Wednesday, 8 A.M.
- Identified need for new operational tool for
Resource Allocation - fill a gap between VO Managers and Resource
Providers - Improvements to Information System related to RA
and VO-scope - to provide VO User with list of reliable,
well-configured and supported resources - Infrastructure monitoring should be realized
within the real, existing VOs - Procedures for fixing problems can be more
efficient with - knowledge base to find if somebody else
encountered the problem beforehttps//weblog.plgr
id.pl/baza-wiedzy/ - 1st line support team to get interactive contact
with the expert - Polish NGI got assigned share in EGI global tasks
related to - Coordination of resource allocation (O-E-10) -
Poland - Grid Operation and oversight (O-E-5) -
Netherlands and Poland
12PL-Grid news (tentative) user registration open!
- http//www.plgrid.pl/oferta/infrastruktura/wyprob