A Simple Virtual Organisation Model and Practical Implementation - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

A Simple Virtual Organisation Model and Practical Implementation

Description:

The real problem underlying the Grid is coordinated resource ... They define the VO as the set of individuals/institutions for which ... Windmill (ATLAS) ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 29
Provided by: LyleW5
Category:

less

Transcript and Presenter's Notes

Title: A Simple Virtual Organisation Model and Practical Implementation


1
A Simple Virtual Organisation ModelandPractical
Implementation
  • AusGrid05
  • Feb 2005
  • Lyle Winton, University of Melbourne

2
VO Origins and Definition
  • The Anatomy of the Grid Foster, Kesselman,
    and Tuecke
  • The real problem underlying the Grid is
    coordinated resource sharing in dynamic,
    multi-institutional virtual organisations! (VOs)
  • Conditions and rules for resource sharing
  • negotiated btwn. resource provider consumer
  • They define the VO as the set of
    individuals/institutions for which these apply.
  • VO is only half the picture!
  • VO may have access to facilities they do not own
    or manage
  • Facilities are organisations with security and
    policy concerns.
  • Facilities may span multiple VOs and other user
    communities.
  • ObservationA Grid is a complex network of
    organisations and connecting policies.
  • user communities (traditionally called VOs)
  • resource facilities
  • negotiated policies for resource access and
    sharing
  • certifying authorities (CAs for GSI based)

3
VO Requirements
  • Requirements from the community(EU DataGrid WP6
    2002 Foster, Kesselman, and Tuecke)
  • Users may be members of any number of VOs
  • A resource can participate in one or more VO
  • User may have any number of roles within a given
    VO
  • VOs must be able to specify membership policy
  • A users VO membership must remain confidential
  • A resource owner must be able to allow
    authorisation by VO and VO role membership
  • It should be possible to list resources and
    actions to which a VO member or role has access
  • It should be possible to list resources to which
    a VO member or role has access to carry out
    specific actions
  • Authorisation decisions must be consistent within
    a VO
  • It must be possible to disable a users VO
    authorisation
  • The VO must be able to specify security
    requirements on any resource for specific roles
  • A user must be able to select and deselect VOs
    and roles
  • It must be possible to assign job priorities
    within resources
  • Looking at requirements we can identify some base
    objects
  • The VO, users/members, groups or roles, resources
    , priorities, authorities
  • Authorities and Priorities are important areas,
    often overlooked

4
VO In Practise
  • Grid Middleware
  • In literature the term VO used to describe groups
    of users and sometimes resources
  • In a few cases information systems have been
    developed to represent the VO
  • VO Information Systems
  • Allows the development of tools to coordinate
    resource sharing
  • Configuration tools for resources
  • Complete authentication systems
  • Most do not take into account
  • Certifying authorities
  • Facility's policies and priorities (geared
    towards users)
  • VOs internal task and role priorities

5
Why CAs in VO - LCG Example
  • Users with Single Credential from Trusted Network
    of Authorities
  • LHC Computing Grid (99 Compute Elements)

6
Why CAs in VO - LCG Example
  • Users with Single Credential from Trusted Network
    of Authorities
  • LHC Computing Grid (28 Certificate Authorities)
  • Problem New CA is added to the network.
    Resource configuration for CAs is shipped with
    Grid Middleware and updates.

7
Why CAs in VO Generic Case
  • Users with Single Credential from Multiple
    Authorities
  • Problem new user/resource with certificate from
    a different CA is added to the VO. The new CA
    trust policy must be deployed to all resources.

8
Why CAs in VO
  • CAs effect the conditions for resource sharing
  • VO owned services/tools must trust all members
    CAs
  • Participating resources must trust all VO
    members CAs
  • VO services/tools and Participating resources
    must trust all resources CAs
  • Why should resources trust the VOs list of CAs?
  • If they want to participate, they must.
  • Generally, the VO knows its member institutions
    and who can best certify their members.
  • If a CA starts issuing bad certificates the VO
    can untrust them. (until CRLs are fixed?)

9
Existing VO Implementations
  • EU DataGrid and NorduGrid VO
  • LDAP based VO Information System
  • LDAP structure represents Users, Groups, and
    Roles
  • mkGridmap tool queries LDAP service then
    generates local resource grid-mapfile
  • NorduGridmap tool extended by NorduGrid community
  • Problems
  • Very simple VO model ignores CAs and VO
    priorities
  • Limited resource policies mapping of VO members
    to shared accounts or ranges of accounts

10
Existing VO Implementations
  • VOMS (VO Membership Service)
  • Extends Globus GSI security
  • 3 components
  • Server holds user, group, and role info
  • Client generates credentials (proxy) containing
    additional role information
  • VOMS enabled gatekeeper service
  • Authorisation split into 2 areas of
    responsibility
  • Users relationship with VO (VOMS server level)
  • Users access and usage of resources (resource
    level)
  • Problems
  • No CA information is available from VOMS
  • Not clear users resource usage (eg. priorities)
    determined solely by resource facilities
  • Urgent changes in VO priorities may require
    renegotiation with resource facilites.

11
Existing VO Implementations
  • CAS (Community Authorization Service)
  • 3 components
  • CAS server contains info on users, groups,
    resources, and access policies
  • Client requests authentication from the server
    for a specific action/role (using proxy)
  • CAS enabled gatekeeper service
  • Server returns signed policy assertion embedded
    into new credentials (proxy), gatekeeper reads
    this
  • Policies consistently control access (to data
    etc.) eliminating least common denominator
    problem
  • VOs can allocate/deallocate resource blocks to
    individuals and groups (coarse grained priority
    management)
  • Problems
  • Still need to manually configure CAs at each
    resource
  • Initially CAS delegated its own credentials on
    users behalf
  • Claimed that allowing VOs to specify access/usage
    policies breaks the Grid model! (VOMS claim, but
    not true)

12
Experiences
  • Grid2003 HPC challenge
  • Project lead by Raj Buyya
  • Attempt to construct largest testbed
  • Grew from several resource at University of
    Melbourne to 218 resources in 50 locations across
    21 countries.
  • Australian Belle Production Grid
  • Belle experiment, KEK B-factory in Japan.
  • Collaboration of 400 physicists from 50
    institutions.
  • Australia took part in 4x109 event MC production
    during 2004

13
Experiences Grid2003
  • Updating user and CA configuration was time
    consuming and problematic.
  • Manual configuration led to errors.
  • Automation of this was recognised as desirable.

14
Experiences Belle Production
  • Accessible resources for Belle
  • Access to around 120 CPU(over 2 GHz)
  • APAC, AC3, VPAC, ARC
  • not all Grid accessible
  • much production performedwithout Grid middleware
  • Access to ANUSF petabytestorage facility (via
    SRB)
  • Will request 10 TB for Belledata.
  • Problems
  • Simplest access method is to share oneaccount at
    each facility. Some resourcepolicies forbid
    this.
  • Each facility has an account applicationprocedure
    . Typically, this is a manual processand
    requires intervention from several people.

15
A Simple VO Model
  • start with all information necessary
  • User/Service identification (certificate ID)
  • Groups and Roles (user and service collections)
  • Trusted user/service certifying bodies (CAs)
  • Trusted resource certifying bodies (CAs)
  • Untrusted certifying bodies and identities.
  • Priorities assigned to Users, Groups, Roles
  • extended the efforts from EU DataGrid and
    NorduGrid

16
A Simple VO Model
  • VO Information Service

Belle
Uni. ofMelbourne
AnalysisWork Group
17
A Simple VO Model
  • Resource Configuration Manager
  • Diagram???

18
A Simple VO Model
  • Resource Configuration Manager (GridMgr v3.0)
  • Rewritten from NorduGridmap, originally from EU
    DataGrid
  • Available resource usage policies
  • Map VO/groups/roles/users to shared accounts
    (existing func.)
  • Manually map individuals (existing func.)
  • Map VO/groups/roles/users to a range of accounts
  • Restriction of mappings to local Unix groups
  • Mapping of users to individual accounts via full
    name
  • Denial of access to VO/group/role/users
  • Security requirements
  • Valid full name matching only one account
  • No new or modified system accounts (without
    approval)
  • Admin approval of new/modified Users and CAs
  • Valid account group (eg. can specify non-root)
  • No shared accounts (optional)
  • Allow or Deny by pattern match
  • Reporting and notification of users failing
    requirements
  • Update of certificate revocation lists (CRLs)
    provided by CAs
  • Advanced notification of host certificate expiry

19
A Simple VO Model
  • What weve got so far
  • VO Information System allows a VO to define its
    structure, authorisation, and security policy
    (limited) independent of resources.
  • Resource Configuration Manager allows a resource
    to easily maintain a range of local security and
    access policies for multiple VOs
  • Whats left
  • How does a VO manage its priorities?
  • eg. Some tasks might be critical to the VO!!!
  • Problem is yet to become apparent
  • many production Grids are effectively single user
  • within some Grids resources are specifically
    allocated or underutilised

20
Managing Priorities
  • Tradition Cluster Computing
  • Locally managed queue determinesusers job
    priorities
  • Jobs execute (are pulled from queue)when
    resource become available
  • (Globus) Grid Computing
  • A resources local priority cannot bedetermined
    until jobs are submitted/completed
  • Jobs are submitted (pushed) to the local resource
    queues
  • Problems with a push mechanism
  • Jobs waiting in long queues could potentially be
    run elsewhere. (submit each job to multiple
    places?)
  • Heavily utilised resources may never appear
    free, but may have short queue times.
  • Large, Fast, or apparently free resources may
    have a low local priority for your job.

21
Managing Priorities
  • Alternative mechanism
  • Allow job consumers to pull jobs when resources
    become available or queues become short.
  • Eliminates the need for users to determine local
    resource priority
  • Jobs consumed from a central VO Managed Queue
  • VO priorities can be managed by allowing some
    jobs to be consumed first
  • Jobs only leave the VO queue when they will be
    run. No idol time on resource queues, can still
    be run anywhere.

22
VO Managed Job Queue
  • VO Managed Queue Service (Prototype)
  • Web service with simple authentication
  • User submission of job with an optional
    role(also job management list jobs, status of
    jobs, delete jobs)
  • Overall job priority determined by VO Information
    Serveruser or group priority, or specified role
    priority
  • facility for Resource to pull of jobs, highest
    priority first
  • facility for Resource to reserve/release job for
    execution
  • and Resource can flag a job failed/completed
  • Resource level Job Consumer (Simulated)
  • Extract jobs and priorities from multiple VO
    queues
  • single VO can host multiple queues for
    scalability
  • VO Info Server ensures consistent priorities
  • Convert VO priority to local priority
  • each VO has a local priority range
  • priorities from 0 to infinity are attenuated to
    within the range
  • simple formula maps priorities below 200 to lower
    80 of range
  • Simulate allocating successfully reserved job to
    CPU resources
  • Simulate job completion

23
VO Queue - Simulation Results
  • Simple Simulation Run
  • 3 VOs with 10 Users each of varying priority
  • 1 VO Queue for each VO, each attached to a VO
    Info System
  • 10 resources of 10-50 job slots, accepting jobs
    from all VOs with varying local priority for each
  • Each user periodicallysubmitted 10-50 jobs
  • Average of 3 timesmore jobs than slots!
  • Saturated Gridqueue time ? 0

24
VO Queue - Simulation Results
Completed jobsall VOs, all Resources
Completed jobsall VOs, one Resource
Completed jobsall VOs, one Resource(VO at 27.5
is missing)
Incomplete (queued) jobsall VOs, one Resource
25
VO Queue - Simulation Results
  • Brief outline of results
  • Queue-time/Queue-size vs. VO Job Priority was too
    complex to analyse.
  • Focusing on a single resource Low VO Priorities
    Jobs tended to take longer
  • However, Low VO Priority Jobs were more likely to
    left in the queue an not complete!
  • Lock-out occurred for low priority jobs
  • In fact, looking at Mean Resource Priority, one
    resource (mean priority of 27.5) was locked out
    completely!

26
VO Queue - Future Development
  • Dynamic Job Priorities (perhaps Fairshare?)
  • VOs can specify target fraction of resource usage
    or job submission
  • Eg. 20 fairshare target for particular Working
    Group
  • If few jobs submitted by group (lt 20) job
    priority is increased
  • If too many jobs have been submitted (gt 20) job
    priorities are decrease relatively
  • Facilities can specify a target fraction of
    resource usage for each VO
  • Advantages
  • help prevent job lock-out
  • VO can specify fine-grained allocation of
    resource, without allocating specific resources
  • Facilities can implement SLAs based on resource
    usage
  • VO Queue deficiencies
  • Advanced resource brokering is needed
  • Is a job appropriate or good match for
    resources? (data access)
  • Do jobs constantly fail (silently) on a resource?
    (black hole effect)

27
VO Queue - Future Development
  • Integration with ATLAS (LCG) tools?
  • Don Quijote Windmill (ATLAS)
  • Supervisor (central queue) Executor (job
    consumer, EDG/LCG, Grid3, NorduGrid)
  • Executor requests number of jobs, Supervisor
    pushes jobs
  • ATLAS Data Challenge Production across 3 Grids
  • AtCom (ATLAS Commander)
  • submit multiple jobs tightly coupled with AMIdb
  • EDG/LCG test plugin NorduGrid production plugin
  • select data or param sweep -gt select operation
    (transform)
  • No Resource Broker or Scheduler, allocate by
    hand!

28
Summary
  • The Simple VO Model
  • Easy to implement a VO Information System (via
    OpenLDAP)
  • Proved sufficient for development of tools aiding
    deployment and configuration
  • Provided encouraging results towards the
    independent management of VO and facility
    priorities.
  • Developed Tools (for VO Information System)
  • GridMgr v3.0 (production ready)
  • allows for a wide range of facility security
    policies to co-exist with VO membership policies
  • VO Managed Queueing System (prototype)
  • could help coordinate the use of resources based
    on VO priorities assigned to groups, roles, and
    users (supported by simulation)
  • future investigation required to prevent lock-out
    and better allocate resource fractions.
Write a Comment
User Comments (0)
About PowerShow.com