glexec on worker nodes - PowerPoint PPT Presentation

About This Presentation
Title:

glexec on worker nodes

Description:

several VOs submit placeholder' jobs with (essentially) a single identity for ... gets user jobs in some' way and exetues them with the placeholder's identity ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 11
Provided by: david2676
Category:

less

Transcript and Presenter's Notes

Title: glexec on worker nodes


1
glexec on worker nodes
  • David Groep
  • NIKHEF

2
Why thing glexec? VO side
  • Background some VOs prefer to use their own
    scheduling job management
  • late binding of jobs to job slots
  • first establishing an overlay network
  • subsequent scheduling and starting of jobs is
    faster
  • hide details between the various grid flavours
  • implement VO priorities
  • full use of allocated slots, up to max wall clock
    time
  • but these VOs will need their own scheduler
  • some of them do have it already,
  • but then, others dont, so this must never be the
    only (or even the default) way of using resources

3
Sites glexec on WN requirements
  • Basic principle
  • VO supplied schedulers should comply with and
    implement
  • the same policies as corresponding functionality
    in the native batch systems and grid middleware
  • both now and in the future
  • Essential ingredients
  • Independent auditing on the VO actions
  • Accounting at the user level no longer to be done
    by the site
  • trusted way to get the user credentials from
    the VO

4
Current mode
  • Job submission in the gLite-CE
  • VO Scheduler Condor-C BLAHP
  • VO scheduler on head node changes to end-users
    identity (i.e. to the job owner in the VO job
    source)
  • On change, site policies are checked
  • Job on the batch queue has proper identity
  • Some current practice
  • several VOs submit placeholder jobs with
    (essentially) a single identity for all of the VO
  • The checkered placeholder then gets user jobs
    in some way and exetues them with the
    placeholders identity
  • The site does not see the original submitter

Of course, also classic submissions and proper
uid changes by Condor-CBLAHP on the head node
5
VO scheduler on the node
  • Job submission in a glexec-on-WN scenario
  • VO scheduler submits a placeholder job to the
    batch system, and the VO placeholder job
    submitter is responsible for the placeholder
    behaviour this might be a specific role in the
    VO, or a locally registered badged user at each
    site
  • The placeholder job is subject to the normal site
    policies for jobs
  • The placeholder obtains the true user job, and
    presents the user credentials and the job
    (executable name) to the site to request a
    decision
  • On success the site will set the uid/gid of the
    new users job
  • On failure the glexec will return with an error,
    and the placeholder job can terminate or obtain
    another job

proper uid changes by Condor-CBLAHP on the
head node SHOULD REMAIN DEFAULT
6
Status today
  • glexec is part of gLite3.0
  • based off the Apache HTTP suexec code base
  • uses the LCAS and LCMAPS for enforcement and
    mapping
  • library-based implementation
  • needs the gLite-flavour of LCAS/LCMAPS (not the
    LCG2.x versions)
  • New modules have been added
  • LCAS RSL (executable path) constrains
  • validation of cert chain and proxy lifetime
  • restrictions
  • policy should be located on local
    posix-accessible file systems
  • policy transport should be trustworthy

7
Still needed
  • Make the credential acquisition process work
    across the network, so there can be a
    site-central policy engine
  • enforcement will have to stay local
  • Same for LCAS
  • changeover to standard callouts for both are
    needed
  • this is planned work, but it is work and will
    take time

8
Needed components, procedures
  • Auditing the VO placeholder job/scheduler on the
    WN
  • check number of fork-execs done by the
    placeholder with the number of glexec
    invocationsa discrepancy means the VO is
    cheating on you
  • check the VO placeholder job is not using too
    much CPUthe CPU-time / Walltime should be close
    to zero
  • credential mapping auditing/logging
  • JobRepository fits the bill
  • schema allows for recording and retrieving all
    aspects of credential mapping
  • records both user identity and any VO attributes
  • retains the credential mapping for each job or
    glexec invocation
  • JR is part of the stack, but not widely deployed
    yet

9
Needed auditing
  • Detailed auditing
  • enterprise class operating systems have some
    kind of auditing
  • system-call level auditing is typically part of
    EAL3 certification
  • LAuS for Linux systems, like RHEL3 and SELS
  • gives a wealth of information, even today without
    glexec-on-WNsso its a good idea even now, and
    not too hard to do

10
Summary
  • We have to realise that some VOs are doing
    agent jobs today,
  • there is no effective enforcement against this
  • and some sites may even just dont care yet, but
    others have hard requirements on auditability and
    regulatory compliance
  • Some VOs are given a specific target date for
    leaving this model
  • This glexec-on-WN model, giving the VOs the tools
    to comply with site requirements, seems a
    reasonable way forward
  • at least makes it better than it is today
  • but many will miss the warm and fuzzy feeling of
    trust here
  • there has been a lot of discussion in the group,
    so have a look at the minutes for details and
    many more considerations
Write a Comment
User Comments (0)
About PowerShow.com