Title: UP Site Selector for Grid03
1UP Site Selector for Grid03
- Catalin Dumitrescu, Mike Wilde, Ian Foster
- The University of Chicago Argonne National
Laboratory
2Introduction
- Problem Usage Policy (UP) based scheduling in
Grid03 - Contention problems are an issue when
- No resources are free
- Restrictions/Quotas are enforced at sites
- !-------------------------------------------------
-------------------------------------! - ! Site Name ! ! iVDGL ! USAtlas !
Ligo ! SDSS ! bTEV ! Gridex ! USCMS ! - !-------------------------------------------------
-------------------------------------! - cluster28.knu.ac.kr 71 16.17 16.17
16.17 16.17 16.17 1.06 1.91 - garlic.hep.wisc.edu 101 3.01 3.01
3.01 3.01 3.01 - 1.43 - nest.phys.uwm.edu 305 0.00 0.00
7.28 0.00 0.00 - 0.00 - t2cms0.sdsc.edu 76 0.62 24.74
- 24.74 24.74 0.00 0.40 - uscmstb0.ucsd.edu 3 11.68 11.68
- - - 11.68 11.68 - xena.hamptonu.edu 1 25.00 25.00
- - - - 25.00 - !-------------------------------------------------
-------------------------------------! - Our Solution an UP site selector for steering
workloads in Grid03
3Enhancement
- Currently
- Site Name Class Location VO Administrator
CPUs - Our proposal
- Site Name Class Location VO CPUs
Allocated to Grid03 Current UP per VO Usages
per VO
4Architecture Overview
Site Selector
select site whererecommendation-honored and
condition
Start-time Predictor
Add start-time prediction to Choice-Table
Site Recommender (can be multi-level Grid, VO,
and Group)
Job description
UP DB
SiteRec (VO, ResRequest) gt Choice-Table
5UP Example
Resource allocation cases UP quota
Under-allocation due to external causes
Over-allocation without contention
Over-allocation with contention
Under-allocation due to internal causes
6UP Metrics Flow
- Planner, Work-Runner, or Application
infrastructure invokes V-PEP to get a recommended
execution site for each job to run. This
recommendation is policy-cognizant. (site and VO
policies) - V-PEP
- provides answers based on site policies, VO
policies, Job requirements, sampled monitoring
data - gets queue conditions, local site policy from
monitoring system gets job resource requirements
from job declaration and predictor data - There can be one V-PEP per VO or one V-PEP per
Grid, and V-PEPs can be multi-layer (Grid, VO,
and Group) - The place to do resource reservation, especially
storage, needs to be worked out many
architectural alternatives
7MAUI-like UP (semantic)
- Goal fair-share resource allocation among users
- Provides support for fair-share rule
specification - Example bettysue 15.5, billybob 10.0, jimbob
5.0- - MAUIs formula
- JobPriority (Other Priority Components)
FairShareFactor - FairShareFactor FairShareWeight
((UserFairShareWeight UserFairShareDelta)
((GroupFairShareWeight GroupFairShareDelta)
((AccountFairShareWeight AccountFairShareDelta))
- Fair share is computed on intervals with DECAY
factors - http//www.hpc2n.umu.se/doc/maui/fairshare.html
8UP Selector Service
- Specialized Grid Service (OGSI)
- Collects Abstract Usage Policies from sites
- Queries the Grid03 monitoring systems for various
metrics - Answers queries like
- From the list L of sites, which is the subset S
of sites where VO Vs workload is allowed to
run? - Which is the best site X to send VO V's
workload?
9Input Metrics / Decisions
- Input Metrics
- CPU capacities and usages of CPUs, available
CPUs, used CPUs (totally and per VO), allocated
to Grid03 and per VO - Free and used disk spaces (APP, DATA directories,
quotas) - Network availability (latency, capacity)
- Decision Examples
- select best site for CMS
- list all sites available for CMS and their
characteristics
10UP Selector Text Client
- Used for interfacing with different other user
schedulers - Accepts simple commands such as
- List sites with characteristics
- Select best
- Commit usage
11Usage Example Euryale
- Nice shell and perl wrapper of the java client
- ./run-client.sh (before anything else) - window1
- Provides in addition local access to enhanced
selection - Euryale style request
- ./mock-up /tmp/kss-gj4CUt.lof (window2)
- SOLUTIONRice_Grid3
12UP Selector Graphic Client
13Conclusions
- The problem we are taking on Usage Policy (UP)
based scheduling in Grid03 - Our solution an UP site selector for steering
workloads in Grid03
14References
- S-POP.v3.0 http//people.cs.uchicago.edu/cldumi
tr/Grid03_policy/S-POP/v3.0/ - V-PEP.v3.0 http//people.cs.uchicago.edu/cldumi
tr/Grid03_policy/V-PEP/v3.0/ - S-Console http//people.cs.uchicago.edu/cldumit
r/Grid03_policy/S-Console/ - S-POP.data http//griodine.uchicago.edu/cldumit
r/data/ - Grid03_policy http//people.cs.uchicago.edu/cld
umitr/Grid03_policy/