Title: GangSim: A Simulator for Grid Scheduling Studies
1GangSim A Simulator for Grid Scheduling Studies
- Catalin L. Dumitrescu
- The University of Chicago
Ian Foster Argonne National Laboratory The
University of Chicago
2Talk Outline / Part I
- Part I
- Introduction
- Our Approach GangSim, a discrete simulator
- Motivating Scenarios
- Architecture
- Evaluation Criteria
- Part II
- Simulation and Validation Results
- Conclusions and Questions
3Introduction
- Large distributed Grid systems pose new
challenges - Overwhelming resource characteristics
- Complex workload characteristics
- Complex interactions and resource allocations
- Analytical modeling is either impractical or
impossible
4Our Approach GangSim
- Derived from Ganglia Monitoring Toolkit
- Real-time simulator
- Focus on local VO interactions
- Mixing simulations with real testbeds
- Provide simple means for result visualization
- Interactions with various Resource Managers (RMs)
5GangSim Novelty
- Simulates (and Handles)
- Sites with RMs
- VO and groups
- Submission hosts
- Model usage allocations (SLAs) at several levels
- Capacity to combine simulated results with real
results collected from a real Grid - Useful for simulations of future trends
6Environment Overview
7Environment Details
- Simulations target environments with
- large number of resources
- resource owners
- VOs
- A few examples are
- Grid3
- OSG
- TeraGrid
- DataGrid
8Initial Research Problems
- What site usage policies are appropriate in a
Grid environment, and how do these policies
impact achieved site and VO performance? - What usage policy may be applied at the VO
level? - What site selection policies are best suited for
various Grid environments?
9GangSim Details
10GangSim Concepts
- Site characterized by various metrics about CPU,
disk space and network connectivity - VO composed of a groups and users
- External Schedulers, Local Schedulers, and Data
Schedulers scheduling decision points at various
levels in the grid - Policy enforcement points (S-PEP and V-PEP)
responsible to gather usage and allocation
information and provide/control how many jobs
should run
11GangSim Strategies
- Various algorithms can be used for scheduling
- Site usage policy
- Simple fair share
- Extensible fair share
- Commitment fair share
- Others
- ES task assignment strategies
- Last recently used (according to available
allocations) - Least used (according to available allocations)
- Round robin / random assignment ()
12Implementation Details
- Ganglia (and VO-Centric Ganglia) various
components were replaced - New components
- Simulator modules track client and provider
states - Task assignment policies various algorithm
invoked during running - Metric aggregators monitoring sub-components
used for scheduling decisions - Grid components internal data structures
- Interfaces a set of CGI scripts remotely
accessible
13Interface Screenshot Example
14Talk Outline / Part II
- Part I
- Introduction
- Our Approach GangSim, a discrete simulator
- Motivating Scenarios
- Architecture
- Evaluation Criteria
- Part II
- Simulation and Validation Results
- Conclusions and Questions
15Achievable Results
- Interested in three main aspects
- Task Assignment and Policies
- Simulated Architecture Variations
- Simulator Performance
16Task Assignment and Policies
Round Robin Assignment Policy
Least Used Site Assignment Policy
Round Robin Assignment Policy
Used Site Assignment Policy
17Analytical Results
- Automated performance metric computation
- Example
- ART Si1..N RTi / N
Table 2 Unsynchronized Workloads ART
Table 1 Synchronized Workloads ART
Policy/Limit No limit Fix-limit Ext-limit
Round Robin 11.09 19.39 11.32
Least Used 13.25 15.14 15.06
Policy/Limit No limit Fix-limit Ext-limit
Round Robin 7.78 14.82 9.34
Least Used 10.57 13.68 11.37
18Simulated Architectures
- Various architectures can be simulated
- Required changes of a few parameters
- New algorithms can be considered
Analytical Approach in Site Selection
Observational Approach in Selection
19Simulator Performance
- Important to find simulator limits
- 15 VO and 100 sites on a single GangSim instance
is achievable
15 VOs and 100 sites (6 VOs drawn)
20Validation Results
- Results Comparison GangSim vs. Grid3
- Site Level Comparisons
- VO Level Comparisons
- Quantitative Comparisons
21Site Level Comparisons
- GangSim and Grid3 on a single site (FermiLab)
- 4 identical workloads
- The GangSim and FermiLab executions both
completed in close to the same time, but show
rather different execution behavior
Per-VO, FermiLab (Grid3)
Per-VO, FermiLab (GangSim)
22VO Level Comparisons
- GangSim and Grid3 runs across 12 sites
- Starting times iVDGL-1 at 20 seconds, BTEV-1 and
USATLAS-1 at 200, LIGO-1 at 700 sec, BTEV-2 at
800, iVDGL-2 at 1000, USATLAS-2 at 1500, and
LIGO-2 at 1700.
Per-VO, 12 sites (Grid3)
-VO, 12 sites (GangSim)
23Quantitative Comparisons
- aggregated resource utilization (ARU)
- average response time (ART)
- ART Si1..N RTi / N.
- average starvation factor (ASF)
- ASF S ( MIN (STi, RTi) ) / S (ETi)
Table 3 Simulation (S) vs. Grid3 (G) Metrics
Level Site Site VO VO
Metric S G S G
ARU 0.12 0.16 0.07 0.06
ASF 2.36 3.9 9.88 5.09
ART 1521.31 1100.7 1824.25 639.5
24Conclusions about GangSim
- a Grid simulator for analysis of different
scheduling policies in a multi-site and multi-VO
environment - Designed for discrete simulation techniques and
modeling of important system components - demonstrated by describing studies of different
VO-level scheduling policies in the presence of
different local site resource allocation policies
25Addressed Questions
- What site usage policies are appropriate in a
Grid environment, and how do these policies
impact achieved site and VO performance? - What usage policy may be applied at the VO
level? - What site selection policies are best suited for
various Grid environments?
26Thanks