Title: Performance evaluation on grid
1Performance evaluation on grid
- Zsolt NĂ©meth
- MTA SZTAKI Computer and Automation Research
Institute
2Outline
- What is the grid?
- What is grid performance?
- Problems of performance evaluation
- WP3 ongoing work and further plans
- A proposed 'passive' benchmarking
- Proposed grid metrics
- Future directions
3Distributed applications
Application Cooperative processes
- Process control?
- Security?
- Naming?
- Communication?
- Input / output?
- File access?
Physical layer Computational nodes
4Distributed applications
Application Cooperative processes
Physical layer Computational nodes
5Conventional distributed environments and grids
- Distributed resources are virtually unified by a
software layer - A virtual machine is introduced between the
application and the physical layer - Provides a single system image to the application
- Types
- Conventional (PVM, some implementations of MPI)
- Grid (Globus, Legion)
6Conventional environments
- Processes
- Have resource requests
- Mapping
- Processes are mapped onto nodes
- Resource assignment is implicit
Physical level
7Grid
- Processes
- Have resource requirements
- Mapping
- Assign nodes to resources?
Physical layer
8Grid the resource abstraction
- Processes
- Have resource needs
Physical layer
9Grid the user abstraction
- Processes
- Belong to a user
- User of the virtual machine is authorised to use
the constituting resources - Have no login access to the node the resource
belongs to
- Physical layer
- Local, physical users (user accounts)
10Fundamental grid functionalities
- By formal modeling the essential functionalities
can be identified - Resource abstraction
- Physical resources can be assigned to virtual
resource needs (matched by properties) - Grid provides a mapping between virtual and
physical resources - User abstraction
- User of the physical machine may be different
from the user of the virtual machine - Grid provides a temporal mapping between virtual
and physical users
11Conventional distributed environments and grids
12What is grid performance at all?
- Traditionally performance is
- Speed
- Throughput
- Bandwidth, etc.
- Using grids
- Quantitative reasons
- Qualitative reasons QoS
- Economic aspects
13Grid performance analysis
- Performance is not characterisitic to an
application itself rather to the interaction of
the application and the infrastructure. - The more complex and dynamic nature of a grid
introduces more possible performance flaws. - Usual metrics and characteristic parameters are
not necessarily applicable for grids. - The larger event data volume needs careful
reduction, feature extraction and intelligent
presentation. - Due to the permanently changing environment,
on-line and semi on-line techniques are
advantageous over post mortem methods. - Performance tuning is more difficult due to
dynamic environment and changing infrastructure. - Observation, comparison and analysis is more
complex due to the diversity and heterogeneity of
resources.
14Interaction of application and the infrastructure
- Performance application perf. ? infrastructure
perf. - Signature model (Pablo group)
- Application signature
- e.g. instructions/FLOPs
- Scaling factor (capabilities of the resources)
- e.g. FLOPs/seconds
- Execution signature
- application signature scaling factor
- E.g. instructions/second instructions/FLOPS
FLOPs/seconds
15Possible performance problems in grids
- All that may occur in a distributed application
- Plus
- Effectiveness of resource brokering
- Synchronous availability of resources
- Resources may change during execution
- Various local policies
- Shared use of resources
- Higher costs of some activities
- The corresponding symptoms must be characterised
16Grid performance metrics
- Abstract representation of measurable quantities
- MR1xR2x...Rn
- Usual metrics
- Speedup, efficiency
- Queue length
- Such strict values are not characteristic in grid
- Cannot be interpreted
- Cannot be compared
- New metrics
- Local metrics and grid metrics
- Symbolic description / metrics
17Processing monitoring information
- Trace data reduction
- Proportional to time t, processes P, metrics
dimension n - Statistical clustering (reducing P)
- Similar temporal behaviours are classified
- Questionnable if works for grids
- Representative processes are recorded for each
class - Statistical projection pursuit (reducing n)
- reduces the dimension by identifying significant
metrics - Sampling frequency (reducing t)
18Performance tuning, optimisation
- The execution cannot be reproduced
- Post-mortem optimisation is not viable
- On-line steering is necessary though, hard to
realise - Sensors and actuators
- Application and implementation dependent
- E.g Autopilot, Falcon
- Average behaviour of applications can be improved
- Post-mortem tuning of the infrastructure (if
possible) - Brokering decisions
- Supporting services
19Running benchmarks
- Benchmarks are executed on a virtual machine
20Running benchmarks
- Benchmarks are executed on a virtual machine
- The virtual machine may change (composed of
different resources) from run to run
21Running benchmarks
- Benchmarks are executed on a virtual machine
- The virtual machine may change (composed of
different resources) from run to run - Benchmark result is representative to one certain
virtual machine
22Running benchmarks
- Benchmarks are executed on a virtual machine
- The virtual machine may change (composed of
different resources) from run to run - Benchmark result is representative to one certain
virtual machine - What can it show about the entire grid?
23Benchmarking inside out
- Conventional benchmarking has a top-down view
- Assumes an unchanging infrastructure
- Cannot look behind the virtual level
- Not necessarily applicable to grids
- To look behind the virtual level a bottom-up view
is necessary
24Benchmarking inside out
- There is a well defined set of benchmarks (e.g.
NPB, Parkbench, etc.) - System administrators (resource owners) run them
from time to time - Results are stored in a local database together
with actual system parameters (CPU load, active
users, etc.) - When a new virtual machine is formed, based on
the current system parameters, a benchmark result
can be estimated
25Benchmarking inside out
- A more or less precise performance figure can be
obtained prior to executing an application - Does not consume resources
- Performance is related to the virtual machine
actually formed for executing the application
26Ongoing work
- Exploring the statistical properties of
benchmarks and system parameters - Exploring the way how benchmark results can be
estimated from past measurements - Finding a right set of benchmarks
27Proposed grid metrics
- A well defined set of benchmarks can serve as
metrics - Multi-dimensional
- Comparable
- Interpretable
- Local resource metrics are transformed into
global grid metrics
28Proposed grid metrics
- Applications show statistical similarities to
benchmarks - Based on these similarity its signature can be
created - Application signature and resource signature can
yield performance metrics - Symbolic processing is advantageous