On the Foundations of Artificial Workload - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

On the Foundations of Artificial Workload

Description:

On the Foundations of Artificial Workload. Domenico Ferrari. Presentation by. Hari Rangarajan ... System S can be abstracted as a analytic or simulation model. ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 31

Provided by: hari52

Category:

more less

Transcript and Presenter's Notes

Title: On the Foundations of Artificial Workload

1
On the Foundations of Artificial Workload

Domenico Ferrari

Presentation by Hari Rangarajan
2
Outline of presentation

Introduction of workloads
Workload design methods
Problems of design
Artificial Interactive Workload Design
Conclusions

3
What is a Workload

Suppose we have a system S , we need a number to
quantify how well it performs

Performance Indices F(System , Workload)
Workload (Jobs, I/O requests)
S Cpu , Disk Ctl
Output

Performance indices are usually
(Responsetime, Throughput, Resource Utilisation)
expressed as Vector P

4
Why Model a Workload

System S can be abstracted as a analytic or
simulation model.
System performance needs to be analysed , which
needs a workload , and hence a model of the
workload
System can also be real in which case workload
model can be run on it and results obtained ,
called Measurement approach

5
Measurement Approach

Applicable only to real , measurable workloads
Natural Workload
Sample of a real workload , chosen according to a
criteria e.g Time Of Day
Artificial Workload
Synthetic benchmarks and scripts
Must be executable on the systems

6
Artificial Workloads

Pros
Better Reproducible
Can model future workloads
Easier portability
Cons
More expensive to build
Potentially less accurate
Needs time on the target system

7
Generic Design method

How to construct a executable model of a real
measurable workload
Identify basic components of the workload. Ex
job,interaction,command
Parametrize components by
Physical resources CPUs , Main Memory etc
Logical resources Language processors, editors
etc
Functional Resources Compiling , editing etc.
Component F (Phy Res, Log Res, Func Res)

8
Generic Design Method (contd..)

Measurement from real workload while executing on
system
Statistical Analysis
Analyse parameter distributions , transforming
measurements within limits
Sample To reduce processing time and storage
Static Analysis Classification and partitioning
of workload components
Clustering, principal component analysis Reduce
the given data set into classes of homogeneous
components

9
Generic Design Methods (contd ..)

Statistical Analysis
Dynamic analysis
Properties of time series is to be considered ,
essential when time varying characteristics of
the workload is important.
Numerical Fitting , Statistical Analysis of non
stationary series of events, stochastic process
modelling are used.

10
Design Method Summary

The values of parameters are measured for each
component making a tuple.
Statistical techniques (clustering and sampling)
applied to tuples reducing the number of tuples
representing the components
Each tuple is now replaced by a workload
component that is characterised by the tuple
constituting a workload model

11
Representativeness of Model

How accurate is the Workload model
W is the real workload, P is the performance
indice
W is the artificial workload , P performance
indice
Model is ACCURATE if PP

P
W
P
W
S
S
12
Problems in design methods

Parameters represent resource demands at various
abstract levels.
What resources should be included in
characterisation
Necessity and sufficiency of the parameters is
not known

13
Problems (contd)

Accuracy of a workload model is not defined .
No metric is available to qualify or quantify
Statistical techniques are applied to population
of tuples which do not contain temporal
information ignoring dynamic behaviour.

14
Design of Artificial Interactive workload

Interactive system with m users

Product Form solution
Performance Indices can be obtained by solving
the network
Ignore the dynamics of the workload

Central Subsytem (N-1) Stations
users
1
2
m
.
Users type a sequence of commands
15
Modelling the system

Identify basic component of the system Job
command or interaction
Measurement

16
User Behaviour Graph

Each state represents a command
Users type a sequence of commands with a defined
probability

Dormant User is not In the system
Login
Logout Of system
Quit
17
Reflecting the graph in the model

There are R different command types in the graph
Model R different classes of customers with
defined probability of changing classes
(executing jobs)
User can change class only when from a station in
the central subsystem to Station 1.

18
Illustration
Central Subsytem (N-1) Stations
Class/Command b
Users change classes With a branching probability
Class/Command a
1
2
m
users
19
Building on the model

Next Step - Generic design methods reduce the
number of command types of the workload
proportionally ignoring sequential links
How valid is this assumption ?
Theorem 1
The equilibrium state probabilities of the
queueing network are invariant to any change in
the user behaviour graph which does not modify
the visit ratios of the command types.
Proves assumption is valid

20
Implication of Theorem 1

Replace User behavior graph by a equivalent one

21
Comments on Theorem 1

Workload models built by this method cannot be
implemented in arbitrary way
Model will be performance-wise accurate if its
simulates the same number of users as in the
workload or an equivalent graph
Problem - does this probabilistic graph
satisfactorily map the behaviour of all real
users in a system.
Model loses the accuracy if we change the no of
users , behaviour at each node

22
Static analysis on Workload Model

Apply Clustering technique
Each Command type in UBG is characterised by
distributions of service times , branching
probabilities , distribution of terminal times.
Map them in state space and cloud neighborhood
clusters
Clustering reduces the state space

23
Clustering on the UBG
Four super classes
Nine classes of commands
24
Does clustering affect validity of model

Define global performance indices mean
throughput rate, mean response time,
utilisations, mean queue lengths and waiting
times
Theorem 2 Validating Clustering
The values of all global performance indices of
the queuing network are invariant wrt
aggregations of classes with identical demands if
each superclass has a visit ratio in the UBG
equal to the sum of the visit ratios of its
members , and each non aggregated class retains
its previous ratio

25
Theorem 2 - Observations

Clustering can produce very accurate results
provided no of users , behaviour remains
unchanged from the original unclassed graph
One representative per cluster is enough
Clustering is better than other reduction methods
(based on study)

26
Insights on modelling user workloads

Suppose workload is described by a collection of
disjoint user behaviour graphs
Performance oriented model acccuracy can be
obtained only if each graph is dealt with
separately

A B C D
User Behaviour Graph of A,B,C,D Type customers
27
Summary of design
Artificial Workload Model

This artificial workload model is able to emulate
the characteristics of the original workload
inaccordance with the set of performance criteria
we are interested in

28
Conclusions

Problem 1 - Choosing the parameters which have a
significant effect on Performance
Solution
Characterise the workload with resources that are
explicitly specified in the queuing model
More resources that the queueing model can take
into account for the performance , the workload
can characterise those resources
Ex in our case , no of users , the command types

29
Conclusion (contd..)

Problem 2 Accuracy of the workload model
Consider the global performance indices which we
are interested in
Theorems proved the accuracy of the final
workload model constructed as long as they
representative of the original user behaviour
graph
Static analysis techniques like sampling and
clustering are valid (in our case)

30
Conclusions

Problem 3 Ignoring dynamics of the workload
when doing static analysis
Dynamics need not be considered always
System performance indices do not depend on the
order of execution of commands ( in our case)
Dynamics should be considered when
Order of execution is important scheduling
Solution cannot be applied here
Violates steady state assumption
May not satisfy product form queuing model