Title: On the Foundations of Artificial Workload
1On the Foundations of Artificial Workload
Presentation by Hari Rangarajan
2Outline of presentation
- Introduction of workloads
- Workload design methods
- Problems of design
- Artificial Interactive Workload Design
- Conclusions
3What is a Workload
- Suppose we have a system S , we need a number to
quantify how well it performs
Performance Indices F(System , Workload)
Workload (Jobs, I/O requests)
S Cpu , Disk Ctl
Output
- Performance indices are usually
(Responsetime, Throughput, Resource Utilisation)
expressed as Vector P
4Why Model a Workload
- System S can be abstracted as a analytic or
simulation model. - System performance needs to be analysed , which
needs a workload , and hence a model of the
workload - System can also be real in which case workload
model can be run on it and results obtained ,
called Measurement approach
5Measurement Approach
- Applicable only to real , measurable workloads
- Natural Workload
- Sample of a real workload , chosen according to a
criteria e.g Time Of Day - Artificial Workload
- Synthetic benchmarks and scripts
- Must be executable on the systems
6Artificial Workloads
- Pros
- Better Reproducible
- Can model future workloads
- Easier portability
- Cons
- More expensive to build
- Potentially less accurate
- Needs time on the target system
7Generic Design method
- How to construct a executable model of a real
measurable workload - Identify basic components of the workload. Ex
job,interaction,command - Parametrize components by
- Physical resources CPUs , Main Memory etc
- Logical resources Language processors, editors
etc - Functional Resources Compiling , editing etc.
- Component F (Phy Res, Log Res, Func Res)
8Generic Design Method (contd..)
- Measurement from real workload while executing on
system - Statistical Analysis
- Analyse parameter distributions , transforming
measurements within limits - Sample To reduce processing time and storage
- Static Analysis Classification and partitioning
of workload components - Clustering, principal component analysis Reduce
the given data set into classes of homogeneous
components
9Generic Design Methods (contd ..)
- Statistical Analysis
- Dynamic analysis
- Properties of time series is to be considered ,
essential when time varying characteristics of
the workload is important. - Numerical Fitting , Statistical Analysis of non
stationary series of events, stochastic process
modelling are used.
10Design Method Summary
- The values of parameters are measured for each
component making a tuple. - Statistical techniques (clustering and sampling)
applied to tuples reducing the number of tuples
representing the components - Each tuple is now replaced by a workload
component that is characterised by the tuple
constituting a workload model
11Representativeness of Model
- How accurate is the Workload model
- W is the real workload, P is the performance
indice - W is the artificial workload , P performance
indice - Model is ACCURATE if PP
P
W
P
W
S
S
12Problems in design methods
- Parameters represent resource demands at various
abstract levels. - What resources should be included in
characterisation - Necessity and sufficiency of the parameters is
not known
13Problems (contd)
- Accuracy of a workload model is not defined .
- No metric is available to qualify or quantify
- Statistical techniques are applied to population
of tuples which do not contain temporal
information ignoring dynamic behaviour.
14Design of Artificial Interactive workload
- Interactive system with m users
- Product Form solution
- Performance Indices can be obtained by solving
the network - Ignore the dynamics of the workload
Central Subsytem (N-1) Stations
users
1
2
m
.
Users type a sequence of commands
15Modelling the system
- Identify basic component of the system Job
command or interaction - Measurement
16User Behaviour Graph
- Each state represents a command
- Users type a sequence of commands with a defined
probability
Dormant User is not In the system
Login
Logout Of system
Quit
17Reflecting the graph in the model
- There are R different command types in the graph
- Model R different classes of customers with
defined probability of changing classes
(executing jobs) - User can change class only when from a station in
the central subsystem to Station 1.
18Illustration
Central Subsytem (N-1) Stations
Class/Command b
Users change classes With a branching probability
Class/Command a
1
2
m
users
19Building on the model
- Next Step - Generic design methods reduce the
number of command types of the workload
proportionally ignoring sequential links - How valid is this assumption ?
- Theorem 1
- The equilibrium state probabilities of the
queueing network are invariant to any change in
the user behaviour graph which does not modify
the visit ratios of the command types. - Proves assumption is valid
20Implication of Theorem 1
- Replace User behavior graph by a equivalent one
21Comments on Theorem 1
- Workload models built by this method cannot be
implemented in arbitrary way - Model will be performance-wise accurate if its
simulates the same number of users as in the
workload or an equivalent graph - Problem - does this probabilistic graph
satisfactorily map the behaviour of all real
users in a system. - Model loses the accuracy if we change the no of
users , behaviour at each node
22Static analysis on Workload Model
- Apply Clustering technique
- Each Command type in UBG is characterised by
distributions of service times , branching
probabilities , distribution of terminal times. - Map them in state space and cloud neighborhood
clusters - Clustering reduces the state space
-
23Clustering on the UBG
Four super classes
Nine classes of commands
24Does clustering affect validity of model
- Define global performance indices mean
throughput rate, mean response time,
utilisations, mean queue lengths and waiting
times - Theorem 2 Validating Clustering
- The values of all global performance indices of
the queuing network are invariant wrt
aggregations of classes with identical demands if
each superclass has a visit ratio in the UBG
equal to the sum of the visit ratios of its
members , and each non aggregated class retains
its previous ratio
25Theorem 2 - Observations
- Clustering can produce very accurate results
provided no of users , behaviour remains
unchanged from the original unclassed graph - One representative per cluster is enough
- Clustering is better than other reduction methods
(based on study)
26Insights on modelling user workloads
- Suppose workload is described by a collection of
disjoint user behaviour graphs - Performance oriented model acccuracy can be
obtained only if each graph is dealt with
separately
A B C D
User Behaviour Graph of A,B,C,D Type customers
27Summary of design
Artificial Workload Model
- This artificial workload model is able to emulate
the characteristics of the original workload
inaccordance with the set of performance criteria
we are interested in
28Conclusions
- Problem 1 - Choosing the parameters which have a
significant effect on Performance - Solution
- Characterise the workload with resources that are
explicitly specified in the queuing model - More resources that the queueing model can take
into account for the performance , the workload
can characterise those resources - Ex in our case , no of users , the command types
29Conclusion (contd..)
- Problem 2 Accuracy of the workload model
- Consider the global performance indices which we
are interested in - Theorems proved the accuracy of the final
workload model constructed as long as they
representative of the original user behaviour
graph - Static analysis techniques like sampling and
clustering are valid (in our case)
30Conclusions
- Problem 3 Ignoring dynamics of the workload
when doing static analysis - Dynamics need not be considered always
- System performance indices do not depend on the
order of execution of commands ( in our case) - Dynamics should be considered when
- Order of execution is important scheduling
- Solution cannot be applied here
- Violates steady state assumption
- May not satisfy product form queuing model
-