Title: Performability (Performance and Reliability) Modeling
1Performability (Performance and Reliability)
Modeling
- Conducted by Meng-Lai Yin, Ph.D.
- Specialty Engineering, Network Centric Systems
- Raytheon Company, Fullerton, California
- 714-446-3080, mlyin_at_raytheon.com
- Department of Electrical and Computer Engineering
- California State Polytechnic University, Pomona
- 909-869-2535, myin_at_csupomona.edu
2An Example
The purpose of this example is to showthe
existences of performance degradable systems
3An email received on July 20, 2005 433PM
- We are experiencing problems with the AIX user
account file systems. We need to take the AIX
system off-line immediately to fix the problem.
We expect the AIX file systems to be off line for
approximately an hour and a half. We hope to
have the file systems back on-line by 600PM. - Sorry for any inconvenience.
- Sys Admin Team
The system is off completely
4Later that day July 20, 2005 626PM
- All AIX file systems are back on-line except
wei_snoop which is in a rebuild stage. Wei_snoop
file system will be back on-line by 0600 tomorrow
morning. - Thanks,
- Sys Admin Team
The system is on a degraded mode
5Observations
- The system can operate without the wei_snoop file
system
More and more systems become performance
degradable
6Performance Degradable Systems
- Performance degradable systems have the
capability of continuing to operate failure-free
in the presence of certain faults or errors by
diminishing the level of quality of service 7.
Typical Scenario A system starts with all
components operational and performs at its
maximum capability. When a component fails, the
system will reconfigure itself and operate with
degraded performance, etc.
7Reasons for Performability Modeling
- Two separate measures
- Traditional dependability analysis assumes no
performance degraded states. - Performance measures always are applied to fully
operational state. - Need an integrated, meaningful metric
- For performance degradable systems, where the
system can operate in many different states, how
do you address the systems performance with the
consideration of degraded performance situations? - Traditional metrics (performance, reliability,
availability. etc.) and the corresponding
modeling techniques cannot catch the overall
performance feature for performance degradable
systems.
8The Beginning of Performability
- The term Performability was introduced almost
three decades ago 4, by Prof. J. F. Meyer.
John F. Meyer Address 4111 EECS Phone (734)
763-0037Fax (734) 763-1503 Professor Emeritus,
Electrical Engr Computer ScienceDegree Ph.D.,
U-Michigan
9A Tribute to M. D. Beaudry
- Before Dr. John F. Meyer gave the name
performability to the world, several works
actually had already been devoted to address the
issue of providing appropriate metrics for
performance degradable systems. - In Particular, the work conducted by Danielle
Beaudry 1 has been referenced in many places. - In 1, she addressed the performance-related
reliability measures for gracefully degraded
systems (performance degradable systems ).
10 Objectives
- At the conclusion of this tutorial, a participant
will be able to - know the basic concepts about performability
- know how to
- conduct a basic dependability analysis using
Reliability Block Diagram (RBD) or Markov
techniques - conduct a basic performance analysis using
Queuing models - conduct a basic performability analysis
11Approach
12Outline
Part 2
Part 1
13The Two Basic Questions
- What is performability?
- Why is performability needed?
- What is performability?
- Performability is a metric to evaluate the
performance over time - Modeling performability is modeling the effect of
reliability on performance.
14Why is performability needed?
- The appearance of performance degradable systems
- Better system designs evaluations for systems
that considers both performance and dependability
15Example 2
A Performance-Degradable, 3-processor System
The best performance occurs when all three
processors work correctly in parallel, as the
three processors share the workload. The system
is performance-degradable, meaning if one or two
are failed, the system continues working with
degraded performance.
The purpose of this example is to show that
traditional performance and dependability models
cannot tell the overall picture of the systems
operations.
16Performance Model
The best performance occurs when all three
processors work correctly in parallel, as the
three processors share the workload. A typical
queuing model for performance assessment
Jobs arrival rate ? Processors service rates
?1, ?2 and ?3Solving this model yields
performance measures such as response time,
throughput, etc.
17Dependability Model
18Problems with Traditional Measures
Pure performance measure too optimistic! Outage-
and-recovery behaviors are not considered Pure
dependability measure too conservative! Degraded
levels of performance are not considered (The
system is either working or failed)
19Outline
Part 2
Part 1
Basics
DependabilityAnalysis
PerformanceAnalysis
20Reliability, Availability, Dependability
- They are all probabilities.
- What are the differences?
Definition of Reliability The probability of an
item to perform a required function under given
conditions for a given time interval.
Definition of Availability "The probability of
an item to be in a state to perform a required
function at a given instant of time, assuming
that the external resources, if required, are
provided.
21Differences
time
t0
?
Reliability the probability that the item
survive theduration t0, ?)
time
t0
?
Availability the probability that the item is
working at time ?, given that the item was
working at time t0.
22Picture the Differences
1.0
Steady state availability
A typical reliability figure (without repair)
A typical availability figure (with repair)
23Calculating Reliability Availability
- Let ? be the failure rate for a component, and ?
be the repair rate for that component. - Assume exponential distribution for the failures
- Then reliability can be calculated as R(t) e
-? t - SS (Steady-State) -Availability can be assessed
as - or
24 Dependability Umbrella term
Courtesy of prof. Trivedi
25Dependability Analysis
Modeling Taxonomy
Simulation
Modeling
RBD
Non-State-Space Method
Analytic modeling
State-Space Method
Markov
- Approaches discussed here
- Reliability Block Diagrams
- Markov Models
26Combinatorial Approach
- Consider
- a system of n components
- every component is either working or failed
- We can
- list out all the possible combinations
- calculate the probability for each combination
- sum up probabilities for all working conditions
27Complexity Concerns
- How many possible combinations out of the n
components? - What can be done to manage the complexity?
- During model construction
- Need a more intelligent way to describe the
systems failure behavior - Series and parallel RBD (Reliability Block
Diagram) approach - During model solution
- Need more efficient and effective ways of
calculations, rather than counting individual
probabilities
28Structured Combinatorial Approach
- Reliability block diagrams
- Integrate certain probability events into a
module, which contains the info - A probability of failure
- A failure rate
- A distribution of time to failure
- Steady-state and instantaneous unavailability
- Organize the modules in a structured way,
according to the effects of each modules failure - Statistical independence Assumption
- Failures independence
- Repairs independence
29Series Systems
- Each component (block) is needed to make the
system work - If any one of the components fails, the system
fails - Example 3
The purpose of this example is to show how to
construct a simple series RBD model and solve it
using Excel
30RDB Example for a Series System
- System Block Diagram for Example 3
31Reliability Block Diagram Model Reliability
Calculation
Processor
Monitor
Keyboard
Let ?1 be the failure rate for Monitor Assume
exponential distribution for the failures,
thenRmonitor(t) e -?1 t Similarly,
Rprocessor(t) e -?2 t and Rkeyboardv(t) e
-?3 t
Rsystem (t) Rmonitor (t) Rprocessor (t)
Rkeyboard (t) e -?1 t e -?2 t e -?3 t
e (?1 t ?2 t ?3 t) e (?1?2?3) t
When exponential failure distribution is
assumed, the failure rate of a series system is
the sum of individual components failure rates
32Excel Exercise 1
- Use Excel Spreadsheet to construct the above
Series RBD - Show the trend of reliability with regard to the
time factor - Show the relationship between reliability and the
failure rate
33SS-Availability Calculation
Let ?1, ?2, ?3 be the failure rates and ?1, ?2,
?3 be the repair rates for the monitor, processor
and keyboard. Then
- ASS-Monitor
- ASS-processor
- ASS-keyboard
ASS-system-series
34Hierarchical Composition/Decomposition
- Problem the size of the model grows with the
size of the system. - Issue Fidelity vs. Complexity
Hudson Professor of Electrical and Computer
Engineering Duke University Phone (919)
660-5269Fax (919) 660-5293Email
kst_at_ee.duke.edu
Trivedi
35Parallel Systems
- A basic parallel system only one of the N
identical components is required for the system
to function - Example 4
36Example 4 Basic Parallel System
The purpose of example 4 is to show the parallel
RBD and the corresponding reliability/availability
calculations.
37RDB example Parallel System
- Reliability Block Diagram
38RDB using Hierarchical Composition/Decomposition
The Highest level (overall system level)
Computer
Computer
or
1 of 2
1 of 2
Usually indicate two different components
On the Computer level
Monitor
Processor
Keyboard
39Reliability Calculation
- The Unreliability of the parallel system can be
computed as the probability that all N components
fail. - Assume all N components are having the same
failure rate ?, and the probability that a
component is failed at time t is Pfail(t) - Rparallel(t) 1- ?i1 to N Pfail(t)
40Independence Assumption
- Where in the above equation that the independence
assumption is made? - Just to remind you
- Failure/Repair Dependencies are often assumed
- RBD usually does not handle the dependency such
as - Event-dependent failure
- Shared repair
41Availability Calculation
- ASS-Monitor
- ASS-processor
- ASS-keyboard
ASS-system-parallel
42Excel Exercise 2
- ?monitor 1? 10-4 failures per hour
- ?processor 1? 10-5 failures per hour
- ?keyboard 4? 10-4 failures per hour
-
- ? 2 repair per hour for all components (MTTR30
minutes) - For series system, ASS is
- For parallel system (with 12 redundancy), ASS is
43Parallel/Series System Example 5
Processor 1
Keyboard 1
Monitor 1
Bus 1
Bus 2
Computer 2
Keyboard 1
Monitor 1
What is the corresponding RBD ?
The purpose of Example 5 is to demonstrate a
simple design process using RBD
44Corresponding RBD
Assuming Buses are perfect
Monitor
Processor
Keyboard
Keyboard
Monitor
Processor
Compare to the RBD below, which one has better
reliability?
Monitor
Processor
Keyboard
Monitor
Processor
Keyboard
45 Modeling Steps
- Model construction
- Model parameterization
- Model solution
- Result interpretation
- Model validation
46N Modular Redundancy
- K of N System
- K of the total of N identical modules are
required to function, K ? N - TMR (Triple Modular Redundancy) is a famous
example, where K is 2 and N is 3
47Example 6 RBD for TMR
Module 1
Voter
Module 2
Module3
Module 3
Single point of failure
Module2
Voter
Module1
The purpose of example 6 is to 1. introduce
TMR 2. show how to model a TMR component. 3.
show the impacts of single-point-of-failure
2 3
48TMR Reliability
Module3
Module2
Voter
Module1
- Cases for the TMR to be working
- all of the 3 modules are working
- any 2 modules are working, and 1 module is
failed - Look at it from another way
- Cases for the TMR to be failed
- all 3 modules are failed
- any one module is working, however, the rest 2
are not working - Remember, the voter is a Single-Point-Of-Failure
2 3
one Module voter TMR System
0.999 0.999 0.999997 0.998997005
49From this chart, you can see the effect that a
single point of failure made ismuch more
significant than that of a component with
redundancy
50Dependability Analysis Markov Modeling
Modeling Taxonomy
- Approaches (Discussed here)
- Reliability Block Diagrams
- RBD for Series Systems
- RBD for Parallel Systems
- Markov Models
51- Why Markov ?
- Who is Markov?
- How to construct a Markov model?
- How to solve a Markov model ?
- What are the issues to be considered ?
52Model Selection
- There are wide range of models available, each
has its strength and weakness.
- Combinatorial models (reliability block diagrams,
fault trees) are straightforward and easy to
understand.
- However, it is not easy to model non-independent
behavior using combinatorial models. - Markov model can model the state changes
53Who was Markov?
- Andrei A. Markov graduated from Saint Petersburg
University in 1878 and subsequently became a
professor there. - His early work dealt mainly in number theory and
analysis, etc. - Markov is particularly remembered for his study
of Markov chains. - These chains are sequences of random variables in
which the future behavior is determined by the
present state but is independent of the way of
how the present state is reached.
54Markovian Property
Given the present state, the future is
independent of the past.
Definition of Markov Process A stochastic
process X(t) t ? T is called a Markov process
if for any t0 lt t1 lt ... tn lt t, the conditional
distribution of X(t) for given values of X(t0),
X(t1), ...X(tn) depends only on X(tn).
55A simple Markov Chain
- A continuous-time, discrete-state Markov process
Pure-birth process (Poisson process if l0 l1
l2 )
56Example 7
Non-identical p1 and p2 p1 has failure rate l1,
repair rate m1 p2 has failure rate l2, repair
rate m2
1, 2
Both p1 and p2 are working
p1 is working, p2 is failed
1
p2 is working, p1 is failed
2
0
Both p1 and p2 are failed
57What can be solved ?
- Basically, the probability of each state.
- The transient solution is the probability at a
certain point of time t.
- The steady-state solution is the steady-state
probability (t ? ?).
58Analytical Solution on a 2-state Model
l
W
F
m
59Analytical Solution -cont. 1
Solving (1) and (2) obtains Pw(s) 1/(s(lm))
m/(s(s(lm)) PF(s) l/(s(s(lm))
60Analytical Solution- cont. 2
- use the Inverse Laplace transform
- pw(t) m/(lm) l/(lm) e-(lm)t
- pF (t) l/(lm) - l/(lm) e-(lm)t
- note that when t goes to infinity, the above has
the steady state solution (recall the steady
state availability)
61A Simple way of Solving Steady-State irreducible
Markov chains
A Markov chain is irreducible if every state can
be reached from every other state.
- Name the probability for each state
- List out the balance equations
- Add one more equation that the sum of all states
prob. is one. - You have the choice of deleting one balance
equation
62Example 8 RBD Markov Approaches Comparison
RBD
Markov
(Only failures transitions are shown here. When
identical modules are assumed, the model can be
further reduced.)
63Reduced Markov Model for TMR
64Solve availability using RBD
Availability Prob. All 3 modules are working
Prob.Any two modules are working and one is
failed
65Discussion
- Why do they both reach the same results?
- Due to the assumptions of
- Independent repairs
- Independent failures
- Exponential distribution
- Whats the implication?
- If the above assumptions were made, choose the
easier way
66State Explosion Problem
The largeness problem can be handled in 2 ways
tolerated or avoided
67Largeness Avoidance and Hierarchical Models
Large models can be avoided by using hierarchical
model composition or decomposition
References 1 P.J. Courtois , Decomposability -
Queueing and Computer System Applications,
Academic Press, INC. 1977. 2 R.A. Sahner, K.S.
Trivedi, Reliability Modeling Using SHARPE,
IEEE Trans. On Reliability, R-36, 2. June
1987. 3 R.A. Sahner, K.S. Trivedi, Antonio
Puliafito, Performance and Reliability Analysis
of Computer Systems, Kluwer Academic Publishers,
1996.
68Example 9
The purpose of thisexample is to demonstrate the
Hierarchical Decomposition Method.
Required 1 processor, 2 memory, 1 network
69A Demonstration of Hierarchical Decomposition
Modeling
Reliability Block Diagram
Memory 1
Processor 1
Network
Memory 2
Processor 2
Memory 3
1 of 2
2 of 3
1 of 1
Need to consider the effects of software errors,
and other dependencies
The Markov model with 6 components will have 26
64 states
70Example 9 Continue
Highest level
ProcessorSubsystem
MemorySubsystem
NetworkSubsystem
1 of 2
2 of 3
1 of 1
Sub-System Level
NetworkSubsystem
ProcessorSubsystem
A small-size model is handled every
time. Flexibility vs. Complexity
MemorySubsystem
71Outline
Part 2
Part 1
Introduction
ReliabilityModeling
PerformanceModeling
72A Simple Performance Modeling Mechanism Task
Graph
Each task can be assigned a value ti to represent
the time the task takes
tA
tB
tC
tD
Then you can calculate the time to complete all
tasks as
tA max (tB , tC ) tD
73 Independent Parallel Tasks
- Let FA(t), FB(t), FC(t), FD(t) be the
distribution functions for the time each task
takes - Tasks B and C are executed in parallel. Denote
the probability distribution that both of them
are finished by time t as FBC (t) - If B and C were independent tasks (no sharing
resources), then FBC (t) FB(t)FC(t)
Two independent events
74Serial Tasks
- When 2 tasks are executed serially, the
distribution function for the time until the
second job finishes is the convolution of the 2
distributions F1(t)?F2(t) - The overall distribution function for the time to
finish all tasks in task graph is - FA?FBC ?FD
75Contention for Resources
- The model above assumes no contention for
resources - In real world applications, limited resources
must be shared. Hence resource contention is
expected. - Queuing model is useful in modeling this kind of
systems
76Queuing Network
- A queuing network consists of service centers and
customers (often called jobs) - A service center consists of one or more servers
and one or more queues to hold customers waiting
for service.
? customers arrival rate
? service rate
77Interarrival time
- Interarrival time the time between successive
customer arrivals - Arrival process the process that determines the
interarrival times - It is common to assume that interarrival times
are exponentially distributed random variables.
In this case, the arrival process is a Poisson
process
78Service Time
- The service time depends on how much work the
customer needs and how fast the server is able to
perform the work - Service times are commonly assumed to be
independent, identically distributed (iid) random
variables.
79A Queuing Model Example
Server Center 1
Server Center 2
cpu1
Disk1
cpu2
Disk2
queue
Arrivingcustomers
cpu3
servers
Server Center 3
Queuing delay
Service time
Response time
80Terminology
81Measures of Interests
- queue length
- response time
- throughput the number of customers served per
unit of time - utilization the fraction of time that a service
center is busy
A relationship between throughput and response
time the Littles law.
82The Littles Law
- The mean number of jobs in a queuing system in
the steady state is equal to the product of the
arrival rate and the mean response time
The average number of customers in a queuing
system
The average response time
? the average arrival rate of customers
admitted to the system
83Notation
- Queuing Model is usually described as X/Y/Z/K/L/D
(M denotes the exponential distribution, E for
Erlang, H for hyperexponential, D for
deterministic and G for general)
X Arrival process Y Service processZ Number
of servers at the service center K Buffer
sizeL Population size D The queuing discipline
K, L and D are often omitted, which means K, L
are ? and D is FCFS
84M/M/1 Queue
- M/M/1 queue
- The first M means the arrival process is
exponential distributed - The second M means the service process is
exponential distributed - 1 means the number of servers is 1
- Assuming buffer size and population size are
infinity - First-Come First-Served discipline is applied
?
?
This example shows how to solve the M/M/1 queue,
and many other aspects with regard to Queuing
models.
85Solving M/M/1 queue
- Solved for
- The steady-state probability in each state
- Server Utilization
- The expected number of customers in the system
- The average response time
Construct the corresponding Markov Chain
A typical birth-death process
86Birth-Death Process
Birth-Death Process is a Markov chain where the
transitions can occur only between adjacent
states. You can solve the Markov chain
analytically by solving the set of balance
equations.
87Solving M/M/1 Queue
called the traffic intensity
Define the ratio
When ? lt 1 (meaning ?lt?), the system is called
stable, and the steady state probabilities can
be determined by
88M/M/1 Queue Property
The mean of number of customers in the system
EN
89M/M/1 Queue Property Average Response Time
The Littles Formula
From previous discussion
Average Response Time
90Example 10 How does Fast Lane work in
Disneyland?
Construct a M/M/1 queue Solve for the average
response time, as a function of the inter-arrival
time Control the people flow to assure the
response time (refer to Excel)
91Intermission
92References
- 1M.Danielle Beaudry, Performance-Related
Reliability Measures for Computing Systems, IEEE
Transactions on Computer, Vol. C-27, No. 6, June
1978. - 2 G. Ciardo and A.S. Miner, SMART Simulation
and Markovian analyzer for reliability and
timing, Proceeding of IEEE International
Computer Performance and Dependability Symposium
(IPDS96), September 1996. - 3G. Clark, T. Courtney, D. Daly, D. Deavours,
S. Derisavi, J. M. Doyle, W. H. Sanders, and P.
Webster. (01CLA01), The Möbius Modeling Tool,
Proceedings of the 9th International Workshop on
Petri Nets and Performance Models, Aachen,
Germany, September 11-14, 2001, pp. 241-250. - 4 John F. Meyer, On Evaluating the
Performability of Degradable Computer Systems,
IEEE Transactions on Computers, Vol. C-29, No.8,
August 1980. - 5 John F. Meyer, Performability A
retrospective and some pointers to the future,
Performance Evaluation 14 (1992) 139-156. - 6 John F. Meyer, William H. Sanders,
Specification and Construction of Performability
Models, Chapter 9 in the book Performability
Modeling Techniques and Tools 7. - 7 Performability Modeling Techniques and
Tools, Edited by B.R. Haverkort, R. Marie, G.
Rubino,K. Trivedi, John Wiley Sons, Inc. 2001.
ISBN 0-471-49195-0. - 8 Andrew L. Reibman, Modeling the Effect of
Reliability on Performance, IEEE Transactions on
Reliability, Vol. 39, No.3, Aug. 1990. - 9 Robin A. Sahner, Kishort S. Trivedi, Antonio
Puliafito, Performance and Reliability Analysis
of Computer Systems, Kluwer Academic Publishers,
1996, ISBN 0-7923-9650-2. - 10 William H. Sanders and John F. Meyer,
Stochastic Activity Networks Formal Definitions
and Concepts, in E. Brinksma, H. Hermanns, and
J. P. Katoen (Eds.), Lectures on Formal Methods
and Performance Analysis, First EEF/Euro Summer
School on Trends in Computer Science, Berg en
Dal, The Netherlands, July 3-7, 2000, Revised
Lectures, Lecture Notes in Computer Science no.
2090, pp. 315-343. Berlin Springer, 2001. - 11 Ann Tai, John F. Meyer, and Algirdas
Avizienis, Software Performability from Concepts
to Applications, Kluwer Academic Publishers,
1996, ISBN 0-923-9670-7. - 12 Kishor S. Trivedi, Probability and
Statistics with Reliability, Queuing and Computer
Science Applications, Wiley, 2002. ISBN
0-471-33341-7. - 13 Meng-Lai Yin, Hierarchical-Compositional
Performability Modeling for Fault-Tolerance
Multiprocessor Systems, Ph.D. Dissertation,
University of California, Irvine, 1995, UMI
Dissertation Services. - 14 Meng-Lai Yin, Douglas Blough, Lubomir Bic,
A Dependability Analysis for Systems with Global
Spares, IEEE Transactions on Computers, Sep.
2000, pp. 958-963.
93A Special birth-death process Poisson Process
- ?(n)? for all n ? 0
- ?(n)0 for all n ? 0
- Pure birth process
- Definition of the Poisson process
- The counting process N(t), t ? 0 is said to be
a Poisson process having rate ?, ?gt0, if - N(0)0
- The process has independent increments
- The number of events in any interval of length t
is Poisson distributed with mean ?t.
94Poisson Process
- For all s, t ?0
- PN(ts)-N(s) n
- EN(t) ? t
- Poisson properties
- Inter-arrival times are exponentially distributed
- Memoryless property the prob. of occurrence of
an event is independent of how many events have
occurred in the past and the time since the last
event
95M/M/2 queue with Heterogeneous Servers
- The service rates of the two processors are not
identical
- Constructing the model
- Represent a state with 2 numbers (n, m)
- Where n is the number of jobs at server 1, m is
the number of jobs at server 2
Solving the model for steady-state prob.
96Two Servers in Tandem
?1
?2
?
- Constructing the model
- Represent a state with 2 numbers (n, m)
- Where n is the number of jobs at server 1, m is
the number of jobs at server 2