Performability (Performance and Reliability) Modeling

About This Presentation

Title:

Performability (Performance and Reliability) Modeling

Description:

9/21/09. NDIA Performability Modeling - Meng-Lai Yin. 1 ... Outage-and-recovery behaviors are not considered. Pure dependability measure: too conservative! ... – PowerPoint PPT presentation

Number of Views:1685

Avg rating:3.0/5.0

Slides: 97

Provided by: csupo

Category:

more less

Transcript and Presenter's Notes

Title: Performability (Performance and Reliability) Modeling

1
Performability (Performance and Reliability)
Modeling

Conducted by Meng-Lai Yin, Ph.D.
Specialty Engineering, Network Centric Systems
Raytheon Company, Fullerton, California
714-446-3080, mlyin_at_raytheon.com
Department of Electrical and Computer Engineering
California State Polytechnic University, Pomona
909-869-2535, myin_at_csupomona.edu

2
An Example
The purpose of this example is to showthe
existences of performance degradable systems
3
An email received on July 20, 2005 433PM

We are experiencing problems with the AIX user
account file systems. We need to take the AIX
system off-line immediately to fix the problem.
We expect the AIX file systems to be off line for
approximately an hour and a half. We hope to
have the file systems back on-line by 600PM.
Sorry for any inconvenience.
Sys Admin Team

The system is off completely
4
Later that day July 20, 2005 626PM

All AIX file systems are back on-line except
wei_snoop which is in a rebuild stage. Wei_snoop
file system will be back on-line by 0600 tomorrow
morning.
Thanks,
Sys Admin Team

The system is on a degraded mode
5
Observations

The system can operate without the wei_snoop file
system

More and more systems become performance
degradable
6
Performance Degradable Systems

Performance degradable systems have the
capability of continuing to operate failure-free
in the presence of certain faults or errors by
diminishing the level of quality of service 7.

Typical Scenario A system starts with all
components operational and performs at its
maximum capability. When a component fails, the
system will reconfigure itself and operate with
degraded performance, etc.
7
Reasons for Performability Modeling

Two separate measures
Traditional dependability analysis assumes no
performance degraded states.
Performance measures always are applied to fully
operational state.
Need an integrated, meaningful metric
For performance degradable systems, where the
system can operate in many different states, how
do you address the systems performance with the
consideration of degraded performance situations?
Traditional metrics (performance, reliability,
availability. etc.) and the corresponding
modeling techniques cannot catch the overall
performance feature for performance degradable
systems.

8
The Beginning of Performability

The term Performability was introduced almost
three decades ago 4, by Prof. J. F. Meyer.

John F. Meyer Address 4111 EECS Phone (734)
763-0037Fax (734) 763-1503 Professor Emeritus,
Electrical Engr Computer ScienceDegree Ph.D.,
U-Michigan
9
A Tribute to M. D. Beaudry

Before Dr. John F. Meyer gave the name
performability to the world, several works
actually had already been devoted to address the
issue of providing appropriate metrics for
performance degradable systems.
In Particular, the work conducted by Danielle
Beaudry 1 has been referenced in many places.
In 1, she addressed the performance-related
reliability measures for gracefully degraded
systems (performance degradable systems ).

10
Objectives

At the conclusion of this tutorial, a participant
will be able to
know the basic concepts about performability
know how to
conduct a basic dependability analysis using
Reliability Block Diagram (RBD) or Markov
techniques
conduct a basic performance analysis using
Queuing models
conduct a basic performability analysis

11
Approach
12
Outline
Part 2
Part 1
13
The Two Basic Questions

What is performability?
Why is performability needed?

What is performability?
Performability is a metric to evaluate the
performance over time
Modeling performability is modeling the effect of
reliability on performance.

14
Why is performability needed?

The appearance of performance degradable systems
Better system designs evaluations for systems
that considers both performance and dependability

15
Example 2
A Performance-Degradable, 3-processor System
The best performance occurs when all three
processors work correctly in parallel, as the
three processors share the workload. The system
is performance-degradable, meaning if one or two
are failed, the system continues working with
degraded performance.
The purpose of this example is to show that
traditional performance and dependability models
cannot tell the overall picture of the systems
operations.
16
Performance Model
The best performance occurs when all three
processors work correctly in parallel, as the
three processors share the workload. A typical
queuing model for performance assessment
Jobs arrival rate ? Processors service rates
?1, ?2 and ?3Solving this model yields
performance measures such as response time,
throughput, etc.
17
Dependability Model
18
Problems with Traditional Measures
Pure performance measure too optimistic! Outage-
and-recovery behaviors are not considered Pure
dependability measure too conservative! Degraded
levels of performance are not considered (The
system is either working or failed)
19
Outline
Part 2
Part 1
Basics
DependabilityAnalysis
PerformanceAnalysis
20
Reliability, Availability, Dependability

They are all probabilities.
What are the differences?

Definition of Reliability The probability of an
item to perform a required function under given
conditions for a given time interval.
Definition of Availability "The probability of
an item to be in a state to perform a required
function at a given instant of time, assuming
that the external resources, if required, are
provided.
21
Differences
time
t0
?
Reliability the probability that the item
survive theduration t0, ?)
time
t0
?
Availability the probability that the item is
working at time ?, given that the item was
working at time t0.
22
Picture the Differences
1.0
Steady state availability
A typical reliability figure (without repair)
A typical availability figure (with repair)
23
Calculating Reliability Availability

Let ? be the failure rate for a component, and ?
be the repair rate for that component.
Assume exponential distribution for the failures
Then reliability can be calculated as R(t) e
-? t
SS (Steady-State) -Availability can be assessed
as
or

24
Dependability Umbrella term
Courtesy of prof. Trivedi
25
Dependability Analysis
Modeling Taxonomy
Simulation
Modeling
RBD
Non-State-Space Method
Analytic modeling
State-Space Method
Markov

Approaches discussed here
Reliability Block Diagrams
Markov Models

26
Combinatorial Approach

Consider
a system of n components
every component is either working or failed
We can
list out all the possible combinations
calculate the probability for each combination
sum up probabilities for all working conditions

27
Complexity Concerns

How many possible combinations out of the n
components?
What can be done to manage the complexity?
During model construction
Need a more intelligent way to describe the
systems failure behavior
Series and parallel RBD (Reliability Block
Diagram) approach
During model solution
Need more efficient and effective ways of
calculations, rather than counting individual
probabilities

28
Structured Combinatorial Approach

Reliability block diagrams
Integrate certain probability events into a
module, which contains the info
A probability of failure
A failure rate
A distribution of time to failure
Steady-state and instantaneous unavailability
Organize the modules in a structured way,
according to the effects of each modules failure
Statistical independence Assumption
Failures independence
Repairs independence

29
Series Systems

Each component (block) is needed to make the
system work
If any one of the components fails, the system
fails
Example 3

The purpose of this example is to show how to
construct a simple series RBD model and solve it
using Excel
30
RDB Example for a Series System

System Block Diagram for Example 3

31
Reliability Block Diagram Model Reliability
Calculation

RBD for Example 3

Processor
Monitor
Keyboard
Let ?1 be the failure rate for Monitor Assume
exponential distribution for the failures,
thenRmonitor(t) e -?1 t Similarly,
Rprocessor(t) e -?2 t and Rkeyboardv(t) e
-?3 t
Rsystem (t) Rmonitor (t) Rprocessor (t)
Rkeyboard (t) e -?1 t e -?2 t e -?3 t
e (?1 t ?2 t ?3 t) e (?1?2?3) t
When exponential failure distribution is
assumed, the failure rate of a series system is
the sum of individual components failure rates
32
Excel Exercise 1

Use Excel Spreadsheet to construct the above
Series RBD
Show the trend of reliability with regard to the
time factor
Show the relationship between reliability and the
failure rate

33
SS-Availability Calculation
Let ?1, ?2, ?3 be the failure rates and ?1, ?2,
?3 be the repair rates for the monitor, processor
and keyboard. Then

ASS-Monitor
ASS-processor
ASS-keyboard

ASS-system-series
34
Hierarchical Composition/Decomposition

Problem the size of the model grows with the
size of the system.
Issue Fidelity vs. Complexity

Hudson Professor of Electrical and Computer
Engineering Duke University Phone (919)
660-5269Fax (919) 660-5293Email
kst_at_ee.duke.edu
Trivedi
35
Parallel Systems

A basic parallel system only one of the N
identical components is required for the system
to function
Example 4

36
Example 4 Basic Parallel System

System Block Diagram

The purpose of example 4 is to show the parallel
RBD and the corresponding reliability/availability
calculations.
37
RDB example Parallel System

Reliability Block Diagram

38
RDB using Hierarchical Composition/Decomposition
The Highest level (overall system level)
Computer
Computer
or
1 of 2
1 of 2
Usually indicate two different components
On the Computer level

Monitor
Processor
Keyboard
39
Reliability Calculation

The Unreliability of the parallel system can be
computed as the probability that all N components
fail.
Assume all N components are having the same
failure rate ?, and the probability that a
component is failed at time t is Pfail(t)
Rparallel(t) 1- ?i1 to N Pfail(t)

40
Independence Assumption

Where in the above equation that the independence
assumption is made?
Just to remind you

Failure/Repair Dependencies are often assumed
RBD usually does not handle the dependency such
as
Event-dependent failure
Shared repair

41
Availability Calculation

ASS-Monitor
ASS-processor
ASS-keyboard

ASS-system-parallel
42
Excel Exercise 2

?monitor 1? 10-4 failures per hour
?processor 1? 10-5 failures per hour
?keyboard 4? 10-4 failures per hour
? 2 repair per hour for all components (MTTR30
minutes)
For series system, ASS is
For parallel system (with 12 redundancy), ASS is

43
Parallel/Series System Example 5
Processor 1
Keyboard 1
Monitor 1
Bus 1
Bus 2
Computer 2
Keyboard 1
Monitor 1
What is the corresponding RBD ?
The purpose of Example 5 is to demonstrate a
simple design process using RBD
44
Corresponding RBD
Assuming Buses are perfect
Monitor
Processor
Keyboard
Keyboard
Monitor
Processor
Compare to the RBD below, which one has better
reliability?
Monitor
Processor
Keyboard
Monitor
Processor
Keyboard
45
Modeling Steps

Model construction
Model parameterization
Model solution
Result interpretation
Model validation

46
N Modular Redundancy

K of N System
K of the total of N identical modules are
required to function, K ? N
TMR (Triple Modular Redundancy) is a famous
example, where K is 2 and N is 3

47
Example 6 RBD for TMR
Module 1
Voter
Module 2
Module3
Module 3
Single point of failure
Module2
Voter
Module1
The purpose of example 6 is to 1. introduce
TMR 2. show how to model a TMR component. 3.
show the impacts of single-point-of-failure
2 3
48
TMR Reliability
Module3
Module2
Voter
Module1

Cases for the TMR to be working
all of the 3 modules are working
any 2 modules are working, and 1 module is
failed
Look at it from another way
Cases for the TMR to be failed
all 3 modules are failed
any one module is working, however, the rest 2
are not working
Remember, the voter is a Single-Point-Of-Failure

2 3
one Module voter TMR System
0.999 0.999 0.999997 0.998997005
49
From this chart, you can see the effect that a
single point of failure made ismuch more
significant than that of a component with
redundancy
50
Dependability Analysis Markov Modeling
Modeling Taxonomy

Approaches (Discussed here)
Reliability Block Diagrams
RBD for Series Systems
RBD for Parallel Systems
Markov Models

Why Markov ?
Who is Markov?

What is Markov ?

How to construct a Markov model?

How to solve a Markov model ?

What are the issues to be considered ?

52
Model Selection

There are wide range of models available, each
has its strength and weakness.

Combinatorial models (reliability block diagrams,
fault trees) are straightforward and easy to
understand.

However, it is not easy to model non-independent
behavior using combinatorial models.
Markov model can model the state changes

53
Who was Markov?

Andrei A. Markov graduated from Saint Petersburg
University in 1878 and subsequently became a
professor there.
His early work dealt mainly in number theory and
analysis, etc.
Markov is particularly remembered for his study
of Markov chains.
These chains are sequences of random variables in
which the future behavior is determined by the
present state but is independent of the way of
how the present state is reached.

54
Markovian Property

Markovian property

Given the present state, the future is
independent of the past.
Definition of Markov Process A stochastic
process X(t) t ? T is called a Markov process
if for any t0 lt t1 lt ... tn lt t, the conditional
distribution of X(t) for given values of X(t0),
X(t1), ...X(tn) depends only on X(tn).
55
A simple Markov Chain

A continuous-time, discrete-state Markov process

Pure-birth process (Poisson process if l0 l1
l2 )
56
Example 7
Non-identical p1 and p2 p1 has failure rate l1,
repair rate m1 p2 has failure rate l2, repair
rate m2
1, 2
Both p1 and p2 are working
p1 is working, p2 is failed
1
p2 is working, p1 is failed
2
0
Both p1 and p2 are failed
57
What can be solved ?

Basically, the probability of each state.

The transient solution is the probability at a
certain point of time t.

The steady-state solution is the steady-state
probability (t ? ?).

Others

58
Analytical Solution on a 2-state Model
l
W
F
m
59
Analytical Solution -cont. 1
Solving (1) and (2) obtains Pw(s) 1/(s(lm))
m/(s(s(lm)) PF(s) l/(s(s(lm))
60
Analytical Solution- cont. 2

use the Inverse Laplace transform
pw(t) m/(lm) l/(lm) e-(lm)t
pF (t) l/(lm) - l/(lm) e-(lm)t
note that when t goes to infinity, the above has
the steady state solution (recall the steady
state availability)

61
A Simple way of Solving Steady-State irreducible
Markov chains
A Markov chain is irreducible if every state can
be reached from every other state.

Name the probability for each state
List out the balance equations
Add one more equation that the sum of all states
prob. is one.
You have the choice of deleting one balance
equation

62
Example 8 RBD Markov Approaches Comparison

TMR with a perfect voter

RBD
Markov
(Only failures transitions are shown here. When
identical modules are assumed, the model can be
further reduced.)
63
Reduced Markov Model for TMR
64
Solve availability using RBD
Availability Prob. All 3 modules are working
Prob.Any two modules are working and one is
failed
65
Discussion

Why do they both reach the same results?
Due to the assumptions of
Independent repairs
Independent failures
Exponential distribution
Whats the implication?
If the above assumptions were made, choose the
easier way

66
State Explosion Problem
The largeness problem can be handled in 2 ways
tolerated or avoided
67
Largeness Avoidance and Hierarchical Models
Large models can be avoided by using hierarchical
model composition or decomposition
References 1 P.J. Courtois , Decomposability -
Queueing and Computer System Applications,
Academic Press, INC. 1977. 2 R.A. Sahner, K.S.
Trivedi, Reliability Modeling Using SHARPE,
IEEE Trans. On Reliability, R-36, 2. June
1987. 3 R.A. Sahner, K.S. Trivedi, Antonio
Puliafito, Performance and Reliability Analysis
of Computer Systems, Kluwer Academic Publishers,
1996.
68
Example 9
The purpose of thisexample is to demonstrate the
Hierarchical Decomposition Method.
Required 1 processor, 2 memory, 1 network
69
A Demonstration of Hierarchical Decomposition
Modeling
Reliability Block Diagram
Memory 1
Processor 1
Network
Memory 2
Processor 2
Memory 3
1 of 2
2 of 3
1 of 1
Need to consider the effects of software errors,
and other dependencies
The Markov model with 6 components will have 26
64 states
70
Example 9 Continue
Highest level
ProcessorSubsystem
MemorySubsystem
NetworkSubsystem
1 of 2
2 of 3
1 of 1
Sub-System Level
NetworkSubsystem
ProcessorSubsystem
A small-size model is handled every
time. Flexibility vs. Complexity
MemorySubsystem
71
Outline
Part 2
Part 1
Introduction
ReliabilityModeling
PerformanceModeling
72
A Simple Performance Modeling Mechanism Task
Graph
Each task can be assigned a value ti to represent
the time the task takes
tA
tB
tC
tD
Then you can calculate the time to complete all
tasks as
tA max (tB , tC ) tD
73
Independent Parallel Tasks

Let FA(t), FB(t), FC(t), FD(t) be the
distribution functions for the time each task
takes
Tasks B and C are executed in parallel. Denote
the probability distribution that both of them
are finished by time t as FBC (t)
If B and C were independent tasks (no sharing
resources), then FBC (t) FB(t)FC(t)

Two independent events
74
Serial Tasks

When 2 tasks are executed serially, the
distribution function for the time until the
second job finishes is the convolution of the 2
distributions F1(t)?F2(t)
The overall distribution function for the time to
finish all tasks in task graph is
FA?FBC ?FD

75
Contention for Resources

The model above assumes no contention for
resources
In real world applications, limited resources
must be shared. Hence resource contention is
expected.
Queuing model is useful in modeling this kind of
systems

76
Queuing Network

A queuing network consists of service centers and
customers (often called jobs)
A service center consists of one or more servers
and one or more queues to hold customers waiting
for service.

? customers arrival rate
? service rate
77
Interarrival time

Interarrival time the time between successive
customer arrivals
Arrival process the process that determines the
interarrival times
It is common to assume that interarrival times
are exponentially distributed random variables.
In this case, the arrival process is a Poisson
process

78
Service Time

The service time depends on how much work the
customer needs and how fast the server is able to
perform the work
Service times are commonly assumed to be
independent, identically distributed (iid) random
variables.

79
A Queuing Model Example
Server Center 1
Server Center 2
cpu1
Disk1
cpu2
Disk2
queue
Arrivingcustomers
cpu3
servers
Server Center 3
Queuing delay
Service time
Response time
80
Terminology
81
Measures of Interests

queue length
response time
throughput the number of customers served per
unit of time
utilization the fraction of time that a service
center is busy

A relationship between throughput and response
time the Littles law.
82
The Littles Law

The mean number of jobs in a queuing system in
the steady state is equal to the product of the
arrival rate and the mean response time

The average number of customers in a queuing
system
The average response time
? the average arrival rate of customers
admitted to the system
83
Notation

Queuing Model is usually described as X/Y/Z/K/L/D

(M denotes the exponential distribution, E for
Erlang, H for hyperexponential, D for
deterministic and G for general)
X Arrival process Y Service processZ Number
of servers at the service center K Buffer
sizeL Population size D The queuing discipline
K, L and D are often omitted, which means K, L
are ? and D is FCFS
84
M/M/1 Queue

M/M/1 queue
The first M means the arrival process is
exponential distributed
The second M means the service process is
exponential distributed
1 means the number of servers is 1
Assuming buffer size and population size are
infinity
First-Come First-Served discipline is applied

?
?
This example shows how to solve the M/M/1 queue,
and many other aspects with regard to Queuing
models.
85
Solving M/M/1 queue

Solved for
The steady-state probability in each state
Server Utilization
The expected number of customers in the system
The average response time

Construct the corresponding Markov Chain
A typical birth-death process
86
Birth-Death Process
Birth-Death Process is a Markov chain where the
transitions can occur only between adjacent
states. You can solve the Markov chain
analytically by solving the set of balance
equations.
87
Solving M/M/1 Queue
called the traffic intensity
Define the ratio
When ? lt 1 (meaning ?lt?), the system is called
stable, and the steady state probabilities can
be determined by
88
M/M/1 Queue Property
The mean of number of customers in the system
EN
89
M/M/1 Queue Property Average Response Time
The Littles Formula
From previous discussion
Average Response Time
90
Example 10 How does Fast Lane work in
Disneyland?
Construct a M/M/1 queue Solve for the average
response time, as a function of the inter-arrival
time Control the people flow to assure the
response time (refer to Excel)
91
Intermission
92
References

1M.Danielle Beaudry, Performance-Related
Reliability Measures for Computing Systems, IEEE
Transactions on Computer, Vol. C-27, No. 6, June
1978.
2 G. Ciardo and A.S. Miner, SMART Simulation
and Markovian analyzer for reliability and
timing, Proceeding of IEEE International
Computer Performance and Dependability Symposium
(IPDS96), September 1996.
3G. Clark, T. Courtney, D. Daly, D. Deavours,
S. Derisavi, J. M. Doyle, W. H. Sanders, and P.
Webster. (01CLA01), The Möbius Modeling Tool,
Proceedings of the 9th International Workshop on
Petri Nets and Performance Models, Aachen,
Germany, September 11-14, 2001, pp. 241-250.
4 John F. Meyer, On Evaluating the
Performability of Degradable Computer Systems,
IEEE Transactions on Computers, Vol. C-29, No.8,
August 1980.
5 John F. Meyer, Performability A
retrospective and some pointers to the future,
Performance Evaluation 14 (1992) 139-156.
6 John F. Meyer, William H. Sanders,
Specification and Construction of Performability
Models, Chapter 9 in the book Performability
Modeling Techniques and Tools 7.
7 Performability Modeling Techniques and
Tools, Edited by B.R. Haverkort, R. Marie, G.
Rubino,K. Trivedi, John Wiley Sons, Inc. 2001.
ISBN 0-471-49195-0.
8 Andrew L. Reibman, Modeling the Effect of
Reliability on Performance, IEEE Transactions on
Reliability, Vol. 39, No.3, Aug. 1990.
9 Robin A. Sahner, Kishort S. Trivedi, Antonio
Puliafito, Performance and Reliability Analysis
of Computer Systems, Kluwer Academic Publishers,
1996, ISBN 0-7923-9650-2.
10 William H. Sanders and John F. Meyer,
Stochastic Activity Networks Formal Definitions
and Concepts, in E. Brinksma, H. Hermanns, and
J. P. Katoen (Eds.), Lectures on Formal Methods
and Performance Analysis, First EEF/Euro Summer
School on Trends in Computer Science, Berg en
Dal, The Netherlands, July 3-7, 2000, Revised
Lectures, Lecture Notes in Computer Science no.
2090, pp. 315-343. Berlin Springer, 2001.
11 Ann Tai, John F. Meyer, and Algirdas
Avizienis, Software Performability from Concepts
to Applications, Kluwer Academic Publishers,
1996, ISBN 0-923-9670-7.
12 Kishor S. Trivedi, Probability and
Statistics with Reliability, Queuing and Computer
Science Applications, Wiley, 2002. ISBN
0-471-33341-7.
13 Meng-Lai Yin, Hierarchical-Compositional
Performability Modeling for Fault-Tolerance
Multiprocessor Systems, Ph.D. Dissertation,
University of California, Irvine, 1995, UMI
Dissertation Services.
14 Meng-Lai Yin, Douglas Blough, Lubomir Bic,
A Dependability Analysis for Systems with Global
Spares, IEEE Transactions on Computers, Sep.
2000, pp. 958-963.

93
A Special birth-death process Poisson Process

?(n)? for all n ? 0
?(n)0 for all n ? 0
Pure birth process
Definition of the Poisson process
The counting process N(t), t ? 0 is said to be
a Poisson process having rate ?, ?gt0, if
N(0)0
The process has independent increments
The number of events in any interval of length t
is Poisson distributed with mean ?t.

94
Poisson Process

For all s, t ?0
PN(ts)-N(s) n
EN(t) ? t
Poisson properties
Inter-arrival times are exponentially distributed
Memoryless property the prob. of occurrence of
an event is independent of how many events have
occurred in the past and the time since the last
event

95
M/M/2 queue with Heterogeneous Servers