A Case Study In Reliability Analysis - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

A Case Study In Reliability Analysis

Description:

A Case Study In Reliability Analysis Lewis Sykalski – PowerPoint PPT presentation

Number of Views:210

Avg rating:3.0/5.0

Slides: 29

Provided by: Lewi1175

Category:

more less

Transcript and Presenter's Notes

Title: A Case Study In Reliability Analysis

1
A Case Study In Reliability Analysis

Lewis Sykalski

2
Background (cont.)

Net Centric Warfare Data Collector
Approximately 180KLOC
Written in Java and heavily uses JDBC and RMI
from J2EE package
CMMI Level 1
Utilizes Oracle 9.2 EE OTS DBMS
Reliability Required Moderate

3
Background
GLOBAL VISION NETWORK (GVN)
FUSION
CAOC
DC
VBMS
LM Mission Sys Colorado Springs, CO
DC
WCS
JSAF
JTAC
Light House Suffolk, VA
JIMM
VBMS
JABE
Other Simulators
Threat Sims
Integrated Warfare Development Center Fort Worth,
TX
LM Sim Training Orlando, FL
4
Design Diversity (Part I)

Part I Oracle DBMS Design Diversity
Acquire 20 bug reports each from Oracle 9.2
Oracle 10.0
Bugs had to be Date Independent, Easy To
Reproduce, Type Independent
Results would then be classified by self-evidence
divergence

5
Design Diversity Results 9.2 Bugs
Bug Type 9.2 S.E 10.0 Fails? 10.0 S.E. Divergent
2357784 Internal Error X NO N/A X
2299898 Performance/Hang X NO N/A X
2202561 Incorrect Results NO N/A
2221401 Incorrect Results NO N/A
2739068 Incorrect Results NO N/A
2683540 Incorrect Results NO N/A
2991842 Incorrect Results NO N/A
2200057 Internal Error X NO N/A
2405258 Internal Error X NO N/A
2716265 Internal Error X NO N/A
2054241 Performance/Hang X NO N/A
2485871 Internal Error X NO N/A
2670497 Internal Error X NO N/A
2659126 Internal Error X NO N/A X
2064478 Internal Error X NO N/A
2624737 Internal Error X NO N/A X
1918751 Internal Error X NO N/A
2286290 Incorrect Results NO N/A X
2700474 Incorrect Results NO N/A
2576353 Internal Error X NO N/A
6
Design Diversity Results 10.0 Bugs
Bug Type 10.0 SE 9.2 Fails? 9.2 SE Divergent
5731063 Internal Error X NO N/A
3664284 Incorrect Results NO N/A
4582808 Incorrect Results NO N/A
3895678 Internal Error X YES X
3893571 Internal Error X YES X
3903063 Incorrect Results YES
3912423 Internal Error X NO N/A
4029857 Engine Crash X YES X
4156695 Incorrect Results YES
2929556 Internal Error X YES X X
3255350 Performance / Hang X NO N/A
3887704 Internal Error X NO N/A
3405237 Engine Crash X YES X
3952322 Feature Unusable X YES X
4033889 Incorrect Results NO N/A
4060997 Internal Error X YES X
4134776 Internal Error X NO N/A
4149779 Incorrect Results NO N/A
2964132 Internal Error X YES X
3361118 Internal Error X YES X
7
Design Diversity More Analysis
8
Design Diversity Even More Analysis
Total Bug Scripts Failures 1 out of 2 Bug Scripts Failing 1 out of 2 Bug Scripts Failing Both DBMS Products Failing Both DBMS Products Failing Both DBMS Products Failing Both DBMS Products Failing
Total Bug Scripts Failures S.E N.S.E Non-Divergent Non-Divergent Divergent Divergent
S.E N.S.E S.E. N.S.E
40 40 18 11 8 2 1 0

Bottom Line
Not a Statistical Sample (Not Enough Time)
2/40 10 of Failures not detected across both
products
Out of the 20 failures for Oracle 10.0, 6 were
N.S.E 4 out of 6 of these failures would be
resolved by utilizing a past release in tangent
with future release

9
Reliability Analysis (Part II)

Part II CASRE Reliability Analysis of NCW Data
Collector
Extract the following from Failure Logs using
JavaScript Time of Program Start, Time of
Program Termination, Time of Thread Terminations,
and Exception or Failure Messages
Parse failures manually into CASRE input format
Categorize by severity utilizing chart on next
slide
Compare 2 consecutive events (CALOE08 MAGTF08)
as well as 2 consecutives lifecycles within same
event (Integration Execution)

10
Severity
Severity Code Failure Description
9 Failure Causes Machine to be Rebooted Causing Catastrophic Loss
8 Failure Causes Program Abort
7 Failure Causes Program Thread Abort
5 Failure Causes Record Not to be Written, Thread Continues
3 Failure Causes Incorrect Data to be Written, Thread Continues
1 Failure is Caught, Handled and Recovers Correctly
11
Using CASRE
12
Using CASRE (cont.)
13
CASRE Input Format
TIME BETWEEN FAILURES FORMAT N/A
FAILURE COUNT FORMAT
Interval Number of Interval Error Number
Errors Length Severity (int) (float)
(float) (int) Example Hours 1 5.0 40.0
1 1 3.0 40.0 2 1 2.0 40.0 3 2 4.0 40.0
1 2 3.0 40.0 3 3 7.0 40.0 1 4 5.0 40.0
1 5 4.0 40.0 1
14
CASRE Failure Counts
CALOEMAGTF Execution
MAGTF Integration Execution
15
CASRE Time Between Failures
CALOEMAGTF Execution
MAGTF Integration Execution
16
CASRE Failure Intensity
CALOEMAGTF Execution
MAGTF Integration Execution
17
CASRE Cummulative Failures
CALOEMAGTF Execution
MAGTF Integration Execution
18
CASRE Test Interval Length
CALOEMAGTF Execution
MAGTF Integration Execution
19
Detecting Reliability Trends

Running Average
Not as Useful for Failure Count Data (unless test
intervals are equal length)
Computes the running average of the time between
successive failures for time between failures
data, or the running average of number of
failures per interval for failure count data.
If the running average decreases with time (fewer
failures per test interval), reliability growth
is indicated.
Laplace Test
Not as Useful for Failure Count Data (unless test
intervals are equal length)
Occurrences of failures homogeneous Poisson
process
If the test statistic decreases with increasing
failure, then the null hypothesis can be
rejected in favor of reliability growth at an
appropriate significance level. Opposite for
increases with increasing failure

20
Running Average
CALOEMAGTF Execution
MAGTF Integration Execution
21
Laplace Test
CALOEMAGTF Execution
MAGTF Integration Execution
22
CASRE Cum Failure Predictions
CALOEMAGTF Execution
MAGTF Integration Execution
23
CASRE Prediction Setup
CALOEMAGTF Execution
MAGTF Integration Execution
24
CASRE Reliability Prediction
CALOEMAGTF Execution
MAGTF Integration Execution
25
CASRE Prequential Likelihood
CALOEMAGTF Execution
MAGTF Integration Execution
26
CASRE Model-Ranking
CALOEMAGTF Execution
MAGTF Integration Execution
27
Reliability Models

Havent been able to get these to run yet.
Instruction manual says many of the built-in
models only work with Time Between Failures Data.
Doubt there would be much utility with Failure
Count Data

28
Conclusion/Follow-Up

It actually would be QUITE easy to integrate
Failure Count or Time Between Failures Output
Auto-Generation into my environment
This would facilitate quick trend-analysis
Reliability trends and not the actual numbers is
what is important

Write a Comment

User Comments (0)