Title: Quiz
1Quiz
- T/F TQM is a clearly defined quality management
process standard. - Define the following
- Defect Rate
- FPA
- Ratio Scale
- OO
- Ordinal Scale
- List at least 5 quality parameters/attributes
used to measure software quality (from the
customer perspective) - Why is LOC a poor measure of code size?
2Project SampleOS X
- Project Replaced Carbon
- and NeXT and Yellow Box and...
- Developers
- Customers
- The Media
- iCEO
3(No Transcript)
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10Software Quality EngineeringCS410
- Class 3a
- Measurement Theory
11Measurement Theory
- It is an undisputed statement that measurement
is crucial to the progress of all sciences (Kan
1995) - Scientific progress is made through observations
and generalizations based on data and
measurements, the derivation of theories as a
result, and in turn the confirmation or
refutation of theories via hypothesis testing
(Kan 1995)
12Measurement Theory
- Basic measurement theory steps
- Proposition
- an idea is proposed
- Definition
- components of the idea are defined
- Operational Definition
- operational characteristics of components are
identified - Metric definition
- metrics are identified based on operational
definition
13Measurement Theory
- Hypothesis definitions
- hypotheses are drawn from combination of
proposition and definitions - Testing and metric gathering
- testing is performed and empirical data is
collected - Confirmation or refutation of hypothesis
- hypotheses are confirmed or refuted based on
analysis of empirical data
14Measurement Theory
- Example
- Proposition - the more rigorous the front end of
the software development process is executed, the
better the quality at the back end - Definitions
- Front end SW process design through unit test
- Back end SW process integration through system
test - Rigorous implementation total adherence to
process (assume process designates 100 design
and code inspections)
15Measurement Theory
- Operational Definitions
- Rigorous implementation can be measured by amount
of design inspection, and lines of code (LOC)
inspection - Back end quality means low number of defects
found in system test - Metric Definitions
- Design inspection coverage can be expressed as
percentage of designs inspected - LOC inspection coverage can be expressed as
percentage of LOC inspected - Back end quality can be expressed as defects per
thousand lines of code (KLOC)
16Measurement Theory
- Hypothesis definition(s)
- The higher percentage of designs and code
inspected, the lower the defect rate will be at
system test. - Testing and metric gathering (multiple projects)
- Track and record inspection coverage
- Track and record defects found in system testing
- Confirmation or refutation of hypothesis
- Analyze data
- Hypothesis supported?
17Measurement Theory
- The operationalization (definition) process
produces metrics and indicators for which data
can be collected, and the hypotheses can be
tested empirically. - In other words - You have to gather, analyze and
compare data to prove whether the hypothesis is
true or not.
18Level of Measurement
- How measurements are classified and compared
- Nominal Scale
- Ordinal Scale
- Interval Scale
- Ratio Scale
- Scales are hierarchical, each higher level scale
posses all of the properties of the lower ones. - Operationalization should take advantage of
highest level scale possible (I.e. dont use
low/medium/high if you can use 110)
19Level of Measurement
- Nominal Scale
- Lowest level scale
- Classification of items (sort items into
categories) - Two requirements
- Jointly exhaustive (all items can be categorized)
- Mutually exclusive (only one category applies)
- Names of categories and sequence order bear no
assumptions about relationships between
categories - Example
- Categories of SW dev Waterfall, Spiral,
Iterative, OO - Does not imply that Waterfall is better/greater
than Spiral
20Level of Measurement
- Ordinal Scale
- Like nominal except comparison can be applied
- But - we cannot determine magnitude of difference
- Example
- Categories of SW dev orgs based on CMM levels
(1-5) - We can state that dev orgs at level 2 are more
mature then orgs at level 1, and so on... - But we cannot state how much better 2 is than 1,
or 3 is than 2, or 3 is than 1, and so on - Likert rating scale often used at with this scale
- 1 completely dissatisfied
- 2 somewhat dissatisfied
- 3 neutral
- 4 satisfied
- 5 completely satisfied
21Level of Measurement
- Interval Scale
- Like ordinal scale, except now we can determine
exact differences between measurement points - Can use addition/subtraction expressions
- Requires establishment of a well-defined,
repeatable, unit of measurement - Example of interval scale
- Temperature in Fahrenheit (vs. cool, warm, hot)
- Day 1s high temperature was 80 degrees
- Day 2s high temperature was 87 degrees
- Day 2 was 7 degrees warmer than day 1 (addition)
- Day 1 was 7 degrees cooler than day 2
(subtraction)
22Level of Measurement
- Ratio scale
- Interval scale with absolute, non-arbitrary zero
point - Highest level scale
- Can use multiplication and division
- Example
- MBNQA scores
- Company A scored 800 in the range of 0...1000
- Company B scored 400 in the range of 01000
- Company A doubled company Bs score
(multiplication) - Company B scored half as well as Company A
(division)
23Basic Measures
- Measures are ways of analyzing and comparing data
to extract meaningful information. - Data vs. Information
- Data - raw numbers or facts
- Information
- relevant - related to subject
- qualified - characteristics specified
- reliable - dependable, high confidence level
- Basic measures
- Ratio
- Proportion
- Percentage
- Rate
24Basic Measures
- Ratio
- Result of dividing one quantity by another
- Best use is with two distinct groups
- Numerator, denominator are mutually exclusive
- Examples 1
- Developers 10, Testers 5
- Developer to Tester ratio 10 / 5 x 100 200
- Example 2
- Developers 5, Testers 10
- Developer to Tester ratio 5 / 10 x 100 50
25Basic Measures
- Proportion
- Best use is with multiple categories within one
group - For n categories (C) in the group (G) then
- C1/G C2/G Cn/G 1
- P of category desired category / total group
size - Example
- Number of customers surveyed 50
- Number of satisfied customers 30
- Proportion of satisfied customers 30 / 50 or .6
- Proportion of unsatisfied customers 20 / 50 or
.4 - satisfied (.6) plus unsatisfied (.4) 1
26Basic Measures
- Percentage
- A proportion expressed in terms of per hundred
units - Percentages represent relative frequencies
- Total number of cases should always be included
- Total number of cases should be sufficiently
large - Example
- 200 bugs found in 8 KLOC
- 30 requirements bugs (30 / 200 x 100) 15
- 50 design bugs (50 / 200 x 100) 25
- 100 code bugs (100 / 200 x 100) 50
- 20 other bugs (20 / 200 x 100) 10
27Basic Measures
- Rate
- Associated with dynamic changes of a quantity
over time - Changes in y per each unit of x
- x is usually a quantity of time
- time unit of x must be expressed
- Example
- Opportunity For Error 5000 (1. based on 5KLOC)
- Number of defects 200 (2. after one year)
- Defect rate 200 / 5000 1K 40 defects per
KLOC - Notes
- 1. - extremely had to determine OFE
- 2. - hard to know when to measure
28Basic Measures
- Rate
- Six Sigma
- A specific defect rate of 3.4 defective parts per
million (ppm) which has become an industry
standard for the ultimate quality goal. - Sigma is the Greek symbol for standard deviation
- By definition, if the variations in the process
are reduced then its easier to obtain Six Sigma
quality - Some problems arise in SW engineering
- What are the parts
- lines of source code?
- lines of assembly code?
29Reliability
- Reliability - consistency of a number of
measurements taken using the same measurement
method on the same subject - High degree of reliability - repeated
measurements are consistant - Low degree of reliability - repeated measurements
have large variations - Operational definitions (specifics of how
measurement is taken) are key to achieving high
degrees of reliability
30Validity
- Validity is whether the measurement really
measures what is intended to be measured - Construct Validity - validity of a metric to
represent a theory - Difficult to validate abstract concepts
- Example
- Concept - Intelligent people attend college
- Measurement - Sum college enrollment
- Conclusion - The sum of the college enrollment
is the number of intelligent people - Not valid
31Validity
- Criterion-related (predictive) Validity -
validity of a metric to predict a theory or
relationship - Example
- Concept - Safe driving requires knowledge of the
rules and regulations - Measurement - Drivers license test
- Conclusion - Those that have low scores on
drivers license tests are more likely to have an
accident - Content Validity - the degree to which a metric
covers the meaning of the concept - Example - A general math knowledge test needs to
include more than just addition and subtraction.
32Measurement Errors
- Two types of measurement Errors
- Systematic Errors - errors associated with
validity - Random Errors - errors associated with
reliability - Example
- A bathroom scale which is off by 10 pounds
- Each time scale is used the reading equals
- actual weight 10 pounds variation
- true systematic error random error
- systematic error makes reading invalid
- random error makes reading unreliable
33Measurement Errors
- Ways of assessing Reliability
- Test/Restest - one or more retests are performed
and results compared to previous tests - May expose random errors
- Alternative-form - acquire same measurements
using alternate testing means - May expose systematic errors
34Correlation
- Correlation - a statistical method for assessing
relationships among observed or empirical data
sets - If the correlation coefficient between two
variables is weak, then there is no linear
correlation (but there may be non-linear) - Example - negative linear relationship between
LOC inspected and defects shipped
35Causality
- Identification of cause and effect relationships
in experiments - Three criteria for cause-effect
- 1. Cause must precede effect
- 2. Two variables are empirically related
(relationship can be measured) - 3. Empirical relationship is direct (not
coincidence, or in error)
36Summary
- Operational definitions are valuable in
determining levels and types of metrics to use - Scales and measures have different
characteristics and different intended uses - Avoid using the wrong scale or measure
- Validity and Reliability represent measurement
quality - Correlation and Causality are goals of
measurement (I.e. quest to identify and prove a
cause-effect relationship)
37Follow-up
- List at least 5 quality parameters/attributes
used to measure software quality from the
customer perspective
38Pop Quiz
- What is the difference between validity and
reliability? - Why are software development process models
important to the study of software quality? - Define Six Sigma
- Define MTTF
- T/F Defect density and PUM combined represent a
true measure of customer satisfaction. - T/F If a hypothesis is refuted, then the wrong
metrics were used.
39Software Quality EngineeringCS410
- Class 3b
- Product Quality Metrics
- Process Quality Metrics
- Function Point analysis
40Software Quality Metrics
- Three kinds of Software Quality Metrics
- Product Metrics - describe the characteristics of
product - size, complexity, design features, performance,
and quality level - Process Metrics - used for improving software
development/maintenance process - effectiveness of defect removal, pattern of
testing defect arrival, and response time of
fixes - Project Metrics - describe the project
characteristics and execution - number of developers, cost, schedule,
productivity, etc. - fairly straight forward
41Software Quality Metrics
- Product Metrics
- Mean Time to Failure (MTTF)
- Defect Density
- Problems per User Month (PUM)
- Customer Satisfaction
- Process Project Metrics
- Defect density during machine testing
- Defect arrival patterns during machine testing
- Phased-based defect removal
- Defect removal effectiveness
42Software Quality Metrics
- Some terminology
- Error - a human mistake that results in incorrect
(or incomplete) software - faulty requirement, design flaw, coding error
- Fault (a.k.a. defect) - a condition within the
system that causes a unit of the system to not
function properly - GPF, Abend, crash, lock-up, dead-lock, error
message, etc. - Failure - required function (I.e. the goal)
cannot be performed - An error results in a fault which may cause one
or more failures.
43MTTF
- Mean Time To Failure (MTTF) - measures how long
the software can run before it encounters a
crash - Difficult measurement to obtain because its tied
to the real use of the product - Easier to define requirements for special purpose
software than general use software - MTTF is not widely used by commercial software
developers for these reasons
44Defect Density
- Defect Density (a.k.a. Defect Rate) - is the
number of estimated defects - Estimated because defects are found throughout
the entire life-cycle of the product - Important for cost and resource estimates for the
maintenance phase of the life cycle
45Defect Density
- More specific
- Defect Density (rate) number of defects /
opportunities for errors during a specified time - Number of defects can be approximated as equal to
the number of unique causes of observed failures - Opportunities for error can be expressed as KLOC
- Time frame (life of product or LOP) varies
46Defect Density
- Defect Density Example
- Product is one year old, and is 10 KLOC
- Unique causes of observed failures 50
- Current Defect Density 50/10K x 1K 5 defects
per KLOC per year - After second year
- Unique causes of observed failures 75
- Current Defect Density 75 / 10K x 1K
7.5 defects per KLOC per
2year or 3.75 per
KLOC per year
47Defect Density
- Comparison Issues
- How LOC is calculated
- Count only executable lines
- Note - what is an executable line?? HLL vs.
Assembler - Count executable lines, plus data definitions
- Count executable lines, plus data definitions,
plus comments - Count executable lines, plus data definitions,
plus comments, plus job control language - Count physical lines
- Count logical lines (terminated by )
- Function Point Analysis (FPA) is an alternative
measure of program size
48Defect Density
- Comparison Issues (cont.)
- Timeframes must be the same
- Cannot compare (current) defect rate for a one
year old product to the (current) defect rate of
a four year old product - IBM considers life of product to be 4 years
- Must account for new and modified code in LOC
count (otherwise metric is skewed) - LOC counting must remain consistent
- Defect rate should be calculated for each release
(must use change flags)
49Defect Density
- Change Flags Example
- / Module A - Prolog /
- / Release 1.1 modifications 12/01/97 _at_R11 /
- / Fix for problem report 1127 03/15/98 _at_F1127
/ - ...
- Total_Records 0 / Init records _at_R11A
/ - ...
- Bad_Records Total_Records - Good_Records
/ Calculate num bad recs _at_F1127C / - Flags (a.k.a. Change Control) - CMM level 2
- A - line added by release/fix
- C - line changed by release/fix
- M - line moved by release/fix
- D - line deleted by release/fix (optional)
50Defect Density
- IBM Example
- SSI (current release) SSI (previous release)
- CSI - Deleted - Changed
- SSI - Shipped Source Instructions
- CSI - Changed (and new) Source Instructions
- Defect Rate Metrics for Current Release
- TVUA/KSSI - all APARS (defects) reported on the
total release (inclusive of previous release) - TVUA/KCSI - all APARS (defects) reported on the
new release code - APAR - Authorized Program Analysis Report
(Severity 1-4) - TVUA - Total Valid Unique Apars
51Customer Problem Metrics
- In addition to valid defects, other issues are
viewed as problems by customers - Usability
- Unclear documentation/information
- Missing documentation/information
- Duplicate problems (counted as invalid)
- User errors (traps)
52Customer Problem Metrics
- From customers perspective, the total problem
space is the combination of the defect-oriented
problems and the non-defect-oriented problems.
They all impact the customer, regardless of how
the SW company classifies them. - Total problems can be expressed as Problems per
User Month (PUM) - PUM Total Problems / License-Months
- License-Months Total number of licenses x
number of months in calculation period
53Customer Problem Metrics
- PUM example
- Total defects 75, Licenses 50, Months 6
- PUM 75 / (50 x 6) .25 problems/user month
- PUM is usually calculated for each month after a
software release, and averaged for the year. - Note - PUM counts a defect multiple times,
depending on how pervasive it is (I.e. mainstream
function defects are costly) - Ways to lower PUM
- Improve the development process to reduce defects
- Reduce non-defect-oriented problems (better
documentation, usability, etc.) - Increase the number of licenses (?!)
54Customer Satisfaction
- PUM and Defect Rate are not true measurements of
customer satisfaction, but they do contribute. - Timing, availability, company image, services,
and (customized) customer solutions also
contribute. - Customer satisfaction is usually measured using
the five point (Likert scale), via a customer
survey - 1. - Very dissatisfied
- 2. - Dissatisfied
- 3. - Neutral
- 4. - Satisfied
- 5. - Very satisfied
55Customer Satisfaction
- Common metrics for Customer Satisfaction
- Percent of very satisfied customers
- Percent of satisfied customers (very satisfied
and satisfied) - Percent of dissatisfied customers (dissatisfied
and very dissatisfied) - Percent of non-satisfied (neutral, dissatisfied,
and very dissatisfied) - Scope of three quality metrics (defects, customer
problems, customer satisfaction). Fig. 4.1 p. 94
56Defect Density During Machine Testing
- Machine Testing - testing after code is
integrated into the system library (I.e.
integration testing, function testing, system
testing, regression testing) - Commonly held beliefs
- There is a positive correlation between defect
rates found during testing and the number of
defects injected during development. - There is a positive correlation between the
defect rates found during testing and the defect
rate once product is released. - Counter argument Better testing will uncover
more defects (I.e. maybe the code is just being
tested better)
57Defect Density During Machine Testing
- Release quality
- If defect rate during testing is the same or
lower than previous release then - If current release testing is worse then
- testing needs to be improved (inconclusive about
quality) - Else if release testing is the same (or better)
- the quality is better than previous release
- If defect rate during testing is higher than
previous release then - If testing process was improved then
- the quality is the same or better then previous
release - Else if testing process was not improved then
- the quality is worse than previous release (more
defects)
58Defect Arrival Rate During Machine Testing
- Defect arrival rate provides more information to
supplement the defect density metric - This metric is a view of the patterns and time
between defects. - Different arrival patterns (can) indicate
different quality levels in the software. - Objective - to see declining and stabilizing
arrival rates over time - Supports the idea of shake-out testing where
you attempt to find all the highest level bugs
first so that additional testing is not impacted.
59Defect Arrival Rate During Machine Testing
- Three different metrics for arrival rate
- Raw defect (includes duplicates, and invalids)
arrivals during testing phase per some time
interval (day, week, month, etc). - Valid defect arrivals during testing phase per
some time interval - Defect backlog over time. This is a measure of
workload which could adversely affect quality.
60Phased-Based Defect Removal Pattern
- An extension of defect density metric.
- Defects are tracked at all (inspection/test)
phases of development cycle (design reviews, code
reviews, unit test, integration test, function
test, and system test). - This metric can be correlated to inspection
coverage, and test coverage metrics. - Helps to identify the overall defect removal
ability of the development process. - Fig. 4.3 p. 103
61Defect Removal Effectiveness
- Defect Removal Effectiveness (DRE)
- DRE (Defects removed in the phase / defects
latent in product) x 100 - Where the latent defects can be calculated as the
sum of all defects found in later phases, and the
field (this is a constantly changing number)
62Defect Removal Effectiveness
- Example - Defects per phase
- HLD (I0) review I0 5
- (found 5, latent4, total9), DRE(5/9x100)55
- LLD (I1) review I0 3, I1 4
- (found 4, latent6, total10),
DRE(4/10x100)40 - Code inspection (I2) I0 1, I1 1, I2 10
- (found 10, latent6, total16),
DRE(6/16x100)38 - Unit Test (UT) I0 0, I1 1, I2 5, UT 3
- (found 3, latent1, total4), DRE (3/4x100)
75 - Component Test (CT) I0 0, I1 0, I2 1, UT
1, CT 3 - (found 3, latent1, total4), DRE(3/4x100)
75 - System Test (ST) I0 0, I1 0, I2 0, UT 0,
CT 1, ST 2 - (found 2, latent1, total3), DRE (2/3x100)
67 - Field 2 I0 0, I1 0, I2 0, UT 0, CT
0, ST 1
63Defect Removal Effectiveness
- Example - Defects per phase
- HLD (I0) review I0 5
- (found 5, latent4, total9), DRE(5/9x100)55
- LLD (I1) review I0 3, I1 4
- (found 4, latent6, total10),
DRE(4/10x100)40 - Code inspection (I2) I0 1, I1 1, I2 10
- (found 10, latent6, total16),
DRE(10/16x100)62.5 - Unit Test (UT) I0 0, I1 1, I2 5, UT 3
- (found 3, latent1, total4), DRE (3/4x100)
75 - Component Test (CT) I0 0, I1 0, I2 1, UT
1, CT 3 - (found 3, latent1, total4), DRE(3/4x100)
75 - System Test (ST) I0 0, I1 0, I2 0, UT 0,
CT 1, ST 2 - (found 2, latent1, total3), DRE (2/3x100)
67 - Field 2 I0 0, I1 0, I2 0, UT 0, CT
0, ST 1
Found
Latent
64Defect Removal Effectiveness
- Notes
- Must account for where a defect was introduced.
- As number of field bugs increases DRE must be
recalculated. - Latent - present but not evident (at this phase).
65Function Point Analysis (FPA)
- Alternative size measure to LOC
- Can measure productivity (function points per
person), and quality (defects per function point) - Idea The defect rate should be measured against
how many functions the software provides - Functionality is independent of code size
66Function Point Analysis (FPA)
- Function Points is a weighted total of five major
components - External inputs x 4
- External outputs x 5
- Logical internal files x 10
- External interface files x 7
- External inquiries x 4
67Function Point Analysis (FPA)
- Low and high weighting factors are used to
account for complexity - External inputs, low 3, high 6
- External outputs, low 4, high 7
- Logical internal files, low 7, high 15
- External interface files, low 5, high 10
- External inquiries, low 3, high 6
- Function Count (FC) is then calculated
- FC sum of each component
68Function Point Analysis (FPA)14 system
characteristics are then accessed for impact on
scale of 0 to 5
- 7. End-user efficiency
- 8. On-line update
- 9. Complex processing
- 10. Reusability
- 11. Installation ease
- 12. Operational ease
- 13. Multiple sites
- 14. Facilitation of change
- 1. Data communications
- 2. Distributed functions
- 3. Performance
- 4. Heavily used configuration
- 5. Transaction rate
- 6. On-line data entry
69Function Point Analysis (FPA)
- Value Adjustment Factor (VAF) then calculated
(a.k.a Processing Complexity Adjustment) - VAF 0.65 (0.01 x C)
- where C the sum of all the complexity ratings
- Then Function Points (FP) are calculated
- FP FC x VAF
- The resulting value is the function point rating
for the software. This number can also be used
to convert to a LOC rating for comparison reasons.
70Summary
- Product Quality Metrics - focus on quality
aspects of product, both intrinsic and from
customer view point - Mean Time To Failure
- Defect Density
- Problems per User Month
- Customer Satisfaction
71Summary (cont.)
- Process quality metrics - focus on quality and
effectiveness of the process. - Defect density during machine testing
- Defect arrival rate during machine testing
- Phased based defect removal
- Defect removal effectiveness
- Function Point analysis
- An alternative method to LOC counting