Title: Evaluation of Information Systems Introduction and Overview
1Evaluation of Information SystemsIntroduction
and Overview
2Syllabus
- This class focuses on understanding the types of
measurements which can support a software
development or maintenance project - We will use the statistics program SPSS to
manipulate data and generate graphs - The Kan text is supplemented by optional readings
3My Biases
- DOD and FAA background
- Systems Engineering approach - because software
doesnt live in a vacuum! - Mostly work with long-lived systems, so
maintenance issues get lots of attention - Metrics focus on supporting decision making
during a project
4Why So Many Military Sources?
- They have vast experience with complex software
and systems development and acquisition, - Which was paid for with tax dollars, so
- Much of its FREE!
5Who cares
- about statistics and measuring software
activities? - The main models for guiding a software project,
ISO 9000 and the Capability Maturity Model
Integration (CMMI), both recommend use of
statistical process control (SPC) techniques to
help predict future performance by an organization
6Software Crisis
- For every six new large-scale software systems
put into operation, two others are canceled - Average software development project overshoots
its schedule by 50 - Three quarters of all large scale systems are
operating failures that either do not function as
intended or are not used at all
7Software Crisis
- Most computer code is handcrafted from raw
programming languages by artisans using
techniques they neither measure or are able to
repeat consistently - There is a desperate need to evaluate software
product and process through measurement and
analysis - Thats why we have required this course!
8Waterfall LifeCycle Model
9Waterfall Model
- Conceptual Development includes defining the
overall purpose of the product, who would use it,
and how it relates to other products - Requirements Analysis includes definition of WHAT
the product must do, such as performance goals,
types of functionality, etc.
10Waterfall Model
- Architectural Design, or high level design,
determines the internal and external interfaces,
component boundaries and structures, and data
structures - Detailed Design, or low level design, breaks the
high level design down into detailed
requirements for every module
11Waterfall Model
- Coding is the actual writing of source code,
scripts, macros, and other artifacts - Unit Testing covers testing the functionality of
each module against its requirements - System Testing can include string or component
tests of several related modules, integration
testing of several major components, and full
scale system testing
12Waterfall Model
- After system testing, there may be early release
options, such as alpha and beta testing, before
official release of the product - Early releases test the ability of your
organization to deliver and support the product,
respond to customer inquiries, and fix problems
13Prototyping Life Cycle
- When requirements are very unclear, an iterative
prototyping approach can be used to resolve
interface and feature requirements before the
rest of development is done - Do preliminary requirements analysis
- Iterate Quick Design, Build Prototype, Refine
Design until customer is happy
14Prototyping Life Cycle
- Then resume full scale development of the system
using some other life cycle model - Its critical to do quick development cycles
during prototyping, or else youre just
redeveloping the whole system over and over
15Spiral Life Cycle
- Used for resolving severe risks before
development begins, the spiral life cycle uses
more types of techniques than just prototyping to
resolve each big risk - Then another life cycle is used to develop the
system
16Iterative Life Cycle
- Many modern techniques, such as the Rational
Unified Process (RUP) advocate an iterative life
cycle - RUP has four major phases, defined by the
maturity of the system rather than traditional
life cycle activities - Inception, Elaboration, Construction, and
Transition
17Iterative Life Cycle
- Like the spiral, iterative life cycles are driven
by the need to resolve key risks, but here they
are resolved all the way to implementation - Much more focus on early implementation of the
core system, then building on it with each
iteration
18Cleanroom Methodology
- The Cleanroom methodology is a severely rigorous
approach to software development - Uses formal design specification, statistical
testing, and no unit testing - Produces software with certifiable levels of
reliability - Very rarely used
19Life Cycle Standards
- The IEEE Software Engineering Standards are one
source of information on many aspects of software
development and maintenance - The standard ISO/IEC 12207, Software Life Cycle
Processes has collected all major life cycle
activities into one overall guidance document
You can download ISO/IEC 12207 see IEEE
instructions on my web site
20Process Maturity Models
- Quality standards and goals are often embodied in
process maturity standards, to guide
organizations process improvement efforts - The primary software standard is the Software
Engineering Institutes (SEIs) Capability
Maturity Model Integration (CMMI)
21CMMI
- Describes five maturity levels
- 1. Initial all processes are ad hoc, chaotic,
not well defined. Do your own thing. - 2. Repeatable a project follows a set of defined
processes for management and conduct of software
development
22CMMI
- 3. Defined every project within the organization
follows processes tailored from a common set of
templates - 4. Managed statistical control over processes
has been achieved - 5. Optimizing defect prevention and application
of innovative new process methods are used
23Other CMMs
- CMMI is based on the original CMM for Software
(SW-CMM) - The latter led to many other variations before
the models were integrated circa 2000
24Malcolm Baldrige
- The Malcolm Baldrige National Quality Award
(MBNQA) is a US-based quality award created in
1988 by the Department of Commerce - Includes a broader scope, such as customer
satisfaction, strategic planning, and human
resource management
25ISO 9000
- The international standard for quality management
of an organization is ISO 9000 - Now applies to almost every type of business, but
was first used for manufacturing - Hence includes activities like calibration of
tools
26ISO 9000
- Is facility-based, whereas CMMI is
organization-based - Was revised and republished in December 2000
- Previous editions were dated 1987 and 1994
27Enter Measurement
- Measurement is critical to all process and
quality models (CMMI, ISO 9000, MBNQA, etc.) - Need to define basic concepts of measurement so
we can speak the same language
28Engineering in a Nutshell
29Engineering in a Nutshell
- So in order to create any Product, we need
Resources to use Tools in accordance with some
Processes - Each of those major areas (Product, Resources,
Tools, and Processes) can be a focus of
measurement
30Measurement Needs
- Statistical meaning - need long set of
measurements for one project, and/or many
projects - Could use measurement to test specific
hypotheses - Industry uses of measurement are to help make
decisions and track progress - Need scales to make measurements!
31Measurement Scales
- The measurement scales form the French word for
black, noir (as in film noir) - Nominal (least useful)
- Ordinal
- Interval
- Ratio (most useful)
NOIR is just a mnemonic to remember their
sequence
32Nominal Scale
- A nominal (name) scale groups or classifies
things into categories, which - Must be jointly exhaustive (cover everything)
- Must be mutually exclusive (cant be in two
categories at once) - Are in any sequence (none better or worse)
33Nominal Scale
- Common examples include
- Gender, e.g. This room contains 19 people, of
whom 10 are female, and 9 male - Portions of a system, e.g. suspension,
drivetrain, body, etc. - Job titles (though you could argue theyre
hierarchical)
34Ordinal Scale
- This measurement ranks things in order
- Sequence is important, but the intervals between
ranks is not defined numerically - Rank is relative, such as greater than or less
than - E.g. grades, CMM Maturity levels, inspection
effectiveness ratings
35Interval Scale
- An interval scale measures quantitative
differences, not just relative - Addition and subtraction are allowed
- E.g. common temperature scales (F or C), or a
single date (Feb 15, 1962) - A zero point, if any, may be arbitrary (90 F is
not six times hotter than 15 F!)
36Ratio Scale
- A ratio scale is an interval scale with a
non-arbitrary zero point - Allows division and multiplication
- E.g. defect rates (defects/KSLOC), test scores,
absolute temperature (K or R) - The best type of scale to use, whenever
feasible
37Scale Hierarchy
- Measurement scales are hierarchicalratio
(best) / interval / ordinal / nominal - Lower level scales can always be derived if data
is from a higher scale - E.g. defect rates (a ratio scale) could be
converted to High, Medium, Low or Acceptable,
Not Acceptable, which are ordinal scales
38Why Are Scales Important?
- The types of analysis which are possible, depend
on the type of scale used for the measurements - In statistics, this is roughly broken into
parametric tests (for interval or ratio scaled
data) or non-parametric tests (for nominal or
ordinal scaled data) - Some tests are more specific about the data
scale(s) needed
39Internal vs External Attributes
- Internal - measured purely in terms of the entity
itself by examining the entity on its own,
separate from its behavior, e.g. code complexity - External - measured with respect to how entity
relates to its environment behavior of the
entity is important, e.g. response time
40Internal vs External Attributes
- Users (and managers) mostly interested in
external attributes. External attributes
measured late in development process - Can use internal attribute measurements to
support decision-making about external attributes - Might select architecture based on performance
needs
41Basic Measures - Ratio
- Used for two exclusive populations
- Ratio ( of testers) ( of developers)
- E.g. tester to developer ratio is 14
42Proportions and Fractions
- Used for multiple (gt 2) populations
- Proportion (Number of this population) / (Total
number of population) - Sum of all proportions equals unity
- E.g. survey results
- Proportions based on integer units whereas
fractions are based on real numbered units
43Percentage
- A proportion or fraction multiplied by 100
becomes a percentage - Only cite percentages when N (total population
measured) is above 30 to 50 always provide N
for completeness - Why? Statistical methods are meaningless for very
small populations
44Rate
- Rate conveys the change in a measurement, such as
over time, dx/dt. Rate ( observed events /
of opportunities)constant - Rate requires exposure to the risk being
measured - E.g. defects per KSLOC ( defects)/( of
KSLOC)1000
45Data Analysis
- Raw data is collected, such as the date a
particular problem was reported - Refined data is extracted from one or more raw
data, e.g. the time it took a problem to be
resolved - Refined data is analyzed to produce derived
data, such as the average time to resolve problems
46Models
- Focus on select elements of the problem at hand
and ignores irrelevant ones - May show how parts of the problem relate to each
other - May be expressed as equations, mappings, or
diagrams - May be derived before or after measurement
(theory vs. empirical)
47Models Examples
Simplest model of effort estimation Effort
f(SLOC) (effort is some function of
SLOC) There are many possible representations,
such as
Effort a(SLOC)b
____________________________________________
___
EffortabSLOC
bgt1
0ltblt1
Effort
Effort
Effort
SLOC
SLOC
SLOC
48Elasticity
- The elasticity of y with respect to x is the
percentage change in y when x changes by 1 - For a logarithmic model, with y as a function
of two things, x1 and x2 - ln(y) a bln(x1) cln(x2)
- Then ln(y) changes b percent if ln(x1) changes
by 1, therefore, b is the elasticity of ln(y)
with respect to ln(x1)
49Elasticity
- Elasticity is also known as the slope of ln(y)
with respect to, in this case, ln(x1) - Often see this concept to express a change in
something such as if your blood alcohol level
goes up 0.02, your accident rate goes up 32
(I made up those numbers)
50Exponential Notation
- You might see output of the form 2.78E-12
- This example means 2.78 10-12
- A negative exponent, e.g. 12, makes it a small
number - The leading number, here 2.78, controls whether
it is a positive or negative number
51Precision
- Keep your final output to a consistent level of
precision, e.g. dont report one number as 12
and another as 11.8625125982351 - Pick a reasonable level of precision
(significant digits) similar to the accuracy of
your inputs - Wait until the final answer to round off
52Graphing
- A typical graph shows Y on the vertical axis, and
X on the horizontal axis - Y is the dependent variable, and X is
independent, since you can pick any value of X
and determine its matching value of Y
SPSS will sometimes ask for X and Y, other times
independent and dependent variables
53What is R Squared?
- Coefficient of determination, R2, is a measure
of the goodness of fit - R2 ranges from 0 to 1.
- R2 1 is a perfect fit (all data points fall on
the estimated line) - R2 0 means that the variable(s) have no
explanatory power - Having R2 closer to 1 helps choose which math
model is best suited to a problem
54Linear Regression
y
y
y
y a bx
y y e
x
Choose best line by minimizing the sum of the
squares of the horizontal distances between the
empirical points and the line
55Expressing Uncertainty
- We can show the uncertainty in our measurements
by putting the standard error after each term - A line is given in the form y b0 b1x
- Show standard errors with y (b0 /- seb0)
(b1 /- seb1)x - See example on next slide
56Y (6.2/-1.9) (1.3 /-0.42) X
- The numbers in parentheses are the estimated
coefficients plus or minus the standard errors
associated with those coefficients - In the example the constant parameter (here the
y-intercept) was estimated to be 6.2 with a
standard error of 1.9 - The parameter associated with x (here, the slope)
was estimated to be 1.3 with a standard error of
0.42
57Level of Confidence
- Generally, we can say that the actual value of a
parameter estimate is in the range of 2
standard deviations of its estimated value with a
95 level of confidence - Thus the value of the constant parameter lies
between 2.4 (i.e., 6.2 21.9) and 10.0 (i.e.,
6.2 21.9) with a 95 level of confidence - With this level of confidence, the parameter
estimate associated with x (slope) lies between
.46 and 2.14
58Level of Confidence
- The default level of confidence for statistical
testing is 95 - Life-and death measurements (e.g. medical
testing) use 99 - Some wimpy software studies might use level of
confidence as low as 80
59The t Statistic
- The t-statistic is defined ast (parameter
estimate) / (standard error) - If t gt 2 then the parameter estimate is
significantly different from zero at the 95
level of confidence in the example on slide 56 - t 6.2/1.9 3.26 for the constant term
- t 1.3/0.42 3.1 for the slope
- Hence both terms are significant.
Note For very precise work, use 1.96 instead of 2
60What can we ignore?
- If a parameter estimate associated with a
variable is not significantly different from zero
at the 95 level of confidence, then the variable
should be omitted from the analysis - This is important later for seeing if a curve fit
is useful if the coefficients pass the T-test,
they may be used
61The 95 Rule
- The 95 rule helps limit how much is practically
possible - Any parameter with a measured mean and non-zero
std error could fall in any range of values
though that range may be very unlikely - The 95 range lets us exclude everything outside
of /- 2 std errors as too unlikely
62The Normal Distribution
- The normal or Gaussian distribution is the
familiar bell shaped curve - The width of the distribution is measured by the
standard deviation s - The area under the curve within /- 1s from the
middle covers 68.26 of all events - Within /- 2s covers 95.44
63The Normal Distribution
- Within /- 3s covers 99.73
- Within /- 4s covers 99.9937
- Within /- 5s covers 99.999943
- Within /- 6s covers 99.9999998
- The Six Sigma quality objective is to have the
number of defects under a couple of parts per
million (ppm) - 3.4 ppm, for reasons covered in the text, instead
of the value implied
64Six Sigma
- Six Sigma was founded by Motorola
- Its best known for its Black and Green Belt
certifications - It focuses on process improvements needed to
consistently achieve extremely high levels of
quality