Title: Evaluation of Information Systems Complexity Metrics and Models
1Evaluation of Information SystemsComplexity
Metrics and Models
2Origin
- Complexity metrics were developed by computer
scientists and software engineers - Strongly based on empirical (real world)
measurement, with little theory - Primarily broken into internal and external
measures
3Internal versus External
- Internal measures describe the complexity within
a module (number of decisions, loops,
calculations, etc.) - External measures describe relationships among
modules (program or function calls, external
file activities, input/output, etc.)
4Internal Measures
5Internal Product Attributes
- Size measures
- Input to prediction models
- Normalizing factor for cost, productivity, etc.
- Progress during development
- Typically use lines of code (LOC) or function
point counts - LOC is a better measure for predicting cost and
schedule
6Lines of Code
- Simple complexity metric, often based on number
of executable statements or instruction
statements - Highest defect rates often occurs in small
modules - Larger modules have a smaller defect rate (if
they exist at all) - until too cumbersome - Optimum module size 250 lines
7Function Points
- Function points help avoid biases due to the
programming language(s) used - Provide a more fair basis for comparing
different environments - Focuses on how much work the program
accomplishes, not how concisely it is expressed
8Halstead Metrics
- Also known as Software Science, 1977
- Examine program as compilable tokens
- Tokens are either operators (, -) or operands
(variables) - Derive metrics such as Vocabulary, Length,
Volume, Difficulty, etc. - Not widely used
9Data Structure (Halstead)
- Halsteads ?2 - number of distinct operands in a
module - Operands include number of variables, number
unique constants, and number of labels - Operand usage (OU)
- OU ?2/N2 where N2 is the total number of
operand references
10Software Complexity
- Is a characteristic that influences the
resources needed to build and maintain it - Many different characteristics of software relate
to complexity - These complexity characteristics revolve around
the structure of the software
11Types of Structural Measures
- Control flow
- Addresses sequence in which instructions are
executed - Iteration and looping
- Data flow
- Follows trail of data as it is created and
handled - Depicts behavior of data as it interacts with the
program
12Types of Structural Measures
- Data structure
- Concerned with organization of data itself
- Provides information about difficulties in
handling data and in defining test cases
13Control Flow
- Modeled by directed graphs (control flow graphs)
- Each node corresponds to a single program
statement - Arcs (directed edges) indicate flow of control
from one statement to another
14Control Flow
- Control flow graphs are useful for
- Analysis (estimating number of defects)
- Expressing complexity by a single value
- Assessing testability and test coverage
15Basic Control Constructs
16Cyclomatic Complexity
- McCabe, 1976
- Based on a programs control flow chart
- Related to number of separate graphable areas, or
number of linearly independent paths in the
program - Complexity MC edges - nodes 2( of
unconnected paths)
17Cyclomatic Complexity
- Complexity under 10 generally desired
- Can also find M as number of binary decisions
(yes/no) minus one - Multiple choice decisions with n choices count
as (n-1) binary decisions - Ignores differences between specific types of
control structures
18Cyclomatic Complexity
- Uses of complexity metric
- Identify complex modules needing detailed
inspection or redesign - Identify simple modules needing minimal
inspection and/or testing - Estimate programming, testing and maintenance
effort - Identify potentially troublesome code
19Control Flow Representation of Programs
- Software programs can be represented by linear
directed segments combined with the basic
control flow constructs - Control flow constructs may be nested, e.g. an IF
statement can be inside of a WHILE loop
20Control Flow Representation of Programs
21Control Flow--Linearly Independent Paths
Set of linearly independent paths b1 abcg
b2 abcbcg b3 abefg b4 adefg
b5 adfg Any arbitrary path is equal to a linear
combination of the linearly independent
paths listed above For example, path abcbefg is
equal to b2 b3 - b1
22Knots - Control Flow Crossovers
- Knot measure -- total number of points at which
control flow lines cross
23Syntactic Constructs
- Examine effect of using specific control
structures on defect rate - Is, by definition, language-specific
- Can result in statistically significant
relationships - e.g. Lo used to show that DO WHILE should be
avoided in COBOL
24External Measures
25Computational Complexity
- Examines algorithmic efficiency and use of
machine resources (memory, I/O, storage) - Studies quantitative aspects of solutions to
computational problems - Examples may include sorting efficiency for a
database, managing I/O constraints across a large
scale network, etc.
26Psychological Complexity
- Concerned with characteristics of software that
affect human performance - Injection of defects (when and why does a
programmer make errors?) - Ease of building the software (effort required)
- Ease of maintenance (effort required)
27Data Structure (Database)
- Database size per program size (DBSPPS)
- DBSPPS DBS/PS
- Where DBS is database size in bytes or
characters - PS is program size in source instructions
- Used in COCOMO model as a cost driver
- Ordinal scale measure derived from DBSPPS
28Fan-in and Fan-out
- Focus is the interaction among code modules
- Fan-in of modules which call a given module
- Fan-out of modules which are called by a
given module - Or, more formally...
29Fan-in and Fan-out
- Fan-in of a module is the number of local flows
terminating at the module, plus the number of
data structures from which info is retrieved by
the module - Fan-out of a module is the number of local flows
that emanate from the module, plus the number of
data structures (tables, arrays) that are updated
by the module
30Fan-in and Fan-out
- Do fan-in and fan-out affect software quality?
- Large fan-in modules may be interpolation or
look-up routines - no defect correlation - Large fan-out often relates to high defect rate -
has a high defect correlation - Large fan-in and fan-out is clearly bad
31Fan-in and Fan-out
- Information flow complexity
- Henry and Kafura Size(fan-in fan-out)2
- Shepperd (fan-in fan-out)2
- Henry and Kafura measure helps predict the number
of software maintenance problems - Shepperd measure correlates with software
development time
Henry, S. and D. Kafura, IEEE Transactions on
Software Engineering, 1981. SE-7(5) p. 510-518
Shepperd, M. 1990. Software Engineering Journal
5, 1 (January), pp. 3-10.
32Structure Metrics
- Information flow metric (Henry Selig)
- HC C (fan-in fan-out)2
- where C is the cyclometric complexity
33Structure Metrics
- System complexity (Card Glass)
- Based on structural complexity (average fan-out
squared) and data complexity (based on number of
I/O variables and fan-out) - Quantified effect of complexity on error rate
34Module Call Graph
- Module - a contiguous sequence of program
statements, bounded by boundary elements, having
an aggregate identifier - Or, a distinct, named group of LOC
- The module call graph shows which modules call
each other, and what key information is passed
among them
35Module Call Graph
36Module Coupling Measures
- Average number of calls per module (ANCPM)
- Fraction of modules that make calls (FMC)
37Information Flow Measures
- Types of information flows
- Local direct flow
- Module invokes a 2nd module passes info to it
- Invoked module returns result to the caller
- Local indirect flow
- Invoked module returns info that is subsequently
passed to a second invoked module - Global flow
- Info flows from one module to another via a
global data structure
38IEEE-STD-982
- Number of Entries and Exits per Module, m
- Like fan-in and fan-out
- m entries exits
- Software Science measures
39IEEE-STD-982
- Graph-Theoretic Complexity
- Static ComplexityC Edges - Nodes 1
- Generalized Static ComplexityBased on summing
resources needed for each module (e.g. storage,
access time, etc.) - Dynamic complexityComplexity as it changes over
time across a network
40IEEE-STD-982
- Cyclomatic complexity
- Minimal Unit Test Case Determination
- Determine number of independent paths through a
module, to get minimum number of test cases for
unit testing - Data or information flow complexity
- Fan-in and fan-out of variables
41IEEE-STD-982
- Design Structure
- Adds weighted () average of six parameters
- Whether designed top down (Y/N)
- Module inter-dependence
- Module dependence on prior processing
- Database size ( of elements)
- Database compartmentalization
- Module single entrance and exit (Y/N)
- Weighting chosen to meet project needs
42Other Measures
- Compiler measures
- Size (bytes of compiled code)
- Number of symbols and variables
- Cross-reference of all labels
- Statement count
43Other Measures
- Configuration Management Library Measures
- Number of code modules
- Number of versions of each module
- History of change dates of each module
- Module size
- Number of related documents for each module
44Availability Metrics
- Most information systems are critical to
day-to-day operations - Witness the recent crash of Google making news
for only 15 minutes of non-availability - Availability depends on 1) how often the system
goes down, and 2) how long it takes to restore it
after a crash
45Availability Metrics
- Perfect availability (100) is nice to dream of,
but realistically, higher reliability is more
expensive - Often measure availability by the number of 9s
in the desired level of availability - Two nines is 99, three nines is 99.9, four
nines is 99.99, etc.
46Availability Metrics
47Achieving High Availability
- Many techniques are used to help ensure that high
levels of availability are possible - Duplicate systems (clustering)
- RAID data duplication
- Duplicate power supplies
- Independent power supplies
- Uninterruptible power supplies (UPS)
48Availability and Code Quality
- Capers Jones demonstrated a clear connection
between code quality (defect rate) and the
corresponding mean time to failure (MTTF), which
is a key aspect of availability - Consistent methods for measurement and
definitions of terms are needed for further
refinement
49Customer Outage Data
- In order to determine availability, the actual
customer-visible system outage time needs to be
collected - In order to get this data, the customer must
place a very high priority on availability - This data could be used to identify software
components which most reduce availability
50Availability
- We also expect that availability for a new system
should increase over the first couple years of
its use - Defect causal analysis can help reduce the root
cause of defects, thereby improving availability