Title: Software Design and the Evolution of Evolvability
1Software Design and the Evolution of Evolvability
- Dissertation Talk
- Terry Van Belle
- May 3, 2004
2Abstract
- Software is often hard to change
- Biology has already faced this problem
- Software archeology
- The SECO Model
- Analysis and Synthesis
- Code Factoring (Genetic Programming)
- Encapsulation (Toy Problems and Real Code)
- Module Optimization (Toy Problems and Real Code)
- New Metrics
3Outline
- Introduction
- The Wagner/Altenberg Model
- The SECO Model
- Experiments
- Analyzing Code Factoring
- Analyzing Encapsulation
- Optimizing Modularity
- Contributions
- Conclusions
4Introduction
- Engineering goes through stages
- Best practices
- Theory
- Further principles
- Software engineering still largely on first stage
- Evolution is an important but overlooked part of
SE theory
5Software Evolvability
- Software evolves?
- Software adapts to environmental changes
- Beyond version 1.0
- Environment User Requirements
- Evolvability Capacity to evolve
- Evolution driven by good mutations
- Equal fitness, different evolvabilities
- Software Evolvability
- Ability to change software in response to changes
in requirements - Short-term success vs. Long-term success
6Biological Modularity
- Module is
- A complex of genes
- Single purpose
- Limited influence on other modules
- How does biological modularity evolve?
- Wagner/Altenberg (1995)
- Modularity improves evolvability
- Allows for independently evolving traits
7Wagner/Altenberg Example
8Wagner/Altenberg Example
Coloring
Leg Length
Traits
Polygeny
Pleiotropy
A
B
C
Genes
9Wagner/Altenberg Example
Left Side
Right Side
Traits
A
B
C
Genes
10Code Polygeny and Pleiotropy
Spell Checker
Italicize
Features
Code Polygeny
Code Pleiotropy
get_text()
check_word()
get_font()
Code
11Software Archeology
- In most software projects, we dont have access
to traits (features) - But we do have the code change history
- Exploit regularities in the change history
- Improve evolvability by
- Grouping together frequent changes
- Minimizing interactions between modules
- Metrics Evolutionary vs. Static vs. Dynamic
12The SECO Model
- A Model of Software Change
- Code divided into elements
- Non-overlapping subsets of code
- E e1, e2, , eN
- Elements linked together via changes
- C c1, c2, , cn
- ci a subset of E
- Modeling Changes
- Evolutionary Computation
- Change Propagation and Correlation
13Change Propagation Example
AreaAverager
height, width
radius
side
Circle
Square
Rectangle
14Change Propagation Example
rarely changes
AreaAverager
Shape
area()
Circle
Square
Rectangle
15Change Propagation Model
Ps
Pt
Ps
e1
e3
Pt
Pt
Pt
Ps
e2
e4
Ps
Ps Seed Probability Pt Transmission
Probability
16Change Correlation Model
r13
e1
e3
e2
!e2
A
B
e1
r12
r34
r23
C
D
!e1
e2
e4
AD-BC
r12
v (AB)(CD)(AC)(BD)
17Outline
- Introduction
- The Wagner/Altenberg Model
- The SECO Model
- Experiments
- Analyzing Code Factoring
- Analyzing Encapsulation
- Optimizing Modularity
- Contributions
- Conclusions
18Evolvability in Code Factoring
- Can code pleiotropy help evolvability?
- Factoring code minimizes number of necessary
changes - Can Genetic Programming in a changing environment
discover this fact? - Supply the genomes with an ADF
- Symbolic regression on y Asin(Ax)
- A varies every five generations
19Evolution of Evolvability over Time
20Typical solution
ADF0
RPB
-
-
exp
adf0
sin
/
cos
sin
exp
sin
0.938
0.645
0.645
sin
x
adf0
0.610
21Matching Correlations
Highly Correlated
Frequency
Amplitude
Sine
Features
ADF
ADF sin(ADF x)
Code
22Drawbacks
- Unfortunately, EC with dynamic fitness function
doesnt scale well - y Asin(Ax) Bsin(Bx)
- Also, change history complicated, integrated with
EC process - Study SECO using simpler models, in anticipation
of applying results to real code
23Outline
- Introduction
- The Wagner/Altenberg Model
- The SECO Model
- Experiments
- Analyzing Code Factoring
- Analyzing Encapsulation
- Optimizing Modularity
- Contributions
- Conclusions
24Simple Models from SECO
- Two sets of experiments based more closely on
SECO - Encapsulation splitting into interface/implement
ation - Change Propagation
- Interfaces reduce work, but lead to rare,
catastrophic changes - Good algorithms for finding modularities
- Change Correlation
- Compared several algorithms on simulated change
sets - Increasing difficulty in separating change sets
25Real-World Software
- Applied results from previous chapter to real
code - Three Open-Source Projects
- Jikes RVM
- a Java virtual machine
- Jakarta Tomcat
- a Java servlet container
- Net Beans
- an IDE based on Java Beans
- Change history from CVS repositories
26The Effectiveness of Encapsulation
- Interfaces improve evolvability, but
- They split work into small/frequent and
rare/large changes - Refine this idea
- Define Elements as Java language elements, e.g.
- protected method body
- public interface
- static field
- Daily changes from CVS
- What types of language elements are touched each
day?
27Evolvability Metrics
- Likelihood
- Probability that an element is part of a change
- Impact
- Expected change size, given element has changed
- Work
- Likelihood Impact
- Acuteness
- Impact / Likelihood
- Acute Interfaces vs Chronic Implementations
28Language Elements - Jikes
29Evolution of Work Net Beans
30Highly Optimized Tolerance
- Doyle and Carlson 1999
- Engineering a system produces a heavy-tailed
distribution of failures - Conservation of Fragility
- Encapsulation Engineering the system
- Failure Change
- Programming by interfaces induces a Conservation
of Change
31Outline
- Introduction
- The Wagner/Altenberg Model
- The SECO Model
- Experiments
- Analyzing Code Factoring
- Analyzing Encapsulation
- Optimizing Modularity
- Contributions
- Conclusions
32Optimizing Package Structure
- Can we generate a package structure better than
the existing one? - Elements are files
- Changed if added, deleted, or touched
- Hourly granularity
- Partition files into packages
- Compare results with current modularity, as
expressed by unique directories
33Modularity Metrics
- Why do we use modules?
- Aggregation
- Segregation
- Module design lies in the tension between these
forces - Two metrics to capture these forces
- Breadth average number of modules touched
- Weight average total touched module size
34Modularity Metrics, continued
- Breadth is trivially minimized by putting all
files in one module - Weight is trivially minimized by giving every
file its own module - Ideally we want to minimize both
Coarse-grained
Weight
Ideal
Fine-grained
Breadth
35Clustering Algorithm
0.5
-0.3
0.1
1.0
0.7
0.4
0.7
0.2
0.9
-0.2
-0.1
0.0
36Clustering Algorithm
37ModPartition Algorithm
- Variant of the Kernighan-Lin Algorithm
- A greedy algorithm, but able to move through
fitness valleys - Allows clusters to move across modules
- Adapted to generate module structure
- Use fitness instead of edge crossings
- Fitness ? breadth weight
- Pre-set maximum number of modules
38FastModPartition Algorithm
- ModPartition is too slow for real code
- A quicker, recursive version of ModPartition
- First, divide into modules 0 and 1
- Divide module 0 into 0 and 2
- Divide module 1 into 1 and 3, and so on
- Stop after predetermined limit, or when modules
dont split anymore - Two orders of magnitude faster than ModPartition
39Modularity Scores, Jikes
40Jikes Evolution
Package declarations
examples
jdp
on-stack replacement
JMTk
41Jikes Change Correlations
examples
on-stack replacment
JMTk
42Sample Module, Jikes RVM
- Module 4
- rvm/src/vm/arch/intel/runtime/VM_DynamicLinkerHelp
er.java - rvm/src/vm/arch/powerPC/runtime/VM_DynamicLinkerHe
lper.java - rvm/src/vm/compilers/optimizing/ir/util/OPT_BasicB
lockEnumeration.java - rvm/src/vm/compilers/optimizing/ir/util/OPT_IREnum
eration.java - rvm/src/vm/compilers/optimizing/ir/util/OPT_Instru
ctionEnumeration.java - Note the repeated names
- Intel and PowerPC architecture-specific files are
grouped together - Group by function, not implementation
43Existing Structure
vm
arch
powerPC
intel
. . .
runtime
runtime
. . .
VM_DLH.java
VM_DLH.java
. . .
. . .
44Refactored Structure
vm
runtime
. . .
VM_IntelDLH.java
VM_PowerPCDLH.java
. . .
45Contributions
- Grounding software engineering in evolutionary
history - Made explicit the link between evolvability and
code factoring, using EC - Formed a link between HOT and Software
engineering - Developed automated techniques that improved
software modularity - Devised new metrics for measuring evolvability
- Techniques discovered a package design principle
46Conclusions
- FastModPartition is effective at optimizing
package structure of real code - Group by functionality, not implementation
- Beyond the Conservation of Change
- Software Evolvability is an important aspect of
Software Design - New insights in design principles
- New ways to optimize software
47Biological and Software Evolvability
Biological Evolution Many generations Changing
environment
EC Many generations Changing fitness
Human programmers Accumulated wisdom Changing
requirements
Design Principles
Evolvable Genotype
Software Evolvability
Biological Evolvability
Long-term Success
48Software Evolution Cycle
Actual Behavior
Software
User Desire
Requirements
Desired Behavior
49Effectiveness of Encapsulation
- By splitting elements into interface and
implementation, can we reduce the expected change
size? - Undifferentiated Configuration
- 1000 elements, 7000 directed edges representing
calls - Split Configuration
- 2000 elements, 7000 call edges, 1000
implementation edges - Average over 100,000 random graphs
50Effectiveness of Encapsulation
0.1
1/N
A
1/N
B
0.1
0.1
C
D
1/N
1/N
51Effectiveness of Encapsulation
A
B
(1-a1)/N
0.5
1
B
A
a1/N
D
C
C
D
52Encapsulation Minimizes Changes
- Undifferentiated
- Mean 3.254
- Median 2
- Split
- Mean 1.247
- Median 1
53Likelihood vs. Impact Results
54Acuteness Distributions
55Optimizing Modularity
- Given different types of requirements can we
develop a good modularity? - Requirements types
- Distinct
- Hierarchical
- Cross-Cutting
- Algorithms Compared
- Clustering
- ModPartition, FastModPartition
56Types of Requirements
x x
x x
Distinct
x x
x
Hierarchical
Cross-Cutting
x x xx x
57Hierarchical, N 128, Pc0.1
58ModPartition Algorithm
0.3
59ModPartition Algorithm
0.2
X
60ModPartition Algorithm
-0.1
X
X
61Calculating Breadth
Unchanged
x
Changed
Module Changed
Time
x
x
x
x
x
x
x
x
x
x
Files
x
x
x
x
x
x
x
x
x
x
x
x
x
x
2
1
2
4
1
1
3
2
Breadth 2
62Calculating Weight
Unchanged
x
Changed
Module Changed
Time
x
x
x
x
x
x
x
x
x
x
Files
x
x
x
x
x
x
x
x
x
x
x
x
x
x
5
4
3
10
4
1
6
6
Weight 4.875