Software Design and the Evolution of Evolvability - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Software Design and the Evolution of Evolvability

Description:

In most software projects, we don't have access to traits (features) ... Study SECO using simpler models, in anticipation of applying results to real code ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 63
Provided by: terryva7
Category:

less

Transcript and Presenter's Notes

Title: Software Design and the Evolution of Evolvability


1
Software Design and the Evolution of Evolvability
  • Dissertation Talk
  • Terry Van Belle
  • May 3, 2004

2
Abstract
  • Software is often hard to change
  • Biology has already faced this problem
  • Software archeology
  • The SECO Model
  • Analysis and Synthesis
  • Code Factoring (Genetic Programming)
  • Encapsulation (Toy Problems and Real Code)
  • Module Optimization (Toy Problems and Real Code)
  • New Metrics

3
Outline
  • Introduction
  • The Wagner/Altenberg Model
  • The SECO Model
  • Experiments
  • Analyzing Code Factoring
  • Analyzing Encapsulation
  • Optimizing Modularity
  • Contributions
  • Conclusions

4
Introduction
  • Engineering goes through stages
  • Best practices
  • Theory
  • Further principles
  • Software engineering still largely on first stage
  • Evolution is an important but overlooked part of
    SE theory

5
Software Evolvability
  • Software evolves?
  • Software adapts to environmental changes
  • Beyond version 1.0
  • Environment User Requirements
  • Evolvability Capacity to evolve
  • Evolution driven by good mutations
  • Equal fitness, different evolvabilities
  • Software Evolvability
  • Ability to change software in response to changes
    in requirements
  • Short-term success vs. Long-term success

6
Biological Modularity
  • Module is
  • A complex of genes
  • Single purpose
  • Limited influence on other modules
  • How does biological modularity evolve?
  • Wagner/Altenberg (1995)
  • Modularity improves evolvability
  • Allows for independently evolving traits

7
Wagner/Altenberg Example
8
Wagner/Altenberg Example
Coloring
Leg Length
Traits
Polygeny
Pleiotropy
A
B
C
Genes
9
Wagner/Altenberg Example
Left Side
Right Side
Traits
A
B
C
Genes
10
Code Polygeny and Pleiotropy
Spell Checker
Italicize
Features
Code Polygeny
Code Pleiotropy
get_text()
check_word()
get_font()
Code
11
Software Archeology
  • In most software projects, we dont have access
    to traits (features)
  • But we do have the code change history
  • Exploit regularities in the change history
  • Improve evolvability by
  • Grouping together frequent changes
  • Minimizing interactions between modules
  • Metrics Evolutionary vs. Static vs. Dynamic

12
The SECO Model
  • A Model of Software Change
  • Code divided into elements
  • Non-overlapping subsets of code
  • E e1, e2, , eN
  • Elements linked together via changes
  • C c1, c2, , cn
  • ci a subset of E
  • Modeling Changes
  • Evolutionary Computation
  • Change Propagation and Correlation

13
Change Propagation Example
AreaAverager
height, width
radius
side
Circle
Square
Rectangle
14
Change Propagation Example
rarely changes
AreaAverager
Shape
area()
Circle
Square
Rectangle
15
Change Propagation Model
Ps
Pt
Ps
e1
e3
Pt
Pt
Pt
Ps
e2
e4
Ps
Ps Seed Probability Pt Transmission
Probability
16
Change Correlation Model
r13
e1
e3
e2
!e2
A
B
e1
r12
r34
r23
C
D
!e1
e2
e4
AD-BC
r12
v (AB)(CD)(AC)(BD)
17
Outline
  • Introduction
  • The Wagner/Altenberg Model
  • The SECO Model
  • Experiments
  • Analyzing Code Factoring
  • Analyzing Encapsulation
  • Optimizing Modularity
  • Contributions
  • Conclusions

18
Evolvability in Code Factoring
  • Can code pleiotropy help evolvability?
  • Factoring code minimizes number of necessary
    changes
  • Can Genetic Programming in a changing environment
    discover this fact?
  • Supply the genomes with an ADF
  • Symbolic regression on y Asin(Ax)
  • A varies every five generations

19
Evolution of Evolvability over Time
20
Typical solution
ADF0
RPB
-

-
exp
adf0
sin
/
cos
sin

exp
sin
0.938
0.645
0.645
sin
x
adf0
0.610
21
Matching Correlations
Highly Correlated
Frequency
Amplitude
Sine
Features
ADF
ADF sin(ADF x)
Code
22
Drawbacks
  • Unfortunately, EC with dynamic fitness function
    doesnt scale well
  • y Asin(Ax) Bsin(Bx)
  • Also, change history complicated, integrated with
    EC process
  • Study SECO using simpler models, in anticipation
    of applying results to real code

23
Outline
  • Introduction
  • The Wagner/Altenberg Model
  • The SECO Model
  • Experiments
  • Analyzing Code Factoring
  • Analyzing Encapsulation
  • Optimizing Modularity
  • Contributions
  • Conclusions

24
Simple Models from SECO
  • Two sets of experiments based more closely on
    SECO
  • Encapsulation splitting into interface/implement
    ation
  • Change Propagation
  • Interfaces reduce work, but lead to rare,
    catastrophic changes
  • Good algorithms for finding modularities
  • Change Correlation
  • Compared several algorithms on simulated change
    sets
  • Increasing difficulty in separating change sets

25
Real-World Software
  • Applied results from previous chapter to real
    code
  • Three Open-Source Projects
  • Jikes RVM
  • a Java virtual machine
  • Jakarta Tomcat
  • a Java servlet container
  • Net Beans
  • an IDE based on Java Beans
  • Change history from CVS repositories

26
The Effectiveness of Encapsulation
  • Interfaces improve evolvability, but
  • They split work into small/frequent and
    rare/large changes
  • Refine this idea
  • Define Elements as Java language elements, e.g.
  • protected method body
  • public interface
  • static field
  • Daily changes from CVS
  • What types of language elements are touched each
    day?

27
Evolvability Metrics
  • Likelihood
  • Probability that an element is part of a change
  • Impact
  • Expected change size, given element has changed
  • Work
  • Likelihood Impact
  • Acuteness
  • Impact / Likelihood
  • Acute Interfaces vs Chronic Implementations

28
Language Elements - Jikes
29
Evolution of Work Net Beans
30
Highly Optimized Tolerance
  • Doyle and Carlson 1999
  • Engineering a system produces a heavy-tailed
    distribution of failures
  • Conservation of Fragility
  • Encapsulation Engineering the system
  • Failure Change
  • Programming by interfaces induces a Conservation
    of Change

31
Outline
  • Introduction
  • The Wagner/Altenberg Model
  • The SECO Model
  • Experiments
  • Analyzing Code Factoring
  • Analyzing Encapsulation
  • Optimizing Modularity
  • Contributions
  • Conclusions

32
Optimizing Package Structure
  • Can we generate a package structure better than
    the existing one?
  • Elements are files
  • Changed if added, deleted, or touched
  • Hourly granularity
  • Partition files into packages
  • Compare results with current modularity, as
    expressed by unique directories

33
Modularity Metrics
  • Why do we use modules?
  • Aggregation
  • Segregation
  • Module design lies in the tension between these
    forces
  • Two metrics to capture these forces
  • Breadth average number of modules touched
  • Weight average total touched module size

34
Modularity Metrics, continued
  • Breadth is trivially minimized by putting all
    files in one module
  • Weight is trivially minimized by giving every
    file its own module
  • Ideally we want to minimize both

Coarse-grained
Weight
Ideal
Fine-grained
Breadth
35
Clustering Algorithm
0.5
-0.3
0.1
1.0
0.7
0.4
0.7
0.2
0.9
-0.2
-0.1
0.0
36
Clustering Algorithm
37
ModPartition Algorithm
  • Variant of the Kernighan-Lin Algorithm
  • A greedy algorithm, but able to move through
    fitness valleys
  • Allows clusters to move across modules
  • Adapted to generate module structure
  • Use fitness instead of edge crossings
  • Fitness ? breadth weight
  • Pre-set maximum number of modules

38
FastModPartition Algorithm
  • ModPartition is too slow for real code
  • A quicker, recursive version of ModPartition
  • First, divide into modules 0 and 1
  • Divide module 0 into 0 and 2
  • Divide module 1 into 1 and 3, and so on
  • Stop after predetermined limit, or when modules
    dont split anymore
  • Two orders of magnitude faster than ModPartition

39
Modularity Scores, Jikes
40
Jikes Evolution
Package declarations
examples
jdp
on-stack replacement
JMTk
41
Jikes Change Correlations
examples
on-stack replacment
JMTk
42
Sample Module, Jikes RVM
  • Module 4
  • rvm/src/vm/arch/intel/runtime/VM_DynamicLinkerHelp
    er.java
  • rvm/src/vm/arch/powerPC/runtime/VM_DynamicLinkerHe
    lper.java
  • rvm/src/vm/compilers/optimizing/ir/util/OPT_BasicB
    lockEnumeration.java
  • rvm/src/vm/compilers/optimizing/ir/util/OPT_IREnum
    eration.java
  • rvm/src/vm/compilers/optimizing/ir/util/OPT_Instru
    ctionEnumeration.java
  • Note the repeated names
  • Intel and PowerPC architecture-specific files are
    grouped together
  • Group by function, not implementation

43
Existing Structure
vm
arch
powerPC
intel
. . .
runtime
runtime
. . .
VM_DLH.java
VM_DLH.java
. . .
. . .
44
Refactored Structure
vm
runtime
. . .
VM_IntelDLH.java
VM_PowerPCDLH.java
. . .
45
Contributions
  • Grounding software engineering in evolutionary
    history
  • Made explicit the link between evolvability and
    code factoring, using EC
  • Formed a link between HOT and Software
    engineering
  • Developed automated techniques that improved
    software modularity
  • Devised new metrics for measuring evolvability
  • Techniques discovered a package design principle

46
Conclusions
  • FastModPartition is effective at optimizing
    package structure of real code
  • Group by functionality, not implementation
  • Beyond the Conservation of Change
  • Software Evolvability is an important aspect of
    Software Design
  • New insights in design principles
  • New ways to optimize software

47
Biological and Software Evolvability
Biological Evolution Many generations Changing
environment
EC Many generations Changing fitness
Human programmers Accumulated wisdom Changing
requirements
Design Principles
Evolvable Genotype
Software Evolvability
Biological Evolvability
Long-term Success
48
Software Evolution Cycle
Actual Behavior
Software
User Desire
Requirements
Desired Behavior
49
Effectiveness of Encapsulation
  • By splitting elements into interface and
    implementation, can we reduce the expected change
    size?
  • Undifferentiated Configuration
  • 1000 elements, 7000 directed edges representing
    calls
  • Split Configuration
  • 2000 elements, 7000 call edges, 1000
    implementation edges
  • Average over 100,000 random graphs

50
Effectiveness of Encapsulation
0.1
1/N
A
1/N
B
0.1
0.1
C
D
1/N
1/N
51
Effectiveness of Encapsulation
A
B
(1-a1)/N
0.5
1
B
A
a1/N
D
C
C
D
52
Encapsulation Minimizes Changes
  • Undifferentiated
  • Mean 3.254
  • Median 2
  • Split
  • Mean 1.247
  • Median 1

53
Likelihood vs. Impact Results
54
Acuteness Distributions
55
Optimizing Modularity
  • Given different types of requirements can we
    develop a good modularity?
  • Requirements types
  • Distinct
  • Hierarchical
  • Cross-Cutting
  • Algorithms Compared
  • Clustering
  • ModPartition, FastModPartition

56
Types of Requirements
x x
x x
Distinct
x x
x
Hierarchical
Cross-Cutting
x x xx x
57
Hierarchical, N 128, Pc0.1
58
ModPartition Algorithm
0.3
59
ModPartition Algorithm
0.2
X
60
ModPartition Algorithm
-0.1
X
X
61
Calculating Breadth
Unchanged
x
Changed
Module Changed
Time
x
x
x
x
x
x
x
x
x
x
Files
x
x
x
x
x
x
x
x
x
x
x
x
x
x
2
1
2
4
1
1
3
2
Breadth 2
62
Calculating Weight
Unchanged
x
Changed
Module Changed
Time
x
x
x
x
x
x
x
x
x
x
Files
x
x
x
x
x
x
x
x
x
x
x
x
x
x
5
4
3
10
4
1
6
6
Weight 4.875
Write a Comment
User Comments (0)
About PowerShow.com