Title: Refactoring Functional Programs
1Refactoring Functional Programs
- Simon Thompson
- with
- Huiqing Li
- Claus Reinke
- www.cs.kent.ac.uk/projects/refactor-fp
2Session 2
3Overview
- Review mini-project.
- Implementation of HaRe.
- Larger-scale examples.
- Case study.
4Mini-project feedback
- Refactorings performed.
- Refactorings and language features?
- Machine support feasible? Useful?
- Not-quite refactorings? Support possible here?
5Examples
- Argument permutations (NB partial application).
- (Un)group arguments.
- Slice function for a component of its result.
- Error handling / exception handling.
6More examples
- Introduce type synonym, selectively.
- Introduce branded type.
- Modify the return type of a function from T to
Maybe T, Either T S, T. - Ditto for input types and modify variable names
correspondingly.
7Implementing HaRe
8Proof of concept
- To show proof of concept it is enough to
- build a stand-alone tool,
- work with a subset of the language,
- pretty print the results of refactorings.
9 or a useful tool?
- Integrate with existing program development
tools stand-alone program links to editors emacs
and vim, any other IDEs also possible. - Work with the complete language Haskell 98?
- Preserve the formatting and comments in the
refactored source code. - Allow users to extend and script the system.
10The refactorings in HaRe
- Rename
- Delete
- Lift / Demote
- Introduce definition
- Remove definition
- Unfold
- Generalise
- Add / remove params
Move def between modules Delete /add to
exports Clean imports Make imports explicit Data
type to ADT
All these refactorings are module aware.
11The Implementation of HaRe
Information gathering
Pre-condition checking
Program transformation
Program rendering
12Information needed
- Syntax replace the function called sq, not the
variable sq parse tree. - Static semantics replace this function sq, not
all the sq functions scope information. - Module information what is the traffic between
this module and its clients call graph. - Type information replace this identifier when it
is used at this type type annotations.
13Infrastructure decisions
- Build a tool that can interoperate with emacs,
vim, yet act separately. - Leverage existing libraries for processing
Haskell 98, for tree transformation as few
modifications as possible. - Be as portable as possible, in the Haskell space.
- Abstract interface to compiler internals?
14Haskell landscape (end 2002)
- Parser many
- Type checker few
- Tree transformations few
- Difficulties
- Haskell 98 vs. Haskell extensions.
- Libraries proof of concept vs. distributable.
- Source code regeneration.
- Real project
15Programatica
- Project at OGI to build a Haskell system
- with integral support for verification at
various levels assertion, testing, proof etc. - The Programatica project has built a Haskell
front end in Haskell, supporting syntax, static,
type and module analysis - freely available under BSD licence.
16The Implementation of HaRe
Information gathering
Pre-condition checking
Program transformation
Program rendering
17First steps lifting and friends
- Use the Haddock parser full Haskell given in
500 lines of data type definitions. - Work by hand over the Haskell syntax 27 cases
for expressions - Code for finding free variables, for instance
18Finding free variables by hand
- instance FreeVbls HsExp where
- freeVbls (HsVar v) v
- freeVbls (HsApp f e)
- freeVbls f freeVbls e
- freeVbls (HsLambda ps e)
- freeVbls e \\ concatMap paramNames ps
- freeVbls (HsCase exp cases)
- freeVbls exp concatMap freeVbls cases
- freeVbls (HsTuple _ es)
- concatMap freeVbls es
- etc.
19This approach
- Boilerplate code 1000 lines for 100 lines of
significant code. - Error prone significant code lost in the noise.
- Want to generate the boiler plate and the tree
traversals - DriFT Winstanley, Wallace
- Strafunski Lämmel and Visser
20Strafunski
- Strafunski allows a user to write general (read
generic), type safe, tree traversing programs,
with ad hoc behaviour at particular points. - Top-down / bottom up, type preserving / unifying,
full
stop
one
21Strafunski in use
- Traverse the tree accumulating free variables
from components, except in the case of lambda
abstraction, local scopes, - Strafunski allows us to work within Haskell
- Other options? Generic Haskell,
Template Haskell, AG,
22Rename an identifier
- rename (Term t)gtPName-gtHsName-gtt-gtMaybe t
- rename oldName newName applyTP worker
- where
- worker full_tdTP (idTP adhocTP
idSite) -
- idSite PName -gt Maybe PName
- idSite v_at_(PN name orig)
- v oldName
- return (PN newName orig)
- idSite pn return pn
23The coding effort
- Transformations straightforward in Strafunski
- the chore is implementing conditions that the
transformation preserves meaning. - This is where much of our code lies.
24Move f from module A to B
- Is f defined at the top-level of B?
- Are the free variables in f accessible within
module B? - Will the move require recursive modules?
- Remove the definition of f from module A.
- Add the definition to module B.
- Modify the import/export lists in module A, B and
the client modules of A and B if necessary. - Change uses of A.f to B.f or f in all affected
modules. - Resolve ambiguity.
25The Implementation of HaRe
Information gathering
Pre-condition checking
Program transformation
Program rendering
26Program rendering example
- -- This is an example
- module Main where
- sumSquares x y sq x sq y
- where sq Int-gtInt
- sq x x pow
- pow 2 Int
- main sumSquares 10 20
- Promote the definition of sq to top level
27Program rendering example
- module Main where
- sumSquares x y
- sq pow x sq pow y where pow 2 Int
- sq Int-gtInt-gtInt
- sq pow x x pow
- main sumSquares 10 20
- Using a pretty printer comments lost and layout
quite different.
28Program rendering example
- -- This is an example
- module Main where
- sumSquares x y sq x sq y
- where sq Int-gtInt
- sq x x pow
- pow 2 Int
- main sumSquares 10 20
- Promote the definition of sq to top level
29Program rendering example
- -- This is an example
- module Main where
- sumSquares x y sq pow x sq pow y
- where pow 2 Int
- sq Int-gtInt-gtInt
- sq pow x x pow
- main sumSquares 10 20
- Layout and comments preserved.
30Token stream and AST
- White space and comments in the token stream.
- Modification of the AST guides the modification
of the token stream. - After a refactoring, the program source is
extracted from the token stream not the AST. - Heuristics associate comments with program
entities.
31Production tool
Programatica parser and type checker
Refactor using a Strafunski engine
Render code from the token stream and syntax tree.
32Production tool (optimised)
Programatica parser and type checker
Refactor using a Strafunski engine
Render code from the token stream and syntax tree.
Pass lexical information to update the syntax
tree and so avoid reparsing
33What have we learned?
- Emerging Haskell libraries make it practical(?)
- Efficiency and robustness
- type checking large systems,
- linking,
- editor script languages (vim, emacs).
- Limitations of editor interactions.
- Reflections on Haskell itself.
34Refactoring
- Refactoring comes in many forms
- micro refactoring as a part of program
development, - major refactoring as a preliminary to revision,
- dealing with legacy code,
- as a part of debugging, understanding,
35Reflections on Haskell
- Cannot hide items in an export list (cf import).
- Field names for prelude types?
- Scoped class instances not supported.
- Ambiguity vs. name clash.
- Tab is a nightmare!
- Correspondence principle fails
36Correspondence
- Operations on definitions and operations on
expressions can be placed in one to one
correspondence - (R.D.Tennent, 1980)
37Correspondence
- Definitions
- where
- f x y e
- f x
- g1 e1
- g2 e2
- Expressions
- let
- \x y -gt e
- f x if g1 then e1 else if g2
38Function clauses
- f x
- g1 e1
- f x
- g2 e2
- Can fall through a function clause no direct
correspondence in the expression language.
- f x if g1 then e1 else if g2
- No clauses for anonymous functions no reason to
omit them.
39Work in progress
- Fold against definitions find duplicate code.
- All, some or one? Effect on the interface
- f x e e
- Traditional program transformations
- Short-cut fusion
- Warm fusion
40Where next?
- Opening up to users API or little language?
- Link with other IDEs (and front ends?).
- Detecting bad smells.
- More useful refactorings supported by us.
- Working without source code.
41API
Refactorings
Refactoring utilities
Strafunski
Haskell
42DSL
Combining forms
Refactorings
Refactoring utilities
Strafunski
Haskell
43Larger-scale examples
- More complex examples in the functional domain
often link with data types. - Dawning realisation that can some refactorings
are pretty powerful. - Bidirectional no right answer.
44Algebraic or abstract type?
data Tr a Leaf a Node a (Tr a) (Tr a)
flatten Tr a -gt a flatten (Leaf x)
x flatten (Node s t) flatten s flatten
t
Tr Leaf Node
45Algebraic or abstract type?
Tr isLeaf isNode leaf left right mkLeaf mkNode
data Tr a Leaf a Node a (Tr a) (Tr
a) isLeaf isNode
flatten Tr a -gt a flatten t isleaf t
leaf t isNode t flatten (left t)
flatten (right t)
46Algebraic or abstract type?
- ?
- Pattern matching syntax is more direct
- but can achieve a considerable amount with
field names. - Other reasons? Simplicity (due to other
refactoring steps?).
- ?
- Allows changes in the implementation type without
affecting the client e.g. might memoise - Problematic with a primitive type as carrier.
- Allows an invariant to be preserved.
47Outside or inside?
Tr isLeaf isNode leaf left right mkLeaf mkNode
data Tr a Leaf a Node a (Tr a) (Tr
a) isLeaf isNode
flatten Tr a -gt a flatten t isleaf t
leaf t isNode t flatten (left t)
flatten (right t)
48Outside or inside?
Tr isLeaf isNode leaf left right mkLeaf mkNode fl
atten
data Tr a Leaf a Node a (Tr a) (Tr
a) isLeaf isNode flatten t
49Outside or inside?
- ?
- If inside and the type is reimplemented, need to
reimplement everything in the signature,
including flatten. - The more outside the better, therefore.
- ?
- If inside can modify the implementation to
memoise values of flatten, or to give a better
implementation using the concrete type. - Layered types possible put the utilities in a
privileged zone.
50Memoise flatten Tr a-gta
data Tree a Leaf vala Node
vala, left,right(Tree a) leaf
Leaf node Node flatten (Leaf x) x flatten
(Node x l r) (x (flatten l flatten r))
data Tree a Leaf vala,
flatten a Node vala,
left,right(Tree a), flattena
leaf x Leaf x x node x l r
Node x l r (x (flatten l
flatten r))
51Memoise flatten
- Invisible outside the implementation module, if
tree type is already an ADT. - Field names in Haskell make it particularly
straightforward.
52Data type or existential type?
data Shape data Shape
Circle Float forall a.
Sh a gt Shape a Rect Float Float
class Sh a where area
Shape -gt Float area a -gt
Float area (Circle f) pir2 perim
a -gt Float area (Rect h w) hw
data Circle Circle
Float perim Shape -gt Float perim (Circle f)
2pir instance Sh Circle perim (Rect h
w) 2(hw) area (Circle f)
pir2
perim (Circle f) 2pir
data Rect Rect Float
instance Sh Rect
area (Rect h w)
hw perim
(Rect h w) 2(hw)
53Constructor or constructor?
data Expr data Expr
Epsilon .... Epsilon ....
Then Expr Expr Then Expr Expr
Star Expr Star Expr
Plus Expr
plus e Then e (Star e)
54Monadification expressions
data Expr Lit Integer
-- Literal integer value Vbl Var
-- Assignable variables Add Expr Expr
-- Expression addition e1e2 Assign Var
Expr -- Assignment xe type Var
String type Store (Var, Integer) lookup
Store -gt Var -gt Integer lookup st x head i
(y,i) lt- st, yx update Store -gt Var -gt
Integer -gt Store update st x n (x,n)st
55Monadification evaulation
eval Expr -gt evalST Expr
-gt Store -gt (Integer, Store)
State Store Integer eval (Lit n) st
evalST (Lit n) (n,st)
do
return n eval (Vbl x) st
evalST (Vbl x) (lookup st x,st)
do st
lt- get
return (lookup st x)
56Monadification evaulation 2
eval Expr -gt evalST Expr
-gt Store -gt (Integer, Store)
State Store Integer eval (Add e1 e2) st
evalST (Add e1 e2) (v1v2, st2)
do where
v1 lt- evalST e1 (v1,st1) eval e1 st
v2 lt- evalST e2 (v2,st2) eval
e2 st1 return (v1v2) eval (Assign x
e) st evalST (Assign x e) (v,
update st' x v) do where
v lt- evalST e (v,st')
eval e st st lt- get
put (update st x v)
return v
57Classes and instances
- Type Store Int
- empty Store
- empty
- get Var -gt Store -gt Int
- get v st head i (var,i) lt- st, varv
- set Var -gt Int -gt Store -gt Store
- set v i ((v,i))
58Classes and instances
- Type Store Int
- empty Store
- get Var -gt Store -gt Int
- set Var -gt Int -gt Store -gt Store
- empty
- get v st head i (var,i) lt- st, varv
- set v i ((v,i))
59Classes and instances
- class Store a where
-
- empty a
- get Var -gt a -gt Int
- set Var -gt Int -gt a -gt a
- instance Store Int where
- empty
- get v st head i (var,i) lt- st, varv
- set v i ((v,i))
- Need newtype wrapper in Haskell 98
end
60Not just programming
- Paper or presentation
- moving sections about amalgamate sections move
inline code to a figure animation - Proof
- introduce lemma remove, amalgamate hypotheses,
- Program
- the topic of the lecture
61Evolving the evidence
- Dependable System Evolution is the software
engineering grand challenge. - Systems built with evidence of their
dependability. - But how to evolve the evidence with the system?
- Refactoring proofs, test coverage data etc.
62Understanding a program
- Take a working semantic tableau system written by
an anonymous 2nd year student - refactor to understand its behaviour.
- Nine stages of unequal size.
- Reflections afterwards.
63An example tableau
?((A?C)?((A?B)?C))
64v1 Name types
- Built-in types
- Prop
- Prop
- used for branches and tableaux respectively.
- Modify by adding
- type Branch Prop
- type Tableau Branch
- Change required throughout the program.
- Simple edit but be aware of the order of
substitutions avoid - type Branch Branch
65v2 Rename functions
- Existing names
- tableaux
- removeBranch
- remove
- become
- tableauMain
- removeDuplicateBranches
- removeBranchDuplicates
- and add comments clarifying the (intended)
behaviour.
- Add test datum.
- Discovered some edits undone in stage 1.
- Use of the type checker to catch errors.
- test will be useful later?
66v3 Literate ? normal script
- Change from literate form
- Comment
- gt tableauMain tab
- gt ...
- to
- -- Comment
- tableauMain tab
- ...
- Editing easier implicit assumption was that it
was a normal script. - Could make the switch completely automatic?
67v4 Modify function definitions
- From explicit recursion
- displayBranch
- Prop -gt String
- displayBranch
- displayBranch (xxs)
- (show x) "\n"
- displayBranch xs
- to
- displayBranch
- Branch -gt String
- displayBranch
- concat . map ("\n") . map show
- Abstraction move from explicit list
representation to operations such as map and
concat which could be over any collection type. - First time round added incorrect (but type
correct) redefinition only spotted at next
stage. - Version control un/redo etc.
68v5 Algorithms and types (1)
- removeBranchDup Branch -gt Branch
- removeBranchDup
- removeBranchDup (xxs)
- x findProp x xs
removeBranchDup xs - otherwise x
removeBranchDup xs - findProp Prop -gt Branch -gt Prop
- findProp z FALSE
- findProp z (xxs)
- z x x
- otherwise findProp z xs
69v5 Algorithms and types (2)
- removeBranchDup Branch -gt Branch
- removeBranchDup
- removeBranchDup (xxs)
- findProp x xs
removeBranchDup xs - otherwise x
removeBranchDup xs - findProp Prop -gt Branch -gt Bool
- findProp z False
- findProp z (xxs)
- z x True
- otherwise findProp z xs
70v5 Algorithms and types (3)
- removeBranchDup Branch -gt Branch
- removeBranchDup nub
- findProp Prop -gt Branch -gt Bool
- findProp elem
71v5 Algorithms and types (4)
- removeBranchDup Branch -gt Branch
- removeBranchDup nub
- Fails the test! Two duplicate branches output,
with different ordering of elements. - The algorithm used is the 'other' nub algorithm,
nubVar - nub 1,2,0,2,1 1,2,0
- nubVar 1,2,0,2,1 0,2,1
- Code using lists in a particular order to
represent sets.
72v6 Library function to module
- Add the definition
- nubVar
- to the module
- ListAux.hs
- and replace the definition by
- import ListAux
- Editing easier implicit assumption was that it
was a normal script. - Could make the switch completely automatic?
73v7 Housekeeping
- Remanings including foo and bar and contra
(becomes notContra). - An instance of filter,
- looseEmptyLists
- is defined using filter, and subsequently
inlined. - Put auxiliary function into a where clause.
- Generally cleans up the script for the next
onslaught.
74v8 Algorithm (1)
- splitNotNot Branch -gt Tableau
- splitNotNot ps combine (removeNotNot ps)
(solveNotNot ps) - removeNotNot Branch -gt Branch
- removeNotNot
- removeNotNot ((NOT (NOT _))ps) ps
- removeNotNot (pps) p removeNotNot ps
- solveNotNot Branch -gt Tableau
- solveNotNot
- solveNotNot ((NOT (NOT p))_) p
- solveNotNot (_ps) solveNotNot ps
75v8 Algorithm (2)
- splitXXX removeXXX solveXXX for each of nine
rules. - The algorithm applies rules in a prescribed
order, using an integer value to pass information
between functions. - Aim generic versions of split remove solve
- Change order of rule application effect on
duplicates. - Add map sort to top level pipeline before
duplicate removal.
76v9 Replace lists by sets.
- Wholesale replacement of lists by a Set library.
- map mapSet
- foldr foldSet (careful!)
- filter filterSet
- The library exposes the representation pick,
flatten. - Use with discretion further refactoring
possible. - Library needed to be augmented with
- primRecSet (a -gt Set a -gt b -gt b) -gt b -gt Set
a -gt b
77v9 Replace lists by sets (2)
- Drastic simplification no explicit worries about
- ordering (and equality), (removal of)
duplicates. - Hard to test intermediate stages type change is
all or nothing - work with dummy definitions and the type
checker. - Further opportunities why choose one rule from a
set when could apply to all elements at once?
Gets away from picking on one value (and breaking
the set interface).
78Conclusions of the case study
- Heterogeneous process some small, some large.
- Are all these stages strictly refactorings some
semantic changes always necessary too? - Importance of type checking for hand refactoring
and testing when any semantic changes. - Undo, redo, reordering the refactorings CVS.
- In this case, directional not always the case.
79Teaching and learning design
- Exciting prospect of using a refactoring tool as
an integral part of an elementary programming
course. - Learning a language learn how you could modify
the programs that you have written - appreciate the design space, and
- the features of the language.
80Conclusions
- Refactoring functional programming good fit.
- Real benefit from using available libraries
with work. - Want to use the tool in building itself.
- Much more to do than we have time for.