Title: Scrap your boilerplate: generic programming in Haskell
1Scrap your boilerplategeneric programming in
Haskell
- Ralf Lämmel, Vrije University
- Simon Peyton Jones, Microsoft Research
2The problem boilerplate code
Company
Dept Research
Dept Production
Manager
Manager
Dept Devt
Bill
15k
Fred
10k
Dept Manuf
Employee
Find all people in tree and increase their salary
by 10
Fred
10k
3The problem boilerplate code
- data Company C Dept
- data Dept D Name Manager SubUnit
- data SubUnit PU Employee DU Dept
- data Employee E Person Salary
- data Person P Name Address
- data Salary S Float
- type Manager Employee
- type Name String
- type Address String
incSal Float -gt Company -gt Company
4The problem boilerplate code
- incSal Float -gt Company -gt Company
- incSal k (C ds) C (map (incD k) ds)
- incD Float -gt Dept -gt Dept
- incD k (D n m us) D n (incE k m) (map (incU k)
us) - incU Float -gt SubUnit -gt SubUnit
- incU k (PU e) incE k e
- incU k (DU d) incD k d
- incE Float -gt Employee -gt Employee
- incE k (E p s) E p (incS k s)
- incS Float -gt Salary -gt Salary
- incS k (S f) S (kf)
5Boilerplate is bad
- Boilerplate is tedious to write
- Boilerplate is fragile needs to be changed when
data type changes (schema evolution) - Boilerplate obscures the key bits of code
6Getting rid of boilerplate
- Use an un-typed language, with a fixed collection
of data types - Convert to a universal type and write (untyped)
traversals over that - Use reflection to query types and traverse
child nodes
7Getting rid of boilerplate
- Generic (aka polytypic) programming define
function by induction over the (structure of the)
type of its argument - PhD required. Elegant only for totally generic
functions (read, show, equality)
generic inclttgt Float -gt t -gt t inclt1gt k Unit
Unit incltabgt k (Inl x) Inl (incltagt k
x) incltabgt k (Inr y) Inr (incltbgt k
y) incltabgt k (x, y) (incltagt k x, incltagt k y)
8Our solution
- Generic programming for the rest of us
- Typed language
- Works for arbitrary data types parameterised,
mutually recursive, nested... - No encoding to/from some other type
- Very modest language support
- Elegant application of Haskell's type classes
9Our solution
- incSal Float -gt Company -gt Company
- incSal k everywhere (mkT (incS k))
- incS Float -gt Salary -gt Salary
- incS k (S f) S (kf)
10Two ingredients
- incSal Float -gt Company -gt Company
- incSal k everywhere (mkT (incS k))
- incS Float -gt Salary -gt Salary
- incS k (S f) S (kf)
2. Apply a function to every node in the tree
1. Build the function to apply to every node,
from incS
11Type classes
member a -gt a -gt Bool member x
False member x (yys) xy True
otherwise member x ys
No! member is not truly polymorphic it does not
work for any type a, only for those on which
equality is defined.
12Type classes
member Eq a gt a -gt a -gt Bool member x
False member x (yys) xy True
otherwise member x ys
The class constraint "Eq a" says that member only
works on types that belong to class Eq.
13Type classes
class Eq a where () a -gt a -gt
Bool instance Eq Int where () i1 i2 eqInt
i1 i2 instance (Eq a) gt Eq a where ()
True () (xxs) (yys) (x y)
(xs ys) () xs ys
False member Eq a gt a -gt a -gt Bool member
x False member x (yys) xy
True otherwise member x ys
14Implementing type classes
data Eq a MkEq (a-gta-gtBool) eq (MkEq e)
e dEqInt Eq Int dEqInt MkEq eqInt dEqList
Eq a -gt Eq a dEqList (MkEq e) MkEq el
where el True el (xxs)
(yys) x e y xs el ys el xs
ys False member Eq a -gt a -gt a -gt
Bool member d x False member d x (yys)
eq d x y True otherwise member d x ys
Class witnessed by a dictionary of methods
Instance declarations create dictionaries
Overloaded functions take extra dictionary
parameter(s)
15Ingredient 1 type extension
- (mkT f) is a function that
- behaves just like f on arguments whose type is
compatible with f's, - behaves like the identity function on all other
arguments - So applying (mkT (incS k)) to all nodes in the
tree will do what we want.
16Type safe cast
cast (Typeable a, Typeable b) gt a -gt
Maybe b ghcigt (cast 'a') Maybe Char Just
'a' ghcigt (cast 'a') Maybe Bool Nothing ghcigt
(cast True) Maybe Bool Just True
17Type extension
mkT (Typeable a, Typeable b) gt (a-gta) -gt
(b-gtb) mkT f case cast f of Just g -gt
g Nothing -gt id ghcigt (mkT not)
True False ghcigt (mkT not) 'a' 'a'
18Implementing cast
An Int, perhaps
data TypeRep instance Eq TypeRep mkRep String
-gt TypeRep -gt TypeRep class Typeable a where
typeOf a -gt TypeRep instance Typeable Int
where typeOf i mkRep "Int"
Guaranteed not to evaluate its argument
19Implementing cast
class Typeable a where typeOf a -gt
TypeRep instance (Typeable a, Typeable b)
gt Typeable (a,b) where typeOf p mkRep "(,)"
ta,tb where ta typeOf (fst p) tb
typeOf (snd p)
20Implementing cast
cast (Typeable a, Typeable b) gt a -gt
Maybe b cast x r where r if typeOf x
typeOf (get r) then Just (unsafeCoerce
x) else Nothing get Maybe a -gt a
get x undefined
21Implementing cast
- In GHC
- Typeable instances are generated automatically by
the compiler for any data type - The definition of cast is in a library
- Then cast is sound
- Bottom line cast is best thought of as a
language extension, but it is an easy one to
implement. All the hard work is done by type
classes
22Two ingredients
- incSal Float -gt Company -gt Company
- incSal k everywhere (mkT (incS k))
- incS Float -gt Salary -gt Salary
- incS k (S f) S (kf)
2. Apply a function to every node in the tree
1. Build the function to apply to every node,
from incS
23Ingredient 2 traversal
- Step 1 implement one-layer traversal
- Step 2 extend one-layer traversal to recursive
traversal of the entire tree
24One-layer traversal
- class Typeable a gt Data a where
- gmapT (forall b. Data b gt b -gt b) -gt
a -gt a - instance Data Int where
- gmapT f x x
- instance (Data a,Data b) gt Data (a,b)
where - gmapT f (x,y) (f x, f y)
(gmapT f x) applies f to the IMMEDIATE CHILDREN
of x
25One-layer traversal
- class Typeable a gt Data a where
- gmapT (forall b. Data b gt b -gt b) -gt
a -gt a - instance (Data a) gt Data a where
- gmapT f
- gmapT f (xxs) f x f xs -- !!!
gmapT's argument is a polymorphic function so
gmapT has a rank-2 type
26Step 2 Now traversals are easy!
everywhere Data a gt (forall b. Data b gt b
-gt b) -gt a -gt a everywhere f x f (gmapT
(everywhere f) x)
27Many different traversals!
everywhere, everywhere' Data a gt (forall
b. Data b gt b -gt b) -gt a -gt a everywhere f x
f (gmapT (everywhere f) x) -- Bottom
up everywhere' f x gmapT (everywhere' f) (f
x) -- Top down
28More perspicuous types
everywhere Data a gt (forall b. Data b gt b
-gt b) -gt a -gt a everywhere (forall b. Data b
gt b -gt b) -gt (forall a. Data a gt a
-gt a) type GenericT forall a. Data a gt a -gt
a everywhere GenericT -gt GenericT
Aha!
29What is "really going on"?
- inc Data t gt Float -gt t -gt t
- The magic of type classes passes an extra
argument to inc that contains - The function gmapT
- The function typeOf
- A call of (mkT incS), done at every node in tree,
entails a comparison of the TypeRep returned by
the passed-in typeOf with a fixed TypeRep for
Salary this is precisely a dynamic type check
30Summary so far
- Solution consists of
- A little user-written code
- Mechanically generated instances for Typeable and
Data for each data type - A library of combinators (cast, mkT, everywhere,
etc) - Language support
- cast
- rank-2 types
- Efficiency is so-so (factor of 2-3 with no effort)
31Summary so far
- Robust to data type evolution
- Works easily for weird data types
data Rose a MkR a Rose a instance (Data a)
gt Data (Rose a) where gmapT f (MkR x rs) MkR
(f x) (f rs) data Flip a b Nil Cons a (Flip
b a) -- Etc...
32Generalisations
- With this same language support, we can do much
more - generic queries
- generic monadic operations
- generic folds
- generic zips (e.g. equality)
33Generic queries
- Add up the salaries of all the employees in the
tree
salaryBill Company -gt Float salaryBill
everything () (0 mkQ billS) billS
Salary -gt Float billS (S f) f
2. Apply the function to every node in the tree,
and combine results with ()
1. Build the function to apply to every node,
from billS
34Type extension again
mkQ (Typeable a, Typeable b) gt d -gt
(b-gtd) -gt a -gt d (d mkQ q) a case cast a
of Just b -gt q b Nothing -gt
d ghcigt (22 mkQ ord) 'a' 97 ghcigt (22 mkQ
ord) True 22
Apply 'q' if its type fits, otherwise return 'd'
ord Char -gt Int
35Traversal again
class Typeable a gt Data a where gmapT
(forall b. Data b gt b -gt b) -gt a -gt a
gmapQ forall r. (forall b. Data b gt b
-gt r) -gt a -gt r
Apply a function to all children of this node,
and collect the results in a list
36Traversal again
class Typeable a gt Data a where gmapT
(forall b. Data b gt b -gt b) -gt a -gt a
gmapQ forall r. (forall b. Data b gt b
-gt r) -gt a -gt r instance Data Int
where gmapQ f x instance (Data a,Data b)
gt Data (a,b) where gmapQ f (x,y) f
x f y
37The query traversal
everything Data a gt (r-gtr-gtr) -gt (forall
b. Data b gt b -gt r) -gt a -gt r everything k f x
foldl k (f x) (gmapQ (everything f) x)
Note that foldr vs foldl is in the traversal, not
gmapQ
38Looking for one result
- By making the result type be (Maybe r), we can
find the first (or last) satisfying value
laziness
findDept String -gt Company -gt Maybe Dept
findDept s everything orElse (Nothing
mkQ findD s) findD String -gt Dept -gt Maybe
Dept findD s d_at_(D s' _ _) if ss' then Just
d else Nothing
39Monadic transforms
class Typeable a gt Data a where gmapT
(forall b. Data b gt b -gt b) -gt a -gt a
gmapQ forall r. (forall b. Data b gt b
-gt r) -gt a -gt r gmapM Monad m
gt (forall b. Data b gt b -gt m b) -gt a
-gt m a
40Where do we stop?
- Happily, we can generalise all three gmaps into
one
data Employee E Person Salary instance Data
Employee where gfoldl k z (E p s) (z E k p)
k s
- We can define gmapT, gmapQ, gmapM in terms of
(suitably parameterised) gfoldl - The type of gfoldl hurts the brain (but the
definitions are all easy)
41Where do we stop?
class Typeable a gt Data a where gfoldl
(forall a b. Data a gt c (a -gt b) -gt a -gt c
b) -gt (forall g. g -gt c g) -gt a
-gt c a
42But we still can't do show!
- Want show Data a gt a -gt String
show Data a gt a -gt String show t ???
concat (gmapQ show t)
show the children and concatenate the results
But how to show the constructor?
43Add more to class Data
class Data a where toConstr a -gt
Constr data Constr -- abstract conString
Constr -gt String conFixity Constr -gt Fixity
- Very like typeOf Typeable a gt a -gt
TypeRepexcept only for data types, not functions
44So here is show
show Data a gt a -gt String show t conString
(toConstr t) concat (gmapQ show t)
- Simple refinements to deal with parentheses,
infix constructors etc - toConstr on a primitive type (like Int) yields a
Constr whose conString displays the value
45Further generic functions
- read Data a gt String -gt a
- toBin Data a gt a -gt BitfromBin Data a
gt Bit -gt a - testGen Data a gt RandomGen -gt a
class Data a where toConstr a -gt Constr
fromConstr Constr -gt a dataTypeOf a -gt
DataType data DataType -- Abstract stringCon
DataType -gt String -gt Maybe Constr indexCon
DataType -gt Int -gt Constr dataTypeCons
DataType -gt Constr
46Conclusions
- Simple, elegant
- Modest language extensions
- Rank-2 types
- Auto-generation of Typeable, Data instances
- Fully implemented in GHC
- Shortcomings
- Stop conditions
- Types are a bit uninformative
Paper http//research.microsoft.com/simonpj