Title: F
1F Another ML compiler for .NET
2Overview
- Why?
- F Caml.NET
- Language Choices (Basic)
- Language Choices (Interop)
- Usability (Optimization, Packaging)
3Part 1 Motivation A Taste
4Why F? Background
Is it efficient? Is it suitable for
cross-language interop? Need a compiler to find
out. SML.NET not quite suitable.
- Generics
- Type parameters for MS-IL, C etc.
- ILX
- Standardize encodings for closures and data by
adding them to MS-IL - Abstract IL
- Toolkit for manipulating IL very easily
- ILVerify
- Verifier/validator for IL (written in Caml)
Need a quality compiler to set an example.
Written in Caml, cant use this from C, or
anywhere else. Need a compiler to make this
accessible.
Couldnt integrate this into the CLR even if we
wanted to. An ML compiler would have been useful.
5Why F? Background
- Future?
- IL analysis? (security? optimization?)
- IL transformation?
- Language design? (systems programming? XML?
concurrency?) - Hence Caml.NET, now F
AbsIL useful for all of these.
6Why F?
- F Caml.NET
- Why are ML languages so great?
- Type checking
- Simple bag of constructs
- Type inference lets you hack out correct code
quickly and still maintain it - Whats often wrong?
- Traditionally hung up on performance
- Slow, strange compilers, no debuggers, no
profilers etc. - No libraries, no interop
- Why is Caml so great?
- Simple, easy-to-understand compiler
- Top-to-bottom design choices that makes
everything fit - My hope would be is to repeat this assuming a
.NET world underneath - Unfortunately it wont be quite that easy
7F Overview
- The aim
- Not for windows or ASP.NET programming
- But for primarily ML programming, with .NET in
mind - All ML code immediately accessible from .NET
- Semantics and operational behaviour of ML
code/objects must be easily understood by C
programmers
- Language
- Simple, extensible core language
- Leave room for extension
- Tools
- Fast separate compilation
- Simple compiler (lt 10K lines, excl. ILX)
- Leverage other .NET tools
- Standards
- Not interested in 100 Caml compatibility
- Just a .NET language
- Willing to change Caml if needed, but rarely
- Interop
- Not .NET Consumer
- cant easily import everything, may need to write
a little C - Not .NET Producer
- cant write or extend C-style frameworks
- But .NET Accessible
- everything that is written can be accessed from
.NET - everything in .NET can be accessed somehow
8F Caml.NET
- Core Caml is just a simple mixed
imperative/functional programming language - Functional
- e.g. Type Inference Functions as values Type
parameters Simple data makes iteration
abstraction easy - Also built-in deep equality, lists, controlled
recursion - Imperative
- Mutable data, while/for loops, arrays
- Pointer equality (i.e. allocation semantics is
visible for mutable data) - Exceptions
9F Caml.NET
let iter f arr ('a -gt unit) -gt'a array let
len length arr in for i 0 to len - 1 do
f arr.(i) done
let rec map f x match x with -gt
(ht) -gt f h map f t
type (a,b) tree Empty Node of
(a,b) nodeand (a,b) node key a
val b left (a, b) tree right
(a, b) tree height int
type (a,b) mtree Empty Node of
(a,b) mnodeand (a,b) mnode key a
mutable val b left (a, b) tree
right (a, b) tree height int
10F Caml.NET
- Polymorphic hashing, equality, comparison
val () a -gt a -gt bool val () a -gt a -gt
bool val compare a -gt a -gt int val (lt) a
-gt a -gt bool val hash a -gt int
val output_val out_channel -gt a -gt unit val
input_val in_channel -gt a
i.e. blast the object graph to disk
i.e. hash and equality automatically available
for non-cyclic Caml types
pervasives.mli val abs int -gt int val
string_of_float float -gt string...type
out_channelval open_out_bin string -gt
out_channelval output_value a -gt out_channel
-gt unit
let find r match x with A B C _
-gt x D(_,ty1,_) E(ty1) -gt combine
ty1 x
11F ltgt OCaml.NET
- No objects (better to fit with .NET)
- No labels/defaults (there must be better
solutions) - No functors (much complexity, little added value)
- Also
- No ocamllex/ocamlyacc
- No ocamlp4 (macro processor)
- No ocamldep (dependency analyzer)
12Using the compiler
- bin\fsc.exe foo.ml
- -c Compile only (produce .cno/.cni)
- -a Build a DLL
- -g Debug. Can run against same library (unless
you want to debug the library) - -O Enable cross-module optimization
- --unverifiable Faster closures, no stupid casts,
different library needed
13High fidelity, Binary compatibility Versioning
- Most ML compilers cannot create DLLs at all
- Or can only create DLLs whose interface is C or
COM - High Fidelity
- Can I access an ML DLL from an ML EXE in a
completely transparent way? - Binary Compatibility Can I change the internals
of a DLL and use it in place of existing DLLs? - OCaml No.
- F Yes. MS-IL compiled interfaces are stable
- Caveat cross-module inlining must not be used by
client DLLs (i.e. do not ship .cnx files to
clients) - Versioning Can you add functionality to a DLL
and use it in place of existing DLLs? - OCaml No.
- F Some, e.g. can add (visible) bindings, can
add (visible) types. - Even with cross-module inlining.
14Part 2 Interop, Language Design
15Language Design Choices
- Immutable Unicode strings and wchars
- Signatures are compilation unit boundaries NOT
module-value constraints - Can hide generated ML types
- Cant constrain polymorphism
- Can and should reveal arities
- Arities specified by parentheses
- Not pretty, but efficient
- Can be helpful documentation
- Gives mutually recursive modules cheaply
Value x2 is more polymorphic in the module than
the signature
type mystring type myrecdtype datatype
csdata type csdata2 val x int list val x2
int list val f1 int -gt int -gt int val f2 int
-gt (int -gt int) val f3 int -gt int -gt intval f4
int -gt (int -gt int)
type mystring MyString of string list type
myrecd a int b string type data
OtherData.datatype csdata (
CSharpProgram.data)type csdata2 CSData of (
CSharpProgram.data)let x ( int
list) let x2 ( a list) let f1 x y x
y let f2 x y x y let f3 x print hello
(fun y -gt x y) let f4 x print hello (fun y
-gt x y)
.ml
.mli
16Interop
- F from C
- C from F -- Not yet done
- MS-IL from F -- Gives baseline interop
- Also to consider
- F from F (done full fidelity)
- Any ILX language from F (not done good fidelity
possible)
17Interop (The SML.NET approach)
int
int
string option
string
Foo option
class Foo
C/.NET Types
SML Types
18Interop (The F Approach)
C/.NET Types
class Foo
Foo
int
byte
char
single
( class FSharp.list)
double ( float)
( class FSharp.tree)
a list
a tree
F Types
19F from C
- Calling code and opaque types
F module Pervasives (pervasives.mli) val abs
int -gt int val string_of_float float -gt
string...type out_channelval open_out_bin
string -gt out_channelval output_value a -gt
out_channel -gt unit
- Every ML type is a C type
- Every ML top-binding is accessible
- No signature file needed to access
C module Test (test.cs) using Pervasives class
Test static void Main() int n
Pervasives.abs(-3) string s
Pervasives.string_of_float(3.1415)
out_channel out Pervasives.open_out_bin(out)
Pervasives.output_valueltdoublegt(out,3.1415)
test.ml let n abs(-3)let s
string_of_float(3.1415) let out
open_out_bin(out) output_value(3.1415)
Curried function values take tuples
20F from C
- Accessing data (records and unions)
F records compile to classes andcan be accessed
immediately
module Il (il.mli) type assembly
assemMainModule modul assemAuxModules
modules type types type type_def CLASS
of class_def INTERFACE of interface_def
VALUETYPE of valuetype_def ...
- F datatypes are C types
- They conform to ILX standard
- Accessed via helper functions (IsCLASS,
GetCLASS) etc. - Independent of ILX representation
test.ml Printf.printf "Num types d\n"
(List.length (dest_tdefs
assem.assemMainModule.modulTypeDefs))
Here is an example
static bool isClass(Il.type_def t) return
t.IsCLASS() class Test static void Main()
Il.types types assem.assemMainModule.modu
lTypeDefs FSharp.listlttype_defgt tys
Il.dest_tdefs(types) Console.WriteLine
("Num types 0",List.lengthltIl.type_defgt(tys)
)
Here is an example of a polymorphic type
Nb. ML types not very OO. May work on this.
21F from C
F function types become System.ILX.Func1
- ILX chooses the representation
- Not yet invisible to C code
module List (list.mli) Val filter (a -gt bool)
-gt a list -gt a list
No generics, function values pass object
But there is an implicit conversion from a
delegatetype to the type used by ILX for
function values
static object isClass(object t) return
(object) ((Il.type_def) t).IsCLASS()
Console.WriteLine ("Num classes 0",
(List.length (List.filter (new
System.Func(isClass), (Il.dest_tdefs(assem
.assemMainModule.modulTypeDefs))))))
System.Func is a delegate type
22F from C
module List (list.mli) Val filter (a -gt bool)
-gt a list -gt a list
With generics
static bool isClass(Il.type_def t) return
t.IsCLASS() Console.WriteLine ("Num
classes 0", (List.length (List.filter
(new System.FuncltIl.type_def,boolgt(isClass),
(types)))))
Parameters can be inferred
23C from F
- Calling static members is easy, just quote the
.NET type name and member using the . notation - let findDLLs dir ( call a static member in
the System.IO.Directory class ) if
(Directory.Exists dir) then let files
Directory.GetFiles(dir, ".dll") in
Arr.to_list files else - Instance members are accessed using an extended
. notation. Sometimes type annotations are
needed to resolve the type used for the .
notation. These type annotations are propagated
left-to-right, outside-in. - let searchFile (patstring) file match (try
Some (Assembly.LoadFrom(file)) with _ -gt None)
with Some a -gt let modules
a.GetModules() in let pat pat.ToUpper()
in ... - Sometimes casts are needed to resolve
overloading. Currently use (cast expr type)
Without the type annotation you get a please
supply a type annotation error here
24C from F
- Can create objects (new Type() or just
Type()). - Can create delegates (new EventHandler()).
When creating delegates provide an ML function of
the right curried type - Can create value types and use .NET properties.
- Cannot mutate value types.
25MS-IL from F
- Embedded MS-IL
- Cheap-shot way of implementing primitives
- Parsed and included as part of the IL stream
- Can be inlined, optimized, even instantiated
- Also type exception representations
let () (xint) (yint) ( "add" x y int)let
sin (xfloat) ( "call default float64
mscorlibSystem.MathSin(float64)" x
float) type obj ( "class mscorlibSystem.Objec
t" ) exception Not_found ( "class
mscorlibSystem.NotFoundException" )
26Part 3Compiler, Perf etc.
27Optimizations Perf
- No optimizations
- Top level function bindings still become methods
- No inlining at all
- Data layout unchanged
- Local optimizations
- Eliminate unused bindings
- Inline a little
- Remove tuples when immediately destroyed
- Cross-module optimizations
- Same as local except propagated across
28F Compiler Architecture
Parser
Typechecker
Optimizer
ILXGEN
ILX -gt Generic IL
ilxlib.dll fslib.dll
CLR V1.1
Generic IL -gt IL
CLR V1.0
ilxlib.dll fslib.dll
29Some interesting bits
- Polymorphic comparison ? IComparable, IStructured
- Polymorphic equality ? ObjectEquals
- Polymorphic hashing ? IStructuredGetHashCode,
ObjectGetHashCode - Must generate new virtuals for each new type
- Problem Good SExpr hash algorithms dont
traverse whole term... - Problem value types get boxed
- ILX Choices
- Datatype representations (many possibilities)
- Closure representations (three choices)
30Some interesting bits
- Embedded MS-IL inlining is very useful
- Almost no primitives in compiler
- Nb. inlining generic IL takes some care...
module Pervasives (pervasives.ml) let ()
(xint) (yint) ( "add" x y int) let (-)
(xint) (yint) ( "sub" x y int) let ()
(xint) (yint) ( "mul" x y int) let (/)
(xint) (yint) ( "div" x y int) module
Array (array.ml) let length (arr 'a array)
( "ldlen" arr int) let get (arr 'a array)
(nint) ( "ldelem.any !0" type ('a) arr
n 'a) let set (arr 'a array) (nint) (x'a)
( "stelem.any !0" type ('a) arr n x)
let zero_create (nint) ( "newarr !0" type
('a) n 'a array)
31Some interesting bits
- Type based conditional pragmas
- Type based conditional pragmas give cheap way of
generating good code - Optimizer chooses right one when possible
- For library use only
let () (x 'a) (y 'a) (inbuilt_poly_equality
x y) when 'a int ( "ceq" x y bool )
when 'a sbyte ( "ceq" x y bool ) when 'a
int16 ( "ceq" x y bool ) when 'a int32
( "ceq" x y bool ) when 'a int64 (
"ceq" x y bool ) when 'a byte ( "ceq" x y
bool ) when 'a uint16 ( "ceq" x y bool
) when 'a uint32 ( "ceq" x y bool ) when
'a uint64 ( "ceq" x y bool ) when 'a
float ( "ceq" x y bool ) when 'a char
( "ceq" x y bool )
32Some interesting bits
- Mutable locals for library procedures
- Why waste weeks implementing optimizations when
you can get 80 of the effect In 1 hour? - Obvious restrictions
let rev l let mutable res in let
mutable curr l in while nonnull curr do
let ht curr in res lt- h res curr
lt- t done res
33Perf A large symbolic processing app, (large
input)
34Perf A large symbolic processing app. (small
input, no install-time compilation)
35Perf Tailcall or not
36Summary
- Essentially reached my aims
- Usable compiler for writing accessible ML
libraries - Performance not brilliant but good enough
- Accessing .NET from ML now done, by extending the
. syntax and using the simple add type
annotations until overloading is resolved - Results
- .NET compilers can be simple and (I think) useful
- Proof of .NET generics interop
- Performance testing for generics
- First ML compiler with high fidelity across DLLs,
and good versioning/binary compat. properties?