Title: A Cluster of Languages for Mathematical Computing
1A Cluster of Languages for Mathematical Computing
- Stephen M. Watt
- Department of Computer ScienceWestern University
London Ontario, Canada
DIKU University of Copenhagen 7 September 2012
2Moving Windows Around
- Add a border
- Add a scroll bar
- Respond to a button.
- Derp, derp,
- We have harderproblems now.
3Declaration of Prejudices
- Key problem How to cascade efficient,
effective abstractions. -
-
4Mathematics as a Programming Language Canary
5Why?
- Complex problems with many parts
- Complex interactions among the parts
- Many different levels of abstraction
- Precise definition
- Can tell if an answer is right or wrong
6Examples
- Garbage collection
- Lisp ? underground ? Java etc
- Algebraic expressions
- Fortran
- Big integer
- Crypto
- Generics
- ? Java, C,
7Computer Algebra
- Solve problems in terms of symbolic parameters,
rather than numerically. - Having the computer figure out the
formulasrather than using formulas given by
humans. - Algorithms computational mathematics
- Software mathematical computation
8Computer Algebra
- Start with symbols and
compute with symbols gt - Exact results
- Hopefully, insightful results
9Finding an Answer
- One day an individual went to the horse races.
Instead of counting the number of humans and
horses, she counted 74 heads and 196 legs. - How many humans and horses were there?
- humans horses 74 humans
2 horses 4 196
10Finding an Answer
- One day an individual went to the horse races.
Instead of counting the number of humans and
horses, she counted 74 heads and 196 legs. - How many humans and horses were there?
- humans horses 74 humans
2 horses 4 196 - horses 24 humans 50
11Finding an Answer
- One day an individual went to the horse races.
Instead of counting the number of humans and
horses, she counted H heads and L legs. - How many humans and horses were there?
- humans horses H humans 2
horses 4 L - horses ?H L/2 humans 2 H ? L/2
12Computer Algebra
- A couple of research problems of personal
interest - Symbolic-numeric algorithms
- Symbolic exponents
13Approximate Polynomials
14Symbolic Exponents
15Examples
- Maple
- Axiom
- Aldor
- MathML
- InkML
- Warning 3x too much stuff here.We will skip to
what the audience wants.
16Language 1 Maple
- Waterloo 1980 on
- Geddes Gonnet initiators.
- University, then company. Collaboration.
- Dynamically typed, interpreted language for
scripting computer algebra programs.
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24An Example (small)
25(No Transcript)
26(No Transcript)
27Maple
- Compiled kernel, interpreted library
- What was compiled was hand-chosen
- Support many students on shared 1980s hw
- Easy to lay down code, quick library growth
- Language not very structured, so limitations
- Commercially viable project
- Company focus education and CAE
28Example 2 Axiom
- 1984 moved from Waterloo to IBM Research
- Scratchpad in-house research project
- Jenks and Trager initiators.
- 1991 released as commercial product by NAG
29Axiom
- Main idea code re-use through abstraction
- Generic algorithms based on structures of modern
algebra (groups, rings, algebras, fields). - The language is the thing
- Compiled programming language for writing
libraries in the large - Syntactically similar, dynamically typed
interpreted language for scripting.
30Type Inference in Interpreter
31More Complicated Types
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36Axiom
- Great concept for building well-structured and
flexible libraries. - Not enough dogfooding.
- Top-level tried to hide types from user, but was
not sufficiently successful at doing that. - Powerful and flexible, but too complex for most
users. - Now open source.
37Example 3 Aldor
- Re-design of Axiom language 1984 on.
- Initiator Watt.
- The language is the thing, writ large
- Efficiency, elegance, take no prisoners
- Nothing special about built-in types
- Dependent types everywhere
- Interoperability with C and Lisp
38Aldor and Its Type System
- Types and functions are values
- May be created dynamically
- Provide representations of mathematical sets and
functions - The type system has two levels
- Each value belongs to a unique type, its domain,
known statically. - This is an abstract data type that gives the
representation. - The domains are values with domain Domain.
- Each value may belong to any number of subtypes
of its domain. - Subtypes of Domain are called categories.
- Categories
- specify what exports (operations, constants) a
domain provides. - fill the role of OO interfaces or abstract base
classes.
39Why Two Levels?
- OO inheritance pb with multi-argument fns
- class SG (SG, SG) -gt SG DoubleFloat
extends SG ...Permutation extends SG ...x, y ?
DoubleFloat ? SGp, q ? Permutation ? SG - x y ?p q ?
- p y ? ??? Bad, Bad, Bad
40Why Two Levels?
- OO inheritance pb with multi-argument fns
- SG ... (, ) -gt DoubleFloat SG
...Permutation SG ...x, y ? DoubleFloat ?
SGp, q ? Permutation ? SG - x y ?p q ?
- p y ?
41Parametric Polymorphism
- PP is via category- and domain-producing
functions. - -- A function returning an integer.
- factorial(n Integer) Integer if n 0
then 1 else nfactorial(n-1) - -- Functions returning a category and a domain.
- Module(R Ring) Category Ring with (R,
) -gt - Complex(R Ring) Module(R) with
- complex (,)-gtR real -gtR imag -gtR
conj -gt ... - add
- Rep Record(real R, imag R) 0
1 (x ) (y ) ...
42Dependent Types
- Give dynamic typing, e.g.f (n Integer, R
Ring, m IntegerMod(n)) -gt SqMatrix(n, R) - Recover OO through dependent productsprodl
List Record(S Semigroup, s S)
DoubleFloat, x, Permutation,
p, DoubleFloat, y - With categories, guarantee required operations
available -
- f(R Ring)(a R, b R) R ab ba
-
43Multi-sorted Algebras
- Category signature as a dependent product type.
- ArithmeticModel Category with
- Nat IntegralDomain
- Rat Field
- / (Nat, Nat) -gt Rat
44Aldor and Its Type System
- Type producing expressions may be
conditionalUnivariatePolynomial(R Ring)
Module(R) with - coeff (, Integer) -gt R
- monomial (R, Integer) -gt
- if R has Field then EuclideanDomain
- ...
- add
- ...
-
- Post facto extensions allow domains to belong to
new categories after they have been initially
defined.
45Without Post Facto Extension forStructuring
Libraries
- DirectProduct(n Integer, S Set) Set with
- component (Integer, ) -gt S
- new Tuple S -gt
- if S has Semigroup then Semigroup
- if S has Monoid then Monoid
- if S has Group then Group
- ...
- if S has Ring then Join(Ring, Module(S))
- if S has Field then Join(Ring,
VectorField(S)) - ...
- if S has DifferentialRing then
DifferentialRing - if S has Ordered then Ordered
- ...
- add ...
46Post Facto Extension forStructuring Libraries
- DirectProduct(n Integer, S Set) Set with
- component (Integer, ) -gt S
- new Tuple S -gt
- add ...
- extend DirectProduct(n Integer, S Semigroup)
Semigroup ... - extend DirectProduct(n Integer, S Monoid)
Monoid ... - extend DirectProduct(n Integer, S Group) Group
... - ...
- extend DirectProduct(n Integer, S Ring)
Join(Ring, Module(S)) ... - extend DirectProduct(n Integer, S Field)
Join(Ring, VectorField(S)) ... - ...
- extend DirectProduct(n Integer, S Field)
Join(Ring, VectorField(S)) ... - extend DirectProduct(n Integer, S
DifferentialRing) DifferentialRing ... - extend DirectProduct(n Integer, S Ordered)
Ordered ... - ...
- Normally these extensions would all be in
separate files.
47Higher Order Operations
- E.g. Reorganizing constructions
- Polynomial(x) Matrix(n) Complex R Complex
Matrix(n) Polynomial(x) R - Slightly simpler example
- List Array String R String Array List R
48Higher Order Operations
- Ag gt (S BasicType) -gt LinearAggregate S
- swap(XAg, YAg)(SBasicType)(xX Y S)Y X S
s for s in y for y in x - al Array List Integer array(list(ij-1 for i
in 1..3) for j in 1..3) - la List Array Integer swap(Array,
List)(Integer)(al)
49Phew!
50Using Genericity
- LinearOrdinaryDifferentialOperator(
- A DifferentialRing,
- M LeftModule(A) with differentiate -gt
- ) MonogenicLinearOperator(A) with
- D
- apply (, M) -gt M
- ...
- if A has Field then
- leftDivide (, ) -gt (quotient ,
remainder ) - rightDivide(, ) -gt (quotient ,
remainder ) - // rgcd, lgcd
-
- ...
51Using Genericity
- LinearOrdinaryDifferentialOperator(
- A DifferentialRing,
- M LeftModule(A) with differentiate -gt
- ) ...
- SUP(A) add
- ...
- if A has Field then
- Op OppositeOperator(, A)
- DOdiv NonCommutativeOperatorDivisio
n(, A) - OPdiv NonCommutativeOperatorDivisio
n(Op,A) - leftDivide (a,b) leftDivide(a,
b)DOdiv - rightDivide(a,b) leftDivide(a,
b)OPdiv -
- ...
-
52Design Principles I
- No compromises on flexibility
- No compromises on efficiency
- Use optimization to bridge the gap.
- Compilation. Separate compilation.
- Generated intermediate code is platform
independent, even though word-sizes, etc, vary. - Libraries can be distributed, if desired, as
binary only. - Be a good citizen in a multi-language framework.
- Call and be called by C/C/Fortran/Lisp/Maple
- Functional arguments
- Cooperating memory management
53Design Principles II
- Language-defined types should have no privilege
whatsoever over application-defined types. - Syntax, semantics (e.g. in type exprs),
optimization (e.g. constant folding) - Language semantics should be independent of type.
- E.g. named constants overloaded, not functions
- Combining libraries should be easy, O(n), not
O(n2). - Should be able to extend existing things with new
concepts without touching old files or
recompiling. - Safety through optimization removing run-time
checks, not by leaving off the checks in the
first place.
54The Compiler as an Artefact
- Written primarily in C (C too immature in 1990)
- 1550 files, 295 K loc C 65 K loc Aldor
- Intermediate code (FOAM)
- Primitive types booleans, bytes, chars, numeric,
arrays, closures - Primitive operations data access, control, data
operations - Runtime system
- Memory management
- Big integers
- Stack unwinding
- Export lookup from domains
- Dynamic linking
- Written in C and Aldor
55Example of Optimization
- From the domain Segment(E OrderedAbelianMonoid)g
enerator(segSegment E)Generator E generate - (a, b) (low seg, hi seg)
- while a lt b repeat yield a a a 1
-
- From the domain List(S Set)
- generator(l List S) Generator S generate
- while not null? l repeat yield first l l
rest l -
- Client code
- client()
- ar array(...) li list(...)
- s 0
- for i in 1..ar for e in l repeat s s
ar.i e - stdout ltlt s
56How Generators Work
- generator(segSegment Int)Generator Int
generate - a lo seg
- b hi seg
- while a lt b repeat yield a a a 1
-
- client()
- ar array(...)
- s 0
- for i in 1..ar repeat s s a.i
- stdout ltlt s
57Example of Optimization (again)
- From the domain Segment(E OrderedAbelianMonoid)g
enerator(segSegment E)Generator E generate - (a, b) (low seg, hi seg)
- while a lt b repeat yield a a a 1
-
- From the domain List(S Set)
- generator(l List S) Generator S generate
- while not null? l repeat yield first l l
rest l -
- Client code
- client()
- ar array(...) li list(...)
- s 0 -- NOTE PARALLEL TRAVERSAL.
- for i in 1..ar for e in l repeat s s
ar.i e - stdout ltlt s
58Inlined
B0 ar array(...) l list(...)
segment 1..ar lab1 B2 l2
l lab2 B9 s 0 goto
B1 B1 goto _at_lab1 B2 a segment.lo b
segment.hi goto B3 B3 if a gt b then
goto B6 else goto B4 B4 lab1 B5 val1
a goto B7 B5 a a 1 goto
B3 B6 lab1 B7 goto B7 B7 if lab1
B7 then goto B16 else goto B8 B8 i
val1 goto _at_lab2 B9 goto B10 B10 if
null? l2 then goto B13 else goto B11 B11 lab2
B12 val2 first l2 goto B14 B12
l2 rest l2 goto B10 B13 lab2 B14
goto B14 B14 if lab2 B14 then goto B16 else
goto B15 B15 e val2 s s ar.i e
goto B1 B16 stdout ltlt s
59Clone Blocks for 1st Iterator
60Dataflow
- lab1 B2, lab1 B5, lab1 B7
61Resolution of 1st Iterator
62Clone Blocks for 2nd Iterator
63Resolution of 2nd Iterator
client() ar array(...) l
list(...) l2 l s 0 a
1 b ar if a gt b then goto
L2 L1 if null? l2 then goto L2 e first
l2 s s ar.a e a a 1
if a gt b then goto L2 l2 rest l2
goto L1 L2 stdout ltlt s
64Aldor vs C (non-floating pt)
65Aldor vs C (floating point)
66Follow-on Research Projects
- Generic library inter-operability
- Localized garbage collection
- Dynamic abstract data types
- Performance analysis of generics
- Etc, etc
67Lessons Learned
- It is possible to be elegant, abstract and
high-levelwithout sacrificing significant
efficiency. - Well-known optimization techniques can be
effectively adapted to the symbolic setting. - Optimization of generated C code is not enough.
- Procedural integration, dataflow analysis,
subexpression elimination and constant folding
are the primary wins. - Compile-time memory optimization, including data
structure elimination, is important. - Removes boxing/unboxing, closure creation,
dynamic allocation of local objects, etc. Can
move hot fields into registers.
68Aldor Lessons
- Language design 20 years old.
- In the mean time, many of the ideas now
mainstream. - Many still are not.
- Mathematics is a valuable canary in the coal
mine of general purpose software. - The general world lags in recognizing needs.
- It has to be free.
- Free1 is the standard price.
- Free2 is required for engagement.
69(No Transcript)
70Example 4 MathML
- First XML application, ever.
- Language for exchange of mathematical data.
- Initially
71Example 4 MathML
72MathML
- OpenMath effort initiated 1993 for data exchange.
- Unfulfilled ltmathgt element in HTML 3.2 Jan 1997.
- Initial, unchartered Math WG defining microsyntax
for ltmathgt. - Internecine rivalry between syntax and semantics
camps coming from TeX, Mathematica and SGML.
73MathML
- Convened HTML-native math group to form unified
proposal. - First ever XML application.
- XML proposed recommendation December 1997.
- MathML proposed recommendation February 1998.
- Supported in major browsers, computer algebra
systems, incorporated in HTML 5.
74Example 5 InkML
- Ink Messaging
- Annotation
- Archival
75Pen-Based Math
- Input for CAS and document processing.
- 2D editing.
- Computer-mediated collaboration.
76Pen-Based Math
- Does not require learning a special language
-
\sum_i0r g_r-i Xi sum(gr-iXi, i
0..r)
77Pen-Based Math
- Different than natural language recognition
- 2-D layout is a combination of writing and
drawing. - No fixed dictionary.
- Many similar few-stroke characters.
- Well segmented.
- Highly ambiguous
78Digital Ink Formats
- Collected by surface digitizer or camera
- Sequence of (x,y) points sampled at some known
frequency - Possibly other info (angles, pressure, etc)
- Grouping into traces, letters, words labelling
79(No Transcript)
80InkML Concepts
- Traces, trace groups
- Device information sampling rate, resolution,
etc. - Pre-defined and application defined channels
- Trace formats, coordinate transformations
- Streaming and archival
- Annotation text and XML
81InkML Evolution
- Started as low-level language for traces and
hardware description. Explicitly disavowed
semantics. - Wanted base language sufficiently rich to support
full range of digital ink applications. Semantic
grouping added, annotation, etc. - W3C Standard
- Built in to Microsoft Office 2010
82Various Language Projects
- Reflex
- Alma
- Java/Aldor/C interop
- Abstract Objects
- Local GC
- WWW GC
83Research Symbol Recognition
- Main idea Represent coordinate curves as
truncated orthogonal series. - Advantages
- Compact few coefficients needed
- Geometric the truncation order is a property
of the character set gives a natural metric on
the space of characters - Algebraic properties of curves can be computed
algebraically (instead of numerically using
heuristic parameters) - Device independent resolution of the device is
not important
84Inner Product and Basis Functions
- Choose a functional inner product, e.g.
- lt f, ggt ? f(t) g(t) w(t) dt
- This determines an orthonormal basis in the
subspace of polynomials of degree d.Determine
using GS on 1, t, t2, t3, .... - Can then approximate functions in subspaces
-
a, b
85Like Symbols form Clouds
86Problems
- Want fast response how to work while trace is
being captured. - Low RMS does not mean similar shape.
87Pb 1. On-Line Ink
- The main problem In handwriting recognition,
the human and the computer take turns thinking
and sitting idle. - We askCan the computer do useful work while the
user is writing and thereby get the answer faster
after the user stops writing? - We showThe answer is Yes!
88On-Line Series Coefficients
- If we choose the right basis functions, then the
series coefficients can be computed on
line.GolubitskySMW CASCON 2008, ICFHR 2008 - The series coefficients are linear combinations
of the moments, which can be computed by
numerical integration as the points are received. - This is the Hausdorff moment problem (1921) ,
shown to be unstable by Talenti (1987). - It is just fine, however, for the orders we need.
89Pb 2. Shape vs Variation
- The corners are not in the right places.
- Work in a jet space to force coords derivatives
close. - Use a Legendre-Sobolev inner product
- 1st jet space gt set µi 0 for i gt 1.Choose µ1
experimentally to maximize reco rate.Can be also
done on-line. - Golubitsky SMW 2008, 2009
90Distance Between Curves
- Approximate the variation between curvesby some
fn of distances between points. - May be coordinate curvesor curves in a jet
space. - Sequence alignment
- Interpolation (resampling)
- Why not just calculate the area?
- This is very fast in ortho series representation.
91Distance Between Curves
92Comparison of Candidate to Models
- Use Euclidean distance in the coefficient space.
- Just as accurate as elastic matching.
- Much less expensive.
- Linear in d, the degree of the approximation.lt 3
d machine instructions (30ns) vs several
thousand! - Can trace through SVM-induced cells
incrementally. - Normed space for characters gives other
advantages.
93Distance-Based Classification
94Distance-Based Classification
95Geometry
- Linear homotopies within a class
C (1? t) A t B
- Can compute distance of a sample to this line
- Convex hull of a set of models
- SVM separating planes
96Distance-Based Classification
97Distance-Based Classification
98Error Rates as Fn of Distance
- SVM Convex Hull
- Error rate as fn of distance gives confidence
measure for classifiers MKM Golubitsky SMW
99Recognition Summary
- Database of samples gt set of LS points
- Character to recognize gt
- Integrate moments as being written
- Lin. trans. to obtain one point in LS space
- Classify by distance to convex hull of k-NN.
- InkML allows natural representation of annotated
database and real-time input.
100Overall Conclusions
- Mathematical problems provide excellent
challenges for language design. - Rich, complex, hard
- Well-defined
- Performance matters a lot!
- Dont be put off by the loud, confident
proclamations of mass-market language designers.