Title: An Introduction toThe Mozart Abstract Machine
1An Introduction toTheMozart Abstract Machine
- Per Brand and Konstantin Popov
2The Mozart System - Overview
- Mozart Compiler
- compiles Oz into an intermediate language
- written in Oz
- Mozart Virtual Machine
- executes intermediate code
- written in C
- Tcl/Tk interpreter (GUI)
- Emacs-based OPI (Emacs Lisp modus)
3Virtual Machines - why?
- Portability
- the same intermediate code runs everywhere
- of course, one has to have VM on target platform
- Easier to implement!
- The so-called semantic gap between source
language and machine language is filled by the
intermediate language - both Mozart Compiler and Mozart VM taken together
are simpler than a potential Oz to machine code
compiler!!
4Virtual Machines around...
- Historically Lisp, Smalltalk
- Low-level, stack-based Forth
- Logic programming Prolog, etc.
- Functional programming ML, Haskell, Erlang
- Modern imperative Java
5The Mozart VM - the Idea
- VM is a loop fetching and executing instructions.
- Instructions creating data structures,
conditionals, procedure calls, thread creation
etc. - Values are stored in the Store.
- like in the language itself
- VM has a program pointer and registers.
- registers refer values in the Store
6Property of Mozart Virtual Machine
- Register-based virtual machine
- Temporaries and parameters are found in registers
(so-called X registers) - Java is stack-based
- Register-based vs stack-based
- closer to machine architecture - less work for
the JIT - X registers are either machine registers or at
least in cache - instructions are longer in register-based than
stack-based machine - Multi-paradigm virtual machine
7Terminology
- X-registers
- a set of registers common to the whole virtual
machine - Y-registers
- corresponds to stack variables (local variables)
in conventional programming variables - relative the current frame
- G-registers
- closure references
- relative current procedure
8The Mozart VM - the Idea (I)
Code Area
Emulator
emulator() ... while (1) op
fetch(PC)) switch (op) case call(X)
inc(PC) continue
PC
Inst(x(0)) inst(g(1)) inst ...
Registers
Store
Atom a
9Hello World example (1)
- declare P in
- proc P
- System.show
- 'hello world'
- end
- P
- P is a procedure printing
hello world - Ps closure contains a reference to the
System module!
10Hello World example (simplified)
- lbl(7) definition(x(0) 21 g(122))
- move(g(0) x(0))
- inlineDot(x(0) show x(1))
- putConstant(hello world x(0))
- call(x(1))
- return
- endDefinition(7)
- lbl(21) unify(x(0) g(1024))
- call(g(1024))
- declare P in
- proc P
- System.show
- 'hello world'
- end
- P
compiles
11Hello World example (2)
- definition(x(0) 21 g(122))
- move(g(0) x(0))
- inlineDot(x(0) show x(1))
- putConstant(hello world x(0))
- call(x(1))
- return
- endDefinition(7)
- lbl(21) unify(x(0) g(1024))
- call(g(1024))
- Creates a procedure as a first-class value.
- x(0) is the register that will refer the
procedure - 21 is the label after the definition
- g(122) is the register refering the System
module
12Hello World example (3)
- definition(x(0) 21 g(122))
- move(g(0) x(0))
- inlineDot(x(0) show x(1))
- putConstant(hello world x(0))
- call(x(1))
- return
- endDefinition(7)
- lbl(21) unify(x(0) g(0))
- call(g(0))
- Moves the content of g(0)
- into x(0)
- g(0) contains a reference to the System
module - g(0) is local to the procedure and initialized
by definition (discussed later!)
13Hello World example (4)
- definition(x(0) 21 g(122))
- move(g(0) x(0))
- inlineDot(x(0) show x(1))
- putConstant(hello world x(0))
- call(x(1))
- return
- endDefinition(7)
- lbl(21) unify(x(0) g(0))
- call(g(0))
- Retrieves the show procedure out of the
System module - x(0) is initialised above
- x(1) becomes a reference to the Show procedure
14Hello World example (5)
- definition(x(0) 21 g(122))
- move(g(0) x(0))
- inlineDot(x(0) show x(1))
- putConstant(hello world x(0))
- call(x(1))
- return
- endDefinition(7)
- lbl(21) unify(x(0) g(0))
- call(g(0))
Creates an atom hello world in the Store and
puts a reference to it into x(0)
15Hello World example (6)
- definition(x(0) 21 g(122))
- move(g(0) x(0))
- inlineDot(x(0) show x(1))
- putConstant(hello world x(0))
- call(x(1))
- return
- endDefinition(7)
- lbl(21) unify(x(0) g(0))
- call(g(0))
Now, x(1) refers the Show procedure and x(0)
refers hello world. Show accesses the
argument as x(0)!
16Hello World example (7)
- definition(x(0) 21 g(122))
- move(g(0) x(0))
- inlineDot(x(0) show x(1))
- putConstant(hello world x(0))
- call(x(1))
- return
- endDefinition(7)
- lbl(21) unify(x(0) g(0))
- call(g(0))
- Returns control to the place
- just after call(g(0))
- endDefinition is not used for execution per
se
17Hello World example (8)
- definition(x(0) 21 g(122))
- move(g(0) x(0))
- inlineDot(x(0) show x(1))
- putConstant(hello world x(0))
- call(x(1))
- return
- endDefinition(7)
- lbl(21) unify(x(0) g(0))
- call(g(0))
- continue...
Execution proceeds futher...
18Oz Data Types in VM
- A (partial) value in the VM is a graph such that
- nodes of primitive values (atoms, integers etc.)
have no outgoing arcs - nodes of compound values (e.g. records) do have
outgoing arcs we call them references - variable nodes can be bound, after which they
become transparent references
19Primitive Values
- Atoms - objects with strings inside
- Integers - objects with integers inside
- Boolean - objects with 0-1 values inside
- Conveniently, boxes are real C objects with
operations relevant to their types.
20Records
- A record in VM is an object that refers
- an atom (name) which is the records label
- a sorted list of feature names
- record subtrees (stored in an array)
- For the sake of efficiency, records refer also
hash tables that map feature names to arrays
indexes
21Records (II)
R label(f1 a f2 1)
Hash Table
R
label
a
1
f1 f2
22Records (III)
R label(f1 a f2 R)
Hash Table
R
label
a
f1 f2
23Cells
- A Cell is just a box with a reference to a value.
C Cell.new unit
C
unit
24Abstractions
- (Remember) Oz Procedures can refer values in
their lexical scope - they are closures - environment of a procedure is an array of
references (g-registers)
Code Area
declare EnvVar Proc in proc Proc X EnvVar
X end
inst() inst() return ...
Proc
PC
EnvVar
25Representation of Types of Nodes
- Types of nodes in the store are represented by
references which are typed. We call them tagged
references (3 or 4 bits).
Emulator
registers
int
1
list
list
int
2
list
26Representation of Types of Nodes-2
- Sometimes there needs to be a combination of
tagged reference or pointer and - Tagged object (each object knows its size)
Emulator
registers
obj
ext
27Variables
- A variable is an object such that
- Unbound variable has no reference in it. Thus, it
looks like a primitive value - Unbound variable is recognised by the VM as such
- Bound variable object refers another value
- Bound variable object is transparent for
operations on values, I.e. becomes a reference - The VM can step through adjacent references. This
is called dereferencing
28Variables
29Compiling Data Structures
- Values from a program text need to be constructed
in the Store. - Primitive values are constructed with putInt,
putConstant, etc.
XatomEx
putConstant(atomEx x(2))
30Compiling Records
- Records are constructed in the top-down way,
similar to the Prologs WAM one
getRectord(rec f1 f2 x(2)) unifyVariable(x(1)) u
nifyNumber(2) getRecord(tup 1 x(1)) unifyLiteral(a
)
R rec(f1tup(a) f22)
31Compiling Records (2)
R rec(f1tup(a) f22)
Creates a record node with subtrees which
are unbound variables
putRectord(rec f1 f2 x(2)) unifyVariable(x(1)) u
nifyNumber(2) getRecord(tup 1 x(1)) unifyLiteral(a
)
32Compiling Records (3)
R rec(f1tup(a) f22)
putRectord(rec f1 f2 x(2)) unifyVariable(x(1)) u
nifyNumber(2) getRecord(tup 1 x(1)) unifyLiteral(a
)
Unifies the first subtree (under f1) with a
new variable in x(1)
33Compiling Records (4)
R rec(f1tup(a) f22)
putRectord(rec f1 f2 x(2)) unifyVariable(x(1)) u
nifyNumber(2) getRecord(tup 1 x(1)) unifyLiteral(a
)
Unifies the second subtree (under f2) with
integer 2
34Compiling Records (5)
R rec(f1tup(a) f22)
putRectord(rec f1 f2 x(2)) unifyVariable(x(1)) u
nifyNumber(2) getRecord(tup 1 x(1)) unifyLiteral(a
)
Unifies x(1) with a new tuple
35Compiling Records (6)
R rec(f1tup(a) f22)
putRectord(rec f1 f2 x(2)) unifyVariable(x(1)) u
nifyNumber(2) putRecord(t 1 x(1)) unifyLiteral(a)
Unifies the first (and sole) subtree of tup()
with a
36Compiling Abstractions
- One specifies registers that are to be saved in
the closure
lbl(7) definition(x(0) 21 g(122)) ...
return endDefinition(7) lbl(21) ...
declare EnvVar Proc in proc Proc X EnvVar
X end
37Conditionals
- Check condition(s) and proceed in one of two
branches. - there is branch ltlabelgt instruction
- very similar to C compiled for any RISC
architecture!
38Conditionals (II)
x(1) contains X x(2) contains
Show testNumber(x(1) 1 22) putConstant(x(0)
ok) call(x(2)) ok (x(0)) passed branch
31 skip else clause lbl(22)
putConstant(x(0) no) call(x(2)) no
(x(0)) passed lbl(31) ...
declare X in if X 1 then Show ok else Show
no end
39Procedure Application
- Arguments are passed in X registers
- there is one single set of X registers
- Return point is saved in a task on the task stack
- task stack is called so because it serves yet
other purposes (e.g. exception handling) - Procedure finishes with return
- pops the task from the stack
40Procedure Application (II)
inst() ... return ...
PC
call() inst() ...
call() inst() ...
Task Stack
41Local Variables (II)
PC
move y(0) x(1) ... return ...
Y registers
call() ... move y(1) x(1) ...
Task Stack
42Local Variables
- Local variables are kept in Y registers
- associated with tasks only the topmost set is
accessible for manipulation - explicitly allocated and deallocated through
allocate ltNgt and deallocate instructions - Y registers are accessible through move
ltreggtltreggt instructions
43Tail-call optimization
- Task frame is only needed when a procedure
contains either - 2 or more call instructions
- 1 call instruction but other instructions follow
- Otherwise no frame allocated
- Example Partition (tail-recursive)
44Accessing Closure Variables
- A pointer to the abstractions environment array
is known to the VM as G registers - set by call ltreggt instruction
- saved in stack when a nested procedure is called
- accessible by move ltreggtltreggt instructions
- Thus, a task in the task stack is a triple
ltPCret,Yregs,Gregsgt
45Memory Management (overview)
- Values in the store can become garbage
- e.g. when an array of Y registers is deallocated
- Garbage collection reclaims garbage it traces
all alive data and frees unused space - Mozart (so far) exploits a stop-and-copy
collector alive data is copied into a new area,
and the old are is freed (including garbage!) - Nodes reachable through registers and stack in
the VM are considered alive (i.e. could be
accessed in the future)
46Threads
- Threads are created by means of the
thread ltlblgt instruction
thread E end ...
thread(33)
CE code for E lbl(33)
- E is executed concurrently in a new thread
47Threads (II)
- A thread consists essentially out of the task
stack - A thread can be runnable, running, blocked or
terminated. - Blocked can not advance because of lack of
information in the Store - VM contains a scheduler and a pool of runnable
threads - Question how to manage blocked threads?
48Threads (III)
- Blocked threads are associated with variable(s)
those missing bindings caused threads to block
Thread
nil
stack
suspension
declare X in if X 1 then end
X
Store
49Threads (IV)
- Suspensions are created by the e.g. test
instructions used for compiling conditionals - There can be many threads blocked on the same
variable! - Binding a variable involves scheduling all
blocked threads for execution - suspensions are deallocated
- threads are entered into the threads pool
50Advanced Issues - Data Types
- Ports are objects with the send method that
just adds elements to the ports stream - no additional synchronisation primitive(s) are
needed - Objects have highly-optimised built-in
implementation - dedicated (C) abstraction
- specialised VM instructions for method
application, etc.
51Advanced Issues - Exceptions
- Remember that handling an exception means that
- remaining computation in the try clause is
discarded - a specified action is executed instead
- Note that the remaining computation is
represented by a certain number of topmost tasks
in the threads stack
52Advanced Issues - Exceptions (II)
- Exception handling mechanism pushes a dedicated
task into the task stack - an exception handler - it delimits computation in try clause
- when no exception occurs, then the task is
silently discarded when reached - when an exception does occur, all tasks up to but
the exception handler are discarded and the
handlers action is executed
53Advanced Issues - Optimization
- Run-time code optimization
- Obj m(X Y Z)
- Compiled as move(? x(0)) move(? x(1)) move(?
x(2)) sendMsg(m ObjectReg 3) - During runtime changed to direct access to method
in method table (from Smalltalk)
54Advanced Issues - Optimization -2
- Threaded code
- GNU compiler (not standard C)
- Instruction is actually address of case statement
for that specific instruction - 30 speedup
- increases code size
55Advanced Issues - Optimization -3
- Emulation is interpretation at very low level
- It is believed (based on experience with other
VMs) that native code compilation would increase
performance by 2-3 times on a RISC - JIT or native code compilation of a stack-based
VM (e.g. Java) gives more - Stack-based virtual machines are slower
- Part of improvement is stack to register
transformations - Dynamic typed or even statically typed with data
flow language systems are slower than static
typed languages
56Topics NOT Mentioned At All
- Further nearly conservative (in spirit of)
extensions of the VM allow for - constraint solving facilities, including the Oz
search combinator - distributed programming extension
- covered in next lecture