Title: OODL Runtime Optimizations
1OODL Runtime Optimizations
- Jonathan Bachrach
- MIT AI Lab
- Feb 2001
2Runtime Techniques
- Assume can only write system code turbochargers
- No sophisticated compiler available
- Can only minimally perturb user code
3Q What are the Biggest Inefficiencies?
- Imagine trying to get Proto to run faster
4Hint Most Popular Operations
5Running Example
- (dg ((x ltnumgt) (y ltnumgt) gt ltnumgt))
- (dm ((x ltintgt) (y ltintgt) gt ltintgt)
- (ib (i (iu x) (iu y)))
- (dm ((x ltflogt) (y ltflogt) gt ltflogt)
- (fb (f (fu x) (fu y)))
- (dm x2 ((x ltnumgt) gt ltnumgt)
- ( x x))
- (dm x2 ((x ltintgt) gt ltintgt)
- ( x x))
6A What are the Biggest Inefficiencies?
- Boxing
- Method dispatch
- Type checks
- Slot access
- Object creation
- Today
7Outline
- Overview
- Inline call caches
- Table
- Decision tree
- Variations
- Open Problems
8Method Distributions
- Distribution can be measured
- At generic
- At call site
- Distribution can be
- Monomorphic
- Polymorphic
- Megamorphic
- Distribution can be
- peaked
- uniform
9Expense of Dispatch
- Problem expensive if computed naively
- Find applicable methods
- Sort applicable methods
- Call most applicable method
- Three outcomes
- One most applicable method gt ok
- No applicable methods gt not understood
error - Many applicable methods gt ambiguous error
10Mapping View of Dispatch
- Dispatch can be thought of as a mapping from
argument types to a method - (t1, t2, , tn) gt m
11Solutions
12Table-based Approach
- N-dimensional tables
- Keys are concrete classes of actual arguments
- Values are methods to call
- Must address size explosion
- Talk a bit about this later
- Nested tables
- Keys are concrete classes of actual arguments
- Values are either other tables or methods to call
13Table Example One
14Table Example Two
15Table Example Three
16Table-based Critique
- Pros
- Simple
- Amenable to profile guided reordering
- Cons
- Too many indirections
- Very big
- demand build it
- Sharing of subtables
- Only works for class types
- can use multiple tables
17Engine Node Dispatch
- Glenn Burke and myself at Harlequin, Inc. circa
1996- - Partial Dispatch Optimizing Dynamically-Dispatche
d Multimethod Calls with Compile-Time Types and
Runtime Feedback, 1998 - Shared decision tree built out of executable
engine nodes - Incrementally grows trees on demand upon miss
- Engine nodes are executed to perform some action
typically tail calling another engine node
eventually tail calling chosen method - Appropriate engine nodes can be utilized to
handle monomorphic, polymorphic, and megamorphic
discrimination cases corresponding to single,
linear, and table lookup
18Engine Node Dispatch Picture
Define method \ (x ltigt, y ltigt)
end Define method \ (x ltfgt, y ltfgt)
end Seen (ltigt, ltigt) and (ltfgt, ltfgt) as inputs.
19Engine Dispatch Critique
- Pros
- Portable
- Introspectable
- Code Shareable
- Cons
- Data and Code Indirections
- Sharing overhead
- Hard to inline
- Less partial eval opps
20Lookup DAG
- Input is argument values
- Output is method or error
- Lookup DAG is a decision tree with identical
subtrees shared to save space - Each interior node has a set of outgoing
class-labeled edges and is labeled with an
expression - Each leaf node is labeled with a method which is
either user specified, not-understood, or
ambiguous.
21Lookup DAG Picture
- From Chambers and Chen OOPSLA-99
22Lookup DAG Evaluation
- Formals start bound to actuals
- Evaluation starts from root
- To evaluate an interior node
- evaluate its expression yielding v and
- then search its edges for unique edge e whose
label is the class of the result v and then
edge's target node is evaluated recursively - To evaluate a leaf node
- return its method
23Lookup DAG Evaluation Picture
- From Chambers and Chen OOPSLA-99
24Lookup DAG Construction
function BuildLookupDag (DF canonical dispatch
function) lookup DAG create empty lookup DAG
G create empty table Memo cs set of Case
Cases(DF) G.root buildSubDag(cs, Exprs(cs))
return G function buildSubDag (cs set of Case,
es set of Expr) set of Case n node if
(cs, es)-gtn in Memo then return n if empty?(es)
then n create leaf node in G n.method
computeTarget(cs) else n create
interior node in G exprExpr pickExpr(es,
cs) n.expr expr for each class in
StaticClasses(expr) do cs' set of Case
targetCases(cs, expr, class) es' set of Expr
(es - expr) Exprs(cs') n' node
buildSubDag(cs', es') e edge
create edge from n to n' in G e.class
class end for add (cs, es)-gtn to Memo
return n function computeTarget (cs set of
Case) Method methods set of Method
minlt(Methods(case)) if methods 0 then
return m-not-understood if methods gt 1 then
return m-ambiguous return single element m of
methods
25Single Dispatch Binary Search Tree
- Label classes with integers using inorder walk
with goal to get subclasses to form a contiguous
range - Implement Class gt Target Map as binary search
tree balancing execution frequency information
26Class Numbering
27Binary Search Tree Picture
- From Chambers and Chen OOPSLA-99
28Critique of Decision Tree
- Pros
- Efficient to construct and execute
- Can incorporate profile information to bias
execution - Amenable to on demand construction
- Amenable to partial evaluation and method
inlining - Can easily incorporate static class information
- Amenable to inlining into call-sites
- Permits arbitrary predicates
- Mixes linear, binary, and array lookups
- Fast on modern CPUs
- Cons
- Requires code gen / compiler to produce best ones
29Inline Call Caches
- Assumption
- method distribution is usually peaked and
call-site specific - Each call-site has its own cache
- Use call instruction as cache
- Calls last taken method
- Method prologue checks for correct arguments
- Calls slow lookup on miss which also patches call
instruction - Deutsch and Schiffman, 1984
30Inline Caching Example One
31Inline Caching Two
32Inline Caching Three
33Inline Caching Critique
- Pros
- Fast dispatch sequence for hit
- Usually high hit rate (90-95 for Smalltalk)
- Cons
- Uses self-modifying code
- Slow for misses
- Depends on method distribution spike
- Might be less beneficial for multimethods
34Polymorphic Inline Caching
- Handles polymorphically peaked distribution
- Generate call-site specific dispatch stub
- Holzle et al., 1991
35Polymorphic Inline CachingExample One
36Polymorphic Inline CachingExample Two
37Polymorphic Inline CachingExample Three
38Polymorphic Inline Cache Critique
- Pros
- Faster for multiple peaked distributions
- Cons
- Slow for uniform distribution
- Requires runtime code generation
- Doesnt scale quite as well for multimethods and
predicate types
39Other Multimethod Approaches
- Hash table indexed by N keys,
- Kiczales and Rodriguez 1989
- Compressed N1 dimensional dispatch table
- Amiel et al. 1994
- Pang et al. 1999
40Variations
- Inline method bodies into leaves of decision tree
- Reorder decision tree based on method
distributions - Fold slot access into dispatch
41Open Problems
- Feed static information into dynamic dispatch
- Smaller
- Faster
- More adaptive
42Readings
- Deutsch and Schiffman 1984
- Kiczales and Rodriguez 1989
- Dussud 1989
- Moon and Cypher 19??
- Amiel et al. 1994
- Pang et al. 1999
- Holzle and Ungar 1994
- Chen and Turau 1994
- Peter Lee Advanced Language Implementation 1991
43Acknowledgements
- This lecture includes some material from Craig
Chambers OOPSLA course on OO language
implementation.
44Assignment 3 Hint
- Create methods with the following construction
- (dm make-method
- ((n ltintgt) (types ltlstgt) (body ltlstgt) gt
ltmetgt) - (select n
- ((0) (fun () ...))
- ((1) (fun ((a0 (elt types 0))) ...))
- ((2) (fun ((a0 (elt types 0)) (a1 (elt types
1))) ...)) - ...)
45Assignment 4
- Write an associative dispatch cache
- Use linear lookup
- Include profile-guided reordering
- Dont need to handle singleton dispatch