OODL Runtime Optimizations - PowerPoint PPT Presentation

About This Presentation

Title:

OODL Runtime Optimizations

Description:

Keys are concrete classes of actual arguments. Values are methods to call ... Hash table indexed by N keys, Kiczales and Rodriguez 1989 ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 46

Provided by: jonathan62

Learn more at: http://www.ai.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: OODL Runtime Optimizations

1
OODL Runtime Optimizations

Jonathan Bachrach
MIT AI Lab
Feb 2001

2
Runtime Techniques

Assume can only write system code turbochargers
No sophisticated compiler available
Can only minimally perturb user code

3
Q What are the Biggest Inefficiencies?

Imagine trying to get Proto to run faster

4
Hint Most Popular Operations
5
Running Example

(dg ((x ltnumgt) (y ltnumgt) gt ltnumgt))
(dm ((x ltintgt) (y ltintgt) gt ltintgt)
(ib (i (iu x) (iu y)))
(dm ((x ltflogt) (y ltflogt) gt ltflogt)
(fb (f (fu x) (fu y)))
(dm x2 ((x ltnumgt) gt ltnumgt)
( x x))
(dm x2 ((x ltintgt) gt ltintgt)
( x x))

6
A What are the Biggest Inefficiencies?

Boxing
Method dispatch
Type checks
Slot access
Object creation
Today

7
Outline

Overview
Inline call caches
Table
Decision tree
Variations
Open Problems

8
Method Distributions

Distribution can be measured
At generic
At call site
Distribution can be
Monomorphic
Polymorphic
Megamorphic
Distribution can be
peaked
uniform

9
Expense of Dispatch

Problem expensive if computed naively
Find applicable methods
Sort applicable methods
Call most applicable method
Three outcomes
One most applicable method gt ok
No applicable methods gt not understood
error
Many applicable methods gt ambiguous error

10
Mapping View of Dispatch

Dispatch can be thought of as a mapping from
argument types to a method
(t1, t2, , tn) gt m

11
Solutions

Caching
Fast mapping

12
Table-based Approach

N-dimensional tables
Keys are concrete classes of actual arguments
Values are methods to call
Must address size explosion
Talk a bit about this later
Nested tables
Keys are concrete classes of actual arguments
Values are either other tables or methods to call

13
Table Example One
14
Table Example Two
15
Table Example Three
16
Table-based Critique

Pros
Simple
Amenable to profile guided reordering
Cons
Too many indirections
Very big
demand build it
Sharing of subtables
Only works for class types
can use multiple tables

17
Engine Node Dispatch

Glenn Burke and myself at Harlequin, Inc. circa
1996-
Partial Dispatch Optimizing Dynamically-Dispatche
d Multimethod Calls with Compile-Time Types and
Runtime Feedback, 1998
Shared decision tree built out of executable
engine nodes
Incrementally grows trees on demand upon miss
Engine nodes are executed to perform some action
typically tail calling another engine node
eventually tail calling chosen method
Appropriate engine nodes can be utilized to
handle monomorphic, polymorphic, and megamorphic
discrimination cases corresponding to single,
linear, and table lookup

18
Engine Node Dispatch Picture
Define method \ (x ltigt, y ltigt)
end Define method \ (x ltfgt, y ltfgt)
end Seen (ltigt, ltigt) and (ltfgt, ltfgt) as inputs.
19
Engine Dispatch Critique

Pros
Portable
Introspectable
Code Shareable

Cons
Data and Code Indirections
Sharing overhead
Hard to inline
Less partial eval opps

20
Lookup DAG

Input is argument values
Output is method or error
Lookup DAG is a decision tree with identical
subtrees shared to save space
Each interior node has a set of outgoing
class-labeled edges and is labeled with an
expression
Each leaf node is labeled with a method which is
either user specified, not-understood, or
ambiguous.

21
Lookup DAG Picture

From Chambers and Chen OOPSLA-99

22
Lookup DAG Evaluation

Formals start bound to actuals
Evaluation starts from root
To evaluate an interior node
evaluate its expression yielding v and
then search its edges for unique edge e whose
label is the class of the result v and then
edge's target node is evaluated recursively
To evaluate a leaf node
return its method

23
Lookup DAG Evaluation Picture

From Chambers and Chen OOPSLA-99

24
Lookup DAG Construction
function BuildLookupDag (DF canonical dispatch
function) lookup DAG create empty lookup DAG
G create empty table Memo cs set of Case
Cases(DF) G.root buildSubDag(cs, Exprs(cs))
return G function buildSubDag (cs set of Case,
es set of Expr) set of Case n node if
(cs, es)-gtn in Memo then return n if empty?(es)
then n create leaf node in G n.method
computeTarget(cs) else n create
interior node in G exprExpr pickExpr(es,
cs) n.expr expr for each class in
StaticClasses(expr) do cs' set of Case
targetCases(cs, expr, class) es' set of Expr
(es - expr) Exprs(cs') n' node
buildSubDag(cs', es') e edge
create edge from n to n' in G e.class
class end for add (cs, es)-gtn to Memo
return n function computeTarget (cs set of
Case) Method methods set of Method
minlt(Methods(case)) if methods 0 then
return m-not-understood if methods gt 1 then
return m-ambiguous return single element m of
methods
25
Single Dispatch Binary Search Tree

Label classes with integers using inorder walk
with goal to get subclasses to form a contiguous
range
Implement Class gt Target Map as binary search
tree balancing execution frequency information

26
Class Numbering
27
Binary Search Tree Picture

From Chambers and Chen OOPSLA-99

28
Critique of Decision Tree

Pros
Efficient to construct and execute
Can incorporate profile information to bias
execution
Amenable to on demand construction
Amenable to partial evaluation and method
inlining
Can easily incorporate static class information
Amenable to inlining into call-sites
Permits arbitrary predicates
Mixes linear, binary, and array lookups
Fast on modern CPUs
Cons
Requires code gen / compiler to produce best ones

29
Inline Call Caches

Assumption
method distribution is usually peaked and
call-site specific
Each call-site has its own cache
Use call instruction as cache
Calls last taken method
Method prologue checks for correct arguments
Calls slow lookup on miss which also patches call
instruction
Deutsch and Schiffman, 1984

30
Inline Caching Example One
31
Inline Caching Two
32
Inline Caching Three
33
Inline Caching Critique

Pros
Fast dispatch sequence for hit
Usually high hit rate (90-95 for Smalltalk)
Cons
Uses self-modifying code
Slow for misses
Depends on method distribution spike
Might be less beneficial for multimethods

34
Polymorphic Inline Caching

Handles polymorphically peaked distribution
Generate call-site specific dispatch stub
Holzle et al., 1991

35
Polymorphic Inline CachingExample One
36
Polymorphic Inline CachingExample Two
37
Polymorphic Inline CachingExample Three
38
Polymorphic Inline Cache Critique