Intraprocedural Optimizations - PowerPoint PPT Presentation

About This Presentation
Title:

Intraprocedural Optimizations

Description:

Must control inlining and specialization carefully to avoid code bloat ... Class-centric specialization usually works by copying down inherited methods ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 37
Provided by: jonathan62
Learn more at: http://www.ai.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Intraprocedural Optimizations


1
Intraprocedural Optimizations
  • Jonathan Bachrach
  • MIT AI Lab

2
Outline
  • Goal eliminate abstraction overhead using static
    analysis and program transformation
  • Topics
  • Intraprocedural type inference
  • Static method selection
  • Specialization and Inlining
  • Static class prediction
  • Splitting
  • Box/unboxing
  • Common Subexpression Elimination
  • Overflow and range checks
  • Partial evaluation revisited
  • Partially based on Chambers Efficient
    Implementation of Object-oriented Programming
    Languages OOPSLA Tutorial

3
Running Example
  • (dg ((x ltnumgt) (y ltnumgt) gt ltnumgt))
  • (dm ((x ltintgt) (y ltintgt) gt ltintgt)
  • (ib (i (iu x) (iu y)))
  • (dm ((x ltflogt) (y ltflogt) gt ltflogt)
  • (fb (f (fu x) (fu y)))
  • (dm x2 ((x ltnumgt) gt ltnumgt)
  • ( x x))
  • (dm x2 ((x ltintgt) gt ltintgt)
  • ( x x))
  • Anatomy of Pure Proto Arithmetic
  • Dispatch
  • Boxing
  • Overflow checks
  • Actual instruction
  • C Arithmetic
  • Actual instruction

4
Biggest Inefficiencies
  • Method dispatch
  • Method calls
  • Boxing
  • Type checks
  • Overflow and range checks
  • Slot access
  • Object creation

5
Intraprocedural Type Inference
  • Goal determine concrete class(es) of each
    variable and expression
  • Standard data flow analysis through control graph
  • Propagate bindings b -gt class
  • Sources are literals, isa expressions, results of
    some primitives, and type declarations
  • Form unions of bindings at merge points
  • Narrow sets after typecases
  • Assumes closed world (or at least final classes)

6
Type Inference Example
  • (set x (isa lttabgt )) x in lttabgt
  • (set y (table-growth-factor x)) y in ltintgt
    ltflogt
  • (set z (if t x y)) z in lttabgt ltintgt ltflogt

7
Narrowing Type Precision
  • (if (isa? x ltintgt)
  • ( x 1)
  • ( x 37.0))
  • (if (isa? x ltintgt)
  • (let ((x ltintgt x))
  • ( x 1))
  • (let ((x !ltintgt x))
  • ( x 37.0)))

8
Static Method Selection
  • (set x (isa lttabgt )) x in lttabgt
  • (set y (table-growth-factor x)) y in ltintgt
    ltflogt
  • (print out y)
  • If only one class is statically possible then can
    perform dispatch statically
  • (set y (lttabgttable-growth-factor x))
  • If a couple classes are statically possible then
    can insert typecase
  • (sel (class-of y)
  • ((ltintgt) (ltintgtprint y))
  • ((ltflogt) (ltflogtprint y)))

9
Type Check Removal
  • Type inference can clearly be used to remove type
    checks and casts
  • (set x (isa lttabgt )) x in lttabgt
  • (if (isa? x lttabgt)
  • (go)
  • (stop))
  • gt
  • (set x (isa lttabgt )) x in lttabgt
  • (go)

10
Intraprocedural Type Inference Critique
  • Pros
  • Simple
  • Fast
  • Fewer dependents
  • Cons
  • Limited type precision
  • No result types
  • Incoming arg types
  • No slot types
  • Etc.

11
Specialization
  • Q How can we improve intraprocedural type
    inference precision?
  • A Specialization which is the cloning of methods
    with narrowed argument types
  • Improves type precision of callee by
    contextualizing body
  • (dm sqr ((x ltnumgt) (y ltnumgt)) ( x y))
  • gt
  • (dm sqr ((x ltintgt) (y ltintgt)) ( x y))
  • (dm sqr ((x ltflogt) (y ltflogt)) ( x y))
  • Must make sure super calls still mean same thing

12
Specialization of Constructors
  • Crucial to get object creation to be fast
  • Specialization can be used to build custom
    constructors
  • (def ltthingygt (isa ltanygt))
  • (slot ltthingygt thingy-x 0)
  • (slot (t ltthingygt) thingy-tracker ( (thingy-x
    t) 1))
  • (slot ltthingygt thingy-cache (fab lttabgt))
  • (df thingy-isa (x tracker cache)
  • (let ((thingy (clone ltthingygt)))
  • (unless ( x nul) (set (slot-value thingy
    thingy-x) x))
  • (set (slot-value thingy thingy-tracker)
  • (if ( tracker nul) ( (thingy-x p) 1)
    tracker))))
  • (set (slot-value thingy thingy-cache)
  • (if ( cache nul) (fab lttabgt) cache))))

13
Inlining
  • Q Can we do better?
  • A Inlining can improve specialization by
    inserting specialized body
  • Improves type precision at call-site by
    contextualizing body (includes result types)
  • (dm f ((x ltintgt) (y ltintgt)) ( (g x y) 1))
  • (dm g (x y) ( x y))
  • gt
  • (dm f ((x ltintgt) (y ltintgt)) ( ( x y) 1))

14
Synergy Method Selection Inlining
  • (df f ((x ltintgt) (y ltintgt))
  • ( x y))
  • method selection
  • (df f ((x ltintgt) (y ltintgt))
  • (ltintgt x y))
  • inlining
  • (df f ((x ltintgt) (y ltintgt))
  • (ib (i (iu x) (iu y))))

15
Pitfalls of Inlining and Specialization
  • Must control inlining and specialization
    carefully to avoid code bloat
  • Inlining can work merely using syntactic size
    trying never to increase size over original call
  • Class-centric specialization usually works by
    copying down inherited methods tightening up self
    references (harder for multimethods)
  • Can run inlining/specialization trials based on
  • Final static size
  • Performance feedback

16
Class Centric Specialization
  • (def ltpointgt (isa ltanygt))
  • (slot ltpointgt (point-x ltintgt) 0)
  • (dm point-move ((p ltpointgt) (offset ltnumgt))
  • (set (point-x p) ( (point-x p) offset)))
  • (def ltcolor-pointgt (isa ltpointgt))
  • gt
  • (dm point-move ((p ltcolor-pointgt) (offset ltnumgt))
  • (set (point-x p) ( (point-x p) offset)))

17
Static Class Prediction
  • Can improve type precision in cases where for a
    given generic a particular method is much more
    frequent
  • Insert type check testing prediction
  • Can narrow type precision along then and else
    branches
  • Especially useful in combination with inlining

18
Static Class Prediction Example
  • (df f (x)
  • (let ((y ( x 1)))
  • ( y 2)))
  • (df f (x)
  • (let ((y (if (isa? x ltintgt)
  • ( x 1)
  • ( x 1))))
  • (if (isa? y ltintgt)
  • ( y 2)
  • ( y 2)))))
  • (df f (x)
  • (let ((y (if (isa? x ltintgt)
  • (ltintgt x 1)
  • ( x 1))))
  • (if (isa? y ltintgt)
  • (ltintgt y 2)
  • ( y 2)))))

19
Synergy Class Prediction Method Selection
Inlining
  • (df f (x)
  • (let ((y (if (isa? x ltintgt)
  • ( x 1)
  • ( x 1))))
  • (if (isa? y ltintgt)
  • ( y 2)
  • ( y 2)))))
  • method selection
  • (df f (x)
  • (let ((y (if (isa? x ltintgt)
  • (ltintgt x 1)
  • ( x 1))))
  • (if (isa? y ltintgt)
  • (ltintgt y 2)
  • ( y 2)))))
  • inlining
  • (df f (x)
  • (let ((y (if (isa? x ltintgt)
  • (ib (i (iu x) 1))
  • ( x 1))))
  • (if (isa? y ltintgt)
  • (ib (i (iu y) (iu 2)))
  • ( y 2)))))

20
Splitting
  • Problem Class prediction often leads to a bunch
    of redundant type tests
  • Solution Split off whole sections of graph
    specialized to particular class on variable
  • Can split off entire loops
  • Can specialize on other dataflow information

21
Splitting Example
  • (df f (x)
  • (let ((y ( x 1)))
  • ( y 2)))
  • (df f (x)
  • (if (isa? x ltintgt)
  • (let ((y ( x 1)))
  • ( y 2))
  • (let ((y ( x 1)))
  • ( y 2))))
  • (df f (x)
  • (if (isa? x ltintgt)
  • (let ((y (ltintgt x 1)))
  • (ltintgt y 2))
  • (let ((y ( x 1)))
  • ( y 2))))

22
Splitting Downside
  • Splitting can also lead to code bloat
  • Must be intelligent about what to split
  • A priori knowledge (e.g., integers most frequent)
  • Actual performance

23
Box / Unboxing
  • (df ((x ltintgt) (y ltintgt) gt ltintgt)
  • (ib (i (iu x) (iu y))))
  • (df f ((a ltintgt) (b ltintgt) gt ltintgt)
  • ( ( a b) a))
  • inlining
  • (df f ((a ltintgt) (b ltintgt) gt ltintgt)
  • (ib (i (iu (ib (i (iu a) (iu b))))
    (iu a))))
  • remove box/unbox pair
  • (df f ((a ltintgt) (b ltintgt) gt ltintgt)
  • (ib (i (i (iu a) (iu b)) (iu a))))

24
Synergy Splitting Method Selection Inlining
Box/Unboxing
  • (df f (x)
  • (if (isa? x ltintgt)
  • (let ((y ( x 1)))
  • ( y 2))
  • (let ((y ( x 1)))
  • ( y 2))))
  • method selection
  • (df f (x)
  • (if (isa? x ltintgt)
  • (let ((y (ltintgt x 1)))
  • (ltintgt y 2))
  • (let ((y ( x 1)))
  • ( y 2))))
  • (df f (x)
  • (if (isa? x ltintgt)
  • (ltintgt (ltintgt x 1) 2)
  • (let ((y ( x 1)))
  • ( y 2))))
  • inlining
  • (df f (x)
  • (if (isa? x ltintgt)
  • (ib (i (iu (ib (i (iu x)
  • 1))))
  • 2))
  • (let ((y ( x 1)))
  • ( y 2))))
  • box/unbox
  • (df f (x)
  • (if (isa? x ltintgt)
  • (ib (i (i (iu x) 1)) 2))
  • (let ((y ( x 1)))
  • ( y 2))))

25
Common Subexpression Elimination (CSE)
  • Removes redundant computations
  • Constant slot or binding access
  • Stateless/side-effect-free function calls
  • Examples
  • (or (elt (cache x) a) (elt (cache x) b))
  • gt (let ((t (cache x))) (or (elt t a) (elt t
    b))
  • (if (lt i 0) (if (lt i 0) (go) (putz)) (dance))
  • gt (if (lt i 0) (go) (dance))

26
Overflow and Bounds Checksaka Moon Challenge
  • Goal
  • Support mathematical integers and bounds checked
    collection access
  • Eliminate bounds and overflow checks
  • Strategy
  • Assume most integer arithmetic and collection
    accesses occur in restricted loop context where
    range can be readily inferred
  • Perform range analysis to remove checks
  • Bound from above variables by size of collection
  • Bound from below variables by zero
  • Induction step is 1

27
Range Check Example
  • (rep (((sum ltintgt) 0) ((i ltintgt) 0))
  • (if (lt i (len v))
  • (let ((e (elt v i)))
  • (rep ( sum e) ( i 1)))
  • sum))
  • inlining bounds checks
  • (rep (((sum ltintgt) 0) ((i ltintgt) 0))
  • (if (lt i (len v))
  • (let ((e (if (or (lt i 0)
  • (gt i (len v)))
  • (sig ...)
  • (vref v i))))
  • (rep ( sum e) ( i 1)))
  • sum))
  • CSE
  • (rep (((sum ltintgt) 0) ((i ltintgt) 0))
  • (if (lt i (len v))
  • (let ((e (if (lt i 0)
  • (sig ...)
  • (vref v i))))
  • (rep ( sum e) ( i 1)))
  • sum))
  • range analysis
  • (rep (((sum ltintgt) 0) ((i ltintgt) 0))
  • (if (lt i (len v))
  • (let ((e (vref v i)))
  • (rep ( sum e) ( i 1)))
  • sum))

28
Overflow Check Removal aka Moon Challenge
Critique
  • Pros
  • simple analysis
  • Cons
  • could miss a number of cases
  • but then previous approaches (e.g., box/unbox)
    could be applied

29
Advanced topicRepresentation Selection
  • Embed objects in others to remove indirections
  • Change object representation over time
  • Use minimum number of bits to represent enums
  • Pack fields in objects

30
Advanced TopicAlgorithm Selection
  • Goal compiler determines that one algorithm is
    more appropriate for given data
  • Sorted data
  • Biased data
  • Solution
  • Embed statistics gathering in runtime
  • Add guards to code and split

31
Rule-based Compilation
  • First millennium compilers were based on special
    rules for
  • Method selection
  • Pattern matching
  • Oft-used system functions like format
  • Problems
  • Error prone
  • Dont generalize to user code
  • Challenge
  • Minimize number of rules
  • Competitive compiler speed
  • Produce competitive code

32
Partial Evaluation to the Rescue
  • Holy grail idea
  • Optimizations are manifest in code
  • Do previous optimizations with only p.e.
  • Simplify compiler based on limited moves
  • Static eval and folding
  • Inlining
  • Eliminate
  • Custom method selection
  • Custom constructor optimization
  • Etc.

33
Partial Eval Example
  • (dm format (port msg (args ))
  • (rep nxt ((I 0) (ai 0))
  • (when (lt I (len msg)))
  • (let ((c (elt msg I)))
  • (if ( c \)
  • (seq (print port (elt args ai))
  • (nxt ( I 1) ( ai 1))))
  • (seq (write port c)
  • (nxt ( I 1) ai)))))))
  • (format out gt? n)
  • First millennium solution is to have a custom
    optimizer for format
  • (seq (print port n) (write port gt ))
  • Second millennium solution with partial
    evaluation
  • (nxt 0 0)
  • (seq (print port n)
  • (nxt 1 1))
  • (seq (print port n)
  • (seq (write port \gt)
  • (nxt 2 1)))
  • (seq (print port n)
  • (seq (write port \gt)
  • (seq (write port \space))))

34
Partial Eval Challenge
  • Inlining and static eval are slow
  • Running code through inlining
  • Need to compile oft-used optimizations
  • Residual code is not necessarily efficient
  • Sometimes algorithmic change is necessary for
    optimal efficiency
  • Example method selection uses class numbering
    and decision tree whereas straightforward code
    does naïve method sorting
  • Perhaps there is a middle ground

35
Open Problems
  • Automatic inlining, splitting, and specialization
  • Efficient mathematical integers
  • Constant determination
  • Representation selection
  • Algorithmic selection
  • Efficient partial evaluation
  • Super compiler that runs for days

36
Reading List
  • Chambers Efficient Implementation of
    Object-oriented Programming Languages OOPSLA
    Tutorial
  • Chambers and Ungar SELF papers
  • Chambers et al. Vortex papers
Write a Comment
User Comments (0)
About PowerShow.com