Transmeta and Dynamic Code Optimization - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Transmeta and Dynamic Code Optimization

Description:

Common case detection and optimization. Branch prediction. Traces ( pre-fetching ) ... Pre-fetching. Find a load that is used later as an address in another load ... – PowerPoint PPT presentation

Number of Views:386
Avg rating:3.0/5.0
Slides: 16
Provided by: ashwinrb
Learn more at: https://cs.login.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Transmeta and Dynamic Code Optimization


1
Transmeta and Dynamic Code Optimization
  • Ashwin Bharambe
  • Mahim Mishra
  • Matthew Rosencrantz

2
Stuff Compilers Dont (Cant?) Do
  • Instruction reordering
  • Common case detection and optimization
  • Branch prediction
  • Traces ( pre-fetching )
  • Optimizing traces
  • Why cant compilers do these optimizations?
  • No runtime statistics
  • Legacy code ( inertia to recompile )

3
Therefore Dynamic Code Optimization
  • Optimize on the fly ( runtime )
  • Current processors do it to some extent
  • Instruction reordering
  • Branch prediction
  • You can do much better

4
How Do You Implement This?
  • Hardware Intensive approach
  • Pentium Pro
  • Instruction Translator Part of the critical
    path of the main processor
  • I-COP
  • Instruction-block Optimizer Off the critical
    path
  • Non-Hardware Intensive approach
  • Transmeta, DAISY, Java HotSpot
  • Trade-offs ?

5
I-COP (Instruction Path Coprocessors)
  • What?
  • Add another processor that watches the
    instructions retire and can perform operations on
    them
  • Why?
  • Performance!
  • Principles
  • Keep the optimizations out of the critical path
  • Avoid slowdown due to software

6
Structure
  • Multiple VLIW processor slices makes the
    I-COP simple, but still able to keep up
  • I-COP slices have 10 special instructions for
    pattern matching in addition to 12 normal RISC
    type

7
Applications of I-COP
  • Trace cache fill
  • Find long strings of instructions that are
    executed frequently
  • Pre-fetching
  • Find a load that is used later as an address in
    another load
  • Instruction trace optimizations
  • Register move optimization

8
The I-COP Processor
  • Multiple VLIW slices allow multi-level statically
    scheduled and explicitly encoded parallelism
  • Predication and delay slots obviate branch
    prediction
  • 32 integer registers, 8 predicate registers
  • 22 instructions, 12 RISC type, and 10 special
  • Pattern matching, bit manipulation,
    instrumentation
  • Fill buffer collects instructions for analysis
  • Task queue acts as FIFO scheduler

9
The I-COP Processor Cont.
10
Examples Of Special Instructions
  • SearchReplace
  • Finds a given pattern and replaces it with
    another given pattern, returns the number of
    replacements accomplished
  • Subset
  • Tests if the bits set in a given register are a
    subset of those set in a second register

11
Transmeta Crusoe
  • The best example of a non-hardware-intensive
    approach
  • New (and fast!) 128-bit VLIW processor
  • Aimed at systems where power efficiency is
    important
  • Mobile systems
  • Dense servers
  • Therefore, small gate count
  • BUT, need x86 compatibility
  • AND, at reasonable performance too

12
So how do they do it?
  • Have a Code-Morphing software layer that runs
    on the processor
  • All x86 software (BIOS, OS, apps) runs above this
  • CM software translates x86 code at runtime into
    VLIW processors native IS
  • Also optimizes the translations!
  • So processor is fast and simple

13
Cheesy Marketing Image
14
Code-Morphing Software
  • Translates an entire basic-block at once
  • Also does instruction re-ordering, branch
    prediction, register renaming
  • The translations are stored in a translation
    cache (part of main memory)
  • Instruments code to help with branch prediction,
    and detecting candidates for heavy optimizations

15
Code Morphing Software (cont.)
  • Also has some help from the hardware
  • Shadowed and working register sets
  • Alias hardware (load-and-protect operations)
  • Translated bit for each page table entry
  • Performance of systems with Crusoe 2-3 times
    longer battery life, performance comparable to
    Intel mobile processors
Write a Comment
User Comments (0)
About PowerShow.com