Title: Transmeta Crusoe Microprocessor
1Transmeta Crusoe Microprocessor
- Dr. Doug L. Hoffman
- Computer Science 330
- Spring 2002
2Transmeta Crusoe
"Today in RISC we have large design teams and
long design cycles. The performance story is also
much less clear now. The die sizes are no longer
small. It just doesn't seem to make as much
sense. Superscalar and out-of-order execution are
the biggest problem areas that have impeded
performance leaps. The MIPS R10,000 and HP
PA-8000 seem much more complex to me than today's
standard CISC architecture, which is the Pentium
II. So where is the advantage of RISC, if the
chips aren't as simple anymore? David Ditzel,
Transmeta CEO
3Transmetas 80x86 Architecture?
- Crusoe microprocessors can run the same
software that runs on IBM PC-compatible personal
computers. -
- Smaller, simpler logic. Only about half the logic
transistors of an x86 processor. - Consumes between one-third and
one-30th the power. - Implements none of the x86 instructions in
hardware.
4X86 vs. Crusoe
The blue stuff is silicon, and the yellow is
software. Crusoe's blue part is smaller, because
branch prediction, and out-of-order execution
(OOO) hardware has moved off the die and into
software. All of those functions are now done in
real-time by a special program as the application
code is executing.
5Transmetas Crusoe
The highest-performance Crusoe chip, the TM5400
6Crusoe Features
- Dynamic binary translation, gives programs the
impression that they are running on an x86
machine. - VLIW processor executes up to 4 instructions in
parallel. - LongRun power control adjust CPU power to the
tasks being performed.
7Transmeta-ese
- Individual instructions are called atoms.
- VLIW instruction groups are called molecules.
- Commit and rollback allows instructions to be
un-done. - Code Morphing
8Transmeta-ese
9VLIW vs. Superscaler
A "traditional" VLIW machine does reordering and
parallelism hunting in software. For a
straight-ahead VLIW design like Intel's IA-64,
the piece of software that does all this is the
compiler. The compiler extracts the parallelism
from the code, looks for dependencies, etc., and
produces optimized code that the VLIW core can
run as fast as possible, in-order.
10Code Morphing
The x86 architecture is an ill-defined amoeba
containing such features as segmentation, ASCII
arithmetic, and variable-length instructions the
square inside the blob is the VLIW processor and
its functions.
11Code Morphing
Since Crusoe is a VLIW machine that's made to run
code compiled for a superscalar machine, its
compilation and scheduling scheme is sort of a
hybrid of both approaches. Crusoe's Code Morphing
software actually takes a compiled x86 program
and recompiles it, on-the-fly, to Crusoe's native
VLIW instruction format. This recompilation uses
sophisticated compiler algorithms to extract
parallelism from the code, look for dependencies
and do all those things that a state-of-the-art
VLIW compiler does.
12Code Morphing Details
- Takes x86 instructions and recompiles them on the
fly into VLIW instructions (atoms). - As it recompiles them, it optimizes them, making
them run, in many cases, more efficiently than
the original x86 code. - Finally, a scheduler reorders the atoms and
groups them into molecules. - Once translated, the VLIW code is stored in a
special part of memory, accessible only by the
Code Morphing software, so that particular
program need not be translated again. - But thats not all...
13Code Morphing Details
- Software continues to monitor how an application
is being used. - If it finds that a process is spending a lot of
time in one part of the code, it turns on more
levels of optimization to make that part of the
program run faster. - It only optimizes the parts of the code being
used. Things that are executed infrequently are
not optimized.
14Code Morphing
One of the challenges of creating the Code
Morphing software was to make the Crusoe
processor, in many cases, bug-compatible with the
x86 so that it would generate the so-called Blue
Screen of Death at many of the same times an x86
processor would.
15Processor Features
- Five execution units two arithmetic-logic, a
load/store, a branch, and a floating-point. - Can execute four instructions in a cycle.
- Sixty-four general-purpose and 32 floating-point
working registers shadowed by 48 general-purpose
and 16 floating-point registers. - 64KB level one (L1) caches and a 256KB level two
(L2) cache. - Even more important
16What it doesnt have
- no superscalar decode, grouping, or issue logic.
- no register renaming.
- no segmentation hardware.
- no floating-point stack hardware.
- less interlock and bypassing logic than a
traditional central processing unit.
17Low Power Features
- If you have fewer transistors, you burn less
power. - Only those functional units that are absolutely
needed to execute an instruction are turned on. - LongRun hardware adjusts both
the supply voltage and the clock frequency so
that each application runs only as fast as it
must to get the job done.
18Hardware and Software Architecture
Processor upgrades are simplified because the
layer of software between the applications and
the chip frees the designers to change the chip
architecture without causing x86 software
developers to have to recompile their code.
Code Morphing software can be updated
independently of hardware by loading a software
upgrade into Flash memory.
19The Last Word
- "Considering the complexity of the project, it is
amazing how well it works, how fast it works, and
how low-power it is. For the end-user, this is
just a normal PC, but under the hood, it is a
technological marvel." - -- Marc Fleischmann,Transmeta
- "Revolutionary may be an overstatement, but they
are definitely different..." - -- Cahners Microprocessor
Report