Title: InstructionLevel Parallelism for LowPower Embedded Processors
1Instruction-Level Parallelism for Low-Power
Embedded Processors
Ph.D. Thesis Jean Michel Puiati, EPFL
- January 23, 2001
- Presented By
- Anup Gangwar
2Introduction
- Need for high performance low power processors
- Synergistic hardware -compiler design for EPIC or
VLIW like architectures - A new variable instruction length scheme
- Full predication support in hardware
3Outline
- Instruction-Level Parallelism
- Power Consumption in VLSI Circuits
- A Look at Available Mobile and DSP Processors
- High-Level Evaluation of A Low-Power VLIW
Processor - The DEVIL Low-Power Processor
- A Step Towards Predicated Execution
- Conclusion
4ILP Concepts and Limitations
- Data Dependences
- Flow Dependence or RAW
- Anti Dependence or WAR
- Output Dependence or WAW
- Reduction of critical path
- Control Dependences
- Resource Conflicts
5(No Transcript)
6Achieving ILP Pipelining
- Control dependencies affect pipelined execution
- Data dependencies affect pipelined execution
- Resource conflicts affect pipelined execution
7Achieving ILP Superscalar
Architectures
- In-order issue with in-order completion
- In-order issue with out-of-order completion
- Out-of-order issue with out-of-order completion
8(No Transcript)
9(No Transcript)
10(No Transcript)
11Achieving ILP VLIW Processors
- Low circuit overhead than Superscalar Processors
- Limited number of resources
- Explicit insertion of NOPs increases code size
12(No Transcript)
13Extracting ILP BasicBlock Scheduling
14Extracting ILP Superblock Scheduling
15Extracting ILP Predicated Execution
16Power Consumption in CMOS Circuits Parallelism
for Energy Efficiency
17(No Transcript)
18Available Mobile and VLIW Processors
- The ARM Family
- The ARM7 Generation
- The StrongARM
- The ARM Thumb Option
- The ARM Piccolo Option
- The ARM9 and ARM10
19Available Mobile and VLIW Processors
- The Motorola M-Core
- The LSI TinyRisc
- The Hitachi SuperH Family
- VLIW Processors
- The Motorola-Lucent StarCore
- The Philips TriMedia
- The HP/Intel IA-64
20High Level Evaluation of A Low-Power VLIW
Processor
- Energy consumption distribution
21High Level Evaluation of A Low-Power VLIW
Processor
- NOP Elimination in VLIW Processor
22High Level Evaluation of A Low-Power VLIW
Processor
23High Level Evaluation of A Low-Power VLIW
Processor
24High Level Evaluation of A Low-Power VLIW
Processor
- Energy-Delay Product Comparison
25The DEVIL Low-Power Processor
- Complexity in VLIW Architectures
- Hardware Duplication
- FUs and number of registers as well as ports
- Number of FUs versus type of FU
- Number of FUs versus available ILP
26The DEVIL Low-Power Processor
27The DEVIL Low-Power Processor
28The DEVIL Low-Power Processor
- Instruction Fetch Mechanism
29The DEVIL Low-Power Processor
- Branch Prediction Mechanism
30The DEVIL Low-Power Processor
- Performance with and without superscalar
optimizations
31The DEVIL Low-Power Processor
- Effect of SuperScalar optimization on code size
32The DEVIL Low-Power Processor
- Effect of NOP elimination on code size
33The DEVIL Low-Power Processor
- Effect of NOP elimination on the number of
accesses to code memory
34The DEVIL Low-Power Processor
- Effect of instruction fetch mechanism on code size
35The DEVIL Low-Power Processor
- Code size comparison with existing mobile
processors
36A Step Towards Predicated Execution
- Compiler techniques for reducing predicate code
size - Reduction of number of Control Instructions
- Predicate promotion and Instruction merging
- Instruction reduction for advanced code generation
37A Step Towards Predicated ExecutionReduction of
number of Control Instructions
38A Step Towards Predicated Execution Predicate
promotion and Instruction merging
39A Step Towards Predicated Execution
- Introducing predication support into processor
- Effect on code size of full predication
- Predication code size and Execution
Characterstics - Prefix based predication
40A Step Towards Predicated Execution
- Relative number of predicated instructions
41A Step Towards Predicated Execution
- Code expansion considering predication
42A Step Towards Predicated Execution
- Code reductions due to predicated execution
43Conclusions
- A synergistic hardware-compiler approach for
low-power processors - A new VLIW architecture to reduce increase in
code size - A prefix based predicated execution architecture
framework