Title: Compiler Issues for Embedded Processors
1Compiler Issues for Embedded Processors
2Contents
- Compiler Design Issues
- Problems of Compilers for Embedded Processors
- Structure of typical C compiler
- Front end
- IR optimizer
- Back end
- Embedded-code optimization
- Retargetable compiler
3Compiler Design Issues
- For embedded systems the use of compilers is less
common. - Designers still use assembly language to program
many embedded applications. - Huge programming effort
- Far less code portability
- Maintainability
- Why is assembly programming still common?
- The reason lies in embedded systems
high-efficiency requirements.
4Problems of Compilers for Embedded Processors
- Embedded systems frequently employ
application-specific instruction set processors
(ASIPs) - Meet design constraints more efficiently than
general-purpose processor - E.g., performance, cost and power consumption
- Building the required software development tool
infrastructure for ASIPs is expensive and
time-consuming - Especially true for efficient C and C compiler
design, which requires a large amount of
resources and expert knowledge. - Therefore, C compilers are often unavailable for
newly designed ASIPs.
5Problems of Compilers for Embedded Processors
- Many existing compilers for ASIPs (e.g., DSPs)
generate low-quality code. - Compiled code may be several times larger and/or
slower than handwritten assembly code. - This poor code is virtually useless for
efficiency reason.
6Problems of Compilers for Embedded Processors
- The cause of the poor code quality is highly
specialized architecture of ASIPs, whose
instruction sets can be incompatible with
high-level languages and traditional compiler
technology - Because an instruction set is generally designed
primarily from a hardware designers viewpoint,
and - the architecture is fixed before considering
compiler issues.
7Problems of Compilers for Embedded Processors
- Problems of compiler unavailability must be
solved, because - Assembly programming will no longer meet short
time-to-market requirements - Future human programmers are unlikely to
outperform compilers - As processor architectures become increasingly
complex (e.g., deep pipelining, predicated
execution, and high parallelism) - Application program should be machine-independent
(e.g., C language) for architecture exploration
with various cost/performance tradeoffs.
8Coarse structure of typical C compiler
Source code
Optimized IR
Front end (scanner, parser, semantic analyzer)
Back end (code selection, register
allocation, scheduling, peephole optimization)
Intermediate representation (IR)
Assembly code
IR optimizer (constant folding, constant
propagation, jump optimization, loop-invariant
code motion, dead code elimination)
9Front end
- The front end translates the source program into
a machine-independent IR - The IR is stored in a simple format such as
three-address code - Each statement is either an assignment with at
most three operands, a label, or a jump - The IR serves as a common exchange format between
the front-end and the subsequent optimization
passes, and also forms the back-end input
L1 i ? i1 t1 ? i1 t2 ? p4 t3 ? t2 p ?
t2 t4 ? t1 lt 10 r ? t3 if t4 goto L1
Example IR (MIR code)
10Front end
- Front ends main component
- Scanner
- Recognizes certain character string in the source
code - Groups them into tokens
- Parser
- Analyzes the syntax according to the underlying
source-language grammar - Semantic analyzer
- Performs bookkeeping of identifiers, as well as
additional correctness checks that the parser
cannot perform - Many tools (e.g, lex and yacc) that automate the
generation of scanners and parsers are available
11IR optimizer
- The IR generated for a source program normally
contains many redundancies - such as multiple computations of the same value
or jump chains, because the front end does not
pay much attention to optimization issues - Human programmer might have built redundancies
into the source code, which must be removed by
subsequent optimization passes
12IR optimizer
- Constant folding
- replaces compile-time constant expressions with
their respective values - Constant propagation
- Replaces variables known to carry a constant
value with the respective constant - Jump optimization
- Simplifies jumps and removes jump chains
- Loop-invariant code motion
- Moves loop-invariant computations out of the loop
body - Dead code elimination
- Removed computation whose results are never needed
13ex) Constant Folding
void f() int A10 A2 3 5
void f() int A10, t1, t3, t5 char t2,
t4 t1 3 5 t4 (char ) A t3 2
4 t2 t4 t3 t5 (int ) t2 t5 t1
? from the source code
? array index 2 by the number of memory words
C example An element array A is assigned a
constant
C-like IR notation of the Lance compiler system
Unoptimized IR with two compile-time constant
expressions
- Now the IR optimizer can apply constant folding
to replace both constant expressions by constant
numbers, thus avoiding expensive computations at
program runtime
14IR optimizer
- A good compiler consists of many such IR
optimization passes. - Some of them are far more complex and require an
advanced code analysis. - There are strong interaction and mutual
dependence between these passes. - Some optimizations enable further opportunities
for other optimization. - should be applied repeatedly to be most effective
15Back end
- The back end (or code generator)
- maps the machine-independent IR into a
behaviorally equivalent machine-specific assembly
program. - Statement-oriented IR is converted into a more
expressive control/dataflow graph representation. - Front end and IR optimization technologies are
quite mature but the back end is often the most
crucial compiler phase for embedded processors.
16Major back end passes
- Code selection
- maps IR statement into assembly instructions
- Register allocation
- assigns symbolic variables and intermediate
results to the physically available machine
registers - Scheduling
- arranges the generated assembly instructions in
time slots - considers inter-instruction dependencies and
limited processor resources - Peephole optimization
- relatively simple pattern-matching replacement of
certain expensive instruction sequences by less
expensive ones.
17Back end passes for embedded processors
- Code selection
- To achieve good code quality, it must use
complex instructions - multiply-accumulate(MAC), load-with-autoincrement,
etc. - Or it must use subword-level instructions (have
no counter part in high-level language) - SIMD and network processor architectures
- Register allocation
- Utilize a special-purpose register architecture
to avoid having too many stores and reloads
between registers and memory - If the back end uses only traditional code
generation techniques, the resulting code quality
may be unacceptable
18Example Code selection with MAC instructions
temporary variable
- Dataflow graph (DFG) representation of a simple
computation - Conventional tree-based code selectors must
decompose the DFG into two separate trees. Fail
to exploit the MAC instructions - Covering all DFG operation with only two MAC
instructions requires code selector to consider
the entire DFG
19Example Register Allocation
LOD R1, C MUL R1, D STO R1, Temp1 LOD R1, B ADD
R1, Temp1 STO R1, A LOD R1, C MUL R1, D STO R1,
Temp2 LOD R1, A SUB R1, Temp2 STO R1, B
LOD R1, C MUL R1, D CD LOD R2, B ADD R2,
R1 BCD STO R2, A SUB R2, R1 A-CD STO R2, B
A B C D B A - C D
Simple Register Allocation
Smart Register Allocation
Source Program
20Embedded-code optimization
- Dedicated code optimization techniques
- Single-instruction, multiple-data instructions
- Recent multimedia processor use SIMD
instructions, which operate at the subword level.
(ex. Intel MMX) - Address generation units (AGUs)
- Allow address computation in parallel with
regular computations in the central datapath - Good use of AGUs is mandatory for high code
quality - Code optimization for low power
- In addition to performance and code size, power
efficiency is increasingly important - Must obey heat dissipation constraint, efficient
use of battery capacity in mobile systems
21Embedded-code optimization
- Dedicated code optimization techniques (contd)
- Code optimization for low power (contd)
- Compiler can support power savings
- Generally, the shorter the program runtime, the
less energy is consumed - Energy-conscious compilers armed with an energy
model of the target machine, give priority to the
lowest energy-consuming instruction sequences - Since a significant portion of energy is spent on
memory accesses, another option is to move
frequently used blocks of program code or data
into efficient cache or on-chip memory
22Retargetable compiler
- To support fast compiler design for new
processors and hence support architecture
exploration, researchers have proposed
retargetable compilers - A retargetable compilers can be modified to
generate code for different target processors
with few changes in its source code.
23Example CHESS /CHECKERSRetargetable Tool Suites
- CHESS/CHECKERS
- is a retargetable tool-suite for flexible
embedded processors in electronic systems. - supports both the design and the use of embedded
processors. These processors form the heart of
many advanced systems in competitive markets like
telecom, consumer or automotive electronics. - is developed and commercialized by Target
Compiler Technologies.
http//www.retarget.com
24Example CHESS /CHECKERSRetargetable Tool Suites
http//www.retarget.com
25Example CHESS /CHECKERSRetargetable Tool Suites
26ASIP(Application-Specific Instruction Set
Processor) Design
27Reference
- J.H.Yang et al, MetaCore An Application-Specific
DSP Development System, 1998 DAC Proceedings,
pp. 800-803. - J.H.Yang et al, MetaCore An Application-Specific
Programmable DSP Development System, IEEE
Trans. VLSI Systems, vol 8, April 2000,
pp173-183. - B.W.Kim et al, MDSP-II16-bit DSP with Mobile
Communication Accelerator, IEEE JSSC, vol 34,
March 1999, pp397-404.
28Part I ASIP in general
- ASIP is a compromise between GPP(General-Purpose
Processor) which can be used anywhere with low
performance and full-custom ASIC which fits only
a specific application but with very high
performance. - GPP, DSP, ASIP, FPGA, ASIC(sea of gates),
CBIC(standard cell-based IC), and full custom
ASIC in the order of increasing performance and
decreasing adaptability. - Recently, ASIC as well as FPGA contains processor
cores.
29Cost, Performance,Programmability, and
TTM(Time-to-Market)
- ASIP (Application-Specific Instruction set
Processor) - ASIP is a tradeoff between the advantages of
general-purpose processor (flexibility, short
development time) and those of ASIC (fast
execution time).
Execution time
General-purpose processor
ASIP
Cost (NREchip area)
Rigidity
Depends on volume of product
ASIC
Development time
30Comparison of TypicalDevelopment Time
Chip manufacturer time
Customer time
MetaCore (ASIP)
20 months
3 months
Core generation application code development
MetaCore development
General-purpose processor
20 months
2 months
Application code development
Core generation
ASIC
10 months
31Issues in ASIP Design
- For high execution speed, flexibility and small
chip area - An optimal selection of micro-architecture
instruction set is required based on diverse
exploration of the design space. - For short design turnaround time
- An efficient means of transforming higher-level
specification into lower-level implementation is
required. - For friendly support of application program
development - A fast development of a suite of supporting
software including compiler and ISS(Instruction
Set Simulator) is necessary.
32Various ASIP Development Systems
Instruction set customization
Application programming level
Year
Selection from predefined super set
User-defined instructions
PEAS-I (Univ. Toyohashi)
Yes
No
C-language
1991
Risc-like Micro-architecture (register
based operation)
Generates proper instruction set based on
predefined datapath
ASIA (USC)
C-language
1993
EPICS (Philips)
Yes
No
assembly
1993
DSP-oriented Micro-architecture (memory
based operation)
CD2450 (Clarkspur)
Yes
No
assembly
1995
MetaCore (KAIST)
Yes
Yes
C-language
1997
33Part II MetaCore System
- Verification with co-generated compiler and ISS
- MetaCore system
- ASIP development environment
- Re-configurable fixed-point DSP architecture
- Retargetable system software
- C-compiler, ISS, assembler
- MDSP-II a 16-bit DSP targeted for GSM
applications.
34The Goal of MetaCore System
- Supports efficient design methodology for ASIP
targeted for DSP application field.
35Overview How to Obtain a DSP Core from MetaCore
System
Instructions
Functional blocks
Architecture template
Primitive class
Adder
add
sub
and
or
. . . .
Bus structure
Multiplier
Shifter
Data-path structure
. . . .
Optional class
min
max
mac
. . . .
Pipeline model
Select architectural parameter
Select instructions
Select functional blocks
Benchmark Programs
Simulation
Modify architecture
No
No
OK?
Add or delete instructions
Add or delete functional blocks
Yes
HDL code generation
Logic synthesis
36System Library Generator Set Key Components
of MetaCore System
Processor Specification
Benchmark Programs
Modify specification
ISS generator
Compiler generator
Simulation
C compiler
ISS
modify
Modify
Add
Add
Evaluation
Generator set
accept
Set of functional blocks
HDL generator
Architecture template
Set of instructions
- bus structure
- instructions definition
- parameterized HDL code
Synthesizable HDL code
- I/O port information
- pipeline model
- related func. block
- gate count
- data-path structure
System Lib.
37Processor Specification (example)
- Specification of target core
- defines instruction set hardware configuration.
- is easy for designer to use modify due to
high-level abstraction.
//Specification of EM1
(hardware
ACC 1
Hardware configuration
AR 4
pmem 2k, 2047 0
)
(def_inst ADD
(operand type2 )
(ACC lt ACC S1 )
Instruction set definition
(extension sign )
(flag cvzn )
(exestage 1
)
38Benchmark analysis
- is necessary for deciding the instruction set.
- produces information on
- the frequency of each instruction to obtain
cost-effective instruction set. - the frequent sequence of contiguous instructions
to reduce to application-specific instructions.
a0memar1
abs a0, ar1
abs a0, ar1
a10
clr a1
clr a1
a1a1memar2
add a1, ar2
add a1, ar2
cmp a1, a0
max a1, a0
a1max(a1, a0)
if(a1gta0) pcL1
bgtz L1
L1
a10
clr a1
a1a1a0
add a1, a0
Application-specific instruction
L1
Frequent sequence of contiguous instructions
39HDL Code Generator
40Design Example (MDSP-II)
- GSM(Global System for Mobile communication)
- Benchmark programs
- C programs (each algorithm constructing GSM)
- Procedure of design refinement
Remove infrequent instructions based
on instruction usage count
Turn frequent sequence of contiguous
instructions into a new instruction
EM2 (MDSP-II)
EM0
EM1
- Initial design containing
- all predefined instructions
- Final design containing
- application-specific
- instructions
41Evolution of MDSP-II Corefrom The Initial Machine
Number of clock cycles (for 1 sec. voice data
processing)
Gate count
Machine
EM0 (initial)
53.0 Millions
18.1K
EM1 (intermediate)
53.1 Millions
15.0K
EM2 (MDSP-II)
27.5 Millions
19.3K
Number of clock cycles
EM1
EM0
50M
40M
EM2 (MDSP-II)
30M
20M
10M
Gate count
5K
10K
15K
20K
42Design Turnaround Time (MDSP-II)
- Design turnaround is significantly reduced due to
the reduction of HDL design functional
simulation time. - Only hardware blocks for application-specific
instructions, if any, need to be designed by the
user.
43Overview of EM2 (MDSP-II)
16-bit fixed-point DSP
Optimized for GSM
0.6 mm CMOS (TLM), 9.7mm x 9.8mm
55 MHz _at_5.0V
MCAU (Mobile Comm. Acceleration Unit) consists of
functional blocks for application-specific
instructions
16x16 multiplier
32-bit adder
DALU (Data Arithmetic Logic Unit)
16x16 multiplier
16-bit barrel shifter
32-bit adder
Data switch network
PCU (Program Control Unit)
AGU (Address Generation Unit) supports linear,
modulo and bit-reverse addressing modes
PU (Peripheral Unit)
Serial I/O
Timer
44Conclusions
- MetaCore, an effective ASIP design methodology
for DSP is proposed. - 1) Benchmark-driven high-level abstraction of
processor specification enables performance/cost
effective design. - 2) Generator set with system library enables
short design turnaround time.
45Grand Challenges and Opportunities laid by SoC
for Korea
- SoC Conference
- Oct. 23-24, Coex Conference Center
46- Can the success story be continued?
47Can the success story be continued?
- 60??? per capita GNP ? 100 ?? ? ?? ??.
- ??? ???, ???, ???? ?? IT ???? Global Player ? ??
??. - We need to be proud of our success despite all
todays agony in NASDAQ and terrible politics
situation. However, we need more to know why we
succeeded and how this can be continued.
48- What is critical for success in SoC Business?
49?? ?? Game ??? Rule ? ???.
- ??? ??? ??? ????? ?????(game rule) ??? ??? ???
???? ?? ?? ???? ???. 50? ??? ?? ??. ????? ? ??
?????? ???.(3? ?? ?, ??, ??)
50Its people, people!
- Internet ? ??? ??? ??? ??, ??/??? ??? ????,
TTM(Time-to-market) ? key value ? ???. Dynamic ?
???? ?? ?? ??? ??? resource (designer,IP,tool) ?
???? ???? deliver ?? ?? ??? ??? ???? ??? ?? ??
????? ?? ?? ???? ????? ???.
51 52?? ??? ??????
- ???? SoC ? ??? ??? ?? ??, ??? ?? ?? ??? ?? ??.
- ???, ?? ?? (???? ?? ????. ????? ????, ??? ??
3-6??? ????? synapse interconnection) - ???? ???? ?????? ?? ??.
- ???, ?? ?? (????/????, ????/?? ???)
- system house ?? ?? ?? (? ?? idea source)
53SoC type
- ??? ??? ??(???),
- ?? ??? ? ??/????(????),
- ??? ???, ???? ??? ??? ??(?? ????)
54?? ??? ???? AND type,or OR type?
- AND or OR?
- ??? ?? ???. ?? ?? ?? ???. ?? ?? ???? ???? ??.
- ??? ??? ?? ?? ?? ??? ?? ? ? ??. ?????? dribble ??
?? ?? world cup ?? ? ???. - ??? ??? ??? ??. 2-??? ??? 2 ??? ???? ?? ????.
3-??? 3-9?? ??? ???. - (???,??),(???,??),(???,???),(???,???),(??? ???,??
??,??)
55Can deep-thinker cooperate?
- ???? ???? ??? ? ????
- ??? ???. ???, ??? ? ? ??? reward/???? ?? ??.
- ??? ??? ??.(????, ?? ??? ?? ?? ?? ??? ?? ?.)
- ? Bismarck ??? ??? shy man.
- ?????????motivation ? ????? ??? ???????
- ?????? ???? ?? ???.
56Can deep-thinker cooperate?
57Big harvest comes later
58??? Fundamental ???
- Fundamental Growth Power ? ??
- Growth Power ??? X ??? X ???
- ??? ??? ????
59- ?? (???? ????) ????? ?????? ? ? ??? ??. ?????
??? ? ?? ??. - To Sustain Growth in a Dynamic Environment, you
need a Sharp edge AND Stable enough bottom.
(Bottom is a tool to let edge, which is the
objective, work better.) -
60Back to the Basic!
- There are unique roles to be played by each party
of Government, University, and Industry. - Government must do long-term/global planning and
evaluation/resource allocation, maintain national
research lab to perform research in areas
basic/health/environment/defense. - University must excel in fundamentals and
future-oriented research. - Industry ??/??? ???
61- strong interaction and cooperation between
- Government and private sector
- Industry and academia
- System industry and IC industry
- Hardware designers and software designers
- IC industries in the pre-competitive stage
62- Korean Semiconductor Industry
- Challenges now faced
63Challenges now faced by Korean Semiconductor
Industry
- SWOT Analysis
- Strength zeal for learning, venturemind
(can-do spirit) - Weakness strong trend to escape from technology
career in favor of lawyer, doctor, star - Threat Chinas rush, Protectionism in each
region - What opportunities ?
64Future depends on dealing with OPPORTUNITIES of
SoC
- System-on-Chip (SoC) is THE driver/market for the
semiconductor AND system industry in the 21th
century. - Koreas expertise on DRAM and memory business
needs to be connected via. SoC to system
industries like communication, car, consumers,
medical/health.
65Five reasons why SoC differs from ASIC
- SoC is not just a BIG ASIC(Application-Specific
IC) SoC needs Culture Change - Design Reuse
- Software embedded in IC chip
- System-level design methodology
- Heterogeneous mix of technology
- memory, processor, MEMS (Micro Electro-Mechanical
System), and others - VDSM (Very Deep Sub-Micron) effects for design
and manufacturing