Compiler Issues for Embedded Processors

About This Presentation

Title:

Compiler Issues for Embedded Processors

Description:

Title: voor dia serie SNS-Utrecth/'t Gooi Author: Carla Otten Last modified by: Created Date: 4/19/1995 10:16:14 AM Document presentation format – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 66

Provided by: Carla117

Category:

more less

Transcript and Presenter's Notes

Title: Compiler Issues for Embedded Processors

1
Compiler Issues for Embedded Processors
2
Contents

Compiler Design Issues
Problems of Compilers for Embedded Processors
Structure of typical C compiler
Front end
IR optimizer
Back end
Embedded-code optimization
Retargetable compiler

3
Compiler Design Issues

For embedded systems the use of compilers is less
common.
Designers still use assembly language to program
many embedded applications.
Huge programming effort
Far less code portability
Maintainability
Why is assembly programming still common?
The reason lies in embedded systems
high-efficiency requirements.

4
Problems of Compilers for Embedded Processors

Embedded systems frequently employ
application-specific instruction set processors
(ASIPs)
Meet design constraints more efficiently than
general-purpose processor
E.g., performance, cost and power consumption
Building the required software development tool
infrastructure for ASIPs is expensive and
time-consuming
Especially true for efficient C and C compiler
design, which requires a large amount of
resources and expert knowledge.
Therefore, C compilers are often unavailable for
newly designed ASIPs.

5
Problems of Compilers for Embedded Processors

Many existing compilers for ASIPs (e.g., DSPs)
generate low-quality code.
Compiled code may be several times larger and/or
slower than handwritten assembly code.
This poor code is virtually useless for
efficiency reason.

6
Problems of Compilers for Embedded Processors

The cause of the poor code quality is highly
specialized architecture of ASIPs, whose
instruction sets can be incompatible with
high-level languages and traditional compiler
technology
Because an instruction set is generally designed
primarily from a hardware designers viewpoint,
and
the architecture is fixed before considering
compiler issues.

7
Problems of Compilers for Embedded Processors

Problems of compiler unavailability must be
solved, because
Assembly programming will no longer meet short
time-to-market requirements
Future human programmers are unlikely to
outperform compilers
As processor architectures become increasingly
complex (e.g., deep pipelining, predicated
execution, and high parallelism)
Application program should be machine-independent
(e.g., C language) for architecture exploration
with various cost/performance tradeoffs.

8
Coarse structure of typical C compiler
Source code
Optimized IR
Front end (scanner, parser, semantic analyzer)
Back end (code selection, register
allocation, scheduling, peephole optimization)
Intermediate representation (IR)
Assembly code
IR optimizer (constant folding, constant
propagation, jump optimization, loop-invariant
code motion, dead code elimination)
9
Front end

The front end translates the source program into
a machine-independent IR
The IR is stored in a simple format such as
three-address code
Each statement is either an assignment with at
most three operands, a label, or a jump
The IR serves as a common exchange format between
the front-end and the subsequent optimization
passes, and also forms the back-end input

L1 i ? i1 t1 ? i1 t2 ? p4 t3 ? t2 p ?
t2 t4 ? t1 lt 10 r ? t3 if t4 goto L1
Example IR (MIR code)
10
Front end

Front ends main component
Scanner
Recognizes certain character string in the source
code
Groups them into tokens
Parser
Analyzes the syntax according to the underlying
source-language grammar
Semantic analyzer
Performs bookkeeping of identifiers, as well as
additional correctness checks that the parser
cannot perform
Many tools (e.g, lex and yacc) that automate the
generation of scanners and parsers are available

11
IR optimizer

The IR generated for a source program normally
contains many redundancies
such as multiple computations of the same value
or jump chains, because the front end does not
pay much attention to optimization issues
Human programmer might have built redundancies
into the source code, which must be removed by
subsequent optimization passes

12
IR optimizer

Constant folding
replaces compile-time constant expressions with
their respective values
Constant propagation
Replaces variables known to carry a constant
value with the respective constant
Jump optimization
Simplifies jumps and removes jump chains
Loop-invariant code motion
Moves loop-invariant computations out of the loop
body
Dead code elimination
Removed computation whose results are never needed

13
ex) Constant Folding
void f() int A10 A2 3 5
void f() int A10, t1, t3, t5 char t2,
t4 t1 3 5 t4 (char ) A t3 2
4 t2 t4 t3 t5 (int ) t2 t5 t1
? from the source code
? array index 2 by the number of memory words
C example An element array A is assigned a
constant
C-like IR notation of the Lance compiler system
Unoptimized IR with two compile-time constant
expressions

Now the IR optimizer can apply constant folding
to replace both constant expressions by constant
numbers, thus avoiding expensive computations at
program runtime

14
IR optimizer

A good compiler consists of many such IR
optimization passes.
Some of them are far more complex and require an
advanced code analysis.
There are strong interaction and mutual
dependence between these passes.
Some optimizations enable further opportunities
for other optimization.
should be applied repeatedly to be most effective

15
Back end

The back end (or code generator)
maps the machine-independent IR into a
behaviorally equivalent machine-specific assembly
program.
Statement-oriented IR is converted into a more
expressive control/dataflow graph representation.
Front end and IR optimization technologies are
quite mature but the back end is often the most
crucial compiler phase for embedded processors.

16
Major back end passes

Code selection
maps IR statement into assembly instructions
Register allocation
assigns symbolic variables and intermediate
results to the physically available machine
registers
Scheduling
arranges the generated assembly instructions in
time slots
considers inter-instruction dependencies and
limited processor resources
Peephole optimization
relatively simple pattern-matching replacement of
certain expensive instruction sequences by less
expensive ones.

17
Back end passes for embedded processors

Code selection
To achieve good code quality, it must use
complex instructions
multiply-accumulate(MAC), load-with-autoincrement,
etc.
Or it must use subword-level instructions (have
no counter part in high-level language)
SIMD and network processor architectures
Register allocation
Utilize a special-purpose register architecture
to avoid having too many stores and reloads
between registers and memory
If the back end uses only traditional code
generation techniques, the resulting code quality
may be unacceptable

18
Example Code selection with MAC instructions
temporary variable

Dataflow graph (DFG) representation of a simple
computation
Conventional tree-based code selectors must
decompose the DFG into two separate trees. Fail
to exploit the MAC instructions
Covering all DFG operation with only two MAC
instructions requires code selector to consider
the entire DFG

19
Example Register Allocation
LOD R1, C MUL R1, D STO R1, Temp1 LOD R1, B ADD
R1, Temp1 STO R1, A LOD R1, C MUL R1, D STO R1,
Temp2 LOD R1, A SUB R1, Temp2 STO R1, B
LOD R1, C MUL R1, D CD LOD R2, B ADD R2,
R1 BCD STO R2, A SUB R2, R1 A-CD STO R2, B
A B C D B A - C D
Simple Register Allocation
Smart Register Allocation
Source Program
20
Embedded-code optimization

Dedicated code optimization techniques
Single-instruction, multiple-data instructions
Recent multimedia processor use SIMD
instructions, which operate at the subword level.
(ex. Intel MMX)
Address generation units (AGUs)
Allow address computation in parallel with
regular computations in the central datapath
Good use of AGUs is mandatory for high code
quality
Code optimization for low power
In addition to performance and code size, power
efficiency is increasingly important
Must obey heat dissipation constraint, efficient
use of battery capacity in mobile systems

21
Embedded-code optimization

Dedicated code optimization techniques (contd)
Code optimization for low power (contd)
Compiler can support power savings
Generally, the shorter the program runtime, the
less energy is consumed
Energy-conscious compilers armed with an energy
model of the target machine, give priority to the
lowest energy-consuming instruction sequences
Since a significant portion of energy is spent on
memory accesses, another option is to move
frequently used blocks of program code or data
into efficient cache or on-chip memory

22
Retargetable compiler

To support fast compiler design for new
processors and hence support architecture
exploration, researchers have proposed
retargetable compilers
A retargetable compilers can be modified to
generate code for different target processors
with few changes in its source code.

23
Example CHESS /CHECKERSRetargetable Tool Suites

CHESS/CHECKERS
is a retargetable tool-suite for flexible
embedded processors in electronic systems.
supports both the design and the use of embedded
processors. These processors form the heart of
many advanced systems in competitive markets like
telecom, consumer or automotive electronics.
is developed and commercialized by Target
Compiler Technologies.

http//www.retarget.com
24
Example CHESS /CHECKERSRetargetable Tool Suites
http//www.retarget.com
25
Example CHESS /CHECKERSRetargetable Tool Suites
26
ASIP(Application-Specific Instruction Set
Processor) Design
27
Reference

J.H.Yang et al, MetaCore An Application-Specific
DSP Development System, 1998 DAC Proceedings,
pp. 800-803.
J.H.Yang et al, MetaCore An Application-Specific
Programmable DSP Development System, IEEE
Trans. VLSI Systems, vol 8, April 2000,
pp173-183.
B.W.Kim et al, MDSP-II16-bit DSP with Mobile
Communication Accelerator, IEEE JSSC, vol 34,
March 1999, pp397-404.

28
Part I ASIP in general

ASIP is a compromise between GPP(General-Purpose
Processor) which can be used anywhere with low
performance and full-custom ASIC which fits only
a specific application but with very high
performance.
GPP, DSP, ASIP, FPGA, ASIC(sea of gates),
CBIC(standard cell-based IC), and full custom
ASIC in the order of increasing performance and
decreasing adaptability.
Recently, ASIC as well as FPGA contains processor
cores.

29
Cost, Performance,Programmability, and
TTM(Time-to-Market)

ASIP (Application-Specific Instruction set
Processor)
ASIP is a tradeoff between the advantages of
general-purpose processor (flexibility, short
development time) and those of ASIC (fast
execution time).

Execution time
General-purpose processor
ASIP
Cost (NREchip area)
Rigidity
Depends on volume of product
ASIC
Development time
30
Comparison of TypicalDevelopment Time
Chip manufacturer time
Customer time
MetaCore (ASIP)
20 months
3 months
Core generation application code development
MetaCore development
General-purpose processor
20 months
2 months
Application code development
Core generation
ASIC
10 months
31
Issues in ASIP Design

For high execution speed, flexibility and small
chip area
An optimal selection of micro-architecture
instruction set is required based on diverse
exploration of the design space.
For short design turnaround time
An efficient means of transforming higher-level
specification into lower-level implementation is
required.
For friendly support of application program
development
A fast development of a suite of supporting
software including compiler and ISS(Instruction
Set Simulator) is necessary.

32
Various ASIP Development Systems
Instruction set customization
Application programming level
Year
Selection from predefined super set
User-defined instructions
PEAS-I (Univ. Toyohashi)
Yes
No
C-language
1991
Risc-like Micro-architecture (register
based operation)
Generates proper instruction set based on
predefined datapath
ASIA (USC)
C-language
1993
EPICS (Philips)
Yes
No
assembly
1993
DSP-oriented Micro-architecture (memory
based operation)
CD2450 (Clarkspur)
Yes
No
assembly
1995
MetaCore (KAIST)
Yes
Yes
C-language
1997
33
Part II MetaCore System

Verification with co-generated compiler and ISS
MetaCore system
ASIP development environment
Re-configurable fixed-point DSP architecture
Retargetable system software
C-compiler, ISS, assembler
MDSP-II a 16-bit DSP targeted for GSM
applications.

34
The Goal of MetaCore System

Supports efficient design methodology for ASIP
targeted for DSP application field.

35
Overview How to Obtain a DSP Core from MetaCore
System
Instructions
Functional blocks
Architecture template
Primitive class
Adder
add
sub
and
or
. . . .
Bus structure
Multiplier
Shifter
Data-path structure
. . . .
Optional class
min
max
mac
. . . .
Pipeline model
Select architectural parameter
Select instructions
Select functional blocks
Benchmark Programs
Simulation
Modify architecture
No
No
OK?
Add or delete instructions
Add or delete functional blocks
Yes
HDL code generation
Logic synthesis
36
System Library Generator Set Key Components
of MetaCore System
Processor Specification
Benchmark Programs
Modify specification
ISS generator
Compiler generator
Simulation
C compiler
ISS
modify
Modify
Add
Add
Evaluation
Generator set
accept
Set of functional blocks
HDL generator
Architecture template
Set of instructions
- bus structure
- instructions definition
- parameterized HDL code
Synthesizable HDL code
- I/O port information
- pipeline model
- related func. block
- gate count
- data-path structure
System Lib.
37
Processor Specification (example)

Specification of target core
defines instruction set hardware configuration.
is easy for designer to use modify due to
high-level abstraction.

//Specification of EM1
(hardware
ACC 1
Hardware configuration
AR 4
pmem 2k, 2047 0
)
(def_inst ADD
(operand type2 )
(ACC lt ACC S1 )
Instruction set definition
(extension sign )
(flag cvzn )
(exestage 1
)
38
Benchmark analysis

is necessary for deciding the instruction set.
produces information on
the frequency of each instruction to obtain
cost-effective instruction set.
the frequent sequence of contiguous instructions
to reduce to application-specific instructions.

a0memar1
abs a0, ar1
abs a0, ar1
a10
clr a1
clr a1
a1a1memar2
add a1, ar2
add a1, ar2
cmp a1, a0
max a1, a0
a1max(a1, a0)
if(a1gta0) pcL1
bgtz L1
L1
a10
clr a1
a1a1a0
add a1, a0
Application-specific instruction
L1
Frequent sequence of contiguous instructions
39
HDL Code Generator
40
Design Example (MDSP-II)

GSM(Global System for Mobile communication)
Benchmark programs
C programs (each algorithm constructing GSM)
Procedure of design refinement

Remove infrequent instructions based
on instruction usage count
Turn frequent sequence of contiguous
instructions into a new instruction
EM2 (MDSP-II)
EM0
EM1

Initial design containing
all predefined instructions

Final design containing
application-specific
instructions

41
Evolution of MDSP-II Corefrom The Initial Machine
Number of clock cycles (for 1 sec. voice data
processing)
Gate count
Machine
EM0 (initial)
53.0 Millions
18.1K
EM1 (intermediate)
53.1 Millions
15.0K
EM2 (MDSP-II)
27.5 Millions
19.3K
Number of clock cycles
EM1
EM0
50M
40M
EM2 (MDSP-II)
30M
20M
10M
Gate count
5K
10K
15K
20K
42
Design Turnaround Time (MDSP-II)

Design turnaround is significantly reduced due to
the reduction of HDL design functional
simulation time.
Only hardware blocks for application-specific
instructions, if any, need to be designed by the
user.

43
Overview of EM2 (MDSP-II)
16-bit fixed-point DSP
Optimized for GSM
0.6 mm CMOS (TLM), 9.7mm x 9.8mm
55 MHz _at_5.0V
MCAU (Mobile Comm. Acceleration Unit) consists of
functional blocks for application-specific
instructions
16x16 multiplier
32-bit adder
DALU (Data Arithmetic Logic Unit)
16x16 multiplier
16-bit barrel shifter
32-bit adder
Data switch network
PCU (Program Control Unit)
AGU (Address Generation Unit) supports linear,
modulo and bit-reverse addressing modes
PU (Peripheral Unit)
Serial I/O
Timer
44
Conclusions

MetaCore, an effective ASIP design methodology
for DSP is proposed.
1) Benchmark-driven high-level abstraction of
processor specification enables performance/cost
effective design.
2) Generator set with system library enables
short design turnaround time.

45
Grand Challenges and Opportunities laid by SoC
for Korea

SoC Conference
Oct. 23-24, Coex Conference Center

Can the success story be continued?

47
Can the success story be continued?

60??? per capita GNP ? 100 ?? ? ?? ??.
??? ???, ???, ???? ?? IT ???? Global Player ? ??
??.
We need to be proud of our success despite all
todays agony in NASDAQ and terrible politics
situation. However, we need more to know why we
succeeded and how this can be continued.

What is critical for success in SoC Business?

49
?? ?? Game ??? Rule ? ???.

??? ??? ??? ????? ?????(game rule) ??? ??? ???
???? ?? ?? ???? ???. 50? ??? ?? ??. ????? ? ??
?????? ???.(3? ?? ?, ??, ??)

50
Its people, people!

Internet ? ??? ??? ??? ??, ??/??? ??? ????,
TTM(Time-to-market) ? key value ? ???. Dynamic ?
???? ?? ?? ??? ??? resource (designer,IP,tool) ?
???? ???? deliver ?? ?? ??? ??? ???? ??? ?? ??
????? ?? ?? ???? ????? ???.

?? ??? ??/??? ??? ?

52
?? ??? ??????

???? SoC ? ??? ??? ?? ??, ??? ?? ?? ??? ?? ??.
???, ?? ?? (???? ?? ????. ????? ????, ??? ??
3-6??? ????? synapse interconnection)
???? ???? ?????? ?? ??.
???, ?? ?? (????/????, ????/?? ???)
system house ?? ?? ?? (? ?? idea source)

53
SoC type

??? ??? ??(???),
?? ??? ? ??/????(????),
??? ???, ???? ??? ??? ??(?? ????)

54
?? ??? ???? AND type,or OR type?

AND or OR?
??? ?? ???. ?? ?? ?? ???. ?? ?? ???? ???? ??.
??? ??? ?? ?? ?? ??? ?? ? ? ??. ?????? dribble ??
?? ?? world cup ?? ? ???.
??? ??? ??? ??. 2-??? ??? 2 ??? ???? ?? ????.
3-??? 3-9?? ??? ???.
(???,??),(???,??),(???,???),(???,???),(??? ???,??
??,??)

55
Can deep-thinker cooperate?

???? ???? ??? ? ????
??? ???. ???, ??? ? ? ??? reward/???? ?? ??.
??? ??? ??.(????, ?? ??? ?? ?? ?? ??? ?? ?.)
? Bismarck ??? ??? shy man.
?????????motivation ? ????? ??? ???????
?????? ???? ?? ???.

56
Can deep-thinker cooperate?
57
Big harvest comes later
58
??? Fundamental ???

Fundamental Growth Power ? ??
Growth Power ??? X ??? X ???
??? ??? ????

?? (???? ????) ????? ?????? ? ? ??? ??. ?????
??? ? ?? ??.
To Sustain Growth in a Dynamic Environment, you
need a Sharp edge AND Stable enough bottom.
(Bottom is a tool to let edge, which is the
objective, work better.)

60
Back to the Basic!

There are unique roles to be played by each party
of Government, University, and Industry.
Government must do long-term/global planning and
evaluation/resource allocation, maintain national
research lab to perform research in areas
basic/health/environment/defense.
University must excel in fundamentals and
future-oriented research.
Industry ??/??? ???

strong interaction and cooperation between
Government and private sector
Industry and academia
System industry and IC industry
Hardware designers and software designers
IC industries in the pre-competitive stage

Korean Semiconductor Industry
Challenges now faced

63
Challenges now faced by Korean Semiconductor
Industry

SWOT Analysis
Strength zeal for learning, venturemind
(can-do spirit)
Weakness strong trend to escape from technology
career in favor of lawyer, doctor, star
Threat Chinas rush, Protectionism in each
region
What opportunities ?

64
Future depends on dealing with OPPORTUNITIES of
SoC

System-on-Chip (SoC) is THE driver/market for the
semiconductor AND system industry in the 21th
century.
Koreas expertise on DRAM and memory business
needs to be connected via. SoC to system
industries like communication, car, consumers,
medical/health.

65
Five reasons why SoC differs from ASIC