Compiler Issues for Embedded Processors - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Compiler Issues for Embedded Processors

Description:

Title: voor dia serie SNS-Utrecth/'t Gooi Author: Carla Otten Last modified by: Created Date: 4/19/1995 10:16:14 AM Document presentation format – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 66
Provided by: Carla117
Category:

less

Transcript and Presenter's Notes

Title: Compiler Issues for Embedded Processors


1
Compiler Issues for Embedded Processors
2
Contents
  • Compiler Design Issues
  • Problems of Compilers for Embedded Processors
  • Structure of typical C compiler
  • Front end
  • IR optimizer
  • Back end
  • Embedded-code optimization
  • Retargetable compiler

3
Compiler Design Issues
  • For embedded systems the use of compilers is less
    common.
  • Designers still use assembly language to program
    many embedded applications.
  • Huge programming effort
  • Far less code portability
  • Maintainability
  • Why is assembly programming still common?
  • The reason lies in embedded systems
    high-efficiency requirements.

4
Problems of Compilers for Embedded Processors
  • Embedded systems frequently employ
    application-specific instruction set processors
    (ASIPs)
  • Meet design constraints more efficiently than
    general-purpose processor
  • E.g., performance, cost and power consumption
  • Building the required software development tool
    infrastructure for ASIPs is expensive and
    time-consuming
  • Especially true for efficient C and C compiler
    design, which requires a large amount of
    resources and expert knowledge.
  • Therefore, C compilers are often unavailable for
    newly designed ASIPs.

5
Problems of Compilers for Embedded Processors
  • Many existing compilers for ASIPs (e.g., DSPs)
    generate low-quality code.
  • Compiled code may be several times larger and/or
    slower than handwritten assembly code.
  • This poor code is virtually useless for
    efficiency reason.

6
Problems of Compilers for Embedded Processors
  • The cause of the poor code quality is highly
    specialized architecture of ASIPs, whose
    instruction sets can be incompatible with
    high-level languages and traditional compiler
    technology
  • Because an instruction set is generally designed
    primarily from a hardware designers viewpoint,
    and
  • the architecture is fixed before considering
    compiler issues.

7
Problems of Compilers for Embedded Processors
  • Problems of compiler unavailability must be
    solved, because
  • Assembly programming will no longer meet short
    time-to-market requirements
  • Future human programmers are unlikely to
    outperform compilers
  • As processor architectures become increasingly
    complex (e.g., deep pipelining, predicated
    execution, and high parallelism)
  • Application program should be machine-independent
    (e.g., C language) for architecture exploration
    with various cost/performance tradeoffs.

8
Coarse structure of typical C compiler
Source code
Optimized IR
Front end (scanner, parser, semantic analyzer)
Back end (code selection, register
allocation, scheduling, peephole optimization)
Intermediate representation (IR)
Assembly code
IR optimizer (constant folding, constant
propagation, jump optimization, loop-invariant
code motion, dead code elimination)
9
Front end
  • The front end translates the source program into
    a machine-independent IR
  • The IR is stored in a simple format such as
    three-address code
  • Each statement is either an assignment with at
    most three operands, a label, or a jump
  • The IR serves as a common exchange format between
    the front-end and the subsequent optimization
    passes, and also forms the back-end input

L1 i ? i1 t1 ? i1 t2 ? p4 t3 ? t2 p ?
t2 t4 ? t1 lt 10 r ? t3 if t4 goto L1
Example IR (MIR code)
10
Front end
  • Front ends main component
  • Scanner
  • Recognizes certain character string in the source
    code
  • Groups them into tokens
  • Parser
  • Analyzes the syntax according to the underlying
    source-language grammar
  • Semantic analyzer
  • Performs bookkeeping of identifiers, as well as
    additional correctness checks that the parser
    cannot perform
  • Many tools (e.g, lex and yacc) that automate the
    generation of scanners and parsers are available

11
IR optimizer
  • The IR generated for a source program normally
    contains many redundancies
  • such as multiple computations of the same value
    or jump chains, because the front end does not
    pay much attention to optimization issues
  • Human programmer might have built redundancies
    into the source code, which must be removed by
    subsequent optimization passes

12
IR optimizer
  • Constant folding
  • replaces compile-time constant expressions with
    their respective values
  • Constant propagation
  • Replaces variables known to carry a constant
    value with the respective constant
  • Jump optimization
  • Simplifies jumps and removes jump chains
  • Loop-invariant code motion
  • Moves loop-invariant computations out of the loop
    body
  • Dead code elimination
  • Removed computation whose results are never needed

13
ex) Constant Folding
void f() int A10 A2 3 5
void f() int A10, t1, t3, t5 char t2,
t4 t1 3 5 t4 (char ) A t3 2
4 t2 t4 t3 t5 (int ) t2 t5 t1
? from the source code
? array index 2 by the number of memory words
C example An element array A is assigned a
constant
C-like IR notation of the Lance compiler system
Unoptimized IR with two compile-time constant
expressions
  • Now the IR optimizer can apply constant folding
    to replace both constant expressions by constant
    numbers, thus avoiding expensive computations at
    program runtime

14
IR optimizer
  • A good compiler consists of many such IR
    optimization passes.
  • Some of them are far more complex and require an
    advanced code analysis.
  • There are strong interaction and mutual
    dependence between these passes.
  • Some optimizations enable further opportunities
    for other optimization.
  • should be applied repeatedly to be most effective

15
Back end
  • The back end (or code generator)
  • maps the machine-independent IR into a
    behaviorally equivalent machine-specific assembly
    program.
  • Statement-oriented IR is converted into a more
    expressive control/dataflow graph representation.
  • Front end and IR optimization technologies are
    quite mature but the back end is often the most
    crucial compiler phase for embedded processors.

16
Major back end passes
  • Code selection
  • maps IR statement into assembly instructions
  • Register allocation
  • assigns symbolic variables and intermediate
    results to the physically available machine
    registers
  • Scheduling
  • arranges the generated assembly instructions in
    time slots
  • considers inter-instruction dependencies and
    limited processor resources
  • Peephole optimization
  • relatively simple pattern-matching replacement of
    certain expensive instruction sequences by less
    expensive ones.

17
Back end passes for embedded processors
  • Code selection
  • To achieve good code quality, it must use
    complex instructions
  • multiply-accumulate(MAC), load-with-autoincrement,
    etc.
  • Or it must use subword-level instructions (have
    no counter part in high-level language)
  • SIMD and network processor architectures
  • Register allocation
  • Utilize a special-purpose register architecture
    to avoid having too many stores and reloads
    between registers and memory
  • If the back end uses only traditional code
    generation techniques, the resulting code quality
    may be unacceptable

18
Example Code selection with MAC instructions
temporary variable
  1. Dataflow graph (DFG) representation of a simple
    computation
  2. Conventional tree-based code selectors must
    decompose the DFG into two separate trees. Fail
    to exploit the MAC instructions
  3. Covering all DFG operation with only two MAC
    instructions requires code selector to consider
    the entire DFG

19
Example Register Allocation
LOD R1, C MUL R1, D STO R1, Temp1 LOD R1, B ADD
R1, Temp1 STO R1, A LOD R1, C MUL R1, D STO R1,
Temp2 LOD R1, A SUB R1, Temp2 STO R1, B
LOD R1, C MUL R1, D CD LOD R2, B ADD R2,
R1 BCD STO R2, A SUB R2, R1 A-CD STO R2, B
A B C D B A - C D
Simple Register Allocation
Smart Register Allocation
Source Program
20
Embedded-code optimization
  • Dedicated code optimization techniques
  • Single-instruction, multiple-data instructions
  • Recent multimedia processor use SIMD
    instructions, which operate at the subword level.
    (ex. Intel MMX)
  • Address generation units (AGUs)
  • Allow address computation in parallel with
    regular computations in the central datapath
  • Good use of AGUs is mandatory for high code
    quality
  • Code optimization for low power
  • In addition to performance and code size, power
    efficiency is increasingly important
  • Must obey heat dissipation constraint, efficient
    use of battery capacity in mobile systems

21
Embedded-code optimization
  • Dedicated code optimization techniques (contd)
  • Code optimization for low power (contd)
  • Compiler can support power savings
  • Generally, the shorter the program runtime, the
    less energy is consumed
  • Energy-conscious compilers armed with an energy
    model of the target machine, give priority to the
    lowest energy-consuming instruction sequences
  • Since a significant portion of energy is spent on
    memory accesses, another option is to move
    frequently used blocks of program code or data
    into efficient cache or on-chip memory

22
Retargetable compiler
  • To support fast compiler design for new
    processors and hence support architecture
    exploration, researchers have proposed
    retargetable compilers
  • A retargetable compilers can be modified to
    generate code for different target processors
    with few changes in its source code.

23
Example CHESS /CHECKERSRetargetable Tool Suites
  • CHESS/CHECKERS
  • is a retargetable tool-suite for flexible
    embedded processors in electronic systems.
  • supports both the design and the use of embedded
    processors. These processors form the heart of
    many advanced systems in competitive markets like
    telecom, consumer or automotive electronics.
  • is developed and commercialized by Target
    Compiler Technologies.

http//www.retarget.com
24
Example CHESS /CHECKERSRetargetable Tool Suites
http//www.retarget.com
25
Example CHESS /CHECKERSRetargetable Tool Suites
26
ASIP(Application-Specific Instruction Set
Processor) Design
27
Reference
  • J.H.Yang et al, MetaCore An Application-Specific
    DSP Development System, 1998 DAC Proceedings,
    pp. 800-803.
  • J.H.Yang et al, MetaCore An Application-Specific
    Programmable DSP Development System, IEEE
    Trans. VLSI Systems, vol 8, April 2000,
    pp173-183.
  • B.W.Kim et al, MDSP-II16-bit DSP with Mobile
    Communication Accelerator, IEEE JSSC, vol 34,
    March 1999, pp397-404.

28
Part I ASIP in general
  • ASIP is a compromise between GPP(General-Purpose
    Processor) which can be used anywhere with low
    performance and full-custom ASIC which fits only
    a specific application but with very high
    performance.
  • GPP, DSP, ASIP, FPGA, ASIC(sea of gates),
    CBIC(standard cell-based IC), and full custom
    ASIC in the order of increasing performance and
    decreasing adaptability.
  • Recently, ASIC as well as FPGA contains processor
    cores.

29
Cost, Performance,Programmability, and
TTM(Time-to-Market)
  • ASIP (Application-Specific Instruction set
    Processor)
  • ASIP is a tradeoff between the advantages of
    general-purpose processor (flexibility, short
    development time) and those of ASIC (fast
    execution time).

Execution time
General-purpose processor
ASIP
Cost (NREchip area)
Rigidity
Depends on volume of product
ASIC
Development time
30
Comparison of TypicalDevelopment Time
Chip manufacturer time
Customer time
MetaCore (ASIP)
20 months
3 months
Core generation application code development
MetaCore development
General-purpose processor
20 months
2 months
Application code development
Core generation
ASIC
10 months
31
Issues in ASIP Design
  • For high execution speed, flexibility and small
    chip area
  • An optimal selection of micro-architecture
    instruction set is required based on diverse
    exploration of the design space.
  • For short design turnaround time
  • An efficient means of transforming higher-level
    specification into lower-level implementation is
    required.
  • For friendly support of application program
    development
  • A fast development of a suite of supporting
    software including compiler and ISS(Instruction
    Set Simulator) is necessary.

32
Various ASIP Development Systems
Instruction set customization
Application programming level
Year
Selection from predefined super set
User-defined instructions
PEAS-I (Univ. Toyohashi)
Yes
No
C-language
1991
Risc-like Micro-architecture (register
based operation)
Generates proper instruction set based on
predefined datapath
ASIA (USC)
C-language
1993
EPICS (Philips)
Yes
No
assembly
1993
DSP-oriented Micro-architecture (memory
based operation)
CD2450 (Clarkspur)
Yes
No
assembly
1995
MetaCore (KAIST)
Yes
Yes
C-language
1997
33
Part II MetaCore System
  • Verification with co-generated compiler and ISS
  • MetaCore system
  • ASIP development environment
  • Re-configurable fixed-point DSP architecture
  • Retargetable system software
  • C-compiler, ISS, assembler
  • MDSP-II a 16-bit DSP targeted for GSM
    applications.

34
The Goal of MetaCore System
  • Supports efficient design methodology for ASIP
    targeted for DSP application field.

35
Overview How to Obtain a DSP Core from MetaCore
System
Instructions
Functional blocks
Architecture template
Primitive class
Adder
add
sub
and
or
. . . .
Bus structure
Multiplier
Shifter
Data-path structure
. . . .
Optional class
min
max
mac
. . . .
Pipeline model
Select architectural parameter
Select instructions
Select functional blocks
Benchmark Programs
Simulation
Modify architecture
No
No
OK?
Add or delete instructions
Add or delete functional blocks
Yes
HDL code generation
Logic synthesis
36
System Library Generator Set Key Components
of MetaCore System
Processor Specification
Benchmark Programs
Modify specification
ISS generator
Compiler generator
Simulation
C compiler
ISS
modify
Modify
Add
Add
Evaluation
Generator set
accept
Set of functional blocks
HDL generator
Architecture template
Set of instructions
- bus structure
- instructions definition
- parameterized HDL code
Synthesizable HDL code
- I/O port information
- pipeline model
- related func. block
- gate count
- data-path structure
System Lib.
37
Processor Specification (example)
  • Specification of target core
  • defines instruction set hardware configuration.
  • is easy for designer to use modify due to
    high-level abstraction.

//Specification of EM1
(hardware
ACC 1
Hardware configuration
AR 4
pmem 2k, 2047 0
)
(def_inst ADD
(operand type2 )
(ACC lt ACC S1 )
Instruction set definition
(extension sign )
(flag cvzn )
(exestage 1
)
38
Benchmark analysis
  • is necessary for deciding the instruction set.
  • produces information on
  • the frequency of each instruction to obtain
    cost-effective instruction set.
  • the frequent sequence of contiguous instructions
    to reduce to application-specific instructions.

a0memar1
abs a0, ar1
abs a0, ar1
a10
clr a1
clr a1
a1a1memar2
add a1, ar2
add a1, ar2
cmp a1, a0
max a1, a0
a1max(a1, a0)
if(a1gta0) pcL1
bgtz L1
L1
a10
clr a1
a1a1a0
add a1, a0
Application-specific instruction
L1
Frequent sequence of contiguous instructions
39
HDL Code Generator
40
Design Example (MDSP-II)
  • GSM(Global System for Mobile communication)
  • Benchmark programs
  • C programs (each algorithm constructing GSM)
  • Procedure of design refinement

Remove infrequent instructions based
on instruction usage count
Turn frequent sequence of contiguous
instructions into a new instruction
EM2 (MDSP-II)
EM0
EM1
  • Initial design containing
  • all predefined instructions
  • Final design containing
  • application-specific
  • instructions

41
Evolution of MDSP-II Corefrom The Initial Machine
Number of clock cycles (for 1 sec. voice data
processing)
Gate count
Machine
EM0 (initial)
53.0 Millions
18.1K
EM1 (intermediate)
53.1 Millions
15.0K
EM2 (MDSP-II)
27.5 Millions
19.3K
Number of clock cycles
EM1
EM0
50M
40M
EM2 (MDSP-II)
30M
20M
10M
Gate count
5K
10K
15K
20K
42
Design Turnaround Time (MDSP-II)
  • Design turnaround is significantly reduced due to
    the reduction of HDL design functional
    simulation time.
  • Only hardware blocks for application-specific
    instructions, if any, need to be designed by the
    user.

43
Overview of EM2 (MDSP-II)
16-bit fixed-point DSP
Optimized for GSM
0.6 mm CMOS (TLM), 9.7mm x 9.8mm
55 MHz _at_5.0V
MCAU (Mobile Comm. Acceleration Unit) consists of
functional blocks for application-specific
instructions
16x16 multiplier
32-bit adder
DALU (Data Arithmetic Logic Unit)
16x16 multiplier
16-bit barrel shifter
32-bit adder
Data switch network
PCU (Program Control Unit)
AGU (Address Generation Unit) supports linear,
modulo and bit-reverse addressing modes
PU (Peripheral Unit)
Serial I/O
Timer
44
Conclusions
  • MetaCore, an effective ASIP design methodology
    for DSP is proposed.
  • 1) Benchmark-driven high-level abstraction of
    processor specification enables performance/cost
    effective design.
  • 2) Generator set with system library enables
    short design turnaround time.

45
Grand Challenges and Opportunities laid by SoC
for Korea
  • SoC Conference
  • Oct. 23-24, Coex Conference Center

46
  • Can the success story be continued?

47
Can the success story be continued?
  • 60??? per capita GNP ? 100 ?? ? ?? ??.
  • ??? ???, ???, ???? ?? IT ???? Global Player ? ??
    ??.
  • We need to be proud of our success despite all
    todays agony in NASDAQ and terrible politics
    situation. However, we need more to know why we
    succeeded and how this can be continued.

48
  • What is critical for success in SoC Business?

49
?? ?? Game ??? Rule ? ???.
  • ??? ??? ??? ????? ?????(game rule) ??? ??? ???
    ???? ?? ?? ???? ???. 50? ??? ?? ??. ????? ? ??
    ?????? ???.(3? ?? ?, ??, ??)

50
Its people, people!
  • Internet ? ??? ??? ??? ??, ??/??? ??? ????,
    TTM(Time-to-market) ? key value ? ???. Dynamic ?
    ???? ?? ?? ??? ??? resource (designer,IP,tool) ?
    ???? ???? deliver ?? ?? ??? ??? ???? ??? ?? ??
    ????? ?? ?? ???? ????? ???.

51
  • ?? ??? ??/??? ??? ?

52
?? ??? ??????
  • ???? SoC ? ??? ??? ?? ??, ??? ?? ?? ??? ?? ??.
  • ???, ?? ?? (???? ?? ????. ????? ????, ??? ??
    3-6??? ????? synapse interconnection)
  • ???? ???? ?????? ?? ??.
  • ???, ?? ?? (????/????, ????/?? ???)
  • system house ?? ?? ?? (? ?? idea source)

53
SoC type
  • ??? ??? ??(???),
  • ?? ??? ? ??/????(????),
  • ??? ???, ???? ??? ??? ??(?? ????)

54
?? ??? ???? AND type,or OR type?
  • AND or OR?
  • ??? ?? ???. ?? ?? ?? ???. ?? ?? ???? ???? ??.
  • ??? ??? ?? ?? ?? ??? ?? ? ? ??. ?????? dribble ??
    ?? ?? world cup ?? ? ???.
  • ??? ??? ??? ??. 2-??? ??? 2 ??? ???? ?? ????.
    3-??? 3-9?? ??? ???.
  • (???,??),(???,??),(???,???),(???,???),(??? ???,??
    ??,??)

55
Can deep-thinker cooperate?
  • ???? ???? ??? ? ????
  • ??? ???. ???, ??? ? ? ??? reward/???? ?? ??.
  • ??? ??? ??.(????, ?? ??? ?? ?? ?? ??? ?? ?.)
  • ? Bismarck ??? ??? shy man.
  • ?????????motivation ? ????? ??? ???????
  • ?????? ???? ?? ???.

56
Can deep-thinker cooperate?
57
Big harvest comes later
58
??? Fundamental ???
  • Fundamental Growth Power ? ??
  • Growth Power ??? X ??? X ???
  • ??? ??? ????

59
  • ?? (???? ????) ????? ?????? ? ? ??? ??. ?????
    ??? ? ?? ??.
  • To Sustain Growth in a Dynamic Environment, you
    need a Sharp edge AND Stable enough bottom.
    (Bottom is a tool to let edge, which is the
    objective, work better.)

60
Back to the Basic!
  • There are unique roles to be played by each party
    of Government, University, and Industry.
  • Government must do long-term/global planning and
    evaluation/resource allocation, maintain national
    research lab to perform research in areas
    basic/health/environment/defense.
  • University must excel in fundamentals and
    future-oriented research.
  • Industry ??/??? ???

61
  • strong interaction and cooperation between
  • Government and private sector
  • Industry and academia
  • System industry and IC industry
  • Hardware designers and software designers
  • IC industries in the pre-competitive stage

62
  • Korean Semiconductor Industry
  • Challenges now faced

63
Challenges now faced by Korean Semiconductor
Industry
  • SWOT Analysis
  • Strength zeal for learning, venturemind
    (can-do spirit)
  • Weakness strong trend to escape from technology
    career in favor of lawyer, doctor, star
  • Threat Chinas rush, Protectionism in each
    region
  • What opportunities ?

64
Future depends on dealing with OPPORTUNITIES of
SoC
  • System-on-Chip (SoC) is THE driver/market for the
    semiconductor AND system industry in the 21th
    century.
  • Koreas expertise on DRAM and memory business
    needs to be connected via. SoC to system
    industries like communication, car, consumers,
    medical/health.

65
Five reasons why SoC differs from ASIC
  • SoC is not just a BIG ASIC(Application-Specific
    IC) SoC needs Culture Change
  • Design Reuse
  • Software embedded in IC chip
  • System-level design methodology
  • Heterogeneous mix of technology
  • memory, processor, MEMS (Micro Electro-Mechanical
    System), and others
  • VDSM (Very Deep Sub-Micron) effects for design
    and manufacturing
Write a Comment
User Comments (0)
About PowerShow.com