Advanced Processor Architectures for Embedded Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Advanced Processor Architectures for Embedded Systems

Description:

provide custom design solutions for particular problems ... are slower and require more power than custom design. are more expensive ... – PowerPoint PPT presentation

Number of Views:1223
Avg rating:3.0/5.0
Slides: 34
Provided by: wittys
Learn more at: http://cse.unl.edu
Category:

less

Transcript and Presenter's Notes

Title: Advanced Processor Architectures for Embedded Systems


1
Advanced Processor Architectures for Embedded
Systems
  • Witawas Srisa-an
  • CSCE 496 Embedded Systems Design and
    Implementation

2
Objectives
  • Discuss ASIC, FPGA-based systems, and general
    purpose processors
  • Analyze the operating requirements for todays
    embedded processors
  • Observe the architectural differences between
    state-of-the-art processors for embedded systems
    and high-performance general purpose processors
  • Tensilica Xtensa
  • Stretch S5000

3
Embedded Processors Requirements
  • operate in memory constraint environment
  • must be energy efficient
  • must be low cost
  • may have to be good at a common set of tasks
  • matrix multiplication,
  • encryption,
  • filtering (FIR),
  • network packet processing, etc.

4
Implications
  • low memory footprint
  • simplified instruction set
  • 16-bit, 24-bit
  • may not need support for VM
  • may lack hardware MMUs
  • energy efficient
  • less complex (smaller number of transistors)
  • simple pipeline stages
  • less cache memory on chips
  • simple floating point units
  • larger transistors and slower clocks
  • integrated function specific components for
    common tasks

5
Implications (cont.)
  • low cost
  • share IP cores to reduce development cost
  • ARM, MIPS, etc.
  • use older semiconductor process technologies
    (e.g. 250nm instead of 90 nm)
  • task specific
  • built in DSP unit
  • wide data bus (more data per movement)
  • may need support for adding functions to the
    cores
  • may need field-reconfigurability

6
Rationales
from The Death of Micro-Processors, Nick
Tredennick and Brion Shimamoto, Embedded Systems
Programming, http//www.embedded.com/showArticle.j
html?articleID26807160
7
Rationales (cont.)
from The Death of Micro-Processors, Nick
Tredennick and Brion Shimamoto, Embedded Systems
Programming, http//www.embedded.com/showArticle.j
html?articleID26807160
8
Rationales (cont.)
  • Studies have shown that custom hardware
    components often require much less energy to
    complete their tasks than the same tasks running
    on general purpose processors. 1
  • An ASIC is custom logic for a particular
    application. Custom logic can be orders of
    magnitude more efficient than microprocessor-based
    solutions. 2

1 Lach et al., Power-Efficient Adaptable
Wireless Sensor Networks, Proceedings of
International Conference on Military and
Aerospace Programmable Logic Devices (MAPLD),
September 2003. 2 Tredennick and Shimamoto,
The Death of Micro-Processors, Embedded Systems
Programming, http//www.embedded.com/showArticle.
jhtml?articleID26807160
9
Application Specific ICs (ASICs)
  • provide custom design solutions for particular
    problems
  • fixed solutions that require public acceptance to
    reduce cost
  • required extensive knowledge of hardware design
  • not field-reconfigurable
  • can have large non-recurring engineering (NRE)
    cost

10
ASICs (cont.)
Technology Mask cost
90 nm 1,000,000
180 nm 250,000
250 nm 120,000
350 nm 60,000
Wayne Wolf, FPGA-Based System Designs, Prentice
Hall, 2004
11
FPGA Based Systems
  • Field-programmable gate arrays (FPGAs)
  • are slower and require more power than custom
    design
  • are more expensive
  • but provide no wait time from completing a design
    to making a chip
  • great for prototyping
  • are also reusable

12
FPGAs
  • SRAM based--volatile
  • Altera Flex, Stratix, Cyclone, Apex
  • Antifuse--one-time programmable
  • Actel
  • EEPROM--non-volatile
  • Altera Max

13
ASIC Design Approaches
  • Custom VLSI designs
  • are fabricated on manufacturing line
  • takes months
  • masking cost is also expensive
  • operate much faster and consume less power than
    FPGA equivalents
  • can be cheaper of manufactured in large volume

14
ASIC Design Approaches (cont.)
  • Structured ASIC
  • is based on pre-designed logic fabric
    structurally embedded in the platform
  • fill the market gap between high-density FPGAs
    and standard cell ASICs
  • can greatly reduce development time and cost
  • reduce non-recurring engineering (NRE) cost
  • http//www.amis.com/asics/structured_asics/
  • http//www.altera.com/b/hardcopyii.html?WT.mc_idh
    2_sm_go_xx_tx_2_041WT.srch1

15
Structured ASICs
View Altera demo
16
Integrating ASICs with GPPs
  • Todays embedded systems have can have complex
    software layers
  • OS
  • Virtual Machine
  • Applications
  • It is more ideal to mate GPPs with ASICs as
    co-processors

17
Integrating ASICs with GPPs (cont.)
  • So, we can have GPPs to perform basic tasks and
    ASICs (co-processors) to speed up computing
    intensive functions
  • sounds simple but in reality, it is quite complex
  • basic hand-shaking is needed between the ASICs
    and the main processors
  • data exchange
  • shared memory
  • requires OS and architecture support
  • synchronous or asynchronous calls
  • cache coherency issue

18
ASICs and GPPs (cont.)
  • An example is to use hardware co-processor for
    Cryptography
  • should the co-processor calls be synchronous
  • main processor blocked on calls and wait for
    response
  • or asynchronous
  • calling process blocked and swapped out
  • need interrupt support
  • need to maintain context

19
ASICs and GPPs (cont.)
  • Co-processor
  • shares bus with the main CPU
  • is a source for bus contention
  • can cause cache coherency issue
  • data in the main CPU cache may have been updated
    by the co-processor
  • flush the cache accordingly
  • should be equiped with DMA to relieve the main
    CPU from copying data

20
Extending GPPs
  • Tensilica Xtensa
  • reconfigurable processor cores
  • support native 16-bit and 24-bit instruction for
    higher code density
  • users can add/subtract components (MMU,
    Multipliers, FPUs)
  • users can reconfigure cache organization
  • users can select bus width (32, 64, or 128 bits)
  • users defined instruction extension language
  • users can create custom instructions to speed up
    commonly used functions
  • users can instantiate custom registers of
    different sizes

21
Tensilica Xtensa
from http//www.tensilica.com/html/tensilica_instr
uction_extensio.html
22
Tensilica Xtensa (cont.)
  • We will not go into great detail about the
    Xtensa.
  • However, we will study Stretch S5000 engine which
    is based on the Xtensa core.

23
Design Time Solutions
  • Up to now, we have only talked about design-time
    solutions!
  • logic designs are done in house
  • not very reconfigurable after the chip is made
  • even with FPGAs, someone has to come up with a
    new hardware design for it to change
  • the Xtensa needs about 1 hours to synthesize the
    instruction extension
  • What if we want to configure on the fly!
  • each application brings in CPU intensive
    functions
  • these functions are not known in advance
  • Can we leave it up to the software developers to
    design fast co-processor?

24
Run-Time Configuration
25
(R)evolution of Processors
Ice Hard
Rock Hard
Playdough Hard
26
(R)evolution of Processors
Ice Hard
Hardwire, GPP
Perform well in most conditions but not extreme
conditions
Rock Hard
Playdough Hard
27
(R)evolution of Processors
Ice Hard
GPP with FPGAs
Custom designs perform well in some extreme
conditions. Required extensive knowledge Of
hardware design
Rock Hard
Play Dough Hard
28
(R)evolution of Processors
Ice Hard
Rock Hard
GPP with embedded programmable logics
Playdough Hard
Reconfiguration triggered by software
29
(R)evolution of Processors
  • Ice Hard
  • Contains ASIC (Application Specific IC) designs
  • Increases time-to-market
  • Takes time to reconfigure

30
Software Hotspots
  • In DSP
  • 80 of the processing load are spent on 20 of
    the code
  • Hand tuned assembly that can take thousands of
    cycle to execute.
  • Less portable
  • The remaining 80 of the code have complex system
    functions
  • Run well on most GPP

31
Software Hotspots Example
  • when 16 QuadAM modem (19.2 Kbaud) implemented
    entirely in software
  • takes 177,000 instruction cycles to execute on
    TIC6711

FPGA Co-processor (a few cycles)
32
Solving Hotspots
  • PROCESSOR FPGA

MULTIPLE DSPs
DSP ENABLED PROCESSORS
FPGA
P
P
P
P
P
P
RISC PROCESSOR
PROGRAMMABLE LOGIC
33
Solving Hotspots
PERFORMANCE
SCP
ASIC
FPGA
DSP
CPU
FLEXIBILITY TTM
SCP Software Configurable Processor
Write a Comment
User Comments (0)
About PowerShow.com