Title: Advanced Processor Architectures for Embedded Systems
1Advanced Processor Architectures for Embedded
Systems
- Witawas Srisa-an
- CSCE 496 Embedded Systems Design and
Implementation
2Objectives
- Discuss ASIC, FPGA-based systems, and general
purpose processors - Analyze the operating requirements for todays
embedded processors - Observe the architectural differences between
state-of-the-art processors for embedded systems
and high-performance general purpose processors - Tensilica Xtensa
- Stretch S5000
3Embedded Processors Requirements
- operate in memory constraint environment
- must be energy efficient
- must be low cost
- may have to be good at a common set of tasks
- matrix multiplication,
- encryption,
- filtering (FIR),
- network packet processing, etc.
4Implications
- low memory footprint
- simplified instruction set
- 16-bit, 24-bit
- may not need support for VM
- may lack hardware MMUs
- energy efficient
- less complex (smaller number of transistors)
- simple pipeline stages
- less cache memory on chips
- simple floating point units
- larger transistors and slower clocks
- integrated function specific components for
common tasks
5Implications (cont.)
- low cost
- share IP cores to reduce development cost
- ARM, MIPS, etc.
- use older semiconductor process technologies
(e.g. 250nm instead of 90 nm) - task specific
- built in DSP unit
- wide data bus (more data per movement)
- may need support for adding functions to the
cores - may need field-reconfigurability
6Rationales
from The Death of Micro-Processors, Nick
Tredennick and Brion Shimamoto, Embedded Systems
Programming, http//www.embedded.com/showArticle.j
html?articleID26807160
7Rationales (cont.)
from The Death of Micro-Processors, Nick
Tredennick and Brion Shimamoto, Embedded Systems
Programming, http//www.embedded.com/showArticle.j
html?articleID26807160
8Rationales (cont.)
- Studies have shown that custom hardware
components often require much less energy to
complete their tasks than the same tasks running
on general purpose processors. 1 - An ASIC is custom logic for a particular
application. Custom logic can be orders of
magnitude more efficient than microprocessor-based
solutions. 2
1 Lach et al., Power-Efficient Adaptable
Wireless Sensor Networks, Proceedings of
International Conference on Military and
Aerospace Programmable Logic Devices (MAPLD),
September 2003. 2 Tredennick and Shimamoto,
The Death of Micro-Processors, Embedded Systems
Programming, http//www.embedded.com/showArticle.
jhtml?articleID26807160
9Application Specific ICs (ASICs)
- provide custom design solutions for particular
problems - fixed solutions that require public acceptance to
reduce cost - required extensive knowledge of hardware design
- not field-reconfigurable
- can have large non-recurring engineering (NRE)
cost
10ASICs (cont.)
Technology Mask cost
90 nm 1,000,000
180 nm 250,000
250 nm 120,000
350 nm 60,000
Wayne Wolf, FPGA-Based System Designs, Prentice
Hall, 2004
11FPGA Based Systems
- Field-programmable gate arrays (FPGAs)
- are slower and require more power than custom
design - are more expensive
- but provide no wait time from completing a design
to making a chip - great for prototyping
- are also reusable
12FPGAs
- SRAM based--volatile
- Altera Flex, Stratix, Cyclone, Apex
- Antifuse--one-time programmable
- Actel
- EEPROM--non-volatile
- Altera Max
13ASIC Design Approaches
- Custom VLSI designs
- are fabricated on manufacturing line
- takes months
- masking cost is also expensive
- operate much faster and consume less power than
FPGA equivalents - can be cheaper of manufactured in large volume
14ASIC Design Approaches (cont.)
- Structured ASIC
- is based on pre-designed logic fabric
structurally embedded in the platform - fill the market gap between high-density FPGAs
and standard cell ASICs - can greatly reduce development time and cost
- reduce non-recurring engineering (NRE) cost
- http//www.amis.com/asics/structured_asics/
- http//www.altera.com/b/hardcopyii.html?WT.mc_idh
2_sm_go_xx_tx_2_041WT.srch1
15Structured ASICs
View Altera demo
16Integrating ASICs with GPPs
- Todays embedded systems have can have complex
software layers - OS
- Virtual Machine
- Applications
- It is more ideal to mate GPPs with ASICs as
co-processors
17Integrating ASICs with GPPs (cont.)
- So, we can have GPPs to perform basic tasks and
ASICs (co-processors) to speed up computing
intensive functions - sounds simple but in reality, it is quite complex
- basic hand-shaking is needed between the ASICs
and the main processors - data exchange
- shared memory
- requires OS and architecture support
- synchronous or asynchronous calls
- cache coherency issue
18ASICs and GPPs (cont.)
- An example is to use hardware co-processor for
Cryptography - should the co-processor calls be synchronous
- main processor blocked on calls and wait for
response - or asynchronous
- calling process blocked and swapped out
- need interrupt support
- need to maintain context
19ASICs and GPPs (cont.)
- Co-processor
- shares bus with the main CPU
- is a source for bus contention
- can cause cache coherency issue
- data in the main CPU cache may have been updated
by the co-processor - flush the cache accordingly
- should be equiped with DMA to relieve the main
CPU from copying data
20Extending GPPs
- Tensilica Xtensa
- reconfigurable processor cores
- support native 16-bit and 24-bit instruction for
higher code density - users can add/subtract components (MMU,
Multipliers, FPUs) - users can reconfigure cache organization
- users can select bus width (32, 64, or 128 bits)
- users defined instruction extension language
- users can create custom instructions to speed up
commonly used functions - users can instantiate custom registers of
different sizes
21Tensilica Xtensa
from http//www.tensilica.com/html/tensilica_instr
uction_extensio.html
22Tensilica Xtensa (cont.)
- We will not go into great detail about the
Xtensa. - However, we will study Stretch S5000 engine which
is based on the Xtensa core.
23Design Time Solutions
- Up to now, we have only talked about design-time
solutions! - logic designs are done in house
- not very reconfigurable after the chip is made
- even with FPGAs, someone has to come up with a
new hardware design for it to change - the Xtensa needs about 1 hours to synthesize the
instruction extension - What if we want to configure on the fly!
- each application brings in CPU intensive
functions - these functions are not known in advance
- Can we leave it up to the software developers to
design fast co-processor?
24Run-Time Configuration
25(R)evolution of Processors
Ice Hard
Rock Hard
Playdough Hard
26(R)evolution of Processors
Ice Hard
Hardwire, GPP
Perform well in most conditions but not extreme
conditions
Rock Hard
Playdough Hard
27(R)evolution of Processors
Ice Hard
GPP with FPGAs
Custom designs perform well in some extreme
conditions. Required extensive knowledge Of
hardware design
Rock Hard
Play Dough Hard
28(R)evolution of Processors
Ice Hard
Rock Hard
GPP with embedded programmable logics
Playdough Hard
Reconfiguration triggered by software
29(R)evolution of Processors
- Ice Hard
- Contains ASIC (Application Specific IC) designs
- Increases time-to-market
- Takes time to reconfigure
30Software Hotspots
- In DSP
- 80 of the processing load are spent on 20 of
the code - Hand tuned assembly that can take thousands of
cycle to execute. - Less portable
- The remaining 80 of the code have complex system
functions - Run well on most GPP
31Software Hotspots Example
- when 16 QuadAM modem (19.2 Kbaud) implemented
entirely in software - takes 177,000 instruction cycles to execute on
TIC6711
FPGA Co-processor (a few cycles)
32Solving Hotspots
MULTIPLE DSPs
DSP ENABLED PROCESSORS
FPGA
P
P
P
P
P
P
RISC PROCESSOR
PROGRAMMABLE LOGIC
33Solving Hotspots
PERFORMANCE
SCP
ASIC
FPGA
DSP
CPU
FLEXIBILITY TTM
SCP Software Configurable Processor