Title: Advanced Processor Architectures for Embedded Systems
1Advanced Processor Architectures for Embedded
Systems
- Witawas Srisa-an
- CSCE 496 Embedded Systems Design and
Implementation
2(R)evolution of Processors
Ice Hard
Rock Hard
Play-dough Hard
3(R)evolution of Processors
Ice Hard
Hardwire, GPP
Perform well in most conditions but not extreme
conditions
Rock Hard
Play-dough Hard
4(R)evolution of Processors
Ice Hard
GPP with FPGAs
Custom designs perform well in some extreme
conditions. Required extensive knowledge of
hardware design
Rock Hard
Play Dough Hard
5(R)evolution of Processors
Ice Hard
Rock Hard
GPP with embedded programmable logics
Play-dough Hard
Reconfiguration triggered by software
6(R)evolution of Processors
- Ice Hard
- Contains ASIC (Application Specific IC) designs
- Increases time-to-market
- Takes time to reconfigure
7Software Hotspots
- In DSP
- 80 of the processing load are spent on 20 of
the code - Hand tuned assembly that can take thousands of
cycle to execute. - Less portable
- The remaining 80 of the code have complex system
functions - Run well on most GPP
8Software Hotspots Example
- when 16 QuadAM modem (19.2 Kbaud) implemented
entirely in software - takes 177,000 instruction cycles to execute on
TIC6711
FPGA Co-processor (a few cycles)
9Solving Hotspots
MULTIPLE DSPs
DSP ENABLED PROCESSORS
FPGA
P
P
P
P
P
P
RISC PROCESSOR
PROGRAMMABLE LOGIC
10An Example of Configurable Processor (Stretch
S5000)
I/O Subsystem Tailored To Markets Applications
32 x 128b Wide Registers Flexible Wide
Load/Store Instructions
300 MHz Xtensa-V 32-bit RISC Processor
Programmable Logic Data Path Inside The RISC
Processor
S5 Engine Common To All S5000 Processors
11Programmable Logic Architecture
Memory
WR
AR
32
32
128
128
128
32
32
RISC DP
Instruction Set Extension Fabric (ISEF)
32
128
128
12ISEF Resources
- An ISEF includes
- Computation resources
- Routing resources
- Pipeline resources
- State Register resources
- 2 types of computation resources
- 4096 arithmetic units (AUs) for arithmetic and
logic operations - 8192 multiplier units (MUs) for multiply and
shift operations - Example A single ISEF may implement
- 32 1616 multipliers
- 128 32-bit ALUs
13Wide Register
- Wide register file is used for holding WR data
- 32 WR registers (128-bits each)
- Divided into 2 banks of 16 registers (WRA and
WRB) - The WRA/WRB types associate a variable with WR
bank A/B - WRA v1, v2, v3
- WRB w1, w2, w3
- The WR type defaults to WRA
- Use WRA/WRB to avoid unnecessary register moves
between the two WR banks
14Extension Instructions (EIs)
- The power of the Software Configurable Processor
(SCP) architecture is derived from the ability to
define new and complex instructions that operate
on very wide data - Extension Instructions 3 steps
- EI Definition write a Stretch-C function
- EI Compilation compile the Stretch-C function
- EI Use call an EI through its intrinsic in the
application code (C/C)
15Extension Instructions
- Define an Extension Instruction (writing
Stretch-C) - include ltstretch.hgt
- SE_FUNC void V_AND8(WR v1, WR vMask, WR vOut)
- vOut v1 vMask
-
- Compile and link EI (Stretch-C source file .xc)
- Use EI in C/C application code (calling
intrinsics) - include vector.h
- WR v1, vMask, vOut
-
- WRL128I(v1, (WR) memSrc1Ptr, 0)
- V_AND8(v1, vMask, vOut)
- WRS128I(vOut, (WR) memDstPtr, 0)
vector.xc
16Extension Instructions
- Extension Instructions
- Are issued by the Xtensa
- Read source operands from the 128-bit WR and/or
32-bit AR register files - Execute out of the ISEF
- Write destination operands to WR
- Once the ISEF is configured with the new
instruction, it may be - Called as an intrinsic from application C code
- Used as an assembly instruction in an assembly
source file
17Writing Stretch-C Functions
- include ltstretch.hgt
- SE_FUNC void V_AND128(
- WR v1, WR v2, WR vOut)
-
- vOut v1 vMask
-
- include stretch.h header file
- Stretch-C functions are identified by keyword
SE_FUNC void - EI names are identified by the Stretch-C function
name (for single instruction functions) - EI source and destination operands are defined by
the Stretch-C function parameters - EI operation is defined by the Stretch-C function
instructions
18Extension Instruction Parameters 1
- Assembly
- result a b
- ADD result, a, b
- Stretch-C
- // RESULT A B
- V_ADD4(A, B, RESULT)
- Extension Instructions are user defined assembly
instructions that use input and output operands - An Extension Instruction can specify up to 3
Parameters - 0, 1, 2, or 3 inputs
- 0, 1 or 2 outputs
- Input and output parameters reside in register
files - Inputs come from the WR or AR register files
- Outputs may only be written to the WR register
file
19Extension Instruction Parameters 2
- EI source operands (inputs) may include
- Up to 3 WR inputs (use WR, WRA or WRB)
- Up to 2 AR inputs (use int, short, etc.)
- EI destination operands (outputs) may include
- Up to 2 WR outputs, each writing a separate WR
bank - Use the C pointer notation for outputs
- A single WR parameter may be used as both an
input and output operand
- SE_FUNC void
- FOO(int c1, WR v1, WRB vOut)
- SE_FUNC void
- FOO(WR v1, WRA vOut1, WRB vOut2)
- SE_FUNC void
- FOO(WR v1, WRA vInOut1, WRB vOut2)
20Example of Stretch-C
- RGB2YCrCb
- Y 0.299 R 0.587 G 0.114 B
- Cr 0.701 R - 0.587 G - 0.114 B
- Cb -0.299 R - 0.587 G 0.886 B
- Or
- Y (77R 150G 29B) gtgt 8
- Cb (-43R - 85G 128B 32768) gtgt 8
- Cr (128R - 107G 21B 32768) gtgt 8
21RGB2YCC
SE_FUNC void rgb2ycc(WR A, WR B) se_sintlt8gt
r5, g5, b5 se_sintlt8gt y5, cb5,
cr5 int i, j / unpack A to RGB data,
does not use any ISEF logic / for (i 0 i lt
5 i) j i 3 8 ri A(j7,
j) gi A(j15, j8) bi A(j23,
j16) / converting 5 pixels / for (i
0 i lt 5 i) yi ( 77ri 150gi
29bi ) gtgt 8 cbi (-43ri -
85gi 128bi 32768) gtgt 8 cri
(128ri - 107gi - 21bi 32768) gtgt 8
/ pack YCbCr to B / B
(cr4,cb4,y4,cr3,cb3,y3,cr2,cb2,y2
,cr1,cb1,y1,cr0,cb0,y0)
22Stretch Compiler
rgb2ycc.xc
ltstretch.hgt
Stretch compile
scc
libei.h
libei.a
rgb2ycc.c
scc
compile
rgb2ycc.o
scc
link
rgb2ycc.exe
target
run
23Compiler Option
24Summary
- Software Configurable Processor
- Describe hardware using C/C
- But not trivial. Basic understanding of the
architecture is needed - Reconfiguration can take place in 150
micro-seconds - 2 ISEFs per chip
- Can ping pong
- Configuration files stored in SDRAM
- Use DMA to preload information
- ISEF is proprietary and NOT FPGAs
25(No Transcript)