Advanced Processor Architectures for Embedded Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Advanced Processor Architectures for Embedded Systems

Description:

Advanced Processor Architectures for Embedded ... Takes time to reconfigure. Software Hotspots. In DSP. 80% of the processing load are spent on 20% of the code ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 26
Provided by: wittys
Learn more at: http://cse.unl.edu
Category:

less

Transcript and Presenter's Notes

Title: Advanced Processor Architectures for Embedded Systems


1
Advanced Processor Architectures for Embedded
Systems
  • Witawas Srisa-an
  • CSCE 496 Embedded Systems Design and
    Implementation

2
(R)evolution of Processors
Ice Hard
Rock Hard
Play-dough Hard
3
(R)evolution of Processors
Ice Hard
Hardwire, GPP
Perform well in most conditions but not extreme
conditions
Rock Hard
Play-dough Hard
4
(R)evolution of Processors
Ice Hard
GPP with FPGAs
Custom designs perform well in some extreme
conditions. Required extensive knowledge of
hardware design
Rock Hard
Play Dough Hard
5
(R)evolution of Processors
Ice Hard
Rock Hard
GPP with embedded programmable logics
Play-dough Hard
Reconfiguration triggered by software
6
(R)evolution of Processors
  • Ice Hard
  • Contains ASIC (Application Specific IC) designs
  • Increases time-to-market
  • Takes time to reconfigure

7
Software Hotspots
  • In DSP
  • 80 of the processing load are spent on 20 of
    the code
  • Hand tuned assembly that can take thousands of
    cycle to execute.
  • Less portable
  • The remaining 80 of the code have complex system
    functions
  • Run well on most GPP

8
Software Hotspots Example
  • when 16 QuadAM modem (19.2 Kbaud) implemented
    entirely in software
  • takes 177,000 instruction cycles to execute on
    TIC6711

FPGA Co-processor (a few cycles)
9
Solving Hotspots
  • PROCESSOR FPGA

MULTIPLE DSPs
DSP ENABLED PROCESSORS
FPGA
P
P
P
P
P
P
RISC PROCESSOR
PROGRAMMABLE LOGIC
10
An Example of Configurable Processor (Stretch
S5000)
I/O Subsystem Tailored To Markets Applications
32 x 128b Wide Registers Flexible Wide
Load/Store Instructions
300 MHz Xtensa-V 32-bit RISC Processor
Programmable Logic Data Path Inside The RISC
Processor
S5 Engine Common To All S5000 Processors
11
Programmable Logic Architecture
Memory
WR
AR
32
32
128
128
128
32
32
RISC DP
Instruction Set Extension Fabric (ISEF)
32
128
128
12
ISEF Resources
  • An ISEF includes
  • Computation resources
  • Routing resources
  • Pipeline resources
  • State Register resources
  • 2 types of computation resources
  • 4096 arithmetic units (AUs) for arithmetic and
    logic operations
  • 8192 multiplier units (MUs) for multiply and
    shift operations
  • Example A single ISEF may implement
  • 32 1616 multipliers
  • 128 32-bit ALUs

13
Wide Register
  • Wide register file is used for holding WR data
  • 32 WR registers (128-bits each)
  • Divided into 2 banks of 16 registers (WRA and
    WRB)
  • The WRA/WRB types associate a variable with WR
    bank A/B
  • WRA v1, v2, v3
  • WRB w1, w2, w3
  • The WR type defaults to WRA
  • Use WRA/WRB to avoid unnecessary register moves
    between the two WR banks

14
Extension Instructions (EIs)
  • The power of the Software Configurable Processor
    (SCP) architecture is derived from the ability to
    define new and complex instructions that operate
    on very wide data
  • Extension Instructions 3 steps
  • EI Definition write a Stretch-C function
  • EI Compilation compile the Stretch-C function
  • EI Use call an EI through its intrinsic in the
    application code (C/C)

15
Extension Instructions
  • Define an Extension Instruction (writing
    Stretch-C)
  • include ltstretch.hgt
  • SE_FUNC void V_AND8(WR v1, WR vMask, WR vOut)
  • vOut v1 vMask
  • Compile and link EI (Stretch-C source file .xc)
  • Use EI in C/C application code (calling
    intrinsics)
  • include vector.h
  • WR v1, vMask, vOut
  • WRL128I(v1, (WR) memSrc1Ptr, 0)
  • V_AND8(v1, vMask, vOut)
  • WRS128I(vOut, (WR) memDstPtr, 0)

vector.xc
16
Extension Instructions
  • Extension Instructions
  • Are issued by the Xtensa
  • Read source operands from the 128-bit WR and/or
    32-bit AR register files
  • Execute out of the ISEF
  • Write destination operands to WR
  • Once the ISEF is configured with the new
    instruction, it may be
  • Called as an intrinsic from application C code
  • Used as an assembly instruction in an assembly
    source file

17
Writing Stretch-C Functions
  • include ltstretch.hgt
  • SE_FUNC void V_AND128(
  • WR v1, WR v2, WR vOut)
  • vOut v1 vMask
  • include stretch.h header file
  • Stretch-C functions are identified by keyword
    SE_FUNC void
  • EI names are identified by the Stretch-C function
    name (for single instruction functions)
  • EI source and destination operands are defined by
    the Stretch-C function parameters
  • EI operation is defined by the Stretch-C function
    instructions

18
Extension Instruction Parameters 1
  • Assembly
  • result a b
  • ADD result, a, b
  • Stretch-C
  • // RESULT A B
  • V_ADD4(A, B, RESULT)
  • Extension Instructions are user defined assembly
    instructions that use input and output operands
  • An Extension Instruction can specify up to 3
    Parameters
  • 0, 1, 2, or 3 inputs
  • 0, 1 or 2 outputs
  • Input and output parameters reside in register
    files
  • Inputs come from the WR or AR register files
  • Outputs may only be written to the WR register
    file

19
Extension Instruction Parameters 2
  • EI source operands (inputs) may include
  • Up to 3 WR inputs (use WR, WRA or WRB)
  • Up to 2 AR inputs (use int, short, etc.)
  • EI destination operands (outputs) may include
  • Up to 2 WR outputs, each writing a separate WR
    bank
  • Use the C pointer notation for outputs
  • A single WR parameter may be used as both an
    input and output operand
  • SE_FUNC void
  • FOO(int c1, WR v1, WRB vOut)
  • SE_FUNC void
  • FOO(WR v1, WRA vOut1, WRB vOut2)
  • SE_FUNC void
  • FOO(WR v1, WRA vInOut1, WRB vOut2)

20
Example of Stretch-C
  • RGB2YCrCb
  • Y 0.299 R 0.587 G 0.114 B
  • Cr 0.701 R - 0.587 G - 0.114 B
  • Cb -0.299 R - 0.587 G 0.886 B
  • Or
  • Y (77R 150G 29B) gtgt 8
  • Cb (-43R - 85G 128B 32768) gtgt 8
  • Cr (128R - 107G 21B 32768) gtgt 8

21
RGB2YCC
SE_FUNC void rgb2ycc(WR A, WR B) se_sintlt8gt
r5, g5, b5 se_sintlt8gt y5, cb5,
cr5 int i, j / unpack A to RGB data,
does not use any ISEF logic / for (i 0 i lt
5 i) j i 3 8 ri A(j7,
j) gi A(j15, j8) bi A(j23,
j16) / converting 5 pixels / for (i
0 i lt 5 i) yi ( 77ri 150gi
29bi ) gtgt 8 cbi (-43ri -
85gi 128bi 32768) gtgt 8 cri
(128ri - 107gi - 21bi 32768) gtgt 8
/ pack YCbCr to B / B
(cr4,cb4,y4,cr3,cb3,y3,cr2,cb2,y2
,cr1,cb1,y1,cr0,cb0,y0)
22
Stretch Compiler
rgb2ycc.xc
ltstretch.hgt
Stretch compile
scc
libei.h
libei.a
rgb2ycc.c
scc
compile
rgb2ycc.o
scc
link
rgb2ycc.exe
target
run
23
Compiler Option
24
Summary
  • Software Configurable Processor
  • Describe hardware using C/C
  • But not trivial. Basic understanding of the
    architecture is needed
  • Reconfiguration can take place in 150
    micro-seconds
  • 2 ISEFs per chip
  • Can ping pong
  • Configuration files stored in SDRAM
  • Use DMA to preload information
  • ISEF is proprietary and NOT FPGAs

25
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com