Title: Reconfigurable Architectures
1Reconfigurable Architectures
- Murali Jayapala
- 15 sept 2000
2Outline
- Introduction
- Reconfigurable Computing
- Reconfigurable Processors
- Tightly Coupled Reconfigurable Processors (RISP)
- Observations
- Work plan
3FPGAs
Field Programmable Gate Array -- schematic
4FPGA elements
Processing element (fine grained)
Single FPGA cell
Interconnect
5Configuration context
. . . .
- One large instruction for a superscalar
processor - One VLIW word composed of instructions for the
- parallel units
- One Horizontal micro code
6How to switch context
7What can be done with these fine grained elements?
Y Ax2 Bx C
Computation in space
- Reconfigurable Computing
- High computational density
- - bit operations per area-time
- Hardware flexibility
8Definitions
- Reconfigurable Logic
- configurable compute elements
- configurable Interconnect
- different configurations (context)
- Reconfiguration
- Process of changing the configuration (context)
- Dynamic Reconfiguration
- Process of changing the configuration during
runtime
9Reconfigurable Processors
- Combine a Processor with Reconfigurable Logic
Reconfigurable Logic
Microprocessor
Reconfigurable Processor
10Types of Reconfigurable Processors
- Type is determined by processor coupling
- Attached processor
- Coprocessor
- Functional unit -gt RISP
11Attached Processor
- Splash 2
- Arnold et al, The Splash 2 processor and
applications, FCCM 93
12Coprocessor
Garp Hauser and Wawrzynek, Garp A MIPS
Processor with a Reconfigurable Coprocessor,
FCCM 97
13RFU
OneChip98 Jacob and Chow, Memory interfacing and
instruction specification for reconfigurable
processors, FPGA 99
14Tightly coupled Reconfigurable Processor
Reconfigurable Instruction Set Processor (RISP)
- By having a reconfigurable logic in the data
path - of the Instruction Set Processor (ISP), There
is an - additional degree of flexibility
- Creating Functional units on the fly
- Create Specialized Functional units to enhance
the - performance of an application
15Global Questions
- For Who and where this processor is needed?
- Are there any hidden advantages ?
- How does it evaluate from the Programming point
of view ? - Issues in programming and compilers
- Francisco Barat Quesada
- How does it evaluate from the hardware point of
view - Common known parameters
- Power
- Speed
- Area
16Typical MPEG-4 application
17iHome applications
- Properties of MPEG-4 applications
- Every object has its own decoder algorithm
- Scalable decoders
- Variable number of objects
- Interaction with user
- Decoder algorithm downloaded together with data
SAOL/SASL, Java - Decoder lasts longer than the standards future
proof - Conclusion
- much run-time flexibility is needed
- this leads to extensive use of Instruction Set
Processors - and dynamically reconfigurable hardware
18Modeling Issues (for H/W)
- Granularity of the Reconfigurable Logic
- Fine grained
- LUT kind of compute elements
- Coarse grained
- ALU kind of compute elements
- Granularity of the Operations
- Fine grained
- addition
- logic operations
- Coarse grained
- fixed point multiplication
- DCT
- Filtering
19Modeling Issues (contd)
- Configuration Memory
- configurable compute element bits
- configurable interconnect bits
- Interconnect
- scheme in which compute elements
- are interconnected
- Nearest neighbor
- hierarchical
20Orthogonal
Dependent
21Orthogonal
Some Observations
Dependent
22Configuration context
. . . .
- One large instruction for a superscalar
processor - One VLIW word composed of instructions for the
- parallel units
- One Horizontal micro code
- Context? Instruction bits for the RFU
23Coarse grained Compute Elements
coarse grained
Fine grained
Field Programmable Function array (FPFA)
Duality Resembles an array Functional units
with configurable interconnect
24Coarse grained FPFA
- Hints for Power models from Low power Instruction
Set Processors
- context Instruction
- FPFA Functional units
25Instruction Memory Hierarchy
Execution Core Or Functional Units
Main Instruction Cache (or unified cache)
Instruction decoder
M U X
Instruction Buffer Or Loop Cache
- In Compute Intensive applications,
- Dominated by instruction loops
- Current scheme to achieve low power
26- Considerable potential to still reduce power
- By, Clustering the Functional Units
Cluster 1 Execution Core Or Functional Units
Instruction Buffer Or Loop Cache 1
Main Instruction Cache (or unified cache)
Instruction decoder
Instruction Buffer Or Loop Cache 2
Cluster 2 Execution Core Or Functional Units
- This is a new observation
- Not exploited in current low power architectures
- There are no compilers which exploits this
27Dualities
- Coarse grained FPFA and
- Instruction Set Processors
28Context id
ALU kind
memory
- Reconfigurable compute elements ? Functional
Units - Multiple Context ? Instruction Loops or Program
loops - Segmented with Multiple context ? Clusters with
loop - Configuration Memory ? Instruction Memory
- Configuration bit compaction ? Instruction Coding
29Global Observation
Reconfigurable Instruction Set Processor (RISP)
GPP
DSP-P
flexibility
RISP
ASIP
ASIC
performance
30Work Plan
31Past work since March, 2000
- March to July Course work
- Digital Electronic Processors
- Design of Digital Integrated circuits
- Languages and Compiler Design
- July to September
- Literature study
- Categorizing h/w issues into Abstract groups
32Future Work (long term)
- Refine the Abstract groups
- Granularity of operations
- Granularity of reconfigurable logic
- Configuration memory
- Interconnect
- Build models to evaluate the Processor
- power
- speed
- area
- With Reconfigurable Logic in mind
- May be end up with a Field Programmable Function
Array (FPFA), suitable for RISP
33Future work (short term-literature)
- FPGA technology and development
- Interconnects
- compute logic
- configuration memory
- Interconnect schemes in current processor
technology - Granularity of the
- functional units and
- Operations
- in current Instruction Set processors (eg,
VLIW processors)
34Future work
- Based on some decisions taken from above tasks,
concentrate on Configuration Memory issues
Interconnect
Granularity of Reconfigurable Logic
Granularity of operations
Configuration Memory