Title: CRISP template architecture
1CRISP template architecture
- Francisco Barat
- Murali Jayapala
- Pieter Op de Beeck
2Motivation
- Target application domain MPEG-4 apps
- Compute-intensive applications
- Many different algorithms
- Different Clusters of quality
- Standards evolving in time
- Enable research on reconfigurable instruction set
processors - Unifying model
3CRISP definition
- Configurable
- Reconfigurable
- Instruction
- Set
- Processor
- Design time configurable
- Parameterized architecture
- (CPU core, Memory)
- Eg Play Doh from HP labs, ARC Cores
- Reconfigurable Instruction Set
- Instruction set is not 100 fixed at design time
- Eg Chimera, One chip
4What is Reconfigurable Instruction Set Processor?
- Instruction set of the processor can be changed
from one application to another - Enabled by
- Memory Based Decoders
Change the Decoder Memory
5A Simple Example
Register File
Register Select
Decoder
1010101010101011101001010101001
Interconnect configuration
Opcode
Adder
Multiplier
Enable o/p
Enable o/p
- Instruction set
- Add R1, R2
- Mult R1, R2
6A Simple Example
Register File
Register Select
Decoder
1010101010101011101001010101001
Interconnect configuration
Opcode
Adder
Multiplier
Enable o/p
Enable o/p
- Instruction set
- INCA R1, R2 (R1R2 R2)
7CRISP, Design Time Configurable
CRISP template
Template
CRISP1
CRISP2
CRISPn
Instances
TI C62 (VLIW)
Remarc
Chimaera
- Design Space
- Reconfigurable instruction set
- VLIW
8CRISP Configurable Parameter Hierarchy
CRISP
Memory Cluster
Global Interconnect
CPU Core
Data Path
Decoder
Intra-Cluster Interconnect
Memory
Inter-cluster Interconnect
Cluster
Decoder Memory
Intra-cluster Interconnect
Functional Unit
Register File
Inter-segment Interconnect
Segment
Processing Element
Intra-segment Interconnect
9A CRISP Multiprocessor System
Main Memory
L2 Cache
L1 I Cache
L1 I Cache
L1 Loop Cache
CPU core
CPU core
L1 D Cache
10Example of an Instance
11CPU Core (Datapath)
12CPU Core (Datapath)
Datapath Cluster 1
Datapath Cluster 2
Datapath Cluster 3
Register File 1
Register File 2
Register File 3
FU1
FU2
FU3
FU4
FU5
FU6
13Parameters CPU Core (Datapath)
- Datapath Clusters (n)
- Interconnect between the clusters (type)
- Data Memory ports (n)
- (un)protected pipeline, predicated VLIW (yes/no)
14Register File Model
Write ports
O1
O2
O3
Read A1
Write A1
Register File
Write A2
Read A2
Write A3
R1
R2
Read ports
15Functional Unit Model
Input ports (Operands)
Opcode
O1
O2
O3
Functional Unit
M1
Memory ports (optional)
M2
R1
R2
Output ports (Results)
16Parameters Functional Unit
- Within a Functional Unit
- Processing elements (PEs)
- Interconnect Routing between PEs
- allows spatial computation
- For Large Functional Units
- Segments (n)
- PE (type, granularity)
- Intra Segment Interconnect (type)
- Inter Segment Interconnect (type)
17Functional Unit Internal Model
Segment border
Functional unit border
PE1
PE2
PE1
PE1
PE3
PE1
Intra Segment Interconnect
Inter Segment Interconnect
opcode
inputs
outputs
memory ports
18Some Examples of Functional Units
- One segment
- PE ALU
- No interconnect
- One segment
- PEs LUT
- Interconnect
19Types of processing elements
- Some examples
- Registers
- Computation elements
- ALU
- multiplier
- LUT
- subword shuffler
- Constant generators
20Memory
21CRISP Configurable Parameter Hierarchy
CRISP
Memory Cluster
Global Interconnect
CPU Core
Data Path
Inter-cluster Interconnect
Cluster
Intra-cluster Interconnect
Functional Unit
Register File
Inter-segment Interconnect
Segment
Processing Element
Intra-segment Interconnect
22A CRISP Multiprocessor System
Main Memory
L2 Cache
L1 I Cache
L1 I Cache
L1 Loop Cache
CPU core
CPU core
L1 D Cache
23Memory Cluster
Memory
Memory Selection Mechanism
Decoder
Memory Cluster 1
Memory Selection Mechanism
Memory
Memory
Memory Cluster 2
Decoder
Decoder
24Parameters Memory Cluster
- Within each Memory Cluster
- Memories (n)
- Decoders (n)
- Memory Cluster Interconnect
- I/p connected to Memory/Decoder
- O/p connected to Decoder/Memory
- Memory Selection Mechanism
- Address translators
25Parameters Memory
- For each Memory
- Size (width x height)
- Internal Memory Organization
- Local Memory Control
- Loop Control
- Cache Replacement Policies
- FIFO control
26Parameters Decoder
- Types
- Fixed Decoding
- Memory Based
- Size (width x height)
- Hybrid
- Combination of both
- Software
n bits
Multiplexer
Decoder Memory
Fixed Decoder
m bits
Fixed Decoding
Memory based Decoding
Hybrid
- Variation in n can be large
- Variation in m can be large
- Variation in n is small
- Variation in m is small
27An Example (Program Memory)
- Cluster
- Cluster 1 1 output(8b)
- Memory 1 input(4b), 1 output(4b)
- SDRAM, 1204bx4b
- Decoder input(4b), output(8b)
- Fixed
- Cluster 2 2 inputs(4b, 4b), 2 outputs (8b, 8b)
- Memory 1 1 input(4b), 1 output(4b)
- Cache
- Memory 2 1 input (4b), 1 output (4b)
- Cache
- Decoder 1 input (4b), output (8b)
- Fixed
- Decoder 2 input (4b), output (8b)
- Fixed
- Loop Control
- Cluster 3 3 inputs(4b, 4b, 4b, 4b), 4 outputs
(8b, 4b, 6b, 8b)
28Summary of the Memory parameters
29Methodology
30Conclusions
- New template to do research
- Complete decoder hierarchy
- Final remark
- Good, isnt it?
31Motivation
- Power consumption due to Instruction/Data traffic
- Significant in
- Current Processors (VLIW)
- FPGAs , very very long control word
(configuration word V2LIW or even RLIW) - More significant when scaled
- More Functional Units in parallel
- To address this
- Control Memory Exploration