Title: Architectures de processeurs volus
1Design flow
C6201 code
VLIW
2000200005290079000403E207BC11A10190CA0502807FAA0
63D2275073D82F7061829A04C
2000200005290079000403E207BC11A10190CA0502807FAA0
63D2275073D82F7061829A04C
2002200007BC11A0033C426501BC62E6063D0267063D22E50
23401A1019C03594514005AE5 2002400005BCC26705BCE2E
5008D2078073D8267073DA2E406BD426706BD62E5013827E05
A 20026000053C8267053CA2E5000C036207BDD2E4051002B
600900234A0100234009813A2DB 00000001FF
2000200005290079000403E207BC11A10190CA0502807FAA0
63D2275073D82F7061829A04C
2000200005290079000403E207BC11A10190CA0502807FAA0
63D2275073D82F7061829A04C
2000200005290079000403E207BC11A10190CA0502807FAA0
63D2275073D82F7061829A04C
2000200005290079000403E207BC11A10190CA0502807FAA0
63D2275073D82F7061829A04C
2000200005290079000403E207BC11A10190CA0502807FAA0
63D2275073D82F7061829A04C
VLIW address
Instruction 1
Instruction 2
Instruction 8
..
05290079
1
2Design flow
Instruction analysis example RA10 RA10 RA8
on L2
05290079 0 0 0 0 0 1 0 1 0 0 1 0
1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1
b31 b0
05290079 0 0 0 0 0 1 0 1 0 0 1 0
1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1
b31 b0
05290079 0 0 0 0 0 1 0 1 0 0 1 0
1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1
b31 b0
05290079 0 0 0 0 0 1 0 1 0 0 1 0
1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1
b31 b0
05290079 0 0 0 0 0 1 0 1 0 0 1 0
1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1
b31 b0
05290079 0 0 0 0 0 1 0 1 0 0 1 0
1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1
b31 b0
05290079 0 0 0 0 0 1 0 1 0 0 1
0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0
1
b0 parallel bit
b23 -gt b27 destination register (RA10)
b1 -gt b4 unit used
b5 -gt b11 operation type
b13 -gt b17 first operand register (RA8)
b18 -gt b22 second operand register (RA10)
2
3Design flow
Harware set analysis result
Registers
Calculating units (CU)
Operations for each unit
Register in for each unit
Register out for each unit
Other elements
3
4Design flow
Source writing in C or C6201 assembler
Automatic processor VHDL model generation
Automatic hardware characteristics extraction
Soft fonctionnel simulation
Synthesis Routing
Hardware Database
C6201 Code
FPGA bistream
4
5Design flow
VHDL model generation 3 kind of data
Analysis data
Configuration data
DSP
RA0
RB3
Peripheral 1
Constant data
RA1
RB7
Peripheral 2
RA5
State registers
L1
L2
Memory managing
D1
D2
PC
M1
5
6Design flow
Génération automatique
5 inputs max
32 inputs max
L1
Addition
RA0
Subtraction
RA1
Absolute V.
RA5
D2
RB3
RB7
6
7Design flow
An example
Sobel filter
- Assembler source in TI library
- CCS (Code Composer Studio) simulation
- Analysis and model generation
- VHDL model test whith Modelsim
7
8Design flow
RA3
RA9
RA10
Addition
RA12
RA13
Subtraction
RA5
RA8
RA14
Absolute V.
RB3
RB10
8
RB11
9Design flow
L1
RB1
L1
RA2
S1
9
10Design flow
For Sobel filter
- 14 operations on 186 ( 8)
- 37 input connections on 384 ( 10 )
- 36 output connections on 160 ( 22 )
1610 slices on synthesis (15 on VirtexII 2000)
10
11Plan
- Conclusion and perspectives
11
12Experimental validations
We use the multimédia board
Vidéo Codecs
FPGA Virtex II - 2000
UART
5 external memories
12
13Experimental validations
Test plateform architecture
video
video
Input Codec
Acquisition
RAM ZBT
Restitution
OutputCodec
DMA channels
DSP C6201 model
Picoblaze
13
Serial link
14Experimental validations
Neural network recognition
Extraction (4032 pixels)
Filter
Pre-processing
Distance calculating
(9,6 µs)
(3,2 µs)
25 of slices (5 Picoblaze and other)
14
multi-processors application possible
15Experimental validations
Vidéo
Vidéo
Input Codec
Acquisition
RAM ZBT
Restitution
OutputCodec
DMA channels
DSP C6201
DSP 1
DSP 2
DSP 3
DSP 4
- - Pré-processing
- Distance calculating with
- 3 hidden neural
Picoblaze
Distance calculating with 4 hidden neural
Serial link
15
16Experimental validations
multi-processors synthesis results
A full DSP uses 14361 slices (134 of virtex
II-2000)
7,8 images /seconde (100 MHz )
16
17Experimental validations
Design flow tests with curent algorithms
Multi-processor application possible
Important simplification factor
17
18Experimental validations
Multi-processors application
Need for tools to place and shedule tasks on the
different processors
18
19Plan
- Conclusion and perspectives
19
20SynDEx
- SynDEx made
- Architecture Algorytm Adequacy (AAA)
Algorithm
Architecture
P1
P2
T1
T2
T3
P3
T4
T5
Adequacy
P1
P2
T1
T5
T3
P3
T2
T4
20
21SynDEx
Generate a lot of files
- Architecture description (processors number,
communication link)
SynDEx provides code generation tools for some
target processors
21
22SynDEx
SynDEx-IC SynDEx circuit extension proposed by
A²SI laboratory
SynDEx-IC
(source présentation Thierry GRANDPIERRE /
ESIEE-A²SI)
Design flow
VHDL
SynDEx
Goal to etablish a bridge between SynDEx and
the design flow
22
23SynDEx
Algorithm decomposition in processing functions
Architecture specification
Software graph
Processing code
Hardware graph
Functions time
Simulator Compiler
SynDEx
SynDEx
Functions dispatching
DSP code
Communication managing
Functions scheduling
Générateur VHDL dUnités de Contrôle
Control units VHDL Generator
Générateur VHDL des DSPs
DSP VHDL Generator
VHDL files Fusion
23
24SynDEx
SynDEx macro-code Picoblaze translater
SynDEx processor model
Calculating scheduler
Communication scheduler
T1
Communications
T2
Synchronization
T3
Processor architecture
UC
UT
ROM
ROM
Picoblaze communications
Picoblaze processing
DMA channel
DMA channel
Privated Ram
24
25SynDEx
- Principle validation with Shen algorithm
Processing functions on two processors
Communication with BLOCRAM
VHDL realization 6 slices on Virtex 2-2000
25
26Plan
- Conclusion and perspectives
- Conclusion and perspectives
26
27Conclusion
Fast prototyping FPGA development for the non
electronic experts
Adapted processor ASIP automatic generation
(Application Specific Instruction
Processor)
Co-design possibilities mixing with existing
tools (scilab/scicos)
Multi-processors applications optimised with
SynDEx interface
28Perspectives
- Picoblaze -gt DSP in the UT
- more multi-processor with simplified VHDL model