Title: Hirofumi Sakane, Levent Yakay, Vishal Karna,
1DIMES An Iterative Emulation Platformfor
Multiprocessor-System-On-Chip Designs
- Hirofumi Sakane, Levent Yakay, Vishal Karna,
- Clement Leung, Guang R. Gao
- Department of Electrical and Computer Engineering
- University of Delaware
- Agenda
- Background and Objectives
- Iterative Emulation Scheme
- DIMES/P Architecture
- Cyclops-E on DIMES/P
- Experimental Results
- Conclusion
2Background
- MpSOC (Multiprocessor System On Chip)
- High performance for computation-intensive
applications - Large amount of logic
- Simulation issues in development with software
tools
3Background (contd)
- Logic emulation by hardware
- Much faster than logic simulation by software
- Clock level accuracy
- Problems in multiprocessor emulation
- Cost
- 1 FPGA for 1 PE
- So, 100 FPGAs for 100 PEs? 1M
- Scalability
- What if the number of PEs in the spec is changed?
?
?
?
4Objectives
- A Cost-effective emulation platform for MpSOC
- Emulation of a number of PEs within 1 FPGA
- Applicable to
- Logic verification
- Early software development
- Computer architecture research
5How to put a number of PEs into 1 FPGA?
6Solution Iterative Emulation
- Large multiprocessor emulation on 1 FPGA
- Key idea Time-multiplexing
- Single PE logic is shared and iteratively used
- Logic state for each PE is replaced cycle by
cycle - Effective use of memory devices for logic states
- Simple hardware
- Essential components
- An FPGA
- External memory chips
- DIMES
- Delaware Iterative Multiprocessor
- Emulation System
7Iterative Emulation Scheme
2
2
0
1
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
3
4
5
6
7
8
9
10
11
12
13
14
15
8Iterative Emulation Scheme (contd)
- Time-multiplexing
- Virtual cycle is divided into N cycles (Iteration
cycle) - ( N number of PEs )
- Each processor emulated in each iteration cycle
Virtual cycle
P0
P1
P2
P3
P4
PN-1
P0
P1
P2
P3
P4
PN-1
P0
P1
Iteration cycle
9Iterative Emulation Scheme (contd)
virtual cycle 0
virtual cycle 1
emulation cycle
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Virtual cycle 0 1
Virtual cycle 1
Virtual cycle 0
10Logic composition
Regular memory structure
Logic states
Combinational logic
11Memory Elements in Iterative Emulation
Combinational logic
Memory element
12Memory Elements in Iterative Emulation
Combinational logic
Memory element
13Iterative Emulation using Shift Registers
- n-bit shift registers for n processors
- Replace every flip-flop with shift registers
combinational logic
combinational logic
P0
P1
P2
P3
P0
P1
P2
P3
Shift Register
Flip Flop (register)
14Workflow
15DIMES-P Architecture
- Prototype of DIMES
- Off-the-shelf board (Alpha-Data ADM-XRC-II)
- One Xilinx XC2V8000
- SRAMs DRAMs
- PCI
16Target Cyclops-E Architecture
- An MpSOC core based on the IBM BlueGene/Cyclops
architecture - 8 PEs in original design
- 4 TUs(thread units) / PE
- Each TU is a simple 32bit RISC
- Logic size of 2 PEs or more gt 1 FPGA
17Adjusting Cyclops-E Into One FPGA
- Everything within FPGA for simplification for
first trial - All architectural memory structures and logic
states into FPGAs embedded memory elements - No external memory use
- Shift registers implemented by cascaded FFs
- Global memory, SPM, I-cache implemented by BRAMs
- Cut down the number of PEs to 2
- Due to the resource limit of memory elements
18Cyclops-E Emulator on DIMES/P
- 2 PE Cyclops-E
- All the memory fits into the embedded memory of
the FPGA - Iterative emulation modules
- PE
- Global memory logic and switch
Global memory 0 64KB
PE 0
network
SPM 16KB
Switch0
I-cache 32KB
Host interface
Switch1
SPM 16KB
PE 1
Global memory 1 64KB
19Experiment
- Synthesis and implementation
- 2-PE Cyclops-E
- Comparison
- Non-iterative emulation
- Iterative emulation
- FPGA development tool Xilinx ISE 5.2i sp3
- Translation for iterative emulation
- Handwork for assignment of iterative emulation
- Automation for regular translation forms
20FPGA Resource Usage
- Successfully reduced the resource usage to fit
within 1 FPGA - From 111 to 75
- 7.6MHz for virtual cycle
- 15.2MHz for iterative cycle
21Software for Cyclops
- Software research on cellular architecture
- Another project using DIMES
- Multithreaded programming model for cellular
architecture - Compiler
- Debugger
- Thread library
- C-thread
- Applications
- Runtime system
- Both on Cyclops and host
- Host interface
22Future Work
- Use of external memory
- Regular memory structures
- SPM, global memory, I-cache
- Main memory
- Embedded BRAMs for logic states
- Shift registers implemented by addressable memory
- 8-PE Cyclops-E
- Larger systems
- Cyclops-64
- 64 bits , 80 PEs / chip , 2 TUs / PE , 1 FPU / PE
- Multi-chip emulation
- Custom-designed DIMES PCB
23Conclusion
- Developed DIMES/P
- A hardware emulator for multiprocessor
- Iterative emulation
- To reduce emulation resources
- Time-multiplexing
- Implemented 2-PE Cyclops-E
- Non-iterative 111 Overflow
- Iterative 75 Fit
24Headhunting
- Research opportunity at
- Computer Architecture and Parallel Systems
Laboratory (CAPSL) led by Professor Gao, - in Department of Electrical and Computer
Engineering, - University of Delaware
- Looking for good
- Graduate students
- Post-docs