Hirofumi Sakane, Levent Yakay, Vishal Karna, - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Hirofumi Sakane, Levent Yakay, Vishal Karna,

Description:

An MpSOC core based on the IBM BlueGene/Cyclops architecture. 8 PEs in original design ... Cyclops-64. 64 bits , 80 PEs / chip , 2 TUs / PE , 1 FPU / PE. Multi ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 25
Provided by: hirofum
Category:

less

Transcript and Presenter's Notes

Title: Hirofumi Sakane, Levent Yakay, Vishal Karna,


1
DIMES An Iterative Emulation Platformfor
Multiprocessor-System-On-Chip Designs
  • Hirofumi Sakane, Levent Yakay, Vishal Karna,
  • Clement Leung, Guang R. Gao
  • Department of Electrical and Computer Engineering
  • University of Delaware
  • Agenda
  • Background and Objectives
  • Iterative Emulation Scheme
  • DIMES/P Architecture
  • Cyclops-E on DIMES/P
  • Experimental Results
  • Conclusion

2
Background
  • MpSOC (Multiprocessor System On Chip)
  • High performance for computation-intensive
    applications
  • Large amount of logic
  • Simulation issues in development with software
    tools

3
Background (contd)
  • Logic emulation by hardware
  • Much faster than logic simulation by software
  • Clock level accuracy
  • Problems in multiprocessor emulation
  • Cost
  • 1 FPGA for 1 PE
  • So, 100 FPGAs for 100 PEs? 1M
  • Scalability
  • What if the number of PEs in the spec is changed?

?
?
?
4
Objectives
  • A Cost-effective emulation platform for MpSOC
  • Emulation of a number of PEs within 1 FPGA
  • Applicable to
  • Logic verification
  • Early software development
  • Computer architecture research

5
How to put a number of PEs into 1 FPGA?
6
Solution Iterative Emulation
  • Large multiprocessor emulation on 1 FPGA
  • Key idea Time-multiplexing
  • Single PE logic is shared and iteratively used
  • Logic state for each PE is replaced cycle by
    cycle
  • Effective use of memory devices for logic states
  • Simple hardware
  • Essential components
  • An FPGA
  • External memory chips
  • DIMES
  • Delaware Iterative Multiprocessor
  • Emulation System

7
Iterative Emulation Scheme
2
2
0
1
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
3
4
5
6
7
8
9
10
11
12
13
14
15
8
Iterative Emulation Scheme (contd)
  • Time-multiplexing
  • Virtual cycle is divided into N cycles (Iteration
    cycle)
  • ( N number of PEs )
  • Each processor emulated in each iteration cycle

Virtual cycle
P0
P1
P2
P3
P4
PN-1
P0
P1
P2
P3
P4
PN-1
P0
P1
Iteration cycle
9
Iterative Emulation Scheme (contd)
virtual cycle 0
virtual cycle 1
emulation cycle
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Virtual cycle 0 1
Virtual cycle 1
Virtual cycle 0
10
Logic composition
Regular memory structure
Logic states
Combinational logic
11
Memory Elements in Iterative Emulation
Combinational logic
Memory element
12
Memory Elements in Iterative Emulation
Combinational logic
Memory element
13
Iterative Emulation using Shift Registers
  • n-bit shift registers for n processors
  • Replace every flip-flop with shift registers

combinational logic
combinational logic
P0
P1
P2
P3
P0
P1
P2
P3
Shift Register
Flip Flop (register)
14
Workflow
15
DIMES-P Architecture
  • Prototype of DIMES
  • Off-the-shelf board (Alpha-Data ADM-XRC-II)
  • One Xilinx XC2V8000
  • SRAMs DRAMs
  • PCI

16
Target Cyclops-E Architecture
  • An MpSOC core based on the IBM BlueGene/Cyclops
    architecture
  • 8 PEs in original design
  • 4 TUs(thread units) / PE
  • Each TU is a simple 32bit RISC
  • Logic size of 2 PEs or more gt 1 FPGA

17
Adjusting Cyclops-E Into One FPGA
  • Everything within FPGA for simplification for
    first trial
  • All architectural memory structures and logic
    states into FPGAs embedded memory elements
  • No external memory use
  • Shift registers implemented by cascaded FFs
  • Global memory, SPM, I-cache implemented by BRAMs
  • Cut down the number of PEs to 2
  • Due to the resource limit of memory elements

18
Cyclops-E Emulator on DIMES/P
  • 2 PE Cyclops-E
  • All the memory fits into the embedded memory of
    the FPGA
  • Iterative emulation modules
  • PE
  • Global memory logic and switch

Global memory 0 64KB
PE 0
network
SPM 16KB
Switch0
 
I-cache 32KB
Host interface
Switch1
SPM 16KB
PE 1
Global memory 1 64KB
19
Experiment
  • Synthesis and implementation
  • 2-PE Cyclops-E
  • Comparison
  • Non-iterative emulation
  • Iterative emulation
  • FPGA development tool Xilinx ISE 5.2i sp3
  • Translation for iterative emulation
  • Handwork for assignment of iterative emulation
  • Automation for regular translation forms

20
FPGA Resource Usage
  • Successfully reduced the resource usage to fit
    within 1 FPGA
  • From 111 to 75
  • 7.6MHz for virtual cycle
  • 15.2MHz for iterative cycle

21
Software for Cyclops
  • Software research on cellular architecture
  • Another project using DIMES
  • Multithreaded programming model for cellular
    architecture
  • Compiler
  • Debugger
  • Thread library
  • C-thread
  • Applications
  • Runtime system
  • Both on Cyclops and host
  • Host interface

22
Future Work
  • Use of external memory
  • Regular memory structures
  • SPM, global memory, I-cache
  • Main memory
  • Embedded BRAMs for logic states
  • Shift registers implemented by addressable memory
  • 8-PE Cyclops-E
  • Larger systems
  • Cyclops-64
  • 64 bits , 80 PEs / chip , 2 TUs / PE , 1 FPU / PE
  • Multi-chip emulation
  • Custom-designed DIMES PCB

23
Conclusion
  • Developed DIMES/P
  • A hardware emulator for multiprocessor
  • Iterative emulation
  • To reduce emulation resources
  • Time-multiplexing
  • Implemented 2-PE Cyclops-E
  • Non-iterative 111 Overflow
  • Iterative 75 Fit

24
Headhunting
  • Research opportunity at
  • Computer Architecture and Parallel Systems
    Laboratory (CAPSL) led by Professor Gao,
  • in Department of Electrical and Computer
    Engineering,
  • University of Delaware
  • Looking for good
  • Graduate students
  • Post-docs
Write a Comment
User Comments (0)
About PowerShow.com