Performance and Overhead in a Hybrid Reconfigurable Computer - PowerPoint PPT Presentation

About This Presentation
Title:

Performance and Overhead in a Hybrid Reconfigurable Computer

Description:

O. D. Fidanci1, D. Poznanovic2, K. Gaj3, T. El-Ghazawi1, N. Alexandridis1 ... permit run-time reconfiguration of FPGAs. Hardware Architecture. and. Programming Model ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 29
Provided by: Osm74
Learn more at: https://www.ece.lsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Performance and Overhead in a Hybrid Reconfigurable Computer


1
Performance and Overhead in a Hybrid
Reconfigurable Computer
  • O. D. Fidanci1, D. Poznanovic2, K. Gaj3, T.
    El-Ghazawi1, N. Alexandridis1
  • 1George Washington University,
  • 2SRC Computers Inc.,
  • 3George Mason University

http//cpe02.gmu.edu/rcm/
2
Features of General-Purpose Reconfigurable
Computers
  • composed of traditional microprocessors and
  • Field Programmable Gate Arrays (FPGAs)
  • closely integrated with each other
  • programming does not require knowledge of
  • hardware design
  • permit run-time reconfiguration of FPGAs

3
Hardware Architecture and Programming Model of
SRC-6E
4
SRC Hardware Architecture
5
SRC Hardware Architecture cont.
6
SRC Programming Model
7
Compilation Process of SRC-6E
Macro sources
Application sources
.vhd or .v files
.c or .f files
HDL sources
Synplicity
Intel
Logic synthesis
.v files
MAP Compiler
?P Compiler
.ngo files
Netlists
Xilinx
Object files
.o files
.o files
Place Route
Linker
.bin files
Configuration bitstreams
Application executable
8
Application Case Study 1
High-throughput Triple DES encryption
9
High-throughput encryption
. . . .
Mi2
Mi1
Mi
K0
3 DES
Ci2
Ci1
Ci
10
Fully pipelined architecture of Triple DES
1
2
. . . .
DES macro

17
18
DES macro
19
. . . .

34
35
36
DES macro
. . . .

51
  • 51 pipeline stages
  • New input new output every clock cycle

11
Overhead of the data transfer
mP Board
mP Board
Xeon mP
Xeon mP
MAP Board
(6x)
(6x)
Private Memory
Private Memory
(6x)
(6x)
12
Timing Measurements
Three-level timing measurement scheme has been
employed
  • end-to-end execution time (wall clock time - HLL
    Level) includes the configuration, data transfer
    and data processing times
  • w/o configuration time (wall clock time - HLL
    Level) excludes the configuration time but
    includes data transfer and data processing times
  • MAP Time (clock counter - Hardware Level)
    only includes data processing time

13
Triple DES Encryption
Execution time ms
160
configuration
data transfer
140
computation
120
100
80
60
40
20
0
1024 10,000 25,000 50,000 100,000
250,000 500,000
Number of encrypted blocks
14
Problems
  • execution time dominated by
  • - configuration of the MAP FPGA and
  • - data transfer between the
  • System Common Memory and
  • On-Board-Memory
  • configuration time hiding techniques
  • preloading the configuration before execution
  • flip-flopping FPGAs during reconfiguration

15
Data transfer hiding techniques
  • Data transfer can be hidden by overlapping DMA
    time with the data processing time




Encryption
Output DMA
Input DMA
Input DMA
Input DMA

Possible speed-up up to 33
Encry- ption
Encry- ption


Output DMA
Output DMA

16
Reference software implementations
Platform
Pentium 4, 1.8 GHz, 512 kB cache, 1 GB RAM
Software
Optimized for encryption (but not for cipher
breaking)
Non-optimized
Public domain code C only Intel C -O3
optimization
Phil Karns DES code C and assembly language
with look-up table precomputations GNU gcc v.
2.96 -O4 optimization
17
Total execution time of Triple DES for Pentium
4 using optimized and non-optimized code
Optimized P4 code Non-optimized P4 code
?
4
18
Throughput results for SRC-6E and Pentium 4
19
SRC-6E vs. Pentium 4speed-up
20
Application Case Study 2
DES cipher breaking
21
Secret-key breaking
C0
M0
K1
K2
K3
KN

DES
Generated by the DES breaker
22
Keys generated in the User FPGA
mP Board
mP Board
Xeon mP
Xeon mP
MAP Board
(6x)
(6x)
Private Memory
Private Memory
(6x)
(6x)
23
DES breaking machine
Execution time ms
1,200
configuration
data transfer
1,000
computation
800
600
400
200
0
128,000 1,000,000
100,000,000
Number of tested keys
24
SRC-6e vs. Pentium 4 Speed-up
25
Conclusions
  • Two different classes of applications developed
  • and tested for SRC-6E and Pentium 4 PC
  • - Triple DES encryption real-time data
    streaming
  • - DES breaking minimal
    input/output

26
Conclusions cont.
Wall-clock speed-ups
3 DES Encryption
DES Breaking
  • vs. P4 C code
  • (larger for real-time input sizes)

3.4 vs. P4 C code 12.5 vs. P4 assembly code
Speed-ups without reconfiguration
3 DES Encryption
DES Breaking
11 vs. P4 C code 41 vs. P4 assembly code
1583 vs. P4 C code
27
Informal speed/cost comparison
Cost of the SRC machine Cost of PC
?
100
Speed of the SRC machine Speed of PC
?
1600
with only one out of four FPGAs used in
computations
16 x improved speed/cost ratio
28
Conclusions Overheads
Reconfiguration time
Most affected applications
short execution time, large resource
requirements, frequent reconfiguration
Minimization techniques
  • preloading configuration
  • flip-flopping among multiple FPGAs

Data transfer time
Most affected applications
high speed real-time input/output
Minimization techniques
  • overlapping data transfer with computations
Write a Comment
User Comments (0)
About PowerShow.com