Title: Frank Vahid, UCR 1
1Building Fake Body Parts Digital Mockups
- Frank Vahid
- Univ. of California, Riverside
Chen Huang (UC Riverside, now Amazon) Bailey
Miller (UC Riverside, intern at SpaceX) Prof.
Tony Givargis (UC Irvine) Ting-Shuo Chou (UC
Irvine) Others...
Support provided by NSF, SRC, Dept. of Educ.
Also CareFusion, Xilinx, METI
2Bailey Miller, UCR 2
3Models of physical world that run in real-time
Test cyber-physical systems
http//www.nhlbi.nih.gov/
Frank Vahid, UCR 3
4Issue Real-time achieved via inaccuracy
Frank Vahid, UCR 4
5PC GPU
1522
1490
1430
1184
PC(1)
1000
900
800
700
600
500
Performance (ms)
400
Speedup vs real-time PC(1)
0.8x PC(4) 3.1x GPU 1.6x
300
200
100
0
Weibel
Neuron
Weibel gas
Weibel hemo
Hemodynamic
- Parallel computations
- Neighbor communication
? Seem like great match for FPGAs
Frank Vahid, UCR 5
6FPGAs Sw circuits (parallel)
C Code for FIR Filter
Circuit for FIR Filter
for (i0 i lt 128 i) y ci
xi .. .. ..
for (i0 i lt 128 i) yi ci
xi .. .. ..
- 1000s of instructions
- Several thousand cycles
- 7 cycles (though slower clock)
- Speedup gt 10x-100x
7FPGAs 101 (A Quick Intro)
FPGA
SM
LUT
4x2 Memory
1
0
a1 a0
00 01 10 11
11
a
b
11
0
Â
d1 d0
0
F
G
F G
a
b
c
1 1 1 0 1 1 0 0
0 0 0 0 0 0 1 0
D
E
8HLS
1522
1490
1430
1184
PC(1)
1000
900
800
700
600
500
Performance (ms)
400
Speedup vs real-time PC(1)
0.8x PC(4) 3.1x GPU
1.6x HLS/FPGA 3.2x
300
200
100
0
Weibel
Neuron
Weibel gas
Weibel hemo
Hemodynamic
High-level synthesis Compiler that converts
program to circuits
Frank Vahid, UCR 8
9Network of synchronized PEs on FPGAs
- General Processing Element
- Iterative ODE solver (Euler/RK4)
- 0.1 ms / 0.01 ms timestep
PE
1 PE 300 MHz
FPGA
Digital mockup
PE
PE
Frank Vahid, UCR 9
10Synthesis tool
Phase
10K iterations
Maps ODEs to virtual PEs using simulated annealing
1
Convert virtual PEs to physical circuits using
FPGA place-route
2
11General PEs
1522
PC(1)
1490
1430
PC(4)
GPU
HLS
1184
1000
900
800
Speedup vs real-time PC(1) 0.8x PC(4) 3.1x GPU
1.6x HLS 3.2x General PEs 4.9x
700
600
500
Performance (ms)
400
300
200
100
0
Weibel
Neuron
Weibel gas
Hemodynamic
weibel hemo
Frank Vahid, UCR 11
12Problem More PEs ? Lower frequency
Lost ODEs/sec due to freq drop
Real ODEs/sec
11-gen Weibel model, Virtex6 240T FPGA, general
PEs
13Use model structure to improve
Avoid using FPGA placement (Phase 2)
Graph embedding Map guest graph to host graph,
minim. max wire length
Guest
Virtual PEs
Host
Physical PEs
14Phase 2 Map virtual PEs to physical PEs
Guest
Embedding algorithm
H-tree embedding
Linear embedding
Direct map embedding
Host
Frank Vahid, UCR 14
1 Zienicke, P. 1990. Embeddings of Treelike
Graphs into 2-Dimensional Meshes. (WG '90). 2
Aleliunas, R., and Rosenberg, A.L. 1982. On
Embedding Rectangular Grids in Square Grids.
(Computers 82). 3 Berman, F., and Snyder, L.
1987. On mapping parallel algorithms into
parallel architectures, (PDC, 87).
152D grid of physical PEs
Bypass FPGA placement
FPGA
(Phase 1 May require "graph folding" first to
reduce PEs)
16Compare/backup Simulated annealing
Cost function
C w1sum w2max w3gaps
Sum sum of wire distances Max max wire length
(Euclidean dist.) Gaps wires across
architectural features
Neighbor function Swap PEs based on
distance to neighbors
P2
P1
P1
17Results
4 generations shown
5 generations shown
5 generations shown
Simulated annealing placement
No placement strategy
Embedding placement
18Results
Not routable
2D Neuron model - 256PE Xilinx Virtex6
Strategy Total power (mW) Dynamic power (mW) Static power (mW)
None 15525 8744 6481
SA 16604 10013 6590
Embed 19859 12999 6859
Strategy LUTS BRAM DSP Equivalent LUTs
None 58362 512 256 306682
SA 58567 512 256 306887
Embed 58569 512 256 306889
No impact on size
20 more power
19Graph emb (Gen PEs)
Speedup vs real-time (avg) PC(1)
0.8x PC(4) 3.1x GPU 1.6x HLS
3.2x General PE 4.9x Grph emb(GPE) 11.2x
Miller, B., F. Vahid, and T. Givargis.
Embedding-Based Placement of Processing element
Networks on FPGAs for Physical Model Simulation.
ACM Int. Symp. on FPGAs, 2013.
Frank Vahid, UCR 19
20Custom Processing Element
- Custom datapath to solve specific type of
equation
V F1 F2 F P1-P2-(FCR)CL
Custom PE for each ODE type
Modified synthesis tool to create custom PEs for
given ODEs first, then synthesis ODEs to PEs
21Custom PEs
1522
1490
1430
1184
PC(1)
PC(4)
GPU
1000
HLS
900
800
700
Speedup vs real-time (avg) PC(1)
0.8x PC(4) 3.1x GPU 1.6x HLS
3.2x General PE 4.9x Grph emb(GPE)
11.2x Custom PE 6.1x
600
500
Performance (ms)
400
300
200
100
0
Weibel
Neuron
Weibel gas
Hemodynamic
weibel hemo
Huang, Vahid, Givargis. Synthesis of networks of
custom processing elements for real-time physical
system emulation. Transactions on Design
Automation of Electronic Systems (TODAES), 2013
(to appear).
Frank Vahid, UCR 21
22Networks of Heterogeneous PEs
- General PE
- Slow, flexible (can solve any types of ODEs)
- Custom PE
- Fast, inflexible (only solves one type of ODEs)
- Multi-Type PE
- Combined multiple types of ODEs into single
custom PE
- Huge solution space
- How to choose types of PEs?
- How many PEs to allocate?
- How to bind ODEs to PEs?
Huang, Miller, Vahid, Givargis. Synthesis of
Heterogeneous Processing Elements for Physical
System Emulation. CODESISSS 2012, Oct, 2012.
23Automatic allocation and binding
24Heterogeneous PEs
1522
1490
1430
1184
PC(1)
PC(4)
GPU
1000
HLS
900
800
700
600
Speedup vs real-time (avg) PC(1)
0.8x PC(4) 3.1x GPU 1.6x HLS
3.2x General PE 4.9x Grph emb(GPE)
11.2x Custom PE 6.1x Heterog PE 34.5x
500
Performance (ms)
400
300
200
100
0
Weibel
Neuron
Weibel gas
Hemodynamic
weibel hemo
C. Huang, B. Miller, F. Vahid, T. Givargis.
Synthesis of Custom Networks of Heterogeneous
Processing Elements for Complex Physical System
Emulation. IEEE/ACM Conf on Hardware/Software
Codesign and System Synthesis (CODES/ISSS, part
of ESWEEK), Finland, Oct 2012.
Frank Vahid, UCR 24
25Network of general/custom/heterogeneous PEsVS
HLS (regularity extraction)
Heterogeneous PE (10x, 1.1x) HLS (7x, 0.85x)
general PE (6x, 1.35x) custom PE (Speed, Size)
26Speedup / dollar
Heterogeneous PEs 3X better than PC(4) 4.5x
better than GPU FPGA Easier to build custom
interfaces
CPU (I7-950 Intel X58 board) 480
GPU(GTX460 I3-540 H55 board)
380 FPGA (Xilinx Virtex6 240T-2 board)
1800
27Other projects
- Assistive monitoring
- www.cs.ucr.edu/vahid/assistivemonitoring/
- http//www.youtube.com/watch?featureplayer_embedd
edvSf8tU-78lXs - ..\Desktop\Fall montage.mp4 ..\Desktop\Frank_pul
lChair_013113_cam3.video.wmv - Web-based learning
- "Textbook is dead"
- Multi-univ synergy
- pcpp.zyante.com (C)
- Embedded systems educ.
- New prog. model, virtual lab, programmingembeddeds
ystems.com - Also riosscheduler.org
- Drunk driving (DUI)
- ..\Desktop\dui.MOV
- duicam.org
- http//www.utsandiego.com/news/2013/feb/11/ucr-dru
nken-driving-app/
28Summary
- FPGAs Fastest cost-effective execution of
physical models - http//www.youtube.com/watch?vThUKVhqoA3Q
- Future
- Manycore device
- Beyond testing CPS
- Implement end-products
Speedup vs real-time (avg) PC(1)
0.8x PC(4) 3.1x GPU 1.6x HLS
3.2x General PE 4.9x Grph emb(GPE)
11.2x Custom PE 6.1x Heterog PE
34.5x (Grph embHPE 48.5x)
Frank Vahid, UCR 28
http//www.meti.com/
29Questions?
Frank Vahid, UCR 29