An FPGA Implementation of SPME Reciprocal Sum Compute Engine - PowerPoint PPT Presentation

About This Presentation
Title:

An FPGA Implementation of SPME Reciprocal Sum Compute Engine

Description:

What did we find out about precision? What did we find out about speedup? ... Due to limited logic resource limited precision FFT LogiCore. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 38
Provided by: Sam358
Category:

less

Transcript and Presenter's Notes

Title: An FPGA Implementation of SPME Reciprocal Sum Compute Engine


1
An FPGA Implementation of the Smooth Particle
Mesh Ewald Reciprocal Sum Compute Engine
(RSCE) Sam Lee
2
What is this Thesis about?
  • Implementation
  • Reciprocal Sum Compute Engine (RCSE).
  • FPGA based.
  • Accelerate part of Molecular Dynamics Sim.
  • Smooth Particle Mesh Ewald.
  • Investigation
  • Precision requirement.
  • Speedup capability.
  • Parallelization strategy.

3
Outline
  • What is Molecular Dynamics Simulation?
  • What calculations are involved?
  • How do we accelerate and parallelize the
    calculations?
  • What did we find out about precision?
  • What did we find out about speedup?
  • What is left to be done?

4
Molecular Dynamics Simulation
5
Molecular Dynamics Simulation
  • E - V (Electric Field -Gradient of
    Potential)
  • F QE (Force Charge x Electric Field)
  • F ma (Force Mass x Acceleration)
  • Time integration gt New Positions and Velocities

?
6
MD Simulation
  • Problem scientists are facing
  • SLOW!
  • O(N2).
  • N105, time-span1ns, timestep size1fs
    gt 1022 calculations.
  • An 3GHz computer takes 5.8 x 1012 days to finish!!

7
Solution
  • Accelerate with FPGA
  • Especially
  • The O(N2) calculations.
  • To be more specific, the thesis addresses
  • Reciprocal Electrostatic energy and force
    calculations.
  • Smooth Particle Mesh Ewald algorithm.

8
Previous Work
  • Software Implementations
  • Original PME Package written by Toukmaji.
  • NAMD2.
  • AMBER.
  • Hardware Implementations
  • No previous hardware implementations of SPME.
  • MD-Grape MD-Engine used Ewald Summation.
  • Ewald Summation is O(N2) SPME is O(NLogN)!

9
Calculations Involved
  • Smooth Particle Mesh Ewald

10
Electrostatic Interaction
  • Coulombic equation
  • Under the Periodic Boundary Condition, summation
    is only
  • Conditionally Convergent.

11
Periodic Boundary Condition
  • To combat Surface Effect

Replication
12
Ewald Summation Used For PBC
  • To calculate for the Coulombic Interactions.
  • O(N2) Direct Sum O(N2) Reciprocal Sum.

Direct Sum
Reciprocal Sum
r
13
Smooth Particle Mesh Ewald
  • Shift the workload to the Reciprocal Sum.
  • Use Fast Fourier Transform.
  • O(N) Real O(NLogN) Reciprocal.
  • RSCE calculates the Reciprocal Sum using the SPME
    algorithm.

14
SPME Reciprocal Energy
FFT
FFT
15
SPME Reciprocal Force
16
Reciprocal Sum Compute Engine(RSCE)
17
RSCE Validation Environment
18
RSCE Architecture
19
RSCE Verification Testbench
20
RSCE SystemC Model
21
MD Simulations with theRSCE
22
RSCE Precision Goal
  • Goal Relative error lt 10-5.
  • Two major calculation steps
  • B-Spline Calculation.
  • 3D-FFT Calculation.
  • Due to limited logic resource limited precision
    FFT LogiCore.
  • gt Precision goal CANNOT be achieved.

23
MD Simulation with RSCE
  • RMS Energy Error Fluctuation

24
FFT Precision Vs. Energy Fluctuation
25
Speedup Analysis
  • RSCE vs. Software Implementation

26
RSCE Speedup
  • RSCE _at_ 100MHz vs. P4 Intel _at_ 2.4GHz.
  • Speedup 3x to 14x
  • RSCE Computation time

27
RSCE Speedup
  • Why so insignificant?
  • QMM bandwidth limitation.
  • Sequential nature of the SPME algorithm.
  • Solution
  • Use more QMM memories.
  • Slight design modifications required.

28
Multi-QMM RSCE Speedup
  • NQ-QMM RSCE Computation time
  • The 4-QMM RSCE
  • Speedup 14x to 20x.
  • Assume N is of the same order as KxKxK
  • Speedup 3(NQ-1)x

29
RSCE Speedup
N P K Single-QMM Speedup against Software Four-QMM Speedup against Single-QMM Four-QMM Speedup against Software
Speedup 20000 4 32 5.44x 3.37 18x
Speedup 20000 4 64 6.97x 2.10 14x
Speedup 20000 4 128 10.70x 1.46 15x

Speedup 20000 8 32 3.72x 3.90 14x
Speedup 20000 8 64 5.17x 3.37 17x
Speedup 20000 8 128 7.94x 2.10 16x
x
30
Parallelization Strategy
  • When Multiple RSCEs are Used Together

31
RSCE Parallelization Strategy
  • Assume a 2-D Simulation.
  • Assume P2, K8, N6.
  • Assume NumP 4.

Four 4x4x4 Mini Meshes
An 8x8x8 mesh
32
RSCE Parallelization Strategy
  • Mini-mesh composed -gt 2D-IFFT
  • 2D-IFFT two passes of 1D-FFT (X and Y).

Y Direction FFT
X Direction FFT
33
Parallelization Strategy
  • 2D-IFFT -gt Energy Calculation -gt 2D-FFT
  • 2D-FFT -gt Force Calculation

Energy Calculation
Force Calculation
2D-FFT
34
Multi-RSCE System
35
Conclusion
  • Successful integration of the RSCE into NAMD2.
  • Single-QMM RSCE Speedup 3x to 14x.
  • NQ-QMM RSCE Speedup 14x to 20x.
  • When NKxKxK, NQ-QMM Speedup (NQ-1)3x.
  • Multi-RSCE system is still a better alternative
    than the Multi-FPGA Ewald Summation system.

36
Future Work
  • Input Precision Analysis.
  • More in-depth FFT Precision Analysis.
  • Implementation of block-floating Point FFT.
  • More investigation on how different simulation
    setting (K, P, and N) affects the RSCE speedup.
  • Investigate how to better parallelize the SPME
    algorithm.

37
Questions?
Write a Comment
User Comments (0)
About PowerShow.com