SciDac2 Kickoff Meeting - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

SciDac2 Kickoff Meeting

Description:

http://crash.ncac.gwu.edu/pradeep/Models.html ... Time accurate solution of the Navier-Stokes equations, overset (Chimera) grids ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 28

Provided by: kare137

Category:

more less

Transcript and Presenter's Notes

Title: SciDac2 Kickoff Meeting

1
SciDac2 Kickoff Meeting

Karen A. Tomko
Electrical and Computer Engineering Department
email Karen.Tomko_at_uc.edu

2
Outline of talk

Other Application areas
Crash Worthiness
Computational Electromagnetics
Computational Fluid Dynamics
Research Interests
Application Performance Challenges
Reconfigurable High Performance Computing

3
Crash Worthiness

with S. Abraham, E.S. Davidson, Q. Stout on Ford
Motor Co. sponsored Project
Finite Element Method
Newtonian physics, deformation models for
crumpling of car body
100,000 lines of Fortran 77

1996 Ford Taurus Model http//crash.ncac.gwu.edu/p
radeep/Models.html

Parallelization and performance enhancement for
shared memory and distributed memory systems
Weighted and multi-constraint domain
decomposition using graph partitioning algorithms

4
Computational Electromagnetics

with L. Katehi, C. Sarris et. al.
Time-accurate Wireless communication simulation
(transients are what is of interest)
Solution of Maxwells equations, based on Yees
FDTD approach

Adaptive multi-resolution using the Haar-wavelet
transform
C with MPI, dynamic domain decomposition using
Zoltan K. Devine, et. al.
one level of a multi-resolution modeling problem

5
COSITE INTERFERENCE IN A VEHICULAR TRANSCEIVER
NETWORK WITHIN A FOREST ENVIRONMENT A HYBRID
FDTD/MOM APPROACH

Problem Statement
The in-forest communication between multi-antenna
mobile transmit-receive units is considered.
Issues to address
Forest propagation and multi-path (FDTD modeling
requires enormous resources) .
Effect of arbitrary platform (MoM requires
extremely complex Greens function).
Operation of transceiver electronics under
cosite interference conditions (MoM incompatible
with SPICE type solvers as TRANSIM).

humvee.net

Modeling Approach
Use the Method of Moments Sarabandi and Koh,
IEEE AP-49, Feb. 2001 to model wave propagation
through the forest.
Enclose the vehicular transceivers in an FDTD
mesh to model rigorously the effect of the
platform and the transceiver architecture as in
Sarris et al., Proc. 2001 IEEE AP-S.

Joint CEN-5/FCS work. Contributors
CEN-5 C. D. Sarris, W. Thiel , L. P.
Katehi FCS I.-S. Koh, K.
Sarabandi
6
Computational Fluid Dynamics

with D. Rizzetta, P. Morgan, M. Visbal and also
with A. Hamed, D. Basu, Q. Liu
Time accurate solution of the Navier-Stokes
equations, overset (Chimera) grids
Unsteady and turbulent fluid flow and acoustics
modeling
Fortran 77, coarse level parallelization with MPI
Memory and cache analysis
Variety of numerical models implemented and
compared
New effort turbomachinery modeling with M.
Turner
multi-scale, unsteady, discontinuities,
multi-domain modeling, periodicity and symmetry

7
Hybrid Turbulence Models
Cavity mid-span axial vorticity contours
Baseline Grid
Fine Grid
8
FPGA-based Reconfigurable High Performance
Computing

Field-programmable Gate Arrays (FPGA)
Programmable digital logic
Manufacturers Xilinx, Altera, others
Trends that make FPGA especially appealing
Computational capacity of FPGA has been scaling
faster than CPU
Current generation chips are able to support
large numbers of floating point units

FPGA
Software
Hardware
9
Programming the FPGA

FPGA are programmed or configured with a a
sequence of bits containing the contents of the
LUTs and the control bits determining the
connections between LUTs, Flip Flops, Block Ram,
etc..
This sequence of bits is referred to as the
configuration or bit file.
Programmed/Reprogrammed on-the-fly in
microseconds.

10
XilinxVirtex-II Architecture
Figure from T. El-Ghazawi, K. Gaj, and D.
Pointer,Reconfigurable Supercomputing Systems
tutorial RSSI 05
11
Configurable Logic Block (CLB) ofXilinx
VirtexTM 2.5 V FPGA

4 Logic cells
4 input Look up table
Carry logic
D flip-flop

Figure from VirtexTM 2.5v FPGA Datasheet by
Xilinx
12
Trends in FPGA Floating Point Capabilities
from V. Natoli,A Computational Physicists View
of Reconfigurable High Performance Computing
Stone Ridge Technology RSSI July 05
13
Xilinx XC4VLX200

32 bit Integer and Fixed Point
Thousands of Arithmetic Units
Floating Point
600 SP Floating Point Multipliers
100 SP Floating Point Dividers
100 DP Floating Point Multipliers
20 DP Floating Point Dividers
SP ! 2 X DP
Theoretical Peaks
SP Floating Point 20-120 GFLOPs
DP Floating Point 4-20 GFLOPs
Integer .5-1 TOP

90 nm 200,448 Logic Cells 750 kB BRAM 96
18x18 bit Multipliers Clock upto 500MHz
from V. Natoli,A Computational Physicists View
of Reconfigurable High Performance Computing
Stone Ridge Technology RSSI July 05
14
An FPGA-based FDTD Solver for Reconfigurable High
Performance Computing
15
FDTD

Maxwells equations were solved using integral
equations until Yee introduced Finite-Difference
Time-Domain (FDTD).
The FDTD calculation is very parallel, and is
currently employed in parallel simulations on
High Performance Computing Clusters (HPC).
Fairly linear improvement in computations.
How to get even further speed-up on HPC systems?

16
FDTD

Target System

Beowulf Cluster
Network
FPGA
CPU
FPGA
CPU
FPGA
CPU

FPGA performs the computation
Host Software moves the data.
FPGA communication
HPC communication

17
FDTD

Relation of the Equations
HxtijHxt-1ij - dtumdy(Ezt-0.5i1j1-
Ezt-0.5i1j)

Hx/Hy Calculations Transfers
Ez Calculations Transfers
18
FDTD

The FDTD calculations have both temporal and
spatial locality.

Add
Ezij
Delay
Multiply
Constant
Hx/Hyij
Add
Hx/Hyij
Delay

HxtijHxt-1ij - dtumdy(Ezt-0.5i1j1-
Ezt-0.5i1j)
HytijHyt-1ij dtumdx(Ezt-0.5i1j1-
Ezt-0.5ij1)

19
FDTD

Ez calculation has more operations.

Constant
Multiply
Add
Delay
Hxij
Add
Hyij
Add
Add
Ezij
Hyi-1j
Multiply
Constant
Delay
Ezij
EztijEzt-1ij dtepsdx(Hyt-0.5ij-1-H
yt-0.5i-1j-1) - dtepsdy(Hxt-0.5i-1j-Hx
t-0.5i-1j-1)
20
Cray XD1 System Architecture
Cray XD1 Chasis
21
Cray XD1-Expansion Module

AAP FPGA Xilinx Virtex II Pro (xc2vp50-7)
RAP RapidArray Processor

Cray XD1 Expansion Module
22
Baseline Implementation

Update engines created by Gandhi 2
Floating point units provided by Belanovic 3 at
NEU
Two clocks system, and update engines
Magnetic Updates in parallel (Hx and Hy)
Electric update (Ez) every 2 clock cycles
Multiple update cycles w/o host intervention
Local SRAMs for input and output data
SRAMs as ping-pong buffers
Slower than Opeterons alone

23
FPGA Implementation in Cray XD1
prog_clock_gen
Transmit Data Bus
app_fdtd
rt_core
qdr2_core
QDR 1 Interface
mux
Fabric Request Interface
rt_client
QDR II SRAM1 Interface
Receive Data Bus
QDR 2 Interface
QDR II SRAM2 Interface
Host Processor Interface
QDR 3 Interface
QDR II SRAM3 Interface
qdr_fdtd
User Request Interface
QDR 4 Interface
QDR II SRAM4 Interface
Clock Signals