Sparse Linear Solver for Power System Analysis using FPGA

1 / 16
About This Presentation
Title:

Sparse Linear Solver for Power System Analysis using FPGA

Description:

To design an embedded FPGA-based multiprocessor system to perform high speed ... Tailor HW design to systems arising in Power Flow analysis. HPEC 2004 ... –

Number of Views:173
Avg rating:3.0/5.0
Slides: 17
Provided by: ASC88
Category:

less

Transcript and Presenter's Notes

Title: Sparse Linear Solver for Power System Analysis using FPGA


1
Sparse Linear Solver for Power System
Analysisusing FPGA
  • Jeremy Johnson, Prawat Nagvajara, Chika Nwankpa
  • Drexel University

2
Goal Approach
  • To design an embedded FPGA-based multiprocessor
    system to perform high speed Power Flow Analysis.
  • To provide a single desktop environment to solve
    the entire package of Power Flow Problem
    (Multiprocessors on the Desktop).
  • Solve Power Flow equations using Newton-Raphson,
    with hardware support for sparse LU.
  • Tailor HW design to systems arising in Power Flow
    analysis.

3
Algorithm and HW/SW Partition
4
Results
  • Software solutions (sparse LU needed for Power
    Flow) using high-end PCs/workstations do not
    achieve efficient floating point performance and
    leave substantial room for improvement.
  • High-grained parallelism will not significantly
    improve performance due to granularity of the
    computation.
  • FPGA, with a much slower clock, can outperform
    PCs/workstations by devoting space to hardwired
    control, additional FP units, and utilizing
    fine-grained parallelism.
  • Benchmarking studies show that significant
    performance gain is possible.
  • A 10x speedup is possible using existing FPGA
    technology

5
Benchmark
  • Obtain data from power systems of interest

6
System Profile
7
System Profile
  • More than 80 of rows/cols have size lt 30

8
Software Performance
  • Software platform
  • UMFPACK
  • Pentium 4 (2.6GHz)
  • 8KB L1 Data Cache
  • Mandrake 9.2
  • gcc v3.3.1

9
Hardware Model Requirements
  • Store row column indices for non-zero entries
  • Use column indices to search for pivot. Overlap
    pivot search and division by pivot element with
    row reads.
  • Use multiple FPUs to do simultaneous updates
    (enough parallelism for 8 32, avg. col. size)
  • Use cache to store updated rows from iteration to
    iteration (70 overlap, memory ? 400KB -
    largest). Can be used for prefetching.
  • Total memory required ? 22MB (largest system)

10
Architecture
SDRAM Memory
SRAM Cache
SDRAM Controller
CACHE Controller
Processing Logic
FPGA
11
Pivot Hardware
Pivot logic
Physical index
Translate to virtual
Index reject
Read colmap
Memory read
FP compare
Pivot index
Pivot value
Virtual index
Column value
Pivot column
Pivot
12
Parallel FPUs
13
Update Hardware
Column Word
FMUL
Update Row
FADD
Select
Write Logic
Merge Logic
Memory read row
colmap Update
Submatrix Row
14
Performance Model
  • C program which simulates the computation (data
    transfer and arithmetic operations) and estimates
    the architectures performance (clock cycles and
    seconds).
  • Model Assumptions
  • Sufficient internal buffers
  • Cache write hits 100
  • Simple static memory allocation
  • No penalty on cache write-back to SDRAM

15
Performance

16
GEPP Breakdown
Write a Comment
User Comments (0)
About PowerShow.com