Title: SKAMPLFD Correlator
1SKAMP/LFD Correlator
- FPGAs in Radioastronomy
- 5-8 February 2007, Hobart, Tasmania,
- John Bunton
- CSIRO ICT Centre, Sydney
2Molonglo
- 1960s
- Commissioned as 408MHz One--Mille Mills Cross
- (1600m long arms)
- 1980s
- Converted to the Molonglo Observatory Synthesis
Telescope - (MOST), a 843MHzsynthesis telescope, using
just the E--W arm - 2000s
- SKAMP
3SKAMP
- A new low-frequency spectral line instrument.
- Features wide field of view imaging,
polarisation, spectral line capability, RFI
mitigation.. - Strategy parallel 3-stage re-development of MOST
- Science technology prototyping for the Square
Kilometre Array (SKA) - 1 collecting area,
- wide-field imaging
- Line feeds
- Signal transport
- Correlators
4SKAMP stage 1 (2004-2006)
- Continuum correlator using existing IF
- Observing frequency 843MHz,
- IF 4.4 MHz wide at 11MHz,
- 96 inputs 4560 baselines
- Processing in Xilinx Spartan II and 3e FPGAs
- After digitising - FPGA processing for delay,
complex to real conversion and fringe stopping - Correlator systolic array 16x16 (256 baselines at
once) - Array reuse to calculate all baselines (18
passes) - Lesson data duplication leads to an increase in
routing resources - Must store all correlations on chip (18 per MAC)
5SKAMP stage 2 3 (2006 - )
- Stage 2
- New downconversion system 100MHz bandwidth
- 30MHz useable limited by existing feed
- Digitise at antenna fibre optic transmission to
correlator - Spectral line correlator 368 inputs, resolution
5-7kHz - Stage 3
- Broadband dual polarisation linefeed, 0.6-1.2 GHz
- New LNAs, analogue beamforming
- 100 MHz bandwidth
- Same digitisation, signal transport, and
correlator
6Common Correlator Design
- Developing a correlator of technologies for
SKAMP2/3 has now been adopted by LFD (MIT and
Aust Uni Consortium) only modification needed
-4 extra GigE - Prototype for the xNTD correlator
- Same correlator board useable for 30 to 512
antennas - Team SKAMP / LFD correlator
- Uni Sydney/ ATNF - Ludi de Souza,
- Uni Sydney Duncan Campbell-Wilson, Adrian Blake
- Domain 42 John Russel, Chris Weimann
- MIT Roger Cappallo, Bart Kincaid
- ATNF/ ICT Centre John Bunton, Jaysri Joseph
7SKAMP 3 specification
- 368 inputs 67,712 signal pairs
- 100 MHz 36.8GHz total bandwidth
- Close to Australia Telescope upgrade
- 6.7 Tera complex multiply-accumulates per sec
- Similar to EVLA
- 5kHz resolution 20,000 channels
- average to get less resolution extra
programming for finer - 1,354 Mega correlations per integration period
- Enormous data set on FPGA storage insufficient
- Image Processing !!!!!
8FX correlators
- General principle divide and conquer
- Filter data to multiple subbands and distribute a
subset of subbands for all inputs to each
correlator units - Main problem is
- Getting the right data to right FPGA at the right
time
9384 way Cross Connect
- Switch? 736 GigeE ports -
- Leads to custom design
- Too hard to do one stage so do in stages
- 1st stage (8-way)
- First FPGA Coarse Filter bank can handle 8 inputs
- 2nd stage (12-way)
- Backplane interconnection between 12 filterbank
board - 3rd stage (2 way)
- Data aggregation after fine filterbanks
- 4th stage (2 way)
- Cables from two card cages
10System Overview
Diagram Ludi de Souza
11Advanced TCA
- 12 filterbank board 12 way cross connect on
backplane
12Backplane Crossconnect
- Using Rocket I/O over backplane
- Two pairs to and from each pair of board
- Implementation three FX20
- Output mapping and null connection different for
each board - Send nine bands/sets to outer FPGA and re-route
selected fourth band to centre FPGA - Input to board invariant
Diagram Ludi de Souza
13Filterbank/Crossconnect board
Diagram Ludi de Souza
14First Stage Filterbank
- Have broken filtering into two stages
- Cant do eight 32k filterbanks in one SX35
- First stage needs to divide data into 12
sub-bands - But 12 requires mixed radix FFT
- instead implement 256 real
- Gives 128 channel
- 8 boards get 11 channels and 4 get 10
- Standard processing advance data by length of FFT
at each operation of the FFT - Critical sampling - corruption due aliasing
- Solution oversampled filtebank
Diagram Ludi de Souza
15Oversampling
- Increase output data rate for each channel
without altering channel characteristics - 15 increase in data rate
- All data across 100 MHz can now be recovered
without aliasing - Method for successive filter bank operations
advance input data by less than FFT length
16Second stage filterbank
- Oversampling leads to data duplication at band
edges - Pass each band through second filterbank and
discard duplicated data - Hope to use SX25 in final implementation
- Example show 12 channel first filterbank and 2048
second
Diagram Ludi de Souza
Diagram Ludi de Souza
17Second stage problem
- In simplest implementation each second stage FPGA
is processing 48 2048-band filter banks - 12 stage FIR x 1010 bits x 2048 491520 bits
by 48 inputs - Not enough on chip memory
- Use external RAM, need capacity of DRAM
- DRAM bandwidth limits performance
- 2 SX35 have sufficient processing but have 4 to
get DRAM BW - Must reorder data so that long sequences for each
input are processed - Process each antenna for 2048512 samples..
- For a PFB with 12-point FIR, efficiency is
2048/2060 99
18Fine Filter Bank Input Re-ordering
Diagram Ludi de Souza
Diagram Ludi de Souza
19Input Ordering for Correlator
- Correlator needs data ordered so that it receives
data for a single frequency channel for all
inputs at one time - Filter bank is generating data for one input at a
time - Second re-ordering operation needed
- Originally in same memory as input re-ordering
- Design modification
- Replace two FX20 by single FX60 and add DRAM
20Output re-ordering
Diagram Ludi de Souza
21Correlator Task
- 368 inputs 67,712 signal pairs
- (Even worse for LFD 1024 inputs - 523,776
signal pairs) - 20,000 frequency channels
- 1,354 Mega correlations
- DRAM for long term accumulations
- But each correlation occurs at a 5kHz rate
- CMAC units operate at 300MHz
- Each handles 60,000 correlation
- With hundreds of CMAC/FPGA still need hundreds of
FPGAs - Developed Correlation Cell concept to ease data
flow - 35,000 correlation at one time in a single SX35
- Two FPGAs to process a single channel
2267,712 correlation in two SX35
- Systolic array too inflexible
- 144 CMAC before new data required
- New data needed 470 times to process one time
sample - 24 values for each use of the array total input
11,280 samples - Problem is even worse for LFD
- Approach developed Correlation Cell
- Combination of multiply-accumulate and storage
- Each cell handles 256 correlations at a time
- 37,000 correlations per FPGA simultaneously
- Total input data less than 1000 samples per time
instance - 512 time sample short integration on chip
23Correlation Cell
- Input 16 pairs of data
- 4bit complex multiply in 18-bit multiplier
- Accumulation to block RAM
- Calculate 256 correlation, 512 successive time
samples - Data reordering in filterbank
- xNTD 4-7 cells for all correlations
- 30-70 MHz BW per FPGA
- All baselines LFD 16, SKAMP3 2 FPGAs
- 1.2-1.5 MHz of bandwidth
24Board Manufacture Simplification (1)
- Manufacture of correlator board a major task
- Examples SKAMP1, EVLA
- Correlation cell reduces input data rate into
correlation chip - For individual correlation cell 2 sets of 16
inputs requires 256 clock cycles to process. - Data rate reduction up to a factor of 16
- This value approached for SKAMP3 and LFD
25Board Manufacture Simplification (2)
- Correlation cell also reduces data duplication
- SKAMP1 4x4 systolic array, EVLA 8x8 systolic
array - Data duplication 8 in EVLA, higher in SKAMP1 due
to array reuse - Each Correlation Cell process 256 correlations at
once - Can reduce size of systolic array by sqrt(256)16
- No data duplication on board for up to 150
antennas - Data duplication 1.5 for SKAMP3, and 4 for LFD,
LFD 16 FPGAs data input 1/sqrt(16) of total per
FPGA - Correlation cell leads to a large simplification
of correlator board
26Putting it Together The SKAMP prototype
Correlations
Correlator interface
Control and data output for SKAMP
Long Term Accumulations
27Input, Daisy Chain, Route, Autocorrelate
Infiniband 1X
- Two FX20
- Interconnection for high antenna number designs
- Input 16 rocket I/O on unidirectional Infinband
- Output 16 Rocket I/O unidirectional Infiniband
- Can daisy chain modules for reuse of data in
further processing modules beamforming, pulsar
processor
Infiniband 4X
28Compute Engine
- Eight SX35 FPGAs
- Input 16, Output 18 LVDS per FPGA low I/O leads
to reduction in support chips - 1152 Correlation cells total
- Up to 294k correlations on board at a time (256
per cell) - Data re-ordering in filterbank to achieve
- Processing of 512 time values for each frequency
channel - Then dump to LTA
29Long Term Accumulator intermediate routing
- Number of Correlation require DRAM for storage
- Data rate requires a DIMM modules per pair of
SX35
30Estimated Performance
- Correlator board clock rate 330MHz, 192
cells/FPGA, 6 FPGAs - Board processing rate 400GCMACs/s 2.8Tops/s
- Power consumption 100W per board
- Power efficiency 0.25W/GCMAC/s (4bit FX)
- EVLA an order of magnitude higher (not pure FX)
- Filterbank board 3.2Gsample/s, two polyphase
filterbanks 32 operations per sample - Board processing rate 100Gops/s (18 bit)
- Power consumption 60W
- Power efficiency 0.6W/Gop
31Conclusion
- Common hardware, hardware modules and VHDL for
SKAMP3 and LFD (prototyping for xNTD) - SKAMP3 in the lead with filterbank and correlator
hardware well on the way - Initial Manufacture this year
- Correlator common to all using correlation cell
to gain required flexibility - Developing international project - distributed
design team - Concept and design Sydney
- Hardware Tasmania
- Correlator Verilog MIT
- Filterbank VHDL Sydney