Title: FPGAbased SystemonChip Designs for RealTime Applications in Particle Physics
1FPGA-based System-on-Chip Designs for Real-Time
Applications inParticle Physics
- Shebli Anvar, Olivier Gachelin,Pierre Kestener,
Herve Le Provost,Irakli Mandjavidze - DAPNIA, CEA Saclay,91191 Gif-sur-Yvette, France
2Overview
- Platform FPGAs
- Xilinx Virtex-II Pro devices
- Typical SoC Architecture
- Example designs (On-going projects)
- Test bench for the ANTARES off-shore DAQ/SC board
- Selective Read-out Processor (SRP) for the CMS
ECAL - On-board Gamma Ray Burst DAQ/Trigger and alert
system for the ECLAIRs microsatellite - Conclusive remarks on the use of SoC approach
3Platform FPGAs (Virtex-II Pro)
- Programmable logic cells
- combinatorial and synchronous
- Versatile IOs
- Single ended (LVTTL, LVCMOS) and differential
(LVDS) - Hard IP cores
- Clock management
- Memory blocks
- Serial transceivers (MGT)
- Embedded processor(s)
- Plus various soft IP cores
- Microcontrollers, network IF...
CPU
- Xilinx Virtex-II Pro 2vp30 (Middle range device)
- 2 PowerPC 405 CPU _at_ 300 MHz
- 8 RocketIO transceivers up to 3.125 Gbit/s
- 136 18-kbit dual-port memories blocks 2.4
Mbits644 configurable I/Os
4SoC architecture on Virtex-II Pro
- IBM CoreConnect standard on-chip
bus-communication link
5SoC example
50 MHz
50 MHz
RS232 console19200 baud
PowerPC100 MHz
User Logic
Reset IDregister
32 kB memory
P L B
O P B
Dataregister
256 bytememory
PLB / OPBbridge
Slaveinterface IPIF
IPIC
32-bit R/W
Clock, Reset, JTAG
- 12 of BRAM and 7 of logic cells of a middle
range 2VP30 device - Plenty resources for much more sophisticated user
cores
6Prototyping and design
- Number of development kits with various Virtex-II
Pro devices - 2VP7, 2VP30, 2VP50
- 1 or 2 PowerPCs
- 4 or 8 RocketIO transceivers
- Pluggable optical modules
- LVDS interfaces
- RS232, Ethernet
- 64 Mbyte external memory
- P160 extension module
- RS232 Ethernet Flash Memory
- Soft IP cores
- Software libraries
FF1152 development kit from Memec Inc.
71st SoC development example Test bench for the
ANTARES DAQ/SC board
- Production test bench for 350 Local Control
Modules - Electronics to be installed in Mediterranean Sea
2.5 km below surface - Fully automated with test report populating
quality control DB - Several data control interfaces with different
IO standards
- Test bench emulates LCM environment
- Stimulates inputs and analyzes responses
8The test bench
- SoC-based tester board
- ?Memec development kit with Xilinx 2VP30 FPGA
- ?Supports hot swappable DAQ/SC
- ?Test duration 15 minutes per LCM
9Test bench organization
- 3 interacting systems control PC, LCM SoC
tester - 200 MHz embedded PowerPC on tester FPGA runs
Linux OS - with NFS root file system on control PC
- Simple cross-compilation step to reuse and adapt
the ANTARES DAQ softwareconcentrate development
efforts on the test functionalities - Successions of tests initiated by control PC
- Actions taken by C callback functions in LCM
tester
Test n
Test 1
...
Callbacks
Callbacks
10Firmware design of the SoC tester
- An IP core per test
- C callback function addresses the IP core
corresponding to the active test
66 MHz
Ethernet
- Simplified firmware development
- Most IP cores are very simple test sequence in
software - Use of existing IP cores for Ethernet and
RS232/RS485 interfaces
112nd SoC development exampleThe Selective
Read-out Processor (for CMS)
- Part of the CMS electromagnetic calorimeter
read-out - ? Assists in on-line ECAL raw data reduction
Trigger electronics
ECAL Front-end electronics
L1 Accept
Raw data1.5 Mbyte
100 kHz
Read-out
Selective Read-outProcessor
5 µs timing budget
Selected data100 Kbyte
HLT DAQ
- Asynchronous hard real time system
12SRP Boards
- Singe 6U VME crate
- ?12 conceptually identical VME64x compliant
boards - Up to 17 optical communication links at 1.6
Gbit/s each
J0
VME buffers
Power supply
Xilinx V-II Proxc2vp70-6-ff1704
Boundary scan JTAG chain
VMESerial linksAlgorithmsTrigger IF
FPROMs
Memory
Clocksynthesizer
Trigger Interface
Parallel optics
Trigger, timing, and control
Aux.connector
Throttling
TTSOut
Cons., JTAGEthernet
O/E
SRP Tester same hardware, modified firmware
13SRP Application IP core
VME
IPIF slaveinterface
Arbiter
O P B
50 MHz
Ethernet
RS232Console
- Seamless integration in SoC based on Virtex-II
Pro devices - Embedded processor accesses IP resources via
slave interface - 80 MHz pipelined hardware logic to satisfy real
time requirements - Standalone C software on 100 MHz PowerPC to
control and monitor
14SRP Prototyping
- Three development kits
- ? 3 firmware
- ? 3 standalone C applications
Trigger control system emulator(2vp7 also a SoC
design)
Trigger signals overflat cables
- Validate SRP latency and communication channels
- Advance in SRP firmware/software
15Summary
- Flexibility of SoC designs
- ? Diversity of applications with substantially
different requirements - Comfortable development environment
- ? Relatively short learning phase
- ? Common kernel large variety of IP cores and
associated software - ? Well defined interface with user logic
- ? Tradeoff between hardware and software
complexity - ? Running OS on embedded processor (VxWorks,
Linux, Nucleus, RTEMS, ) - Facility of debugging and testing
- ? Simulate individual modules
- ? Debug entire system running a test application
on embedded processor - Performance of hardware and flexibility of
software
163rd example On-board GRB Trigger and Alert
System of the ECLAIRs microsatellite
- Gamma Ray Burst study (4 to 50 keV)
- Compute in near real-time the position of the GRB
in the sky with an accuracy of up to 10 arcmin - Transmit this information on-ground in real-time
and distribute it as fast as possible to other
observatories - On-board 2-level trigger system? first level
counting histogram (hardware)? second level
image processing to localize sources (software,
FFT) - SoC approach for Hardware/Software design of the
DAQ/Trigger sub-system - FPGA soft-core processor (MMU FPU) Real
time OS?Microblaze (Xilinx, 32 bits RISC)?LEON
(Open Source, Spark v8, ESA project)
17ECLAIRs On-board trigger and data-flow
CXG 16 ADC
EGCU
Config/Status/HK CXG
TM/TC
Position Refine -request -answer
Config/Status/HK SXC
SXC 8 modulesoverseer
Config/Status/HK UTS
all photons
AbsTime SatPointing
DAQ
photon lists
bulk-mem
freeze mem
Trigger
GRB alert
X-band
VHF
UTS
18Summary
- Flexibility of SoC designs
- Diversity of applications with substantially
different requirements - Comfortable development environment
- Relatively short learning phase
- Common kernel large variety of IP cores and
associated software - Well defined interface with user logic
- Tradeoff between hardware and software complexity
- Running OS on embedded processor
- Facility of debugging and testing
- Simulate individual modules
- Debug entire system running a test application on
embedded processor - Performance of hardware and flexibility of
software
19Xilinx EDK (Embedded Development Kit)
20Virtex-II Pro Linux boot terminal
21Deconvolution algorithm computing speed study
- decorrelate data from detector with the mask
geometry - 2D Decorrelation
- FFT ( N2 log(N)) versus direct decorrelation
(N4) - N mask_size detector_size 120 80 200
22Hardware implementation of the FFT
algorithm(FPGA via a IP Core)
- ? fixed-point computing Xilinx IP core
(datasheet ds260) FFT1D 256, 24 bits data,
Virtex-II _at_200MHz 1 to 2 µs extrapolate
to 2D FFT 256x256 500 to 1000 µs NOT EASY
to HANDLE (troncature, numeric representation,
etc) - ? floating-point computing IP core from Dillon
or 4DSP - FFT1D 256, data 816 bits, virtex-4_at_200MHz 4
µs extrapolation Virtex-II_at_100MHz 8 µs
extrapolate to 2D FFT 256x256 2000 to 4000
µs expensive IP (19 to 26000 dollars)
target-specific (VHDL sources unavailable). - Write an FFT IP (floating point) several
weeks design directly in VHDL co-design
software C-to-HDL (Handel-C)
23Software implementation of the FFT algorithm (C
langage)
- Avantage development easy, testbench easy to
develop - Embedded processors LEON (SparcV8) or
MICROBLAZE (RISC) - Compilation toolchain GCC
Library FFTW benchmark Pentium4 2.2GHz
24Software implementation of the FFT algorithm (C
langage) -- 2
- Test of the FFTW3 library ./bench irf 256x256
- Desktop Machines (P4 2GHz or Sparc 1.6Ghz) 3ms
(950 MFLOPS) - Extrapolate for LEON on FPGA_at_100MHz 50ms to
100ms (this supposes that FPU can handle such
frequency) - Problem with MICROBLAZE (compiler that can handle
FPU not available), Xilinx FPU performance is
33MFLOPS (Virtex-4_at_200MHz, usable with
Virtex-II_at_100MHz). - To compare, Virtex-II Pro_at_200MHz (Linux
soft-emulated floating point) 1.94 SECONDES
(1.4 MFLOPS) - One can estimate to several 0.1 secondes the
total time of the source localization algorithm - memory 12x256kBytes 3Mbytes RAM