Title: Diapositiva 1
1A parallel in-time analisys system for Virgo
Leone B. Bosi1 on behalf of the VIRGO
collaboration
1INFN Perugia (Italy)1Dip. di Fisica, Univ. di
Perugia (Italy)
The Problem VIRGO is a detector built to
discover gravitational waves coming from various
sources. The coalescing binaries neutron stars
are among the best candidate. To detect the
signal emitted by them we plan to use the
powerful matched filter algorithm, matching the
data acquired with a bank of reference signal
called templates. This is a demanding
computational technique. One of the main goal
for VIRGO is the realization of a reliable real
time observation strategy in order to use the
interferometer as a Gravitational Waves
observatory. In order to reach such a result we
carefully designed the computational strategy by
addressing the dimension of the problem, the
computational power required and available, the
constrain due to the in-time condition. Finally
we realized Merlino an high performance
distributed signal analyzer.
Merlino Performance Merlino is installed in the
online cluster of Cascina (PI) where VIRGO is
located. This cluster is composed by 64 CPU
Opteron 2200. We did some performance test about
the various section of the Merlino analysis. The
startup operation like the template generation
and the double whitening learning algorithm
provide a time offset in the in-time analysis
that is important to evaluate. To generate a grid
with 10000 templates and the related templates
information, like the bands interval used with
the chi2 test we need 320 s, and this is the
delay that we have to take in to account to
regenerate the grid of template, in case the
sensitivity changes. The double whitening
learning time, is the time spent by the double
whitening algorithm to estimate the noise PSD.
Usually we need 3000 parameters. This implies a
time of 200-300 seconds.
Parallelization Schema To detect gravitational
waves coming from coalescing binaries, the
matched filter essentially compares a data time
slice with some reference signals F, using the
correlation algorithm. The number of filters can
be as high as 20000. The figure here shows an
example of serial analysis and how that is
optimized. The problem is divided in
sub-problems, partitioning the template space and
distributing the charge on different CPUs.
Communication is based on MPI protocol.
Merlino performs the analysis like a single
distributed program. To run the analysis with
6677 templates, with a starting frequency of
60Hz, sample frequency of 4kHz, min match of 98,
mass range 0.9-10 the system require 90GB of
memory allocated by several processes. The
network traffic depends on the triggers rate.
The relevant parameter that describes the
performance of the in-time analysis is the in
time factor. This is the ratio between the time
length of data and the time needed to process it.
With 34 CPU Dual Opteon 2200, applying the
matched filter and the chi2 test (15 bands) 2
we are able to be 3 time faster than the data
acquisition.
The parallel communication is optimized using
broadcast. The serial part of the computation is
parallelized using a pipeline schema with
asynchronous communication. The results are
posted independently to an unique process that
reconstruct and clusterizes the events. Merlino
has been generalized to optimize the data
handling, and it performs independently from the
type of algorithms that can be plugged inside at
run time.
Convolution use Overlap-add schema
- .Some of the features studied and introduced to
optimize the analysis are - The filters/templates are
- Generated
- FFT transformed
- Stored in memory
- The data to process are DOUBLE whitened 1 in
time domain one time for all - The Correlation/Matched Filter is implemented
using the overlap-add method - Conceptually the matched filter algorithm has
been decomposed in various parts, and associated
to different processes. - filters_gen.exe generate Filters/Templates and
transfer these to the filters_exec.exe memory via
MPI. Many instances of the process run to
parallelize the generation. - Loader data conditioning, the so called
double whitening part and the FFT of the input
data are common part and so only one instance is
needed. - filters_exec.exe data from loader are sent
here. In the memory of these processes reside
the templates. Here is completed the matched
filter and chi2 test algorithms. The algorithms
can be changed.
On Line schema Goal of the Merlino Project is to
realize a parallel facility finalized to optimize
and reduce all the possible overhead due to
communication, data handling and processing. The
scope was to carry out one vast analyses,
optimizing the computer power resources.
- The computational framework uses two additional
processes to link the online analysis - FdMPIserver
- Communication wrapper between the internal VIRGO
communication protocol (Cm) and MPI. It is used
to get data. - MSGserver
- Get results and log information from Merlino and
communicates with the Trigger Manager. It is used
to post results and logs.
During the VIRGO Commissioning run C5 Merlino run
inside the on-line chain. The first code ever to
perform massive analysis in-time. It run
continuously for 7 days. This was a technical
test and no malfunctioning was noticed. It
handled 10480 templates with an inTime factor of
1.8, this implies that in this configuration
Merlino online was able to process data 1.8 time
faster than the acquisition rate.
1 E.Cuoco et al. Class.Quant.Grav.21S801-S806,2
004 2 B Allen, A chi-squared time-frequency
discriminator for gravitational wave detection,
gr-qc/0405045.