Title: Efficient algorithms of multidimensional ?-ray spectra compression
1Efficient algorithms of multidimensional ?-ray
spectra compression
- V. Matoušek and M. Morhác
- Institute of Physics, Slovak Academy of Sciences,
Bratislava, Slovakia - Vladislav.Matousek_at_savba.sk
Miroslav.Morhac_at_savba.sk - ACAT 2005, Zeuthen May 22
- 27, 2005
2The measurements of data in nuclear physics
experiments are oriented towards gathering
a large amount of multidimensional data.
- The data are collected in the form of events.
- In a typical experiment with spectrometers
(Gammasphere, Euroball), each coincidence event
consists of a set of n integers (e1, e2, , en),
which are proportional to the energies of the
coincident ?-rays. - Such a coincidence specifies a point in an
n-dimensional hypercube. - Storing of multidimensional data goes very
frequently beyond the available storage media
volumes.
3Multiparameter nuclear data taken from
experiments are typically stored
- directly by events and index the coincidences -
list mode storage - analyzed and stored as multidimensional
histograms (hypercubes) - nuclear spectra. - List of events storing mode has several
disadvantages - enormous amount of information that has to be
written onto storage media (primarily tapes), - long time needed to process the data.
4Multidimensional histograms - nuclear spectra
- Advantages
- Possibility of interactive handling with data.
- It allows easily to create slices of lower
dimensionality. - Disadvantages
- The multidimesional amplitude analysis must be
done. - Storage requirements for multidimensional
hypercubes are enormous - e.g. 3-D ?-?-? coincidence nuclear spectrum with
resolution of 14 bits (16 384 channels) per axis
and 2 Bytes per channel requires 8 TB of memory. - Data often need to be stored in RAM for
interactive handling. - It is needed to compress the multidimensional
nuclear spectra to the size of available memory.
5Suitable data compression techniques must satisfy
these requirements
- Less storage space after compression of the
multidimensional nuclear spectra - Preservation as much information as possible -
minimum data distortion - Fast enough to be suitable for on-line
compression during the experiment - Constrains
- The size of the original multidimensional
spectrum goes beyond the capacity of available
memory. - Data from nuclear experiments are received as a
train of events - they need to be analyzed and
compressed separately event by event - Thus, the multidimensional amplitude analysis
must be performed together with compression,
event by event in on-line acquisition mode.
6Suitable methods widely used
- Binning - neighboring channels are summed
together - loss of information. - Employing natural properties of data - e.g.
symmetry removal from the multidimensional ?-ray
spectra from Gammasphere - no loss of
information. - Use of fast orthogonal transformation algorithms.
- Storing the descriptors of events with counts of
occurrences.
7Symmetry removal
- For instance in multidimensional ?-ray spectra
from Gammasphere one can utilize the property of
symmetry of the data. It holds - for 2-dimensional spectra E(?1, ?2) E(?2,
?1) - for 3-dimensional spectra
8Principle of storage of 2-dimensional symmetrical
data
Two-dimensional symmetrical spectra with
resolution R 4.
- The size of reduced space can be simply expressed
9By composition of triangles of the sizes R, R-1,
..., 2, 1 we get the geometrical shape called
tetrahedron.
- An example of storage 3-dimensional symmetrical
data in the form of tetrahedron. - The size (volume) of the reduced space of the
tetrahedron is
10In case of 4-dimensional data by compositions of
tetrahedrons we obtain hyperhedron of the 4-th
order for R 4.
The volume of the hyperhedron of 4-th order can
be expressed as
11The achievable compression ratios and storage
requirements for typical spectra (14 bit ADCs and
2 Bytes per channel)
Dimensionality of spectra Compression ratio CR Storage requirements MB
2 2 256
3 6 1.25 106
n n!
- Radware package - the author combines both
utilizing the property of symmetry and binning.
Three-fold coincidences are stored in the form of
cubes with the sizes 8 x 8 x 8. Inside of each
cube the data are binned so that they span
entirely the resolution 8192 channels in each
dimension.
12Compression methods using orthogonal
transformations
- The multidimensional array, hypercube, is
transformed into a new data array in transform
domain, where the maximum amount of information
is concentrated into smaller number of elements. - The basic premise is that the transformation of a
signal has an energy distribution more amenable
to retaining the shape of the data than the
spatial domain representation. - Because of the inherent element-to-element
correlation, the energy of the signal in the
transform domain tends to be clustered in a
relatively small number of transform
coefficients.
13The advantages of using fast orthogonal
transforms
- Existing fast algorithms allowing their on-line
implementation. - Linearity of the transforms. The signal that is
being compressed need not to be stored statically
in the memory. Each event can be transformed in
time separately. The predetermined transformed
coefficients are summed (analysis with on-line
compression).
14Fixed kernel orthogonal transforms usually
employed in data compression
- Discrete Cosine, Walsh-Hadamard, Fourier, Hartley
and other transforms. - Haar transform - the first and simplest scaling
function of the mother wavelet suitable for
generating an orthonormal wavelet basis. - The use of classical orthogonal transforms is
very efficient provided that the form of
compressed data resembles the form of the
transform base functions. - The efficiency of the compression strongly
depends on the nature of the experimental data. - Fourier transform and DCT are well suited to
compress cosine and sine data shapes, whereas the
Walsh-Hadamard transform is suitable to compress
rectangular shapes in the input data.
15There arose an idea to modify the shape of the
base functions of the orthogonal transform so
that the maximum possible compression of the
multidimensional spectra can be achieved
- We have proposed the fast orthogonal transform
with transform kernel adaptable to the reference
vectors representing the processed data. - The structure of the signal flow graph is the
Cooley-Tukey's type. -
- The principle of the method consists in direct
modification of the multiplicative coefficients
a, b, c, d of the signal flow graph in such a way
that the base functions approximate the shape of
the reference vector.
16Let us illustrate the method for the case of size
of the transform N 4.
- Signal flow graph of the fast adaptive orthogonal
transform.
17Basic element of the signal flow graph.
- The coefficients of basic element of the signal
flow graph are calculated as - ,
, ,
, - where x0, x1 are values of the reference vector.
- The values y0, y1 at the output are
-
, .
18The transform coefficients are calculated in such
a way that for the reference vector at the input
they transform it into the one point at the
output.
- We have proposed the fast algorithm of on-line
multidimensional amplitude analysis with
compression using adaptive Walsh transform - removes the necessity to store whole spectrum
before compression, compression is performed
event by event, - it is optimized so that only a minimum number of
operations are needed. - The above mentioned principle of adaptability can
be applied also for other transform structures. - The compression is achieved by discarding
pre-selected elements in the transformed
multidimensional array. - Two basic methods for element selection
- zonal sampling
- threshold sampling.
19Block data compression using orthogonal
transforms with symmetry removal
- In case of 3-dimensional space, it is divided
into cubes. Each cube of the size S ? S ? S will
be compressed to the cube of the size C ? C ? C.
- We assume
- The sizes of cubes are equal in all dimensions.
- The number of cubes in each dimension and their
sizes S, C are the power of 2.
20The number of cubes in the tetrahedron is
- Where
- R is the number of channels (e.g. resolution of
ADC), S is the size of cube before the
compression. - For each cube we have to define adaptive
transform and consequently we need to store its
coefficients. - The number of transform coefficients for one
dimension is
21The elements are stored in the float format (4
Bytes). The transform coefficients must be stored
for each dimension, thus to store 3-dimen-sional
compressed data we need
- Bytes of storage media.
- Then in general for D-dimensional data the size
of needed memory is - We have to adhere to the following rules
- the size of the cube of original data S should be
as small as possible. - the size of the cube after compression, C ,
should be the biggest possible (C S), i.e., we
desire smallest possible compression. - the data volume for the chosen combination C, S
must fit the size of memory available.
22The following sizes of cubes were chosen for
block transform compression of multidimensional
?-ray spectra of 16 384 channels per axis and 4
Bytes for each channel.
Dim. of spectra S Channels C Channels Storage MB Compression ratio CR
3 256 8 366 8010
4 1024 8 189.5 63.3 ? 106
5 2048 8 168.4 2.33 ? 1011
- We have compressed histograms for 3-, 4-, 5-fold
?-ray coincidences of the event data from
Gammasphere.
23Examples achieved by employing compression on
3-fold ?-ray spectra with symmetry removal
- Slice from original data (thin line) and
decompressed slice (thick line) from data
compressed by employing binning operation
(Radware)
24Slice from original data (thin line) and
decompressed slice (thick line) from data
compressed by employing adaptive Walsh transform.
25Two-dimensional slice from original data.
26Two-dimensional slice from data compressed by
employing binning operation (Radware).
27Two-dimensional decompressed slice from data
compressed via adaptive Walsh transform.
28Three-dimensional original spectrum (sizes of
spheres are proportional to counts the channels
contain).
29Three-dimensional spectrum decompressed from data
compressed via adaptive Walsh transform. Due to
the smoothing effect of the adaptive transform
some information is lost.
30Similar experiments were done with 4-fold
coincidence ?-ray spectra.
- One-dimensional slice from original 4-dimensional
spectrum (thin line) and the same slice
decompressed from data compressed via adaptive
Walsh transform (thick line). Due to enormous
compression ratio the distortion of data in some
regions is considerable. On the other hand in
some regions the fidelity of the method is
satisfactory.
31Two-dimensional slice from original 4-dimensional
spectrum.
32Two-dimensional slice decompressed from
4-dimensional data compressed via adaptive Walsh
transform
33Compression of multidimensional ?-ray coincidence
spectra using list of descriptors.
- The input data describing an external event can
be expressed using descriptor. Each descriptor
describes fully the event. - This method is based on maintaining the list of
descriptors. - The number of different descriptors, which
actually occurred during an experiment is much
smaller than the number off all possible
descriptors. - So, the multidimensional space has empty regions.
-
- Conventional analyzer - The descriptor defines
the location in the memory at which the counts
(number of occurrences of the descriptor) is
stored. The range of descriptors is defined by
the size of the memory.
34An alternative technique - Store only those
descriptors that actually occurred in the
experiment.
- The correspondence between the location and the
description is lost, it is necessary to store the
descriptor as well as associated counts. - When a new event comes, it must be sorted into
its channel in a list by using its descriptor - The problem is to devise a procedure for
assigning the descriptor location number so that
the time needed to store or read out a descriptor
is minimized. - There exist several retrieval algorithms
35Sequential method An obvious routine for
searching the list on the memory is to compare
the descriptor of a new event with the descriptor
in each location starting at the first one. When
a match is found, the associated count is
increased by one. Such an algorithm is time
consuming and cannot be accepted for on-line
applications.
- Sequential retrieval of events
36Tree method A considerable reduction of access
time can be achieved by using a tree search
algorithm. The descriptor of a new event is
compared repeatedly with descriptors arranged in
a tree. The main disadvantage of this technique
is its complexity and amount of redundant
information given by address pointers.
- Tree search algorithm of event retrieval.
37Partitioning and indexing method - It is the
combination of the two previous methods and is
implemented e.g. in the database Blue for
high-fold ?-ray coincidence data.
- The hypercube is partitioned in high and low
density regions. Each node of the tree represents
a subvolume of the n-dimensional hypercube. The
left and right child nodes represent the bisected
volume of the parent. Associated with each
leaf-node is a sublist of descriptors falling
into appropriate geometric volume. They are
arranged according sequential retrieval
algorithm. - Cromaz M. et al. Blue a database for
high-fold ?-ray coincidence data, NIM A 462
(2001) 519.
38Pseudo-random transformation of addresses of
locations of descriptors. Requirements
- Uniform (or quasi-uniform) distribution of
descriptors over memory addresses for any shape
of multidimensional spectra. - Clusters of descriptors in physical field,
hypercube, must be spread over the whole range of
possible addresses and adjacent descriptors must
go to addresses far away from each other. - Transformation must be fast, so that it can be
applied on-line for high-rate experiments. - Unlimited number of methods of generation of
pseudorandom numbers - residues of modulo operations, Hammings code
technique, transformation through the division of
polynomials, etc.
39One of the methods satisfying the above stated
criteria and give pseudorandom distribution is
based on the assignment of inverse number (in the
sense of modulo arithmetic) to each address in
original space.
- where M is prime.
- This operation can be carried out through the use
of look-up table of pre-computed inverse numbers. - Through the transformation each descriptor
uniquely derives its storage address. - There is possibility of more descriptors being
transformed to the same address. To overcome this
serious limitation, the transformation is used
only to generate an address at which to start
searching in the bucket of descriptors.
40A list of successive locations, where d is the
depth of searching are checked
- If in a location the descriptor coincides with
read out descriptor, the counts in this location
are incremented. - If no descriptor coincides with read out
descriptor and in a location within search depth
is empty location, the descriptor is written to
this location and its count is set to 1. - If there is no empty location within search depth
and no descriptor coincides with read out
descriptor, additional processing is done. - During the experiment, the events with higher
counts (statistics) occur earlier and therefore
there is a higher probability that free positions
will be occupied by statistically relevant
events. - One can utilize additional information and to
store the events with the highest weights, i.e.,
the highest probability of occurrence.
41Provided that all locations for the depth d are
occupied and the descriptor did not occur in this
region, we scan the region once more and find the
event with the smallest probability of occurrence
- Then we compare the probability of the processed
event pk with pj. If pk gt pj we replace the
descriptor in the position j with descriptor of
the processed event and we set the counts of the
event to 1. Otherwise, the processed event is
ignored. - How to determine the probabilities of the
occurrences of events? - Several approaches are possible in practice.
- One of them is to utilize marginal (projection)
spectra for each dimension. Then for
n-dimensional event with the event values
42this probability can be defined
- where si is marginal spectrum for dimension i.
- However many other definitions and approaches are
possible. - Example of 3-fold coincidence ?-ray spectra
storing - The descriptor of each event contains the
addresses x, y, z and counts (short integers),
i.e., each event takes 8 Bytes. - We utilize again the property of symmetry of the
multidimensional ?-ray spectra. Then chosen prime
module has to satisfy the condition
43For the 384 MB memory we have chosen the prime
module M 601.
Assignment between numbers from ring lt1,600gt and
their modulo inverse numbers.
44Spectrum of distances between two adjacent modulo
inverse numbers.
- One can observe great scattering in these
distances. This allows quasi-uniform distribution
of descriptors in the transformed area.
45We utilize the property of symmetry in ?-ray
coincidence spectra.
- The algorithm of calculation of the address of an
event in the transformed area - arrange the coordinates so that x y z
- calculate ,
, - calculate ,
, - calculate address in the transformed area
- This defines the beginning position of the
searching for a given descriptor.
46The whole linear array of descriptors (36 361 808
items) have been mapped to the 16384 channels
spectrum. One can observe quasi-constant
distribution, which witnesses about quasi-uniform
distribution of descriptors over all memory
addresses in the transform domain.
- Distribution of descriptor counts in the
transformed domain.
47Prime module M, memory requirements and achieved
compression ratio for 3-, 4- and 5-fold ?-ray
spectra (16 384 channels in each dimension)
Dim. of spectra Prime module M Storage MB Compression ratio CR
3 601 290.9 30 239
4 157 262.9 33 452
5 73 237.0 37 100
- The searching depth for all cases is 1000 events.
48Three-fold coincidence spectra.
- High counts region of 1-dimesional slice from
original data (thick line) and corresponding
region from compressed data (thin line).
49Low counts region of slice from original data
(thick line) and corresponding from compressed
data (thin line).
50Influence of searching depth on quality of
decompressed spectra
- Increasing the length of searching in the buffer
of compressed events improves the preservation of
the peak heights. - In all spectra we subtracted background.
51Narrow (one peak wide) 1-dimensional slice from
non-compressed original data (thick line) and
compressed 3-dimensional array (thin line), for
the searching depth1000 events.
52Two-dimensional slice from original 3-dimensional
data.
53Reconstructed 2-dimensional slice from
compressed3-dimensional data.
54Three-dimensional slices from both original and
compressed events.
- Original 3-dimensional data.
55Decompressed 3-dimensional data.
56Four-fold coincidence events.
- Part of 1-dimensional slices from non-compressed
original (thick line) and compressed
4-dimensional (thin line) arrays.
57- Two-dimensional slice from original 4-dimensional
data.
58Two-dimensional slice from compressed
4-dimensional data.
59- Three-dimensional slice from original
4-dimensional data.
60Three-dimensional slice from compressed
4-dimensional data.
61Examples of 4-dimensional slices from 4-fold
coincidence data in pies display mode.
- Original 4-dimensional data. The sizes of balls
are proportional to the volumes and the colors in
the pies correspond to the content of channels in
the 4-th dimension (64 channels in x, y, z
dimensions and 16 channels in v dimension).
62Four-dimensional slice from compressed
4-dimensional data.
- Decompressed slice. Big peaks correspond in both
data, however in small peaks some differences can
be observed.
63Examples of applying compression methods to
5-fold coincidence data.
- A part of one-dimensional slices from original
(thick line) and compressed (thin line) 5-fold
coincidence events.
64- Two-dimensional slice from original 5-dimensional
data.
65Two-dimensional slice from compressed
5-dimensional data.
66Conclusion
- In the talk were presented new methods of
multidimensional coincident ?-ray spectra
compression - Using fast adaptive orthogonal transforms.
- Using method of retaining the list of descriptors
scattered in the compressed area using
pseudorandom address transformation method. - The processed data have the property of symmetry
in their nature. In both cases the symmetry
removal methods are implemented directly in
compression algorithms. - A new class of adaptive transforms with the
transformation kernel modifiable to the reference
vectors that reflect the shape of the compressed
data were presented.
67Methods of compression used
- Orthogonal transforms - the compression is
achieved by removal of redundant and irrelevant
data components. - List of descriptors - the compression is achieved
on account of quasi-uniform distribution of the
data in the transform domain space and thus by
its more efficient utilization. - The algorithms are designed to be employed for
on-line compression during experiment. - After the experiment the operator can decompress
any slices of equal or lower dimensionality from
the compressed data. - For nuclear spectra, both methods proved to give
better results as the classical ones and allow to
achieve higher compression ratios with less
distortion.
68Some relevant publications
- Morhác M., Matoušek V. Multidimensional nuclear
spectra compression using fast adaptive Fourier -
based transforms, Computer Physics
Communications, 165 (2005) 127. - Matoušek V., Morhác M., Kliman J., Turzo I.,
Krupa L., Jandel M., Efficient storing of
multidimensional histograms using advanced
compression techniques, NIM A 502 (2003) 725. - Morhác M., Matoušek V., A new method of on-line
multiparameter analysis with compression, NIM A
370 (1996) 499. - Morhác M., Matoušek V., Turzo I., Multiparameter
data acquisition and analysis system with on-line
compression. IEEE Transaction on Nuclear Science
43 (1996) 140. - Morhác M., Kliman J., Matoušek V., Turzo I.,
Integrated multi-parameter nuclear data analysis
package, NIM A 389 (1997) 89.