Title: CBM DAQ and Event Selection
1CBM DAQ and Event Selection
- Walter F.J. Müller, GSI, Darmstadt
- for the CBM Collaboration
- Topical Workshop Advanced Instrumentation for
Future Accelerator Experiments, Bergen, Norway,
4-6 April 2005
2Outline
- CBM (very briefly)
- observables
- setup
- FEE/DAQ/Trigger
- requirements
- challenges
- strategies
3CBM at FAIR
SIS 100 Tm SIS 300 Tm U 35 AGeV p 90 GeV
Compressed Baryonic MatterExperiment
4CBM Physics Topics and Observables
- In-medium modifications of hadrons
- ? onset of chiral symmetry restoration at high
?B ? measure ?, ?, ? ? ee- (µ µ-)
open charm D0, D - Strangeness in matter
- ? enhanced strangeness production ? measure
K, ?, ?, ?, ? - Indications for deconfinement at high ?B
- ? anomalous charmonium suppression ? ?
measure D0, D - J/? ? ee- (µ µ-)
- Critical point
- ? event-by-event fluctuations
- ? measure p, K
Good e/p separation
Vertex detector
Low cross sections? High interaction rates?
Selective Triggers
Hadron identification
5CBM Setup
? Radiation hard Silicon pixel/strip detectors in
a magnetic dipole field ? Electron detectors
RICH TRD ECAL pion suppression up to 105 ?
Hadron identification RPC, RICH ? Measurement
of photons, p0, ?, and muons ECAL
6CBM and HADES
All you want to know about CBMTechnical Status
Report (400 p)now available under http//www.gsi.
de/documents/DOC-2005-Feb-447-1.pdf
7Meson Production in central AuAu
W. Cassing, E. Bratkovskaya, A. Sibirtsev, Nucl.
Phys. A 691 (2001) 745
10 MHz interaction rateneeded for 10-15 A GeV
SIS300
8A Typical AuAu Collision
Central AuAu collision at 25 AGeV URQMD
GEANT 160 p 170 n 360 ?-
330 ? 360 ?0 41 K 13 K-
42 K0
? 107 AuAu interactions/sec ? 109
tracks/sec to reconstruct for first level event
selection
9CBM Trigger Requirements
assume archive rate few GB/sec 20 kevents/sec
- In-medium modifications of hadrons
- ? onset of chiral symmetry restoration at high
?B ? measure ?, ?, ? ? ee-
open charm (D0, D) - Strangeness in matter
- ? enhanced strangeness production ? measure
K, ?, ?, ?, ? - Indications for deconfinement at high ?B
- ? anomalous charmonium suppression ? ?
measure D0, D - - J/? ? ee
- Critical point
- ? event-by-event fluctuations
- ? measure p, K
offline
trigger
trigger ondisplaced vertex
offline
drives FEE/DAQarchitecture
trigger
trigger
trigger on high pt e - e- pair
offline
10Open Charm Detection
- Example D0 ? K-? (3.9 c? 124.4 ?m)
- reconstruct tracks
- find primary vertex
- find displaced tracks
- find secondary vertex
target
few 100 µm
5 cm
- high selectivity because combinatorics is reduced
first two planesof vertex detector
11CBM DAQ Requirements Profile
- D and J/? signal drives the rate capability
requirements - D signal drives FEE and DAQ/Trigger requirements
- Problem similar to B detection, like in LHCb or
BTeV (rip) - Adopted approach
- displaced vertex 'trigger' in first level, like
in BTeV (rip) - Additional Problem
- DC beam ? interactions at random times
- ? time stamps with ns precision needed
- ? explicit event association needed
- Current design for FEE and DAQ/Trigger
- Self-triggered FEE
- Data-push architecture
12Conventional FEE-DAQ-Trigger Layout
Especially instrumented detectors
Detector
L0 Trigger
fbunch
Trigger Primitives
Dedicated connections
FEE
Cave
Limited capacity
Shack
L1 Accept
DAQ
Modest bandwidth
L2 Trigger
L1 Trigger
Limited L1 trigger latency
Specialized trigger hardware
Standard hardware
Archive
13Limits of Conventional Architecture
Decision time for first level trigger
limited. typ. max. latency 4 µs for LHC
Not suitable for complex global triggers like
secondary vertex search
Only especially instrumented detectors can
contribute to first level trigger
Limits future trigger development
Large variety of very specific trigger hardware
High development cost
14The way out .. use Data Push Architecture
Especially instrumented detectors
Detector
L0 Trigger
fbunch
Trigger Primitives
fclock
Dedicated connections
FEE
Timedistribution
Cave
Limited capacity
Shack
L1 Accept
DAQ
High bandwidth
Modest bandwidth
L1 Trigger
Limited L1 trigger latency
Specialized trigger hardware
Special hardware
Standard hardware
Archive
15The way out ... use Data Push Architecture
Detector
fclock
FEE
Cave
Shack
DAQ
High bandwidth
Special hardware
Archive
16The way out ... use Data Push Architecture
Detector
Self-triggered front-end Autonomous hit detection
fclock
FEE
No dedicated trigger connectivity All detectors
can contribute to L1
Cave
Shack
DAQ
Large buffer depth available System is
throughput-limited and not latency-limited
High bandwidth
Modular design Few multi-purpose rather many
special-purpose modules
Special hardware
Use term Event Selection
Archive
17Front-End for Data Push Architecture
- Each channel detects autonomously all hits
- An absolute time stamp, precise to a fraction of
the sampling period, is associated with each hit - All hits are shipped to the next layer (usually
concentrators) - Association of hits with events done later using
time correlation - Typical Parameters
- with few 1 occupancy and 107 interaction rate
- some 100 kHz channel hit rate
- few MByte/sec per channel
- whole CBM detector 1 Tbyte/sec
18Typical Self-Triggered Front-End
Use sampling ADC on each detector channel running
with appropriate clock
- Average 10 MHz interaction rate
- Not periodic like in collider
- On average 100 ns event spacing
a 126 t 5.6
a 114 t 22.2
amplitude
Time is determined to a fraction of the sampling
period
100
threshold
50
time
0
5
10
15
20
25
30
19Toward Multi-Purpose FEE Chain
preFilter
digital Filter
Hit Finder
Backend Driver
PreAmp
ADC
Anti-AliasingFilter
Sample rate 10-100 MHz Dyn. range 8...12 bit
'Shaping' 1/t Tailcancellation Baselinerestorer
Hit parameter estimators Amplitude Time
Clustering Buffering Link protocol
see talk V. Lindenstruthsee talk L. Musa
All potentially in one mixed-signal chip
20CBM DAQ and Online Event Selection
- More than 50 of total data volume relevant for
first level event selection - Aim for simplicity
- Ansatz
- do (almost) all processing done after the build
stage - Simple two layer approach
- 1. event building
- 2. event processing
- Other scenarios are possible, putting more
emphasis on - do all processing as early as possible
- transfer data only then necessary
neededfor D
neededfor J/µ
usefullfor J/µ
STS, TRD, and ECAL data usedin first level event
selection
21Logical Data Flow
Concentratorsmultiplex channelsto high-speed
links
Time distribution
Buffers
Build Network
Processing resources forfirst level event
selectionstructured in small farms
Connection to'high level' selection processing
22Bandwidth Requirements
Data flow 1 TB/sec
Gilder helps
Moore helps
1st level selection 1014-15 operation/sec
100 Sub-Farms
Data flow few 10 GB/sec
to archive few 1 GB/sec
23Focus on CNet
24Self-Triggered FEE Output Format I
FEE
Output of a FEE chipis a list of hits Each hit
has a timestampplus other information
Output of asingleFEE chip
17 15 ... 68 34 ... 134 18 ... 135 19 ... 123
4 33 ...
TimeStamp
Channeladdress
other valuesamplitudespulse shape
!! Time Stamp values can increase forever !! ?
How to express absolute time efficiently ?
25Handle the infinite Time Axis
1. Subdivide Time in Epochs
2. Express a timerelative to an epoch
practical epochlength about 10 µs
3. Introduce Epoch Markers
Epoch 1
Epoch 2
Epoch 3
Epoch 4
(2, 137 ns)
(3, 314 ns)
Time
A Hit
An EpochMarker
26Self-Triggered FEE Output Format II
Output of a FEE chipis a list of hits andepoch
markers Each hit has a timestampplus other
information
FEE
M 1 H 17 15 ... H 68 34 ... H 134 18 ... H 135 19
... H 1234 33 ... M 2 M 3 H 258 19 ...
Hit
EpochMarker
Hit with effective timestamp (3, 258)
Recordtype
27Self-Triggered FEE Concentrators
M 1 H 18 2007 ... M 2 H 589 2134 ... M 3 H 258 271
4 ...
time
address
FEE
FEE
M 1 H 17 15 ... H 68 34 ... H 134 18 ... H 135 19
... H 1234 33 ... M 2 M 3 H 258 19 ...
M 1 H 17 15 ... H 18 2007 ... H 68 34 ... H 134 18
... H 135 19 ... H 1234 33 ... M 2 H 589 2134 ...
M 3 H 258 19 ... H 258 2714 ...
Seems prudentto keep dataalways sortedin time
A concentrator mergesthe data streams
andeliminates redundantepoch markers
28FEE Data Clusters I
- In many subsystems a particle causes correlated
hits in physically neighboring detector cells
(STS, TRD, ECAL) - Depending on detector subsystem
- the cluster pattern is 1d or 2d
- contained in one FEE chip or not
- examples in CBM
- STS-MAPS 2d contained
- STS-Strip 1d mostly contained
- TRD 1d mostly contained to 2d often
uncontained depending on pad geometry (varies
inside?outside) - RPC t.b.d.
- ECAL 2d many uncontained
Note for 2d a 16(64) channel chip has ¾(½) of
channels on perimeter !
29FEE Data Clusters II
- Usually one wants to read very low amplitude hits
in the tail of a cluster - low channel hit threshold might give to much
noise - ? read only low amplitude hit if in neighborhood
of a big one - ? how to handle clusters crossing a chip border ?
- use two thresholds
- high threshold determines particle hit and region
of interest - RoI communicated to all relevant neighbors
- low amplitude hits in RoI are validated and send
- ? this implies cross communication on CNet
between FEE chips...
Better named FNet
If RoI are communicated, CNet becomes a real
network !!
see talk V. Lindenstruthsee talk L. Musa
30Focus on BNet
31Event Building Alternatives
- Straight event-by-event approach
- data arrives on 1000 links
- 100 byte per event and link
- 1010 packets/sec to handle...
- Handle time intervals or event intervals
- 10 µs or 100 events seems reasonable
- Very regular and fully controlled traffic
pattern - data traffic can be scheduled to avoid network
congestion - a large fraction of the switch bandwidth can be
used
32Networking I
- High-speed networking
- high density connectors
- 2.5 Gbps SerDes now 100 mW
- 480 Gbps InfiniBand switch on one chip
- DDR and QDR link speeds will come
- just wait and see
- Mellanox MTA4739624 port InfiniBand switch
- 4x ports, 1 Gbyte/sec per port
- ? 96 x 2.5 Gbps SERDES
- 480 Gbps aggregate B/W
- Single chip implementation
- 961 ball BGA
- 18 W power dissipation
- Double data rate version (5 Gbps per link) in
pipe....
33Networking II
- TODAY
- Voltaire ISR 9288 switch
- 288 4x ports non-blocking
- cost today 120 kEUR (or 400 EUR/port)
- 288 GByte/sec switching bandwidth
- likely in a few years
- 288 4x port QDR
- likely same or lower cost
- 1152 GByte/sec switching speeds
- adequate for CBM...
- Conclusion
- BNet switch is not a major issue
34Focus on PNet
35Network Characteristics
Data PushDatagram'serrors markedbut not
recovered
Request/Responseand Data PushTransactionserrors
recovered
36L1 Event Selection Farm Layout
- Current working hypothesis CPU FPGA hybrid
system (proviso follows) - Use programmable logic for cores of algorithms
- Use CPU for the non-parallelizable parts
- Use serial connection fabric (links and switches)
- Modular design (only few board types)
FPGA
37Network Summary
- 5 different networks with very different
characteristics - CNet
- medium distance, short messages, special
requirements - connects custom components (FEE ASICs)
- TNet
- broadcast time (and tags), special requirements
- BNet
- naturally large messages, Rack-2-Rack
- PNet
- short distance, most efficient if already
'build-in' - connects standard components (FPGA, SoCs)
- HNet
- general purpose, to rest of world
FEE Interfaces and CNet will be co-developed.
Depends on clock/time distribution is done
Custom
Potentially build with CNet components
Custom
Probably uncritical
Ethernet, Infiniband,...
Look at emerging technologiesStay open for
changes and surprisesCost efficiency is key here
!!
PCIe,ASI,....
Whatever the implementation is, it will be
called Ethernet...
Ethernet
38Algorithms
- Performance of L1 feature extraction algorithms
is essential - critical in CBM STS tracking vertex
reconstruction TRD
tracking and Pid - Look for algorithms which allow massive parallel
implementation - Hough Transform Trackerneeds lots of bit level
operations, well suited for FPGA - Cellular Automaton tracker
- Other approaches to be evaluated
- Co-develop tracking detectors and analysis
algorithms - L1 tracking is necessarily speed optimized? more
detector granularity and redundancy needed - Aim for CBMValidate final hardware design with
at least 2 trackers suitable for L1
39Algorithms an Example
- Hough Transform
- assume track comes from (close to) primary vertex
- map each measurement into 'Hough space'
- a peak in Hough space indicates a real track
- is a 'global' method
- needs substantial amount of calculation to fill
and analyze the histograms - Many, but very simple operations
- allows massively parallel implementation
40Hough-Transform Implementation
41Hough-Transform Implementation
Very suitable forimplementation inprogrammable
logic (FPGA's)
Other track finderapproaches, likecellular
automatatracker, also underinvestigation
42Interim Summary
- Event definition has changed
- now based on time stamps and time correlation
- Role of DAQ has changed
- DAQ is simply responsible to transport data from
producers to consumers - Role of 'Trigger' has changed
- filter events delivered by DAQ
- 'Online Event Selection' is better term
- System aspects
- 'online' 'offline' boundary blurs
- more COTS (commercial off the shelf) components
- much more modular system
- much more adaptable system
- This is emerging technology in HEP, though
baseline for ILCHowever being used since many
years in nuclear structure
43Moore quo vadis ?
- Will price/performance of computing continue to
improve ? - What are limits of CMOS technology ?
- Where are the markets ? What are market forces ?
- Technology
- most of the gain comes from architecture anyway
- conventional designs, especially x86, reach their
limits - Markets
- end of the metal-box PC age ? Laptops PDA
all kind of dedicated boxes (Video, Games) - end of the binary compatibility age ?
intermediate code 'Just in Time' Compilers
(JIT)
There is life after Intel x86A lot of
architectural innovation ahead
44BlueGene vs Cell Processor
BlueGene121 mm2 130 nm2.8/5.6 DP GFlop
STI Cell221 mm2 90 nm256 SP GFlop 30 DP
GFlop 25 GB/sec mem 78 GB/sec IO
Finally presentedon ISSCC 2005
SPE Synergistic Processing Element
International Solid-State Circuit Conf.
45BlueGene vs Cell Processor
Developed by IBMMarket national security
science Budget 100 M
Developed bySony, Toshiba and IBMMarket
VIDEOGAMESBudget 500 M
High performance computing is driven now by
embedded systems(games, video, ....) ?
Science is a spin-off, at best ...
46STI Cell Processor
- 'normal' PowerPC CPU
- 8 Synergistic Processing Element (SPE) each with
- 258 kB memory
- 128 x 128 bit registers
- 4 SP floating point units
- own instruction stream
- 32 multiply/add per clock cycle
- runs at gt 4 GHz
221 mm2 die sizein 90 nm
47Game Processors as Supercomputers ?
Slide from CHEP'04 Dave McQueeneyIBM CTO US
Federal
48CPU and FPGA paradigms merge
Conventional CPU
SIMD (single instruction multiple data) CPU
Register
Wide Register
Control
Control
ALU
ALU
ALU
ALU
ALU
Configurable Instruction Set CPU
Wide Register
arithmeticresources
ALU
ALU
ALU
ALU
ALU
ALU
Control
PSM
PSM
PSM
PSM
PSM
ALU
ALU
ALU
ALU
ALU
ALU
configurableconnectionfabric
PSM
PSM
PSM
PSM
PSM
ALU
ALU
ALU
ALU
ALU
ALU
49Configurable Instruction Set Processor
- Example Stretch S5xxx
- Hybrid design
- conventional fixed instruction set part
- plus configurable instruction set part
- C/C compiler analyses the kernel of algorithms
- generates custom instruction set
- generates code to use it
- The promise
- easy of use of C/C
- performance of an FPGA
Stretch S5 engine
Fabric is the keyword
interconnected resources
from Stretch Inc. product brief
50CPU and FPGA paradigms merge
CPU
Processorindustryworld view
A lot of innovation in the years to
come Essential will be availability of
efficient development tools
configurablelogic
configurablelogic
FPGAindustryworld view
Moore will go on ! There are the technologies
There are the markets Architectural changes ahead
CPU
CPU
51Summary
Substantial RD needed
- Self-triggered FEE
- autonomous hit detection, time-stamping with ns
presision - sparsification, hit buffering, high output
bandwidth - High bandwidth event building network
- handle 10 MHz interaction rate in Au-Au
- also cope with few 100 MHz interaction rate in
p-p, p-A - likely be done in time slices or event slices
- L1 processor farm
- feasible with PC FPGA Moore (needed 2014)
- but look beyond todays PC's and FPGA's
- Efficient algorithms (109 tracks/sec)
- co-design of critical detectors and tracking
software
Quitedifferentfrom thecurrentLHC
styleelectronics
RII3-CT-2004-506078
52The End
Thanks for your attention
53CBM Collaboration 41 institutions, 15 countries
China Hua-Zhong Univ., Wuhan Croatia RBI,
Zagreb Cyprus Nikosia Univ. Czech
Republic Czech Acad. Science, Rez Techn. Univ.
Prague  France IReS Strasbourg Germany
Univ. Heidelberg, Phys. Inst. Univ. HD,
Kirchhoff Inst. Univ. Frankfurt Univ.
Kaiserslautern Univ. Mannheim Univ.
Marburg Univ. Münster FZ Rossendorf GSI Darmstadt
Russia CKBM, St. Petersburg IHEP Protvino INR
Troitzk ITEP Moscow KRI, St. Petersburg Kurchatov
Inst., Moscow LHE, JINR Dubna LPP, JINR
Dubna LIT, JINR Dubna MEPhi, Moskau Obninsk State
Univ. PNPI Gatchina SINP, Moscow State Univ. St.
Petersburg Polytec. U. Spain Santiago de
Compostela Uni. Ukraine Shevshenko Univ. ,
Kiev
Hungaria KFKI Budapest Eötvös Univ.
Budapest Korea Korea Univ. Seoul Pusan National
Univ. Norway Univ. Bergen Poland Krakow
Univ. Warsaw Univ. Silesia Univ.
Katowice  Portugal LIP Coimbra Romania NIPNE
Bucharest
54FPGA Basic Building Block
CLB Configurable Logic Block
CLB
X
F0
XQ
D
Q
F1
LUT
F2
C
F3
CLK
Elementarystorage unit
Universallogic gate
Look-up Tablejust a 4x1 RAM
D Flip-Flop
55FPGA Putting it together
CLB
CLB
CLB
CLB
ConfigurableLogic Block
PSM
PSM
PSM
Wiring
CLB
CLB
CLB
CLB
Programmableswitch matrix
PSM
PSM
PSM
I/O blocks
CLB
CLB
CLB
CLB
PSM
PSM
PSM
Modern FPGA'sgt100.000 LUT 500 MHz
CLB
CLB
CLB
CLB