Title: High Speed Data Acquisition and Trigger
1High Speed Data Acquisition and Trigger
2Front end / DAQ / Trigger Requirements
- The overall architecture of front end
electronics, DAQ, and trigger is mainly
determined by the trigger requirements - Main CBM Triggers
- J/?
- Open charm
- Low mass di-leptons
- Long lived resonances (e.g. ?)
Trigger on 2e needs PID and tracking
Near Threshold Major Challenge
Trigger on D pair - one semi-leptonic decay - D
? K? displaced vertex
3DAQ Architecture Antique
Extra trigger signal path
Detector
Front end
Compensates trigger latency
Delay
Trigger
Digitize after trigger
Gate
Max. latency limited by delay cable length
ADC / TDC
Transport selected events to archive
LAM
DAQ
4DAQ Architecture Collider style
Detector
Digitize each bunch crossing
Dead time free
Front end
ADC
f bunch cross
Extra trigger data path
Compensates L1 trigger latency
Often limited size
Pipeline
L1 Trigger
Accept
Transports L1 accepted events to L2 trigger /
archive
DAQ
Often fixed max. latency CMS 4 ?sec
5Triggered Front End Limitations
- Sometimes difficult to build proper L0
- E.g. in nuclear decay spectroscopy where a true
minimum bias is needed to see delayed decays - Buffering in FE is a strong and difficult to
change (and upgrade) constraint - Sometimes difficult to build L1 with a short
(and/or fixed) latency, especially when several
triggers run in parallel - E.g. PANDA, triggers are on channels, there is no
obvious fast selection criterion
6DAQ Architecture - Future
Self-triggered Data Push Architecture
Detector
Dead time free
Self triggered digitization
Front end ADC
clock
Provides absolute time scale
Compensates builder/selector latency
Each hit transported as Address/Timestap/Value
Buffer memory
Practically unlimited size
Event builder and selector
Use time correlation of hits to define
events. Select and archive.
Max. latency uncritical Avr. latency relevant
7Advantages of SDPA
- No special trigger data path
- No fixed maximal L1 trigger latency
- Leaner, more versatile Front end
- Easier upgrade path of Back end
- All signals can contribute to all trigger levels
(limit is crunch power, not connectivity) - Highest flexibility for trigger development
Main reason why PANDA is committed to use such an
architecture
8Planned Experiments with SDPA
Completion 2008
- AGATA (Advanced Gamma Tracking Array)
- 190 Ge detectors 6780 channels
- 300 kHz events/sec _at_ M30
- 1 GB/sec into reconstruction farm
- BTeV (Charm Beauty Decays at FNAL)
- 2.2107 pixel RICH ECAL .
- 7.6 MHz bunch crossing
- 1 TB/sec into L1 buffer memories
- L1 trigger on displaced vertices
DC beam
Completion 2007
Bunched beam
9CBM Challenges
108 ISDN lines 5000 2 Gbps
- Large primary data rate 1 TB/sec
- 107 int/sec 200 part/int
- 50 layers 10 byte/hit
- Large trigger decision rate 107/sec
- Large computing power required
- assume 100 ops/byte
- 100 Tops/sec or 1014 ops/sec
2109 part/sec
Min. Bias
Mostly TRD
104 PCs with 10 GHz ?
1 MW Power ?
10CBM Time Scale
Added on 14.11.02
- SIS 200 completion gt Q4 2011
- CBM installation gt 2010/2011
- CBM production / test gt 2008/2009
- Plan with technology of 2007
- SIA forecast for 2007
- 0.065 ?m process (0.13 ?m today)
- Logic factor 4 denser and faster
- So expect for example
- 10 Gbps serial link speed
- 1 GHz FPGA clock speed
- gt100 kLC FPGA
Optical and Cu at low cost
connectivity
speed
at acceptable cost Today 5000
density
11Proposal for CBM DAQ Architecture
- Inspired (and scaled) from BTeV
- Key difference DC beam
- CBM needs explicit event building
- Evaluate time correlation, event tag hits
- BTeV (et al.) uses implicit event building
- Tag with bunch crossing number in front end
- Front end inspired by AGATA
- Meant to demonstrate feasibility
- Dont take details too serious !!!
12Front End Electronics
Good enough for ToF ?
Detector
Low jitter clock 100 MHz
Analog front end Sampling ADC
Absolute time scale
more channels
Hit detector Parameter estimate
To neighbor cells
t, q,
Mux / Cluster finder Data link interface
Timing Link interface
xy, t, q, q -1, q 1,
DDL
TL
13Front End Essentials
- Where useful and feasible determine time stamp to
a fraction of the clock cycle - Helps event building, reduces pile-up
- Do cluster finding
- Helps to reduce data volume, otherwise dominated
by address and time stamp - Enough bandwidth
- DDL should never saturate, and there is no
trigger to throttle down data flow
14CBM Radiation Hardness Requirements I
- Assume 107 int/sec and 5107 sec on-time
- Assume 2107 h cm-2 ?1 rad
- 1 h cm-2 ? 1.251014 h cm-2 ? 6 Mrad
- Total dose (TID) based on CDR numbers
1.5 yr design luminosity
From ATLAS
Flux per cent. int.
Fluence over life time
Total dose over life time
2º 6º
TRD (5m) 225 krad 75 krad
RPC (10m) 55 krad 20 krad
COTS CMOS fails after 100 krad
15CBM Radiation Hardness Requirements II
- Assume 107 int/sec
- Assume Single Event Upset cross section
- ?SEU 10-10 cm2 per device
- 1 h cm-2 ? 2.5106 h cm-2 s-1 ? 22 SEU/day
- SEUs per day and FPGA
Measured for Virtex FPGA
Flux per cent. int.
Flux
SEU rate
Multiply with of FPGA to get system rate
2º 6º
TRD (5m) 0.80 0.26
RPC (10m) 0.20 0.07
Mitigation reconfigure after each spill
16Some Assumptions for Back End
- 1 TB/sec data over 1024 10 Gbps DDLs
- 1000 FPGAs with 100 kLC sufficient for local and
regional processing - 100 kLC allows 1000 parallel ops
- At 1 GHz this gives 1 Tops /sec / FPGA
- Or a total of 1 Pops / sec in the system
- 1000 DSP/FPGA pairs enough for global L1
processing (average L1 latency 100 ?sec) - Put 4 FPGAs (or DSP/FPGA) on a board
- Power consumption should allow that
1000 ops per byte
500 kops/part 200 kcyc/evt
17Back End Processing
Event tagging
DDLs
Use FPGAs
Data processing Local clustering Regional
tracklet
Active Buffer
L1 Farm
Sw
Use FPGAs and DSPs
Global tracking L1 trigger
Sw
Use PCs
To Archive
L2 Farm
L2 Trigger Raw Formatting
18Back End Data Flow
DDLs
From FEE 1 TB/sec
L1 Switch 200 GB/sec
Active Buffer
Neighbor comm. for regional algorithm
L1 Farm
Sw
L2 switch 10 GB/sec
Sw
To Archive 1 GB/sec
L2 Farm
19Nice scheme
- but how to implement the needed bandwidth for
- Near-neighbor communication
- For event building at L1 level
20Crates and Backplanes
10 Gbps SERDES in CMOS
- Trend use serial point-to-point links
- Parallel shared media busses obsolete
- Look for serial backplane fabrics
- Backplanes Whats available today/tomorrow ?
- PICMG 2.16 C-PCI dual 1G Ether star
- PICMG 2.17 C-PCI 4622 Mbps star
- PICMG 2.20 C-PCI 2.5 Gbps mesh
- VITA 41 (VXS) VME 410 Gbps dual star
- Whats in the pipe ?
- ATCA (Advanced Telecommunications Computing
Architecture) - Base Interface dual 1G Ethernet star
- Fabric Interface 810 Gbps star or mesh
available
available
2.162.20 announced
Infiniband over P0 conn.
21Fabric Types
Dual Star
Full Mesh
Sw
N
N
N
N
N
N
N
N
Sw
Nodes communicate via switch 2 n links needed
Nodes communicate directly n (n-1) links needed
PICMG 2.16 (cPSB) 2 fabric slots for 24 port
switch 18 node slots 72 Gbps BW
PICMG 2.20 (cSMB) 16 slots full mesh 2.5 Gbps
link 700 Gbps BW
22Active Buffer Board
DDL
DDL
DDL
DDL
L1L
FPGA
FPGA
FPGA
FPGA
FPGA
Mem
Mem
Mem
Mem
L2L
4
4
4
4
To 1G Ether dual star backplane
To serial mesh backplane
Assume cSMB and cPSB available
23Active Buffer Crate
cSMB 70 GB/sec internal Bandwidth
DDL
64 Gb/sec input
32 Gb/sec duplex
L1L
cPSB 8 GB/sec internal Bandwidth
L2L
1-2 Gb/sec output
24Event Building Part I
1. Stage Collect global or partial
timestamp histogram
DDL
DDL
DDL
Active Buffer
Active Buffer
Active Buffer
Active Buffer
2. Stage Peak find
3. Stage Tag all hits, use detector specific
time window
- Histogram dataflow is modest
- Tagging dataflow almost negligible
- Can be used to throttle L1 data flow
- L1 routing decided at tagging time
- One hit can be part of multiple events !!
Runs over mesh and L1 net
Prune after tracking
25Event Building Part II
Mesh Backpl.
Mesh Backpl.
1. Stage Collect locally via mesh into one
buffer board
ActiveBuffer Boards
ActiveBuffer Boards
2. Stage Collect globally via L1 links into one
processor
Reduces number of L1 transfers
L1 Sw
L1 Sw
L1 Sw
L1 Sw
Route fixed when event is tagged
Is crates Not boards
Allows to factorize L1 switch
To farm sector 1
To farm sector 2
To farm sector 3
To farm sector 4
Use 8 256 Gbps Avoid 1 2 Tbps
26L1 Processor Board
Emphasize FPGA or DSP as you wish
DSP
DSP
DSP
DSP
L1L
FPGA
FPGA
FPGA
FPGA
FPGA
Mem
Mem
Mem
Mem
L2L
4
4
4
4
Not needed for event parallel algorithms
To 1G Ether dual star backplane
To serial mesh backplane
Mesh helps to factorize L1 switch
27L1 Processor Crate
cSMB 70 GB/sec internal Bandwidth
32 Gb/sec duplex
L1L
cPSB 8 GB/sec internal Bandwidth
L2L
1-2 Gb/sec output
28Back End Data Flow
1024 links 10 Gbps
DDLs
256 links 10 Gbps
256 links 10 Gbps
Can be factorized ! 8(16)switches with 64(32)
ports each 10 Gbps per port
Active Buffer
256 boards in 16 crates
16-32 links 10G Ether
L1 Farm
Sw
256 boards in 16 crates
1 or few switches 48 10G Ether in 20 10G
Ether out
Sw
16 links 10G Ether
To L2 Farm
29Back End Essentials I
- Use as few as possible networks
- Detector Data Links
- L1 Network
- L2 Network
- Use as few as possible protocols stacks..
- Light weight (on DDL and L2 net)
- Ethernet / IP (on L2 net)
- Provide enough bandwidth
- than a versatile back end can be build from a
few building blocks
30Back End Essentials II
- Split processing into
- Local
- Regional
- Global
- Gain density by using most efficient compute
platform - FPGA 20 mW per Gops/sec
- DSP 1 W per Gops/sec
- PCs 20 W per Gops/sec
- High density automatically gives good
connectivity in modern backplanes
Hit level
Cluster, Tracklet
Track, Vertex
31Conclusions
- An SDPA approach seems feasible for CBM, despite
the daunting data rates - There is a large synergy potential between CBM,
PANDA, and S-FRS - AGATA uses already a SDPA, the rest of the S-FRS
community will probably follow sooner or later - PANDA is committed to use SDPA (under the title
S-DAQ). - Experiment time scales differ somewhat, but that
can also be an opportunity
32Conclusions
- Central element of such an architecture is the
clock and time distribution - Many other details, like link and crate
technologies, can and will evolve with time
33Main RD Fronts
- Low jitter clock distribution (ToF quality)
- Front end
- ASICs often needed (density, radhard), but avoid
to be too detector specific - Back end hardware
- Explore serial connection technologies (links,
backplanes, switches, protocols) - Standardize, follow standards
- Define a small set of building blocks
Enough backbone BW keeps designs simple
34Main RD Fronts
- Back end config/firm/software
- Modern hardware is often tool limited
- Investigate development tools
- Develop Parallelized algorithms (essential for
using FPGAs efficiently) - Learn how to efficiently use a mix of
- FPGA (with embedded CPUs)
- DSP
- PCs
- Handling of Fault tolerance, Monitoring, Setup
and Slow Control,.