High Speed Data Acquisition and Trigger

About This Presentation

Title:

High Speed Data Acquisition and Trigger

Description:

DDL should never saturate, and there is no trigger to throttle down data flow ... Can be used to throttle L1 data flow. L1 routing decided at tagging time ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 35

Provided by: wwwlin

Category:

more less

Transcript and Presenter's Notes

Title: High Speed Data Acquisition and Trigger

1
High Speed Data Acquisition and Trigger

Walter F.J. Müller
GSI

2
Front end / DAQ / Trigger Requirements

The overall architecture of front end
electronics, DAQ, and trigger is mainly
determined by the trigger requirements
Main CBM Triggers
J/?
Open charm
Low mass di-leptons
Long lived resonances (e.g. ?)

Trigger on 2e needs PID and tracking
Near Threshold Major Challenge
Trigger on D pair - one semi-leptonic decay - D
? K? displaced vertex
3
DAQ Architecture Antique
Extra trigger signal path
Detector
Front end
Compensates trigger latency
Delay
Trigger
Digitize after trigger
Gate
Max. latency limited by delay cable length
ADC / TDC
Transport selected events to archive
LAM
DAQ
4
DAQ Architecture Collider style
Detector
Digitize each bunch crossing
Dead time free
Front end
ADC
f bunch cross
Extra trigger data path
Compensates L1 trigger latency
Often limited size
Pipeline
L1 Trigger
Accept
Transports L1 accepted events to L2 trigger /
archive
DAQ
Often fixed max. latency CMS 4 ?sec
5
Triggered Front End Limitations

Sometimes difficult to build proper L0
E.g. in nuclear decay spectroscopy where a true
minimum bias is needed to see delayed decays
Buffering in FE is a strong and difficult to
change (and upgrade) constraint
Sometimes difficult to build L1 with a short
(and/or fixed) latency, especially when several
triggers run in parallel
E.g. PANDA, triggers are on channels, there is no
obvious fast selection criterion

6
DAQ Architecture - Future
Self-triggered Data Push Architecture
Detector
Dead time free
Self triggered digitization
Front end ADC
clock
Provides absolute time scale
Compensates builder/selector latency
Each hit transported as Address/Timestap/Value
Buffer memory
Practically unlimited size
Event builder and selector
Use time correlation of hits to define
events. Select and archive.
Max. latency uncritical Avr. latency relevant
7
Advantages of SDPA

No special trigger data path
No fixed maximal L1 trigger latency
Leaner, more versatile Front end
Easier upgrade path of Back end
All signals can contribute to all trigger levels
(limit is crunch power, not connectivity)
Highest flexibility for trigger development

Main reason why PANDA is committed to use such an
architecture
8
Planned Experiments with SDPA
Completion 2008

AGATA (Advanced Gamma Tracking Array)
190 Ge detectors 6780 channels
300 kHz events/sec _at_ M30
1 GB/sec into reconstruction farm
BTeV (Charm Beauty Decays at FNAL)
2.2107 pixel RICH ECAL .
7.6 MHz bunch crossing
1 TB/sec into L1 buffer memories
L1 trigger on displaced vertices

DC beam
Completion 2007
Bunched beam
9
CBM Challenges
108 ISDN lines 5000 2 Gbps

Large primary data rate 1 TB/sec
107 int/sec 200 part/int
50 layers 10 byte/hit
Large trigger decision rate 107/sec
Large computing power required
assume 100 ops/byte
100 Tops/sec or 1014 ops/sec

2109 part/sec
Min. Bias
Mostly TRD
104 PCs with 10 GHz ?
1 MW Power ?
10
CBM Time Scale
Added on 14.11.02

SIS 200 completion gt Q4 2011
CBM installation gt 2010/2011
CBM production / test gt 2008/2009
Plan with technology of 2007

SIA forecast for 2007
0.065 ?m process (0.13 ?m today)
Logic factor 4 denser and faster
So expect for example
10 Gbps serial link speed
1 GHz FPGA clock speed
gt100 kLC FPGA

Optical and Cu at low cost
connectivity
speed
at acceptable cost Today 5000
density
11
Proposal for CBM DAQ Architecture

Inspired (and scaled) from BTeV
Key difference DC beam
CBM needs explicit event building
Evaluate time correlation, event tag hits
BTeV (et al.) uses implicit event building
Tag with bunch crossing number in front end
Front end inspired by AGATA
Meant to demonstrate feasibility
Dont take details too serious !!!

12
Front End Electronics
Good enough for ToF ?
Detector
Low jitter clock 100 MHz
Analog front end Sampling ADC
Absolute time scale
more channels
Hit detector Parameter estimate
To neighbor cells
t, q,
Mux / Cluster finder Data link interface
Timing Link interface
xy, t, q, q -1, q 1,
DDL
TL
13
Front End Essentials

Where useful and feasible determine time stamp to
a fraction of the clock cycle
Helps event building, reduces pile-up
Do cluster finding
Helps to reduce data volume, otherwise dominated
by address and time stamp
Enough bandwidth
DDL should never saturate, and there is no
trigger to throttle down data flow

14
CBM Radiation Hardness Requirements I

Assume 107 int/sec and 5107 sec on-time
Assume 2107 h cm-2 ?1 rad
1 h cm-2 ? 1.251014 h cm-2 ? 6 Mrad
Total dose (TID) based on CDR numbers

1.5 yr design luminosity
From ATLAS
Flux per cent. int.
Fluence over life time
Total dose over life time
2º 6º
TRD (5m) 225 krad 75 krad
RPC (10m) 55 krad 20 krad
COTS CMOS fails after 100 krad
15
CBM Radiation Hardness Requirements II

Assume 107 int/sec
Assume Single Event Upset cross section
?SEU 10-10 cm2 per device
1 h cm-2 ? 2.5106 h cm-2 s-1 ? 22 SEU/day
SEUs per day and FPGA

Measured for Virtex FPGA
Flux per cent. int.
Flux
SEU rate
Multiply with of FPGA to get system rate
2º 6º
TRD (5m) 0.80 0.26
RPC (10m) 0.20 0.07
Mitigation reconfigure after each spill
16
Some Assumptions for Back End

1 TB/sec data over 1024 10 Gbps DDLs
1000 FPGAs with 100 kLC sufficient for local and
regional processing
100 kLC allows 1000 parallel ops
At 1 GHz this gives 1 Tops /sec / FPGA
Or a total of 1 Pops / sec in the system
1000 DSP/FPGA pairs enough for global L1
processing (average L1 latency 100 ?sec)
Put 4 FPGAs (or DSP/FPGA) on a board
Power consumption should allow that

1000 ops per byte
500 kops/part 200 kcyc/evt
17
Back End Processing
Event tagging
DDLs
Use FPGAs
Data processing Local clustering Regional
tracklet
Active Buffer
L1 Farm
Sw
Use FPGAs and DSPs
Global tracking L1 trigger
Sw
Use PCs
To Archive
L2 Farm
L2 Trigger Raw Formatting
18
Back End Data Flow
DDLs
From FEE 1 TB/sec
L1 Switch 200 GB/sec
Active Buffer
Neighbor comm. for regional algorithm
L1 Farm
Sw
L2 switch 10 GB/sec
Sw
To Archive 1 GB/sec
L2 Farm
19
Nice scheme

but how to implement the needed bandwidth for
Near-neighbor communication
For event building at L1 level

20
Crates and Backplanes
10 Gbps SERDES in CMOS

Trend use serial point-to-point links
Parallel shared media busses obsolete
Look for serial backplane fabrics
Backplanes Whats available today/tomorrow ?
PICMG 2.16 C-PCI dual 1G Ether star
PICMG 2.17 C-PCI 4622 Mbps star
PICMG 2.20 C-PCI 2.5 Gbps mesh
VITA 41 (VXS) VME 410 Gbps dual star
Whats in the pipe ?
ATCA (Advanced Telecommunications Computing
Architecture)
Base Interface dual 1G Ethernet star
Fabric Interface 810 Gbps star or mesh

available
available
2.162.20 announced
Infiniband over P0 conn.
21
Fabric Types
Dual Star
Full Mesh
Sw
N
N
N
N
N
N
N
N
Sw
Nodes communicate via switch 2 n links needed
Nodes communicate directly n (n-1) links needed
PICMG 2.16 (cPSB) 2 fabric slots for 24 port
switch 18 node slots 72 Gbps BW
PICMG 2.20 (cSMB) 16 slots full mesh 2.5 Gbps
link 700 Gbps BW
22
Active Buffer Board
DDL
DDL
DDL
DDL
L1L
FPGA
FPGA
FPGA
FPGA
FPGA
Mem
Mem
Mem
Mem
L2L
4
4
4
4
To 1G Ether dual star backplane
To serial mesh backplane
Assume cSMB and cPSB available
23
Active Buffer Crate
cSMB 70 GB/sec internal Bandwidth
DDL
64 Gb/sec input
32 Gb/sec duplex
L1L
cPSB 8 GB/sec internal Bandwidth
L2L
1-2 Gb/sec output
24
Event Building Part I
1. Stage Collect global or partial
timestamp histogram
DDL
DDL
DDL
Active Buffer
Active Buffer
Active Buffer
Active Buffer
2. Stage Peak find
3. Stage Tag all hits, use detector specific
time window

Histogram dataflow is modest
Tagging dataflow almost negligible
Can be used to throttle L1 data flow
L1 routing decided at tagging time
One hit can be part of multiple events !!

Runs over mesh and L1 net
Prune after tracking
25
Event Building Part II
Mesh Backpl.
Mesh Backpl.
1. Stage Collect locally via mesh into one
buffer board
ActiveBuffer Boards
ActiveBuffer Boards
2. Stage Collect globally via L1 links into one
processor
Reduces number of L1 transfers
L1 Sw
L1 Sw
L1 Sw
L1 Sw
Route fixed when event is tagged
Is crates Not boards
Allows to factorize L1 switch
To farm sector 1
To farm sector 2
To farm sector 3
To farm sector 4
Use 8 256 Gbps Avoid 1 2 Tbps
26
L1 Processor Board
Emphasize FPGA or DSP as you wish
DSP
DSP
DSP
DSP
L1L
FPGA
FPGA
FPGA
FPGA
FPGA
Mem
Mem
Mem
Mem
L2L
4
4
4
4
Not needed for event parallel algorithms
To 1G Ether dual star backplane
To serial mesh backplane
Mesh helps to factorize L1 switch
27
L1 Processor Crate
cSMB 70 GB/sec internal Bandwidth
32 Gb/sec duplex
L1L
cPSB 8 GB/sec internal Bandwidth
L2L
1-2 Gb/sec output
28
Back End Data Flow
1024 links 10 Gbps
DDLs
256 links 10 Gbps
256 links 10 Gbps
Can be factorized ! 8(16)switches with 64(32)
ports each 10 Gbps per port
Active Buffer
256 boards in 16 crates
16-32 links 10G Ether
L1 Farm
Sw
256 boards in 16 crates
1 or few switches 48 10G Ether in 20 10G
Ether out
Sw
16 links 10G Ether
To L2 Farm
29
Back End Essentials I

Use as few as possible networks
Detector Data Links
L1 Network
L2 Network
Use as few as possible protocols stacks..
Light weight (on DDL and L2 net)
Ethernet / IP (on L2 net)
Provide enough bandwidth
than a versatile back end can be build from a
few building blocks

30
Back End Essentials II

Split processing into
Local
Regional
Global
Gain density by using most efficient compute
platform
FPGA 20 mW per Gops/sec
DSP 1 W per Gops/sec
PCs 20 W per Gops/sec
High density automatically gives good
connectivity in modern backplanes

Hit level
Cluster, Tracklet
Track, Vertex
31
Conclusions

An SDPA approach seems feasible for CBM, despite
the daunting data rates
There is a large synergy potential between CBM,
PANDA, and S-FRS
AGATA uses already a SDPA, the rest of the S-FRS
community will probably follow sooner or later
PANDA is committed to use SDPA (under the title
S-DAQ).
Experiment time scales differ somewhat, but that
can also be an opportunity

32
Conclusions

Central element of such an architecture is the
clock and time distribution
Many other details, like link and crate
technologies, can and will evolve with time

33
Main RD Fronts

Low jitter clock distribution (ToF quality)
Front end
ASICs often needed (density, radhard), but avoid
to be too detector specific
Back end hardware
Explore serial connection technologies (links,
backplanes, switches, protocols)
Standardize, follow standards
Define a small set of building blocks

Enough backbone BW keeps designs simple
34
Main RD Fronts

Back end config/firm/software
Modern hardware is often tool limited
Investigate development tools
Develop Parallelized algorithms (essential for
using FPGAs efficiently)
Learn how to efficiently use a mix of
FPGA (with embedded CPUs)
DSP
PCs
Handling of Fault tolerance, Monitoring, Setup
and Slow Control,.

Write a Comment

User Comments (0)

About PowerShow.com

High Speed Data Acquisition and Trigger - PowerPoint PPT Presentation

High Speed Data Acquisition and Trigger

DDL should never saturate, and there is no trigger to throttle down data flow ... Can be used to throttle L1 data flow. L1 routing decided at tagging time ... – PowerPoint PPT presentation