(A Taste of) Data Acquisition, Triggers, and Controls

About This Presentation

Title:

(A Taste of) Data Acquisition, Triggers, and Controls

Description:

... control, monitoring ... recent trend Some technological themes Triumph ... try hard to find ways to get realistic advance looks at systems integration and scaling ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 35

Provided by: GregoryDu

Learn more at: https://www.slac.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: (A Taste of) Data Acquisition, Triggers, and Controls

1
(A Taste of) Data Acquisition, Triggers,
and Controls

Gregory Dubois-FelsmannCaltechCHEP 2003

2
Routine disclaimer

Much interesting material presented
About 35 talks (13.5 hours, 100MB as
uploaded)from many experiments covering a rich
variety of issues
Many thanks to the speakers!
Fitting this into 25 minutes requires procedures
familiar to the DAQ community
Feature extraction
and unfortunately also triggering with a
decidedly imperfect data acquisition system
So I apologize in advance for the things Ive
missed or perhaps misunderstood!

3
Outline

Overview of talks presented
Some technological themes
Trigger architectural issues
The great challenge for the next years scaling
Conclusions

4
Talks presented

Monday parallel

W. Badgett CDF Run II Data Acquisition
R. Rechenmacher Run II DZERO DAQ / Level 3 Trigger System
S. Luitz The BaBar Event Building and Level-3 Trigger Farm Upgrade
R. Itoh Upgrade of Belle DAQ System
A. Polini The Architecture of the ZEUS Micro Vertex Detector DAQ and Second Level Global Track Trigger
J. Schambach STAR TOF Readout Electronics and DAQ
R. Divià Challenging the challenge Handling data in the Gigabit/s range ALICE
J. Gutleber, L. Orsini XDAQ - Real Scale Application Scenarios CMS et al.
G. Lehmann The DataFlow of the ATLAS Trigger and Data Acquisition System
S. Stancu The use of Ethernet in the DataFlow of the ATLAS Trigger DAQ
S. Gadomski Experience with multi-threaded C applications in the ATLAS DataFlow
5
Talks presented

Tuesday parallel

A. Ceseracciu A Modular Object Oriented Data Acquisition System for the Gravitational Wave AURIGA experiment
R. Mahapatra Cryogenic Dark Matter Search Remote Controlled DAQ
T. Steinbeck A Software Data Transport Framework for Trigger Applications on Clusters ALICE
T. Higuchi Development of PCI Bus Based DAQ Platform for Higher Luminosity Experiments e.g., Super-Belle
J. Mans Data Acquisition Software for CMS HCAL Testbeams
B. Lee An Impact Parameter Trigger for DØ
G. Comune The Algorithm Steering and Trigger Decision mechanism of the ATLAS High Level Trigger
V. Boisvert The Region of Interest Strategy for the ATLAS Second Level Trigger
S. Wheeler Supervision of the ATLAS High Level Triggers
6
Talks presented

Thursday parallel session 5 DAQ and controls

J. Kowalkowski Understanding and Coping with Hardware and Software Failures in a Very Large Trigger Farm BTeV
M. Gulmini Run Control and Monitor System for the CMS Experiment
S. Kolos Online Monitoring Software Framework in the ATLAS Experiment
G. Watts DAQ Monitoring and Auto Recovery at DØ
K. Maeshima Online Monitoring for the CDF Run II Experiment
M. Gonzalez Berges The Joint COntrols Project Framework CERN multi-expt. LHC et al.
S. Lüders The Detector Safety System for LHC Experiments
J. Hamilton A Generic Multi-node State Monitoring System BaBar
V. Gyurjyan FIPA Agent Based Network Distributed Control System JLAB
M. Elsing Configuration of the ATLAS Trigger System
In parallel with session 5a, unfortunately
7
Talks presented

Thursday parallel session 5a first level
triggers
Related plenary talks

G. Grastveit FPGA Co-processor for the ALICE High Level Trigger
B. Scurlock A 3-D Track-Finding Processor for the CMS Level-1 Muon Trigger
P. Chumney Level-1 Regional Calorimeter System for CMS
F. Meijers The CMS Event Builder
M. Grothe Architecture of the ATLAS High Level Trigger Event Selection Software
8
Projects represented

Strong emphasis onLHC, continuingrecent trend

(by number of talks)
9
Some technological themes

Triumph of C for HEP DAQ confirmed
Along with Java for GUIs
Triumph of commodity computing hardware (Intel
IA-32) and operating system (Linux)
Large-farms-of-small-boxes model confirmed
Near-triumph of commodity networking hardware
Fast and GB Ethernet, standard commercial switches

10
More technological themes

Continuation of long trend of reducing scope of
application of custom hardware
Yet most DAQ software is still custom in present
experiments
Serious efforts to find what is generic in DAQ
programming
Not just to isolate patterns (knowledge
applicable by others) but also actual programming
toolkits
Continuation of trend of moving offline code
and/or frameworks into high level triggers
New widespread use of XML for non-event
information(configuration, monitoring)
Many of the major new challenges relate to
scaling to huge farms
Performance
Operability, control, monitoring

11
Programming languages

C has to a large extent proven itself
Some current experiments have DAQ systems written
from scratch almost entirely in C, including
real-time code running in an embedded RTOS
environment
E.g., BaBar DataFlow, feature extraction, Level
3 trigger, and rest of online system written in
serious OO C on VxWorks, Solaris, Linux
Achieves virtually zero deadtime at 5.5kHz
L1Accept rate on 1997-vintage 300 MHz Motorola
SBCs and 1.4 GHz Linux P-IIIs Luitz
New projects are fairly uniformly continuing to
adopt it for code in the event data flow path
Caveats remain, though usually worth the cost
Executable size
Dependency management seems to remain a challenge
Non-trivial work in creating shareables
Ease with which naïve users can write
non-performant code
Threading requires care (see below)
Even see use in hardware trigger FPGA coding (see
below) Scurlock, CMS

12
BaBar online and DAQ system
13
Programming languages II

Java has emerged as the other major player
Especially for graphical applications
E.g., run control GUIs
Good points
(Some say) ease of programming vs. C
Universal availability including rich GUI
graphics library
Simple API for remote object programming (RMI)
Caveats
Performance (although results vary considerably)
JVM quality-of-implementation, platform
(non-)independence
No other real competitors on the horizon except
for niche applications
Saw appearances of Python, LISP, etc.

14
Computing hardware and operating systems

Farms of Intel IA-32 / Linux 1,2-CPU machines are
the coin of the realm today tomorrow?
Linux will continue to be!
Speakers had little to say about CPU chips except
that they will buy the most cost-effective farms
they can shortly before each major project goes
into final commissioning
Intel Itanium line may get some traction by then
Buy-late is a big Moores Law win, but see
scaling concerns below
Linux success is particularly striking
In use in essentially every HEP role, HLT through
laptops
Even approaching in the embedded world PCI DAQ
component development for Super-KEK-B et al.,
Higuchi
Still some lingering attachment to other Unix
flavors for disk servers(though Linux-based IDE
RAID storage is also becoming very common, in the
offline world, too)
Linux is so successful in HEP that cross-platform
portability may erode

15
Linux in the embedded DAQ card world
16
Multithreading issues

Language and library level
Need to stay aware of serialization from locking
mechanisms used in outside libraries
Example C Standard Library containers memory
pool by default uses a single lock found to
produce x2 penalty in ATLAS HLT tests Gadomski
O/S level
Linux is not a real-time operating system
Still no full implementation of POSIX threads
Implementation of pthread yield operation
interacts poorly with time slicing in scheduler
(cant reschedule immediately).
Found to produce x4 penalty in ATLAS tests
Gadomski kernel patch available
See also under offline code in the online
world below

17
Networking

Commodity networking hardware!
The various flavors of Ethernet (Fast, GB, and
beyond) have become the almost unchallenged
fabric of higher-level triggering and event
building (one major exception CMS still
considering Myrinet)
Standard protocols (TCP, UDP) also ascendant
some efforts to explore raw Ethernet
All groups seem to be making some of the same
discoveries, notably Network switches are not
simple, transparent devices!
Flow control and buffering behavior must be
understood in detail
Vendors can be cagey about the details
(proprietary internal arch.)
Need good tools to monitor traffic behavior

18
Networking adventures
19
Offline code in the online world

Use of offline code in high-level software
triggers
application framework
or even offline reconstruction code
Several current experiments and most future ones
are doing this
Problems
Dependencies
Performance offline code has often not been
exposed to the close scrutiny typical for
online, and may have axes of flexibility at odds
with high performance (CPU cycles and memory
utilization)
Multithreading offline code is almost never
written to be thread-safePresents a problem when
thread parallelism is needed in a high level
trigger
Make a subset of the offline code, and its
framework, thread-safe (ATLAS L2)?
Replace threads with process and a shared-memory
data model (BaBar, D0)?
Offline event loop model may not be directly
usable (BaBar, D0, ATLAS)

Benefits
Greatly simplifies incorporation of trigger
algorithms in simulations
Eases development and validation of trigger
algorithms

20
Genericity patterns and products

We have always noticedThe same ideas keep
coming up and the same problems have to be solved
over and over again.
We have always thoughtThere must be something
to be gained from applying that knowledge. Can
generic problems be solved with generic tools?
We have tried in various ways
Identifying patterns learning how to think about
these common problems building up expert
knowledge that can be applied to the next
experiment learning lessons
Thats what CHEP is all about
But we also aspire to reuse applying a software
product in more than one place

21
Reusing concrete products

Sounds great has a mixed history
In some places this has come to work well
CERNLIB, GEANT4, ROOT are ubiquitous in HEP
But there are real obstacles, chief among them
The difficulty of sharing code bases between
experiments in different phases of development
(example divergence of BaBar and CDF versions of
their originally shared application framework)
The devil is in the details often the high
level features of a system seem generic (the
patterns) but the implementation picks up
experiment-specific features
Sometimes this is because of concern with
compatibility with historical code
Sometimes it arises when the high-level
architecture turns out to need to be driven by
some low-level optimization
Perhaps in principle the high-level design could
be extended to cover both users needs, but the
press of deadlines favors a quick hack that
doesnt require renegotiation

22
Reuse in the online environment

Reuse has been perhaps less successful on average
in the online and DAQ worlds.
Often online code is prepared later in the
construction of an experiment(since simu/reco
code is usually already required at the proposal
stage),thus under more time pressure.
Online code tends to require more low-level
optimizations. Often these come with serious
tradeoffs that can limit the flexibility of a
design, even to its in-house users.
But perhaps the next round of experiments
presents a rare opportunity to do better
The LHC experiments are on the same time scale
and they still have a fair amount of time left.
The use of a common language, O/S, and networking
environment helps.
There are some interesting projects under way!

23
Quest for generic online software

Data acquisition
XDAQ (arising from the CMS project)
Gutleber/Orsini
So far mostly beingused to provide acommon
platform for several subdetectors
commissioning DAQ systems and ease their
integration into the main DAQ
Exploring collaborationwith other expts
Performance seems good.
How CMS-free can it bekept over time, though?

24
Generic software for online

Control and monitoring frameworks
Lots of projects in this area a couple of
examples
(see Thursday program for more)
Inherently fault-tolerant architectures
Kowalkowski
Motivated by BTeV, but is ata very generic
level, with CScollaborators viewing it as
ageneral research project
In early stages, but worthwatching
Generic monitoringframeworks
Example D0s XML-baseddistributed
monitoringWatts

25
XML

A fairly new trend XML is cropping up all over
in online configuration and monitoring
applications
Perhaps surprising? Not-very-compact, textual
representation!
But we are willing to spend some CPU and network
bandwidth here (since other things in the systems
require so much more)
Benefits
Avoids private toy language problem, when
combined with scripting tools
No more hand-written run card-type parsers
Easy to parse in many languages, and thus pick an
appropriate (and perhaps different) language for
generating the data and for applying it.
Very easy to transmit over a network (byte
stream)
Aids in a) using existing generic tools (editors,
validators)b) allowing new tools we build to be
more generic within HEP

26
Triggering scope
27
Triggering

Far too much information presented to cover in
detail
Remarkable things can now be done in commodity
CPUs
E.g., ZEUS second level tracking trigger / global
tracking trigger Polini
Silicon vertex tracking information becoming
absorbed into tracking triggers ahead of Level
3
ZEUS
Fermilab (for B physics efficiency) Lee, D0
hardware

28
ZEUS software tracking trigger
29
Hardware triggers

Still indispensable at the first level
Used in some places as adjuncts to second level
Good progress on testing and production for LHC
experiments
Scurlock Generate VHDL from C codeeases
production of highly accurate board-level
simulation of trigger

B. Lee An Impact Parameter Trigger for DØ
G. Grastveit FPGA Co-processor for the ALICE High Level Trigger
B. Scurlock A 3-D Track-Finding Processor for the CMS Level-1 Muon Trigger
P. Chumney Level-1 Regional Calorimeter System for CMS
30
The CMS ATLAS choice
F. Meijers The CMS Event Builder
G. Lehmann The DataFlow of the ATLAS Trigger and Data Acquisition System
V. Boisvert The Region of Interest Strategy for the ATLAS Second Level Trigger

and many other talks on configuration and other
details
CMS baseline
Build full events at output of Level 1 (100 kHz,
1MB events)
Risk this is a lot of data to handleAble to
fall back to a partial-readout Level 2 model
ATLAS baseline
L2 trigger operates on ROIs nominally 2 of
event data at output of Level 1 (75 kHz, 1MB
events, 20 kB ROI data)
Full event build at L2 rate of 1 kHz, sent to
Event Filter (EF) farm
Risk not yet completely clear that small ROIs
provide enough informationAble to shift boundary
between L2, EF somewhat
Both experiments finding present or readily
foreseeable technology adequate
at least at level of individual subsystems full
scale end-to-end tests beginning

31
Scaling

Many issues remain to be confronted fully in
building
systems with many thousands of CPUs!
Fault tolerance
Overseeing huge constantly-changing collections
of active entities(too many for direct human
oversight)
Performance issues
Image activation
Calibration constant loading
Configuration
Global knowledge updates required to keep system
coherent

File server and/ordatabase contention?
32
An exotic tidbit

A familiar reassurance to nervous newcomersGo
ahead, type whatever you want the worst that can
happen is that we might have to reboot the
computer.
A cautionary tale from CDF
Observed unexpected losses of silicon detector
readout channels
Proposed explanation Vibrational resonances due
to Lorentz forces on digital power lines to the
front end chips
Limits trigger rates
Probably related to high deadtime setting up
steady patterns

Test stand results simulating overloading the DAQ
system, within a magnetic field
Net result physical damage can be caused by
changing trigger configuration!!!
Good wire bond
Broken bond after enduring vibrational resonance
stress
Up close
33
Regretfully omitted

Developments in front-end DAQ electronics
STAR TOF readout Schambach
Importance of development tools
Network performance monitoring many
Thread debugging Gadomski

34
Conclusions

Trigger/DAQ systems have kept up well with the
demands of doing physics with the present
generation of experiments
Many new technologies and ideas have made this
possible
But we are entering an entirely new regime with
experiments of the LHC scale and must take care
that we are not overwhelmed by complexity
We should try hard to find ways to get realistic
advance looks at systems integration and scaling
before the last minute bulk hardware buys
Very interesting research and reduction-to-practic
e work lies ahead in the next few years looking
forward especially to CHEP 2003 2