Title: ALICE DAQ Status, Interfaces and Perspectives
1ALICE DAQStatus, Interfaces and Perspectives
- ALICE HLT review
- 13-14 January 2003
- P.VANDE VYVRE - CERN/EP
2Part I - DAQ Status
- TRG/DAQ/HLT hardware architecture
- Status and performances of the major components
- DAQ software architecture
- Results of the ALICE Data Challenge IV (ADC IV)
3TRG/DAQ/HLT Architecture
RoI data
RoI Processor
Local Trigger Crate
TTC
L1, L2a
Central Trigger Proc.
Detector Readout
L1, L2a
TTC Rx
RoI data
DDL SIU
Event fragment
DDL
LDC/FEP
DDL DIU
LDC/FEP
RORC
HLT Farm
Sub-event
Event Building Network
Event
4DAQ Architecture
5DAQ / HLT Data Flow
Trigger
25 GB/s 2.5 GB/s 1.25 GB/s
HLT Network
HLT nodes
Event Building Network
GDC
Storage Network
6Detector Data Link (DDL)
- Duplex point-to-point data link running at 100
MBytes/s - During data-taking open channel from detector
to DAQ - Current status
- Hardware SIU, DIU and PCI RORC boards
- Software DDL test software, DATE readout for the
DDL - Integration tools
- VHDL simulation DDL test module
- Hardware test tools SIU emulator, SIU extender
- Pre-series under production (200 MBytes/s)
7DDL SIU
8DDL Integration with FEE
TPC RCU First data transfer realized
ITS Drift 80.000 events transferred over DDL to
DATE V4
CARLOS
CARLOS_rx
SIU
DDL to RORC
9Read-Out Receiver Card (RORC)
- DDL readout board
- Interface the DDL to the PCI of the LDC
- No local memory. Fast transfer to PC memory
- DDL saturated for block size above 5 kBytes
- 101 Mbytes/sec
- Event rate saturated for block size below 5
kBytes - 35000 events/sec
- RORC handling overhead in LDC 28 µsec
10RORC DDL DIU
11LDC
- General-purpose computers
- Host one or more RORCs
- Large and cheap memory buffer
- Send the sub-events through the event-building
network to the destination GDC - Execute part of the event distribution and load
balancing - Scalability and staging
- Multiplexing the DDLs inside 1 LDC
- Baseline commodity twin-CPUs PCs, using a
widely-used operating system (currently Linux)
12GDC
- General-purpose computers
- Build whole events from the sub-events sent by
LDCs - Ship them to the transient data storage
- Scalability and staging
- Support for any number of GDCs without changing a
single line of code - Capacity can be adjusted at any time to the
experiments requirements in terms of bandwidth
and cost - Baseline commodity PCs, using a widely-used
operating system (currently Linux)
13Event Building Network (1)
- Event Building Network (EBN)
- General-purpose network
- Interconnect 300 LDCs to the GDC farm at 2.5
GBytes/s - No event-building features in itself but used for
data transfer - Choice of communication media based on
- Our own experience
- Fast Ethernet (NA57)
- RD projects ATM (RD31), SCI (RD24 and STAR)
- Other experiments
- ATM (CDF)
- Fiber Channel (D0)
- HiPPI (NA48)
- Myrinet (STAR, CMS baseline)
- Switched Ethernet (Compass, HARP, NA48 since 98,
BaBar, considered as baseline or fallback
solution by all LHC experiments)
14Event Building Network (2)
- Baseline adopt broadly exploited standards
- Communication media switched Ethernet
- Motivations
- The performance of Gigabit Ethernet switches
currently available already adequate for the
ALICE DAQ - Use of commodity items network switches and
interfaces - Easy (re)configuration and reallocation of
resources - Same network can also be used for the DAQ services
15Event Building Network (3)
TRD LDC
211 224
1 14
1 8
1 7
1 14 ...
1 9
Sector 35-36 Switch 18
Pixel - Strips Switch 19
Sector 1-2 Switch 1
Muon-PMD-TRG Switch 21
TOF-HM-PHOS Switch 22
Drift Switch 20
Data Link to computing center
1250 MB/s
GDC 4
TDS 1
GDC 1
TDS 2
60 MB/s
60 MB/s
1 10
1 10 ...
21 20
31 40
16Event Building Network (4)
- Baseline adopt broadly exploited standards
- Transport protocol TCP/IP
- Motivations
- Reliable and stable transport service
- Flow control handling
- Lost packet handling
- Congestion control
- Verified for DAQ load during the ALICE Data
Challenges - Industry mainstream
- Guaranteed support from present and future
industrial providers operating systems,
switches, interfaces - Constant improvements
- Overhead already acceptable on hosts CPU
17DAQ Software Architecture
- Protocol-less push-down strategy
- System throttling by X-on/X-Off signals
- Detector readout via a standard link (DDL) and
standard software handler - Light-weight multi-process synchronization
- (lt 1?s on PIII 800 MHz)
- Common data-acquisition services
- Detector integration
18DAQ Software Framework DATE
- Development process
- Target the system from the start
- Release and document
- DATE (Data Acquisition and Test Environment)
- Complete ALICE DAQ software framework
- Data-flow detector readout, event building
- Online physics data quality monitoring
- System configuration, control and performance
monitoring - Evolving with requirements and technology
- DATE current status
- Used in the ALICE test beams area (PS and SPS)
- Performance tests during ALICE Data Challenges
(ADC) - DATE V4 released. Documentation and training.
19ADC IV Hardware Setup
36 DISK Servers
48 CPU Servers
4 Gigabit switches (3 COM)
3 Gigabit switches (3 COM)
3
3
3
3
2
2
2
TOTAL 32 ports
TOTAL 18 ports
8
Gigabit switches (Extreme Networks)
CPU servers on FE
2
3
3
3
3
4 Gigabit switches (3 COM)
CERN Backbone (4 Gbps)
10 TAPE servers (distributed)
48 CPU Servers
Total Up to 192 CPU servers (Up to 96 on GBE, 96
on FE), Up to 36 DISK servers, 10 TAPE servers
20ADC IV DATE Run Control
21Event Building Performance (1)
22Event Building Performance (2)
- Event building with flat data traffic
- No recording
- 5 days non-stop
- 1750 MBytes/s sustained
23Complete DAQ Performance
- Event building (DATE) and data recording (CASTOR)
- ALICE-like data traffic
- 7 days non-stop
- To tape
- 200-300 MBytes/s sustained
- 170 TBytes total
24Part II - DAQ/HLT interfaces
- HLT modes
- HLT functions
- HLT processing agents
- DAQ-HLT dataflow
- DAQ-HLT interfaces
- Switching modes of operations
25HLT Modes Functions
- HLT modes of operations
- A DAQ only HLT disabled Monitoring possible
- B DAQ HLT parasitic
- C DAQ HLT enabled
- HLT functions
- Data reduction
- Modify the events by replacing pieces of
information - Triggering
- Filter the events by accepting or rejecting full
events - Data production
26DATE V4
27DAQ HLT Data-Flow
Detectors readout
HLT Fw
DDL
DDL Sw
LDC
RORC
HLT Sw
HLT farm
HLT
Event Building Network
DATE
HLT Sw
GDC
ROOT I/O
Storage Network
CASTOR
28Whats new ? (1)
- Cost-to-Completion
- Construction cost funding shortfall
- Large fraction covered by FAs
- Contingency plan
- DAQ staging 2006 20, 2007 30, 2008 100
- Deferring 40 of the DAQ capacity 1.1 MSFr
- 40 loss in most statistics limited signals
29HLT Processing Agents
30DAQ/HLT interface in HLT-RORC
DDL
Raw data
LDC
DDL DIU
HLT-RORC
Mode A
Mode B/C
PCI interface
Raw data
31Dataflow in LDC
Mode A
32DAQ/HLT interface in LDC
Mode B/C
33DAQ services for HLT agent
34DAQ/HLT interface in GDC
35Switching Mode of Operation
- Switching modes of operation
- A DAQ only HLT disabled
- B DAQ HLT parasitic
- C DAQ HLT enabled
- No need for any (re)cabling and (re)routing
- Same binaries
- To read out the front-end electronics
- To handle the dataflow
- Same DAQ services
- Control, bookkeeping, information logging
- Load balancing
- Data quality monitoring, performance measurement
- No compromises or run-time penalties as far as
efficiency, flexibility and capabilities of the
HLT and of the DAQ are concerned
36Part III - DAQ/HLT perspectives
- Where are we ?
- Whats new ?
- Implications for the ALICE DAQ and HLT
37Where are we ?
- Outcome of HLT workshop in summer 2001
- Physics case for the HLT
- HLT running modes defined
- DAQ to be upgraded to support HLT
- Solomonic decision leading to the current DAQ/HLT
architecture - Agreement on software interface to the GDCs
- Disagreement on software interface in the LDCs
- Separate farms for DAQ and HLT
- Separate networks for DAQ and HLT
38Whats new ? (2)
- Industry and commodity market is ahead of the HEP
needs - Millions of customers want to surf faster on the
web (lighter TCP/IP) - Thousands of sites want to monitor large computer
farms - TCP/IP Offload Engine
- Protocol developed 20 years ago (network bw
scarce resource compared to CPU cycles) - New generation of Network Interface Cards will
execute part of the TCP/IP protocol - Products from Adaptec, Intel, Syskonnect, etc
- No change of architecture
- Improvement of cost/performance
- Manageability and system health monitoring
included in recent network chips (IPMI and ASF)
39Whats new ? (3)
- Results of Data Challenge
- GBytes/s achievable today with existing commodity
hardware and software (as predicted in ALICE TP) - 10 Gbit Ethernet
- Standard (IEEE 802.3 ae) approved in June 2002
- Ethernet 3 orders of magnitude in less than 10
years - Products available from several vendors
- Enterasys products tested during ADC IV
- 10 Gbit Ethernet applications
- LAN backbone (Computing Center to experimental
pits) - WAN communication (Grids)
4010 Gig. Eth. test setup
36 DISK Servers
48 CPU Servers
4 Gigabit switches (3 COM)
3 Gigabit switches (3 COM)
4
4
4
4
2
2
2
Gigabit switch (Extreme Networks)
2 x 10 Gigabit switches (Enterasys)
2
3
3
3
3
4 Gigabit switches (3 COM)
CERN Backbone (4 Gbps)
10 TAPE servers (distributed)
48 CPU Servers
Total Up to 192 CPU servers (Up to 96 on GBE, 96
on FE), Up to 36 DISK servers, 10 TAPE servers
41DAQ / HLT Components
400 300 565 ports 40 20
Trigger
HLT nodes
Event Building Network
265
GDC
Storage Network
42DAQ/HLT networking
HLT
LDCs
TPC LDC
HLT
1 12 ...
289 300
1 12 ...
265 276
100 MB/s
60 MB/s
DAQ 25
HLT 1
DAQ 1
HLT 23
1250 MB/s
Data Link to computing center
GDC 5
GDC 1
TDS 1
TDS 2
60 MB/s
60 MB/s
1 10
1 16 ...
11 20
41 80
TDS
43Implications for DAQ HLT
- Missing funding to reach full ALICE performance
- No need for ad-hoc and expensive network
- Present DAQ and HLT traffic can be handled by 1
network - Output of LDCs in Heavy Ion
- DAQ traffic to GDCs raw data 2.5 GB/s
- HLT traffic to HLT farm processed data 2.2 GB/s
- Potential benefits
- Huge savings on investments ( 780 kSFr) and
human support - System more flexible
- Resources reallocated by simple reconfiguration
- New HLT options
- Reliability Improved
44HLT in GDC (1)
- Introduction of cheaper commodity network will
allow HLT after event building - Global pattern recognition
- Correlation between detectors
- Number of GDCs completely flexible according to
bandwidth and CPU needs for HLT algorithms - Resource allocation by reconfiguration
- Data access in the GDC
- HLT on full events
- HLT on a few subevents (deferred event building)
45HLT in GDC (2)
46HLT in GDC (3)
47Farm of clusters
48Conclusions
- All major DAQ components developed, documented
and demonstrated to work according to specs - DAQ proposal of DAQ/HLT interfaces presented
- Part of DATE V4
- HLT possible at each stage of the DAQ dataflow
- According to the physics needs
- Without any performance compromise
- Smooth mode switching
- No need of special and expensive network
- Commodity communication technologies 10 x
performance gain - Dramatic cost reduction of present HLT network
possible - Open for new HLT options not considered so far