Title: The Performance and Scalability of the back-end DAQ sub-system
1The Performance and Scalability of the back-end
DAQ sub-system
- Igor SOLOVIEV
- CERN ATLAS DAQ/EF-1
2Contents
- Introduction
- ATLAS DAQ/EF P-1 project
- back-end software overview architecture
- Test Results
- component test results
- integrated back-end sub-system test results
- Summary and Future
3ATLAS DAQ/EF P-1 Project
- Goal
- to produce a prototype system representing a
full slice of a DAQ suitable for evaluating
candidate technologies and architectures for the
final ATLAS DAQ system - Sub-systems
- Detector Interface
- Data-Flow
- Event Filter
- Back-end
- Status
- Base-line system developed working in lab.
environment - Exploitation phase up to TDR (2001)
- To be used on test-beam (summer 2000)
4Back-end Sub-system
- Is used to configure, control and monitor the DAQ
system - It excludes management, processing and
transportation of physical data - It talks to all the other online systems
- (glue of the experiment)
- More information
- WWW pages http//atddoc.cern.ch/Atlas/
- Impact of Software Review and Inspection talk
F331, today 1750, Doris Burckhart
5Back-end Architecture
- Components
- split back-end software into groups with similar
functionality (Core TDAQ detector integration
components) - Operational environment
- heterogeneous collection of UNIX workstations,
PCs and embedded systems (e.g. PPC on VME under
real-time Lynx OS) connected via a local network - developed in C and ported to several compilers
on - Solaris, Linux, Lynx, HP-UX Window NT
- Design
- use freeware and commercial software
- Tools.h, OODB, CORBA, CHSM, CLIPS, Motif/Java
6Back-end core components
- Configuration Databases
- describes all aspects of the configuration
- Information Service (IS)
- general purpose information exchange facility
- Message Reporting System (MRS)
- allows software components to report messages in
distributed environment - Process Manager (PMG)
- performs distributed job control of components
- Run Control (RC)
- controls configuration and data taking operations
7Component Unit Tests Results
- Configuration Databases
- used by many components during system start-up
- tests done for different OKS configurations
(single read-out crate, typical P-1 conf.,
expected ATLAS DAQ conf.) - on average workstation time to load P-1 conf. ,
make complete traverse and close is about 1.5
sec. and on PPC VME board the same test requires
about 3 sec. - Information Systems (IS MRS)
- used by many components during all phases of
system operation (publish/subscribe facilities) - scalable (multiple servers to split the load)
- benchmarks done on single workstation and on
several computers for different conf. (size, up
to 5010 clients) - the response time is a few milliseconds
- better results for distributed systems
8Component Unit Tests Results
- Process Manager
- used during the system start-up and shutdown
- results obtained on single Solaris workstation
- time to start a process is a few 100s
milliseconds and slowly increases with the number
of managed processes - Run Control
- required to change the state of the system
- scalable by changing the structure of RC tree
- tests on all available workstations (up to 250
controllers) - to change the systems state with several 10s of
nodes varies from several 100s milliseconds up to
few seconds depending on the state of the system - the time to change running/configured states is
lt1 sec.
9Component Unit Tests Conclusions
- Unit tests made for back-end core components show
that they are in accordance with DAQ P-1
requirements - Similar tests will be done for back-end
integration components
10Back-end Sub-system Tests
- What
- bring together all the core and several
TDAQ/detector integration components - Why
- to simulate the control configuration of data
taking sessions - Where
- back-end servers are running on UNIX workstation
- others (PMG agent, LDAQ emulator RC Ctrl.) on
PC running Linux or VME based Power PC CPU board
running Lynx OS
11Test Configurations
Network
PMG Agent
G IPC
P IPC
RC IS
DF IS
PMG IS
RM
RDB
MRS
MRS L
DAQ Supervisor
IGUI
RC Root Ctrl
12Test Description
- Done by shell script
- start communication services
- launch configuration processes via DAQ supervisor
- marshal the hierarchy of RC controllers through
different states I - L - C - R - C - R - C - L -
I - stop DAQ supervisor
- processes
- stop servers
13Start-up warm start/stop
Time (seconds)
Number of processors/crates PowerPC 100/200 MHz
32/64 MB Lynx OS
Number of processors/crates Pentium III 450 MHz
128 MB Linux
14Start-up close
Time (seconds)
Number of processors/crates PowerPC 100/200 MHz
32/64 MB Lynx OS
Number of processors/crates Pentium III 450 MHz
128 MB Linux
15Back-end system test summary
- Results
- time to start/stop processes depends on OS,
computer architecture and configuration - once all processes started, the time to change
system state remains constant (good distributed
control) - the use of IS, MRS and conf. db has a negligible
effect on the performance - the results even for the largest configurations
is in acceptable range (lt 1 minute to start-up on
Linux) - Known problems
- pmg agents started via RSH with long delays (20
sec) - the computers were not dedicated to tests
16Summary Future
- Individual back-end component test
- done for core components and show that they are
in accordance with the DAQ/EF P-1 requirements - similar tests have to be done for integration
components - Integrated back-end system tests
- performed employing the majority of the
components - verified correct component inter-operation,
ability to work in a distributed multi-platform
environment - gathered performance measurements
- Future
- more statistics for larger configurations (more
hosts) - script improvement and better start-up/shutdown
synchronization
17AppendixConfiguration Databases
- Importance
- are used by many components during initialization
- performance is important for system start-up
- Results (with OKS)
Time (s)
1 single read-out crate 10 prototype -1 200
expected ATLAS DAQ
Number of crates
18AppendixInformation Service
- Importance
- used by many components
- performance is important during all phases of
system operation - Results
- scalable (multiple
- servers to split the
- load)
- update medium size
- info. results presented
- (on single host)
- similar to publish
- and remove
Update time (ms)
Number of sources
19AppendixMessage Reporting System
- Importance
- used by many components
- performance is important during all phases of
system operation - Results
- presented tests
- obtained on
- single host
- better results
- obtained in
- distributed
- environment
Report time per message (ms)
Number of senders
20AppendixProcess Manager
- Importance
- performance is important for system start-up and
shutdown - Results
- obtained on
- single Solaris
- workstation
Time per process (ms)
21AppendixRun Control
- Importance
- required to change the state of the system
- Results
- scalable by
- changing
- the structure
- of the RC tree
- tests done on
- all available
- workstations
Time, (s)
Number of controllers