Conditions Database mySQL Implementation Status Report - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Conditions Database mySQL Implementation Status Report

Description:

Experience with the Open Source based implementation for ATLAS Conditions Data Management System A.Amorim, J.Lima, C.Oliveira, L.Pedro, N.Barros – PowerPoint PPT presentation

Number of Views:168

Avg rating:3.0/5.0

Slides: 25

Provided by: slacStanf8

Learn more at: https://www.slac.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Conditions Database mySQL Implementation Status Report

1
Experience with the Open Source based
implementation for ATLAS Conditions Data
Management System
A.Amorim, J.Lima, C.Oliveira, L.Pedro,
N.Barros - ATLAS-DAQ LISBON COLLABORATION -CHEP
2003
Jorge Lima - FCUL
21 March 2003
CHEP 2003
2
Outline

Goals
Conditions Data, what is it?
Open Source Choice
Runtime environment
Design and Implementation
Performance tests
Conclusions

Jorge Lima - FCUL
21 March 2003
CHEP 2003
3
GOALS

Study the feasibility of a Database Management
System for Conditions Data based in an Open
Source RDBMS
Using only standard SQL features
Keep implementation dependent optimizations in a
self contained set of classes to improve
portability
Provide example implementation and evaluate them
MySQL based
Postgres based

Jorge Lima - FCUL
21 March 2003
CHEP 2003
4
Conditions Data

What is Conditions data?
The conditions data reflect the conditions in
which the experiment was performed and the actual
physics data was taken.
Calibration and alignment robustness (DCS)
detector description.
Slowly evolving data associated with the
experiment besides the event data itself.
The Conditions DB
Data management system for storing and retrieving
conditions data
Hides details of the underlying implementation or
implementations
Exposes a commonly agreed data model
Provides one or several C API for application
programmers and a set of administration tools

Jorge Lima - FCUL
21 March 2003
CHEP 2003
5
Why Open Source?

The panorama ten years ago
Reliable DBMS were only available from vendors
like ORACLE.
RDBMS expertise was concentrated.
A common way of storing physics data was directly
using the file system.
The situation evolved dramatically in the last
years
Robust, high performance, open source
implementations of RDBMS appeared.
A growing community of programmers has a
reasonable understanding of DB technologies.
Open Source RDBMS are being used very
successfully in many fields and the HEP community
should not loose this opportunity.

Jorge Lima - FCUL
21 March 2003
CHEP 2003
6
Why Open Source?

Open Source also offers the following advantages
Low cost (frequently free of charge)
Code availability allows fine tuning and specific
optimizations when necessary.
Available for most common platforms.
Users can build and test their own setup

Jorge Lima - FCUL
21 March 2003
CHEP 2003
7
Runtime environment
Clib/Align/Rob DB
Offline
Promp Recon
Offline Analysis
Calibrating node/process
Client
Provider
Sub-system
Control
Jorge Lima - FCUL
21 March 2003
CHEP 2003
8
Starting point

The legacy from CERN IT
API specification (and example implementation in
Objectivity).
The data model (some similarities with the model
for the Conditions Data found in the BABAR
experiment).
Nice but far from complete.
No knowledge whatsoever about the object format
(BLOBs)
Insufficient tag mechanism (only HEAD could be
tagged potential data loss)
There are new challenges posed by the ATLAS
complex triggering system
Many clients (both online and offline) must be
able access data efficiently
They must understand (know how to read) each
others data.
Objects with different granularities (how to deal
efficiently with extreme cases?)

Jorge Lima - FCUL
21 March 2003
CHEP 2003
9
Implementation

The usual development cycle
Devise a relational database schema to cope with
the data model.
Devise clustering/replication model to cope with
efficiency and scalability.
Code it and evaluate its performance.
On the other hand, we had to understand in what
extent the current Interface Specification was
appropriate
Does it fulfil the requirements collected so far
amongst the people working on the several
detectors and subsystems?
If not, well need to extend or redesign the
Interface

Jorge Lima - FCUL
21 March 2003
CHEP 2003
10
Data Model

Three different variation axis
Type (supported by the folder concept)
Interval of validity
Revision
Tags for grouping collections of objects

Jorge Lima - FCUL
21 March 2003
CHEP 2003
11
File system like hierarchical structure
Jorge Lima - FCUL
21 March 2003
CHEP 2003
12
Database schema
Jorge Lima - FCUL
21 March 2003
CHEP 2003
13
Clustering Model

Similar to Oracles Table Spaces
Optimized for the conditions data problem
Reconfigurable as the system grows
The DB server doesnt know nothing about it
The model doesnt rely on special features of the
underlying technology.
Can be used with many different backends in a
heterogeneous environment.
Replication can also be used
Mainly in the top level databases which can
became a bottleneck

Jorge Lima - FCUL
21 March 2003
CHEP 2003
14
Tagging mechanism

Examples insertion tagging retrieving

Browse objects in HEAD
Browse objects in tag TAG1
Jorge Lima - FCUL
21 March 2003
CHEP 2003
15
Layered software layout
Upper layer - Deals with the data model and
clustering issues
Bottom layer - implementation specific
Some features found on the Postgress server are
being investigated
MySQL server
Postgress server
MyISAM
InnoDB
Jorge Lima - FCUL
21 March 2003
CHEP 2003
16
Whats wrong?...

From the requirements collected so far by Luis
Pedro we can state that
Objects with different granularities and
different data rates must be stored and retrieved
efficiently
To cope with the object granularity problem
efficiently extensions to the API specification
are required.
The object schema must be stored within the
database and should be accessible to the users
Data is presently stored as a BLOB and the
current API specification doesnt provide means
to do it differently.
The tagging and versioning mechanisms are not
adequate and can lead to data loss, specially in
multi-client environments.
Only head can be tagged, and the version or
revision is meaningless to the user. Should be
possible group objects in more flexible ways.
What if someone insert a new object, thus
changing the head while tagging is in progress?

Jorge Lima - FCUL
21 March 2003
CHEP 2003
17
Whats wrong?

We need a more flexible tagging mechanism
Probably its enough to allow tagging based in
some other criteria. For instance, based in
insertion time
Version is useless. What about labelling objects
at insertion time for later referencing?
Current model is inappropriate for DCS objects
The model is not efficient dealing with many and
small objects.
DCS objects dont need revision.
Storing data only as BLOBs will force users to
implement their own solutions to store object
schema.
Some will use XML data (online) some will use
the Athena Store Gate services (offline) others
plain integers.
Data will not be interchangeable.

Jorge Lima - FCUL
21 March 2003
CHEP 2003
18
We need to revise the API

API extensions can solve the problem
Extend the tagging mechanism to support other
tagging criteria.
Bind object schema to the folder at folder
creation time.
Support for a limited number of built in object
types. Basic types like integers, doubles as well
as arrays of integers or doubles
Allow extension to user defined types??
Use the most efficient internal representation
for a given object type.
All types shall support the same interface,
though with different degrees of effectiveness
(orthogonality).
This effectively means some sort of embedded
conversion service.

Jorge Lima - FCUL
21 March 2003
CHEP 2003
19
Results from performance tests
Tests details Client Computers Intel PIII 1GHz
dual processor / Linux 2.4.18-18.7.x.cernsmp /
376 MByte RAM / on-line software release
00-18-01 / gcc 2.96 /Gigabit network between the
clients Server Computer Intel PIV 2GHz / Linux
2.4.18 / 1 GByte RAM / gcc 2.95.4 / MySQL Distrib
3.23.49 / Network????? All tests were performed
using the on-line software infrastructure for
synchronization Each client (controller)
establishes 3 connections to the database server
when performing a test Databases sizes 100
objects - 37.8KByte 1.000 objects - 180.1 KByte
10.000 objects - 1.6 MByte 100.000 objects -
15.4 MByte
Jorge Lima - FCUL
21 March 2003
CHEP 2003
20
Retrieve performance (Single-DB)
Jorge Lima - FCUL
21 March 2003
CHEP 2003
21
Retrieve performance (multi-DB)
Jorge Lima - FCUL
21 March 2003
CHEP 2003
22
Comparisons (single client)
Oracle (local) MySQL (local) Oracle (remote) MySQL (remote)
createFolderx createFolderx 0m15.173s 0m0.034s 0m11.857s 0m0.072s
storeDatax 10 0m4.973s 0m1.447s 0m5.434s 0m1.127s
storeDatax 100 0m9.749s 0m1.368s 0m15.820s 0m2.345s
storeDatax 10.000 9m22.103s 0m23.175s 11m12.554s 0m56.929s
storeDatax 100.000 109m40.878s 3m49.184s 04m13.283s 8m16.510s
storeDatax 1.000.000 23m12.563s 40m48.256s
readDatax 10 0m0.324s 0m0.025s 0m0.955s 0m0.058s
readDatax 100 0m1.403s 0m0.050s 0m3.061s 0m0.135s
readDatax 10.000 2m1.919s 0m2.851s 4m37.315s 0m7.926s
readDatax 100.000 25m46.423s 0m27.273s 46m53.846s 1m18.850s
readDatax 1.000.000 4m40.315s 10m26.124s
Jorge Lima - FCUL
21 March 2003
CHEP 2003
23
Conclusions

The feasibility of an Open Source based RDBMS for
the ConditionsDB was proven to be possible,
furthermore
The clustering model devised, which should scale
well over data volume and time, can be used with
almost any kind of RDBMS backend.
Porting between different backend implementations
doesnt constitute a major effort as verified
while porting the original implementation to
Postgress.
The MySQL implementation outperformed the
Oracles based one by a factor of 50 on the most
critical operations while being slightly slower
in store operations.
Further tests are needed, mainly in multi server
and many clients environments.
Side effects
An important effort for collecting requirements
was deployed and several lacking features were
identified.
The community is adopting the MySQL ConditionsDB.
(they can easily play around with it).

Jorge Lima - FCUL
21 March 2003
CHEP 2003
24
Where to fetch the latest release

Software project for the MySQL's Conditions
Database implementation, available at the ATLAS
offline software repository.
ATLAS offline software repository
CVSROOTkserveratlas-sw.cern.ch/atlascvs
Packages offline/Database/ConditionsDBMySQL
offline/Database/ConditionsDBTests
offline/Database/IConditionsDB
The same package and an another test
implementation based in Postgres are available
from FCUL to compile without CMT.
Available through CVS and FTP
ftp//kdataserv.fis.fc.ul.pt/pub/Software
CVSROOTextkdataserv.fis.fc.ul.pt/usr/local/cvs
root
CVS_RSHssh
Packages ConditionsDB-MySQL, ConditionsDB-PgSQL