Conditions Database mySQL Implementation Status Report - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Conditions Database mySQL Implementation Status Report

Description:

Experience with the Open Source based implementation for ATLAS Conditions Data Management System A.Amorim, J.Lima, C.Oliveira, L.Pedro, N.Barros – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 25
Provided by: slacStanf8
Category:

less

Transcript and Presenter's Notes

Title: Conditions Database mySQL Implementation Status Report


1
Experience with the Open Source based
implementation for ATLAS Conditions Data
Management System  
A.Amorim, J.Lima, C.Oliveira, L.Pedro,
N.Barros - ATLAS-DAQ LISBON COLLABORATION -CHEP
2003
Jorge Lima - FCUL
21 March 2003
CHEP 2003
2
Outline
  • Goals
  • Conditions Data, what is it?
  • Open Source Choice
  • Runtime environment
  • Design and Implementation
  • Performance tests
  • Conclusions

Jorge Lima - FCUL
21 March 2003
CHEP 2003
3
GOALS
  • Study the feasibility of a Database Management
    System for Conditions Data based in an Open
    Source RDBMS
  • Using only standard SQL features
  • Keep implementation dependent optimizations in a
    self contained set of classes to improve
    portability
  • Provide example implementation and evaluate them
  • MySQL based
  • Postgres based

Jorge Lima - FCUL
21 March 2003
CHEP 2003
4
Conditions Data
  • What is Conditions data?
  • The conditions data reflect the conditions in
    which the experiment was performed and the actual
    physics data was taken.
  • Calibration and alignment robustness (DCS)
    detector description.
  • Slowly evolving data associated with the
    experiment besides the event data itself.
  • The Conditions DB
  • Data management system for storing and retrieving
    conditions data
  • Hides details of the underlying implementation or
    implementations
  • Exposes a commonly agreed data model
  • Provides one or several C API for application
    programmers and a set of administration tools

Jorge Lima - FCUL
21 March 2003
CHEP 2003
5
Why Open Source?
  • The panorama ten years ago
  • Reliable DBMS were only available from vendors
    like ORACLE.
  • RDBMS expertise was concentrated.
  • A common way of storing physics data was directly
    using the file system.
  • The situation evolved dramatically in the last
    years
  • Robust, high performance, open source
    implementations of RDBMS appeared.
  • A growing community of programmers has a
    reasonable understanding of DB technologies.
  • Open Source RDBMS are being used very
    successfully in many fields and the HEP community
    should not loose this opportunity.

Jorge Lima - FCUL
21 March 2003
CHEP 2003
6
Why Open Source?
  • Open Source also offers the following advantages
  • Low cost (frequently free of charge)
  • Code availability allows fine tuning and specific
    optimizations when necessary.
  • Available for most common platforms.
  • Users can build and test their own setup

Jorge Lima - FCUL
21 March 2003
CHEP 2003
7
Runtime environment
Clib/Align/Rob DB
Offline
Promp Recon
Offline Analysis
Calibrating node/process
Client
Provider
Sub-system
Control
Jorge Lima - FCUL
21 March 2003
CHEP 2003
8
Starting point
  • The legacy from CERN IT
  • API specification (and example implementation in
    Objectivity).
  • The data model (some similarities with the model
    for the Conditions Data found in the BABAR
    experiment).
  • Nice but far from complete.
  • No knowledge whatsoever about the object format
    (BLOBs)
  • Insufficient tag mechanism (only HEAD could be
    tagged potential data loss)
  • There are new challenges posed by the ATLAS
    complex triggering system
  • Many clients (both online and offline) must be
    able access data efficiently
  • They must understand (know how to read) each
    others data.
  • Objects with different granularities (how to deal
    efficiently with extreme cases?)

Jorge Lima - FCUL
21 March 2003
CHEP 2003
9
Implementation
  • The usual development cycle
  • Devise a relational database schema to cope with
    the data model.
  • Devise clustering/replication model to cope with
    efficiency and scalability.
  • Code it and evaluate its performance.
  • On the other hand, we had to understand in what
    extent the current Interface Specification was
    appropriate
  • Does it fulfil the requirements collected so far
    amongst the people working on the several
    detectors and subsystems?
  • If not, well need to extend or redesign the
    Interface

Jorge Lima - FCUL
21 March 2003
CHEP 2003
10
Data Model
  • Three different variation axis
  • Type (supported by the folder concept)
  • Interval of validity
  • Revision
  • Tags for grouping collections of objects

Jorge Lima - FCUL
21 March 2003
CHEP 2003
11
File system like hierarchical structure
Jorge Lima - FCUL
21 March 2003
CHEP 2003
12
Database schema
Jorge Lima - FCUL
21 March 2003
CHEP 2003
13
Clustering Model
  • Similar to Oracles Table Spaces
  • Optimized for the conditions data problem
  • Reconfigurable as the system grows
  • The DB server doesnt know nothing about it
  • The model doesnt rely on special features of the
    underlying technology.
  • Can be used with many different backends in a
    heterogeneous environment.
  • Replication can also be used
  • Mainly in the top level databases which can
    became a bottleneck

Jorge Lima - FCUL
21 March 2003
CHEP 2003
14
Tagging mechanism
  • Examples insertion tagging retrieving

Browse objects in HEAD
Browse objects in tag TAG1
Jorge Lima - FCUL
21 March 2003
CHEP 2003
15
Layered software layout
Upper layer - Deals with the data model and
clustering issues
Bottom layer - implementation specific
Some features found on the Postgress server are
being investigated
MySQL server
Postgress server
MyISAM
InnoDB
Jorge Lima - FCUL
21 March 2003
CHEP 2003
16
Whats wrong?...
  • From the requirements collected so far by Luis
    Pedro we can state that
  • Objects with different granularities and
    different data rates must be stored and retrieved
    efficiently
  • To cope with the object granularity problem
    efficiently extensions to the API specification
    are required.
  • The object schema must be stored within the
    database and should be accessible to the users
  • Data is presently stored as a BLOB and the
    current API specification doesnt provide means
    to do it differently.
  • The tagging and versioning mechanisms are not
    adequate and can lead to data loss, specially in
    multi-client environments.
  • Only head can be tagged, and the version or
    revision is meaningless to the user. Should be
    possible group objects in more flexible ways.
    What if someone insert a new object, thus
    changing the head while tagging is in progress?

Jorge Lima - FCUL
21 March 2003
CHEP 2003
17
Whats wrong?
  • We need a more flexible tagging mechanism
  • Probably its enough to allow tagging based in
    some other criteria. For instance, based in
    insertion time
  • Version is useless. What about labelling objects
    at insertion time for later referencing?
  • Current model is inappropriate for DCS objects
  • The model is not efficient dealing with many and
    small objects.
  • DCS objects dont need revision.
  • Storing data only as BLOBs will force users to
    implement their own solutions to store object
    schema.
  • Some will use XML data (online) some will use
    the Athena Store Gate services (offline) others
    plain integers.
  • Data will not be interchangeable.

Jorge Lima - FCUL
21 March 2003
CHEP 2003
18
We need to revise the API
  • API extensions can solve the problem
  • Extend the tagging mechanism to support other
    tagging criteria.
  • Bind object schema to the folder at folder
    creation time.
  • Support for a limited number of built in object
    types. Basic types like integers, doubles as well
    as arrays of integers or doubles
  • Allow extension to user defined types??
  • Use the most efficient internal representation
    for a given object type.
  • All types shall support the same interface,
    though with different degrees of effectiveness
    (orthogonality).
  • This effectively means some sort of embedded
    conversion service.

Jorge Lima - FCUL
21 March 2003
CHEP 2003
19
Results from performance tests
Tests details Client Computers Intel PIII 1GHz
dual processor / Linux 2.4.18-18.7.x.cernsmp /
376 MByte RAM / on-line software release
00-18-01 / gcc 2.96 /Gigabit network between the
clients Server Computer Intel PIV 2GHz / Linux
2.4.18 / 1 GByte RAM / gcc 2.95.4 / MySQL Distrib
3.23.49 / Network????? All tests were performed
using the on-line software infrastructure for
synchronization Each client (controller)
establishes 3 connections to the database server
when performing a test Databases sizes 100
objects - 37.8KByte 1.000 objects - 180.1 KByte
10.000 objects - 1.6 MByte 100.000 objects -
15.4 MByte
Jorge Lima - FCUL
21 March 2003
CHEP 2003
20
Retrieve performance (Single-DB)
Jorge Lima - FCUL
21 March 2003
CHEP 2003
21
Retrieve performance (multi-DB)
Jorge Lima - FCUL
21 March 2003
CHEP 2003
22
Comparisons (single client)
    Oracle (local) MySQL (local) Oracle (remote) MySQL (remote)
createFolderx createFolderx 0m15.173s 0m0.034s 0m11.857s 0m0.072s
storeDatax 10 0m4.973s 0m1.447s 0m5.434s 0m1.127s
storeDatax 100 0m9.749s  0m1.368s 0m15.820s 0m2.345s
storeDatax 10.000 9m22.103s  0m23.175s 11m12.554s 0m56.929s
storeDatax 100.000 109m40.878s 3m49.184s 04m13.283s  8m16.510s
storeDatax 1.000.000  23m12.563s 40m48.256s
readDatax  10 0m0.324s 0m0.025s  0m0.955s  0m0.058s
readDatax  100 0m1.403s 0m0.050s  0m3.061s  0m0.135s
readDatax  10.000 2m1.919s 0m2.851s  4m37.315s  0m7.926s
readDatax  100.000 25m46.423s 0m27.273s  46m53.846s  1m18.850s
readDatax  1.000.000 4m40.315s    10m26.124s
Jorge Lima - FCUL
21 March 2003
CHEP 2003
23
Conclusions
  • The feasibility of an Open Source based RDBMS for
    the ConditionsDB was proven to be possible,
    furthermore
  • The clustering model devised, which should scale
    well over data volume and time, can be used with
    almost any kind of RDBMS backend.
  • Porting between different backend implementations
    doesnt constitute a major effort as verified
    while porting the original implementation to
    Postgress.
  • The MySQL implementation outperformed the
    Oracles based one by a factor of 50 on the most
    critical operations while being slightly slower
    in store operations.
  • Further tests are needed, mainly in multi server
    and many clients environments.
  • Side effects
  • An important effort for collecting requirements
    was deployed and several lacking features were
    identified.
  • The community is adopting the MySQL ConditionsDB.
    (they can easily play around with it).

Jorge Lima - FCUL
21 March 2003
CHEP 2003
24
Where to fetch the latest release
  • Software project for the MySQL's Conditions
    Database implementation, available at the ATLAS
    offline software repository.
  • ATLAS offline software repository
  • CVSROOTkserveratlas-sw.cern.ch/atlascvs
  • Packages offline/Database/ConditionsDBMySQL
    offline/Database/ConditionsDBTests
    offline/Database/IConditionsDB
  • The same package and an another test
    implementation based in Postgres are available
    from FCUL to compile without CMT.
  • Available through CVS and FTP
  • ftp//kdataserv.fis.fc.ul.pt/pub/Software
  • CVSROOTextkdataserv.fis.fc.ul.pt/usr/local/cvs
    root
  • CVS_RSHssh
  • Packages ConditionsDB-MySQL, ConditionsDB-PgSQL

Jorge Lima - FCUL
21 March 2003
CHEP 2003
Write a Comment
User Comments (0)
About PowerShow.com