The PHysics Analysis SERver Project (PHASER) - PowerPoint PPT Presentation

About This Presentation
Title:

The PHysics Analysis SERver Project (PHASER)

Description:

Effort to substantially increase productivity of physicists analyzing multi-TB ... a way for both experts, novices, and 'dinosaurs' to quickly extract information ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 14
Provided by: richar677
Category:

less

Transcript and Presenter's Notes

Title: The PHysics Analysis SERver Project (PHASER)


1
The PHysics Analysis SERver Project(PHASER)
M. Bowen, G. Landsberg, and R. Partridge Brown
University
  • CHEP 2000
  • Padova, Italy
  • February 7-11, 2000

2
What is the PHASER project?
  • Effort to substantially increase productivity of
    physicists analyzing multi-TB summary data sets
  • Our immediate focus is on the DØ experiment
  • 600 million data events/year starting in early
    2001
  • Summary data set expected to grow at rate of
    3TB/year
  • Concentrate on event selection and ntuple
    creation stage
  • transition in data handling from monolithic
    reconstruction processing to the much more
    chaotic processing of summary data by many
    physicisits
  • IO and CPU intensive due to need to apply latest
    calibration, particle ID, and event selection
    algorithms to several hundred million events

3
PHASER Architecture
  • Physics Object Database (POD) stores meta-data
    used by most physics analyses for their initial
    event selection
  • Physics Object and Particle ID tables in POD
    store calibrated 4-vectors, object quality
    variables, and results of particle ID algorithms
  • DVD storage of full summary (mDST) data set and
    useful subsets of larger DST and STA data sets

4
PHASER is PHast
  • New calibrations and particle ID algorithms can
    be quickly incorporated
  • Only the changes need to be importd
  • Regenerating the large mDST data set will only be
    done infrequently
  • Storage of up-to-date calibrations and particle
    ID algorihtms avoids the need to re-apply these
    alogorithms for each event selection pass
  • Particle ID tables are small, making it possible
    to quickly eliminate events not having the
    desired set of physics objects
  • Direct access to full mDST sample on DVD allows a
    mDST subset to be quickly generated for advanced
    analyses developing new algorithms not yet in the
    database

5
The Physics Object Database (POD)
  • Stores fully calibrated meta-data associated with
    the various physics objects
  • leptons, photons, jets, missing ET, secondary
    vertices, triggers, etc.
  • for example, an electron object would have the
    energy, direction, and various quantities used in
    the electron ID algorithms stored
  • Each physics object associated with a table in a
    relational database
  • Primary key uniquely identifies each physics
    object and provides information needed to
    correlate physics objects from a single event
  • Currently use Run, Event, Instance (where
    appropriate) and row number from ntuple used to
    load database
  • Alternative data source index, sequence number,
    and instance

6
Why use a Relational Database?
  • Physics objects typically have a fixed set of
    attributes used for event selection and analysis
  • Independence of tables aids loading, updating
    database
  • Data can be bulk loaded as long as primary key
    is provided in input data stream
  • Several vendors with quite capable products,
    large commercial market

7
Prototype POD
  • Use DØ Run 1 data (1992 - 1996 running period)
  • 62 million events loaded into the database
  • Entire All-Stream data set loaded
  • Data set used by almost all DØ physics analyses
  • Only files with special processing or trigger
    conditions excluded
  • Column-wise ntuple format used for
    importing/exporting data

8
DØ Run 1 POD
  • Including indexes, Run 1 POD occupies 100 GB
  • 58 physics object data
  • 18 indexes on object ET
  • 12 primary keys
  • 12 database overhead

9
POD Benchmarks
  • Z ? ee- candidate event selection
  • 7 seconds to identify 6k events
  • W ? en candidate event selection
  • 18 seconds to identify 86k events
  • Both benchmarks times make use of particle ID
    tables
  • Event selection times compare very favorably with
    1000 CPU hours required to generate ntuples used
    in this study
  • Benchmark Hardware/Software
  • 450 MHz dual-processor Pentium II with 256 MB RAM
  • Database stored on (6) 36 GB disks in Raid 0
    stripe set
  • MS SQL Server running on Windows NT 4.0

10
DVD Storage
  • Provide access to additional event information
    not included in POD
  • DVD-RAM has a number of unique capabilities
  • Less expensive than disk storage, doesnt require
    backup
  • Access to individual events is much faster than
    tape storage
  • Current disk capacity is 2.6 GB, 4.7 GB expected
    soon
  • Commercial DVD libraries hold up to 600 DVD disks
  • 2.8 TB capacity using 4.7 GB DVD-RAM disks
  • Average disk load time of 4.5 s, lt1 hour to cycle
    through 600 disks
  • Up to 6 DVD-RAM drives gives 10 MB/s IO rate

11
Web Interface
  • Plan to develop web-based user interface
  • Interface modelled on 3-tier architecture
    widely used in commercial applications
  • Physicist will enter event selection requirements
    using a Java applet
  • Applet communicates request to Physics
    Intelligence middleware running on PHASER system
    (via CORBA)
  • Translate request to SQL for event selection
  • Verify that request can be accommodated within
    resource constraints
  • Produce the requested output files

12
PHASER Output
  • Several output options
  • List of run and event numbers satisfying the
    request
  • Ntuple created from POD information
  • mDST stream containing requested events from DVD
    library
  • Output files will generally be small enough to
    transfer over the network
  • Larger output files can be written to DVD and
    physically sent to physicist for further analysis

13
Conclusions
  • PHASER offers a way for both experts, novices,
    and dinosaurs to quickly extract information
    about a particular class of events
  • Feasibility of loading Run 1 size physics
    object info into a relational database has been
    demonstrated
  • Significant improvements in event selection time
    has been observed for W/Z benchmarks
  • Expect these results will scale up to Run 2 data
    load
  • Database technology is also potentially useful
    for helping manage complex analyses and storing
    intermediate results
Write a Comment
User Comments (0)
About PowerShow.com