Computing in HEP - PowerPoint PPT Presentation

About This Presentation
Title:

Computing in HEP

Description:

Users know and love/hate the software, and they don't want to change ... extensible library based on Gemini engine. Gemini - core fitting engine based on ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 29
Provided by: andreasp9
Category:

less

Transcript and Presenter's Notes

Title: Computing in HEP


1
Computing in HEP
  • A Introduction to Data Analysis in High Energy
    Physics
  • Max Sang
  • Applications for Physics Infrastructure Group
  • IT Division, CERN, Geneva
  • max.sang_at_cern.ch

2
Introduction to HEP
  • Accelerators produce high intensity, high energy
    beams of particles like protons or electrons.
  • Detectors are huge, multi-layered electronic
    devices constructed around the points where the
    beams collide with targets or other beams.
  • Planned and constructed by multinational
    collaborations of hundreds of people over several
    years.
  • Once operational, they run for years (e.g. LEP
    program 1989-2000).

3
The Large Hadron Collider
CERN
Eight underground caverns for detectors
27km circumference 100m below surface First beam
2006
4
CMS
  • Under construction now - ready 2006
  • 21 m long, 15 m diameter
  • 12500 tons
  • As much iron as the Eiffel Tower
  • 1900 physicists from 31 countries

5
Introduction to HEP (II)
  • Events are like photographs of individual
    subatomic interactions taken by the detectors.
  • Events produced at high rates (kHz-MHz) for
    months at a time with minimal human intervention.
    Analysis continues for years.
  • Fundamental physics processes are quantum
    (probabilistic). They are uncorrelated
    (consecutive events unconnected) but occur at a
    wide range of frequencies - some very rare. Some
    are more interesting than others...

6
Introduction to HEP (III)
  • Data are grouped into runs, periods, years.
    Calibrations, detector faults, beam conditions,
    etc. are associated with certain time periods,
    e.g. The calorimeter was off during run 1234
  • Event Generators simulate the collisions and
    and produce the final state particles.
  • These are processed by simulated detectors to
    produce Monte Carlo data for comparison with
    what we see in the real thing. Iterative process
    of comparison, tuning, model verification.

7
Extracting the Data
  • Passage of particles through detector components
    produces ionisation which is amplified to a
    detectable level.
  • Front-end electronics turn pulses into digits.
  • Hardware processing turns digits into hits.
  • Software turns hits into tracks, clusters
    etc.
  • Multi-level trigger/filter decides what events to
    keep (sometimes only one event in 107).
  • Online reconstruction ? storage.

8
The LEP Era (Started 1989)
  • Four detectors (300 people each) producing
  • 50 kHz collision rate ? 5 Hz storage rate.
  • Event size 100kB, reconstructed by small farm of
    O(10) very high-end workstations.
  • lt 500 GB/year/experiment
  • Stored on tape (with disk caching) at CERN.
  • Analysed on mainframes by remote batch jobs.
  • Ntuples (? 100MB) returned to user for more
    (interactive) analysis and calculation. Plots
    produced for presentations and papers.

9
The LHC Era (Starts 2006)
  • 4 detectors (6k people in total)
  • 50 MHz collision rate ? 100 Hz storage rate.
  • 500 GB/s raw data rate after triggering.
  • Event size 1-2 MB, reconstructed by farm of 1k
    PCs.
  • 1 PB/year/experiment in 2007, increasing rapidly.
    Total by 2015 for all detectors 100 PB.
  • Searches may look for single events in 107. Every
    user (in 30 countries) will want to eat millions
    of events at a single sitting, with reasonably
    democratic data access.

10
Physicists are also Programmers
  • All data analysis done using computers
  • The physicists are all programmers, but almost
    none of them have any formal CS training
  • Some will be very experienced (usually F77). Will
    write lots of code for reconstruction, triggering
    etc.
  • Others write more modest programs for their own
    data analysis.
  • Some will be fresh graduate students whove never
    written a line of code.
  • Our job is to help them do physics.

11
What Software do they Need?
  • Experiment-specific code
  • Triggering, data acquisition, slow controls,
    reconstruction, new physics code
  • Mostly written by the experimentalists without
    assistance
  • Event generators
  • Highly technical, constantly in flux
  • Written by phenomenologists
  • We dont help with these!

12
What Software do they Need?(II)
  • Specialised HEP tools
  • Detector simulation tools, relativistic
    kinematics, ...
  • General purpose scientific tools with a HEP slant
  • Data visualisation, histogramming, ...
  • General purpose technical libraries
  • Random numbers, matrices, geometry, analytical
    statistics, 2D and 3D graphics, ...
  • We do help with these!

13
The Situation in 1995
  • Millions of lines of F77, some of it very
    technical
  • Thousands of man-years of debugging
  • Users know and love/hate the software, and they
    dont want to change
  • Serious and unavoidable maintenance commitment
    for old code - F77 is here to stay!
  • Shrinking manpower in IT division
  • Not long until the start of the LHC programme.
    Change now or wait until 2020!

14
The Old Software
  • Largely home-grown in 70s and 80s
  • Persistent storage and memory management ZEBRA
  • Code management PATCHY
  • Scripting KUIP/COMIS
  • Histograms and Ntuples HBOOK
  • Detector simulation GEANT 3
  • Fitting Minimisation MINUIT
  • Mathematics, random numbers, kinematics MATHLIB
  • Graphics HIGZ/HPLOT
  • Visualisation and interactive analysis PAW

15
The Anaphe Project
  • Provide a modern, object-oriented, more flexible,
    more powerful replacement for CERNLIB with fewer
    people in less time.
  • Identify areas where commercial and/or Open
    Source products can (or must) be used instead of
    home-grown solutions
  • Concentrate efforts on HEP-specific tasks
  • Use object-oriented techniques and plan for very
    long term maintenance and evolution
  • Detector simulation is a separate project (v. big)

16
Commodity Solutions
  • Luckily, computing has also evolved.
  • What can we get off-the-shelf?
  • Open Source tools
  • Code management (CVS)
  • Graphics (Qt, OpenGL)
  • Scripting (Python, Perl)
  • Commercial products
  • Persistency (Objectivity OODB)
  • Mathematics (Nag library CERN edition)

17
HEP Community Developments
  • Not everything is being done solely at CERN!
  • CLHEP - C class libraries for HEP
  • Random numbers
  • 3D geometry, vectors, matrices, kinematics
  • Units and dimensions
  • Generic HEP classes (particles, decay chains etc)
  • Generators being moved (slowly) to C
  • The competition (JAS, Open Scientist, Root)

18
Anaphe C Libraries (I)
  • Fitting FML (fitting and minimisation library)
  • Flexible, extensible library based on Gemini
    engine
  • Gemini - core fitting engine based on Nag or
    MINUIT
  • Histograms HTL (histogram template library)
  • Histograms are statistical distributions of
    measured quantities - the workhorse of HEP
    analysis. Must be flexible, extensible and very
    efficient.

19
Anaphe C Libraries (II)
  • QPlotter Graphics package
  • For drawing histograms and more
  • Based on Qt (superset of Motif)
  • NtupleTag
  • Extends concept of ntuple ( static table of
    data)
  • Can add with new columns as you work
  • Can navigate back to original events
  • Smart clustering of data
  • See Zsolts presentation...

20
Interactive Analysis
  • Analysis in HEP Data Mining
  • Extract parameters from large multi-dimensional
    samples.
  • Typical tasks
  • Plot one or more variables with cuts on yet
    others - exploring the variable space.
  • Perform statistical tests on distributions
    (fitting, moments etc.)
  • Produce histograms etc. for papers or talks.

21
Interactive Analysis (II)
  • Almost all analyses begin as interactive
    playing with the data and progress organically
    to large, complex, CPU intensive procedures.
  • Step 1 single commands to a script interpreter
    e.g. plot x for all events with y gt 5
  • Step 2 multi-command scripts/macros
  • Step 3 procedures can be translated into C
    functions and called interactively
  • Step 4 user can build new libraries and interact
    with them through the command line (etc...)

22
Interactive Analysis (III)
  • The progression from command line, to macro, to
    compiled library, should be smooth and simple.
  • Doing the easy things should be easy to allow
    rapid development and prototyping of algorithms.
  • Doing complex things then becomes significantly
    easier than starting from scratch in C
  • Distributed analysis must also be possible (see
    Kubas talk)

23
Lizard (I)
  • Interactive environment for data analysis using
    the other Anaphe components
  • First prototype (with limited functionality)
    available since CHEP 2000
  • Re-design started in April 2000
  • Beta version October 2000
  • Full version out since June 2001
  • Much more work and testing to do, but already
    approaching (and surpassing) PAW functionality
  • Embedded in Python

24
Lizard (II)
  • Architecture
  • Everything interacts with everything else through
    their abstract interfaces so the implementation
    is hidden.
  • Commander C classes load the implementation
    classes at run time and become proxies for them.
  • Use SWIG to generate shadow classes from the
    Commander header files. These are compiled into
    the Python library and become accessible as new
    Python objects.
  • Swapping components at run time becomes trivial.

25
Lizard Screenshot
26
Behind the Scenes
Automatically generated by SWIG
AIDA Interfaces
User
Controller Shadow classes
C interfaces
Python
C implementations
Anaphe implementations
27
AIDA
  • Use of abstract interfaces promotes weak coupling
    between components.
  • AIDA (Abstract Interfaces for Data Analysis)
    project is extending this to community-wide
    standard interfaces which will allow use of C
    components in Java and vice versa.
  • Developers only need to learn one way of
    interacting with a histogram, which works with
    all compliant implementations.

28
Summary
  • HEP has (and has always had) serious computing
    requirements
  • The old model (F77 monoliths) is no longer
    workable in the LHC era
  • New software in C and Java uses modern software
    design to plan for the long term
  • Anaphe is CERN IT divisions contribution
  • Flexible, extensible, modular, efficient
  • The LHC is coming and we must be ready!

29
Further information
  • More information about the detectors and HEP in
    general
  • http//cmsinfo.cern.ch
  • http//cern.ch/atlas
  • CERN IT Division
  • http//cern.ch/IT
  • The Anaphe project
  • http//cern.ch/Anaphe
Write a Comment
User Comments (0)
About PowerShow.com