Introduction to Analysis Methods Part I - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Introduction to Analysis Methods Part I

Description:

Toy Example with more statistics ... analyze data after binning the data with different S/B or some monotonic function of S/B ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 35
Provided by: suy61
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Analysis Methods Part I


1
Introduction to Analysis Methods Part I
  • Suyong Choi
  • SKKU

2
Outline
  • Steps of Data Analysis
  • Signal significance
  • Cut based and advanced analysis methods

3
Analysis and Analysis Methods
  • Extraction of physical parameters from data
  • Mass
  • Charge
  • Cross section
  • Branching fraction
  • What is the best method to use?
  • Unbiased we can usually correct for this if
    bias is known
  • Efficient smallest variance or uncertainty
  • Biased estimator usually has less sensitivity

4
Data Analysis Steps
All Data
  • High efficiency for data
  • Detector/trigger problems
  • Not calibrated

Cleaned up Calibrated Data
  • Reject events with known detector problems
  • Reject unnormalizable data
  • Reject cosmic ray events
  • Apply detector calibration

Preselection
  • Select on triggers, loose selection
  • high efficiency signal
  • moderate background rejection
  • Understand background

Final Selection
5
Data Analysis Steps
  • Data clean up
  • Reject known bad detector problems
  • Reject unnormalizable data problem with
    luminosity measurement (usually due to triggering
    problems)
  • Cosmic ray event rejection
  • Apply Calibration/correction
  • Tracker Known energy loss accomodated in
    software. Additional correction usually necessary
  • Calorimeter Extremely important, raw energy
    unusable.
  • For data/MC comparisons, efficiency correction
    made to the simulation

6
Data Analysis Steps preselection
  • Trigger selection
  • For signal sample use unprescaled
  • Trigger efficiencies measured on unbiased sample
    for a well-known samples -
  • Usually dictated by final state particles and
    kinematics
  • Offline selection
  • Triggered objects are necessarily dirty
    speed, efficiency
  • Objects from offline reconstruction are closer to
    physics
  • Kinematic, geometric criteria
  • Topological (angular)

Triggered Object
Reconstructed Object
7
Data Analysis Steps Selection
  • After preselection, data is dominated by
    backgrounds
  • This data should be used to check simulations of
    backgrounds extensively assuming the signal is
    small
  • The aim of selection is to reject background
    while keeping signal as much as possible
  • Isolate signal
  • How much is good?? Best measurement is what we
    wantor Best possibility for discovery

8
Toy Example
  • A mix of signal (gaussian) and background (flat)
    in some phase space
  • Hypothetical signal is at 50, gaussian width of 5
  • Ratio of signal to background 3100
  • With 1000 events you dont see anything

9
Toy Example with more statistics
  • With enough data, even with poor S/B the bump
    will eventually be seen
  • With x10 more data you clearly see things

10
Background Prediction
  • Dont define signal region by looking at the data
    first!
  • Expected background events in signal region
    obtained by
  • Sideband method
  • MC Simulated events
  • ? We get mean number of expected bkg events
  • Expect 68 of the times that the actual of bkg
    events is

Signal window
11
Figure of Merit Signal Significance
  • Lets say total number of events from data in
    signal window is N
  • Goal of event selection make N sufficiently
    greater than expected background (B)

B
Signal significance
Signal window
12
Signal Significance and Measurement
  • Lets say our measurement is cross section
  • Fractional error on cross section measurement
  • If we can neglect other errors
  • ? Maximizing signal significance minimizes
    measurement error.

13
Signal Significance in Our Toy Example
  • 15 bins 193 3000
  • Sqrt(3300)57
  • (3300-3000)/57 5

14
With Larger S/B
  • Same 300 signal events
  • Bkg
  • 15 94 1410
  • Sqrt(1700) 41
  • Significance 270 / 41 6.6
  • Optimal selection
  • If two different selections have different B for
    the same S, then one with smaller B is more
    optimal
  • Real life selection reduces both B and S. In this
    case, one having greatest signal significance is
    the one to use

Compare Errors with Previous page
15
Signal Significance and Discovery
  • With more luminosity, signal significance
    improves
  • For searches, we usually deal with S/B and
    sometimes you see the significance defined as
    follows

16
Cut Based Analysis
  • Series of selections to increase signal
    significance
  • From the simple to complicated topological
    variables
  • Finally, distribution of the most sensitive
    variable is made
  • Simple counting
  • Distribution fitting
  • Problems of cut-based analysis
  • Throws away signal
  • Which variables to cut on?
  • Which order to cut?

17
Samples with Different S/B
  • Phase space occupied by data is multidimensional
  • Measurements pT, h, ? of leptons and jets
  • In this space, there are regions with different
    S/B
  • What is the best way to use this of data to make
    the best measurement? A Use them separately

18
Optimal Analysis
  • Optimal decision theory tells us toanalyze data
    after binning the data with different S/B or some
    monotonic function of S/B
  • Of course, we need ways to calculate S/B or
    S/(SB) given an event

19
Signal and Background Likelihoods
  • Given a set of measurements in an event, we ask
    what is the probability that this event is due to
    signal?
  • LS (LB) are called signal (background) likelihood
  • Connection with Bayes theorem

20
Example
Absolute value of Likelihood is meaningless
x
x1
x2
  • x1 PS(x1)0 ? P(Sx1)0
  • x2 PS(x2) 2PB(x2) ? P(Sx2)2/3

21
Optimal Analysis in Multidimensions
  • Calculating the likelihoods in multidimensions is
    difficult
  • We must rely on multivariate analysis techniques
  • Multidimensional likelihoods
  • Neural networks
  • Boosted decision trees
  • Support vector machines
  • What do they do?
  • They try to reconstruct P(Sx) from training
    samples
  • We treat multivariate analysis tools as mostly
    black boxes
  • ROOT v5 has TMVA class with many of these methods
    implemented

22
Multivariate Analysis
  • All we really need to know is the likelihood
    ratio, or posterior probability for signal given
    data
  • A general method is given by the Artificial
    Neural Network (ANN)
  • With 1 hidden layer, it can approximate any
    continuous function
  • With 2 hidden layers, it can approximate even
    discontinuous functions

23
NN Training
  • Important parts are the weights and thresholds
  • of hidden nodes
  • And sometimes the non-linear response function g
  • We train to get these parameters
  • Backpropagation training
  • Prepare a training sample and a test sample of
    signal and background
  • For signal, desired output is 1. For bkg, 0
  • Minimize training error
  • Numerical methods are used
  • There are many implementations of ANN

24
Neural Network Example
  • 2-D
  • Signal points rlt1
  • Background points 1ltrlt2
  • NN architecture - 1 hidden node

25
1 Hidden Node Training Result
26
  • Large portion of the background is mistaken as
    signal

27
2 Hidden Nodes
  • Training error is reduced

28
Example
  • Result of the training

29
(No Transcript)
30
7 nodes
31
7 Hidden Node Result
  • How many hidden nodes?
  • Depends on how complicated signal and background
    distributions in multidimensional space is

32
Higgs Search Example at D0
33
Conclusion
  • In a cut-based analysis, goal is not to improve
    S/B, but to improve signal significance
  • For optimal analysis, we should make use of all
    data where signal is present, no matter how small
  • Selection should be minimal, leave the rest up to
    multivariate tools

34
References
  • Particle Data Groups Statistics Review
    http//pdg.lbl.gov/2007/reviews/statrpp.pdf
  • Probability and Statistics in Experimental
    Physics Springer-Verlag
  • Look up ROOT TMVA class in v5.12 or higher
    http//root.cern.ch/root/Reference.html
  • Phystat 2003 conference proceedings
    http//www.slac.stanford.edu/econf/C030908/
  • Advanced material
Write a Comment
User Comments (0)
About PowerShow.com