Probability, Statistics and Errors in High Energy Physics - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Probability, Statistics and Errors in High Energy Physics

Description:

Cars pass according to a Poisson distribution with a mean frequency of 1 per minute. The probability of an individual car ... ML has lots of nice properties ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 42
Provided by: pho86
Category:

less

Transcript and Presenter's Notes

Title: Probability, Statistics and Errors in High Energy Physics


1
Probability, Statistics and Errorsin High Energy
Physics
Wen-Chen Chang Institute of Physics, Academia
Sinica ??? ????? ?????
2
Outline
  • Errors
  • Probability distribution Binomial, Poisson,
    Gaussian
  • Confidence Level
  • Monte Carlo Method

3
Why do we do experiments?
  • Parameter determination determine the numerical
    value of some physical quantity.
  • Hypothesis testing test whether a particular
    theory is consistent with our data.

4
Why estimate errors?
  • We are concerned not only with the answer but
    also with its accuracy.
  • For example, speed of light 2.998x108 m/sec
  • (3.09?0.15) x108
  • (3.09?0.01) x108
  • (3.09?2) x108

5
Source of Errors
  • Random (Statistic) error the inability of any
    measuring device to give infinitely accurate
    answers.
  • Systematic error uncertainty.

6
Systematic Errors
Systematic Error reproducible inaccuracy
introduced by faulty equipment, calibration, or
technique Bevington
  • Systematic effects is a general category which
    includes effects such as background, scanning
    efficiency, energy resolution, angle resolution,
    variation of counter efficiency with beam
    position and energy, dead time, etc. The
    uncertainty in the estimation of such as
    systematic effect is called a systematic error
  • Orear

Errormistake?
Erroruncertainty?
7
Experimental Examples
  • Energy in a calorimeter EaDb
  • a b determined by calibration expt
  • Branching ratio BN/(?NT)
  • ? found from Monte Carlo studies
  • Steel rule calibrated at 15C but used in warm lab
  • If not spotted, this is a mistake
  • If temp. measured, not a problem
  • If temp. not measured guess ?uncertainty

Repeating measurements doesnt help
8
The Binomial
  • n trials r successes
  • Individual success probability p

Variance V???lt(r- ? )2gtltr2gt-ltrgt2 np(1-p)
Mean ?ltrgt?rP( r ) np
A random process with exactly two possible
outcomes which occur with fixed probabilities.
1-p ??p ? q
9
Binomial Examples
p0.1
n50
n5
n20
n10
p0.2
p0.5
p0.8
10
Poisson
  • Events in a continuum
  • The probability of observing r independent events
    in a time interval t, when the counting rate is ?
    and the expected number events in the time
    interval is ?.

?2.5
Mean ?ltrgt?rP( r ) ?
Variance V???lt(r- ? )2gtltr2gt-ltrgt2 ?
11
More about Poisson
  • The approach of the binomial to the Poisson
    distribution as N increases.
  • The mean value of r for a variable with a Poisson
    distribution is ? and so is the variance. This is
    the basis of the well known n??n formula that
    applies to statistical errors in many situations
    involving the counting of independent events
    during a fixed interval.
  • As ???, the Poisson distribution tends to a
    Gaussian one.

12
Poisson Examples
?2.0
?1.0
?0.5
?25
?10
?5.0
13
Examples
  • The number of particles detected by a counter in
    a time t, in a situation where the particle flux
    ? and detector are independent of time, and where
    counter dead-time ? is such that ? ? ltlt1.
  • The number of interactions produced in a thin
    target when an intense pulse of N beam particles
    is incident on it.
  • The number of entries in a given bin of a
    histogram when the data are accumulated over a
    fixed time interval.

14
Binomial and Poisson
  • From an exam paper
  • A student is standing by the road, hoping to
    hitch a lift. Cars pass according to a Poisson
    distribution with a mean frequency of 1 per
    minute. The probability of an individual car
    giving a lift is 1. Calculate the probability
    that the student is still waiting for a lift
  • (a) After 60 cars have passed
  • (b) After 1 hour
  • 0.99600.5472

b) e-0.6 0.60 /0! 0.5488
15
Gaussian (Normal)
  • Probability Density

Mean ?ltxgt?xP( x ) dx ?
Variance V???lt(x- ? )2gtltx2gt-ltxgt2 ??
16
Different Gaussians
Theres only one!
Width scaling factor Falls to 1/e of peak at x???
Normalisation (if required)
Location change ?
17
Probability Contents
  • 68.27 within 1?
  • 95.45 within 2?
  • 99.73 within 3?

90 within 1.645 ? 95 within 1.960 ? 99 within
2.576 ? 99.9 within 3.290?
These numbers apply to Gaussians and only
Gaussians
Other distributions have equivalent values which
you could use of you wanted
18
Central Limit Theorem
  • Or why is the Gaussian Normal?
  • If a variable x is produced by the convolution of
    variables x1,x2xN
  • I) ltxgt?1?2?N
  • V(x)V1V2VN
  • P(x) becomes Gaussian for large N

19
Multidimensional Gaussian
20
Chi squared
  • Sum of squared discrepancies, scaled by expected
    error
  • Integrate all but 1-D of multi-D Gaussian

21
(No Transcript)
22
About Estimation
Theory
Probability Calculus
Data
Given these distribution parameters, what can we
say about the data?
Given this data, what can we say about the
properties or parameters or correctness of the
distribution functions?
Statistical Inference
Data
Theory
23
What is an estimator?
  • An estimator (written with a hat) is a function
    of the data whose value, the estimate, is
    intended as a meaningful guess for the value of
    the parameter . (from PDG)

24
What is a good estimator?
One often has to work with less-than-perfect
estimators
  • A perfect estimator is
  • Consistent
  • Unbiassed
  • Efficient
  • minimum

Minimum Variance Bound
25
The Likelihood Function
Set of data x1, x2, x3, xN Each x may be
multidimensional never mind Probability depends
on some parameter a a may be multidimensional
never mind Total probability (density) P(x1a)
P(x2a) P(x3a) P(xNa)L(x1, x2, x3, xN
a) The Likelihood
26
Maximum Likelihood Estimation
Given data x1, x2, x3, xN estimate a by
maximising the likelihood L(x1, x2, x3, xN a)

In practice usually maximise ln L as its easier
to calculate and handle just add the ln P(xi) ML
has lots of nice properties
27
Properties of ML estimation
  • Its consistent
  • (no big deal)
  • Its biased for small N
  • May need to worry
  • It is efficient for large N
  • Saturates the Minimum Variance Bound
  • It is invariant
  • If you switch to using u(a), then ûu(â)

Ln L
u
û
28
More about ML
  • It is not right. Just sensible.
  • It does not give the most likely value of a.
    Its the value of a for which this data is most
    likely.
  • Numerical Methods are often needed
  • Maximisation / Minimisation in gt1 variable is not
    easy
  • Use MINUIT but remember the minus sign

29
ML does not give goodness-of-fit
  • ML will not complain if your assumed P(xa) is
    rubbish
  • The value of L tells you nothing

Fit P(x)a1xa0 will give a10 constant P L
a0N Just like you get from fitting
30
Least Squares
y
  • Measurements of y at various x with errors ? and
    prediction f(xa)
  • Probability
  • Ln L
  • To maximise ln L, minimise ?2

x
So ML proves Least Squares. But what proves
ML? Nothing
31
Least Squares The Really nice thing
  • Should get ?2?1 per data point
  • Minimise ?2 makes it smaller effect is 1 unit
    of ?2 for each variable adjusted. (Dimensionality
    of MultiD Gaussian decreased by 1.)
  • Ndegrees Of FreedomNdata pts N parameters
  • Provides Goodness of agreement figure which
    allows for credibility check

32
Chi Squared Results
  • Large ?2 comes from
  • Bad Measurements
  • Bad Theory
  • Underestimated errors
  • Bad luck
  • Small ?2 comes from
  • Overestimated errors
  • Good luck

33
Fitting Histograms
  • Often put xi into bins
  • Data is then nj
  • nj given by Poisson,
  • mean f(xj) P(xj)?x
  • 4 Techniques
  • Full ML
  • Binned ML
  • Proper ?2
  • Simple ?2

x
x
34
What you maximise/minimise
  • Full ML
  • Binned ML
  • Proper ?2
  • Simple ?2

35
Confidence LevelMeaning of Error Estimates
  • How often we expect to include the true fixed
    value of our paramter P0, within our quoted
    range, p??p, for a repeated series of
    experiments?
  • For the actual value P0, the probability that a
    measurement will give us an answer in a specific
    range of p is given by the area under the
    relevant part of Gaussian curve. A conventional
    choice of this probability is 68.

36
The Straightforward Example
Apples of different weights Need to describe the
distribution ? 68g ? 17 g
All weights between 24 and 167 g (Tolerance) 90
lie between 50 and 100 g 94 are less than 100
g 96 are more than 50 g
Confidence level statements
50 100
37
Confidence Levels
L
U
  • Can quote at any level
  • (68, 95, 99)
  • Upper or lower or two-sided
  • (xltU xltL LltxltU)
  • Two-sided has further choice
  • (central, shortest)

U
38
Maximum Likelihood and Confidence Levels
  • ML estimator (large N) has variance given by MVB
  • At peak For large N
  • Ln L is a parabola (L is a Gaussian)

Ln L
Falls by ½ at
a
Falls by 2 at
Read off 68 , 95 confidence regions
39
Monte Carlo Calculations
  • The Monte Carlo approach provides a method of
    solving probability theory problems in situations
    where the necessary integrals are too difficult
    to perform.
  • Crucial element random number generator.

40
An Example
41
References
  • Lectures and Notes on Statistics in HEP,
    http//www.ep.ph.bham.ac.uk//group/locdoc/lectures
    /stats/index.html
  • Lecture notes of Prof. Roger Barlow,
    http//www.hep.man.ac.uk/u/roger/
  • Louis Lyons, Statistics for Nuclear and Particle
    Physicists, Cambridge 1986.
  • Particle Data Group, http//pdg.lbl.gov/2004/revie
    ws/contents_sports.htmlmathtoolsetc
Write a Comment
User Comments (0)
About PowerShow.com