Probability, Statistics and Errors in High Energy Physics - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Probability, Statistics and Errors in High Energy Physics

Description:

Cars pass according to a Poisson distribution with a mean frequency of 1 per minute. The probability of an individual car ... ML has lots of nice properties ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 42

Provided by: pho86

Category:

more less

Transcript and Presenter's Notes

Title: Probability, Statistics and Errors in High Energy Physics

1
Probability, Statistics and Errorsin High Energy
Physics
Wen-Chen Chang Institute of Physics, Academia
Sinica ??? ????? ?????
2
Outline

Errors
Probability distribution Binomial, Poisson,
Gaussian
Confidence Level
Monte Carlo Method

3
Why do we do experiments?

Parameter determination determine the numerical
value of some physical quantity.
Hypothesis testing test whether a particular
theory is consistent with our data.

4
Why estimate errors?

We are concerned not only with the answer but
also with its accuracy.
For example, speed of light 2.998x108 m/sec
(3.09?0.15) x108
(3.09?0.01) x108
(3.09?2) x108

5
Source of Errors

Random (Statistic) error the inability of any
measuring device to give infinitely accurate
answers.
Systematic error uncertainty.

6
Systematic Errors
Systematic Error reproducible inaccuracy
introduced by faulty equipment, calibration, or
technique Bevington

Systematic effects is a general category which
includes effects such as background, scanning
efficiency, energy resolution, angle resolution,
variation of counter efficiency with beam
position and energy, dead time, etc. The
uncertainty in the estimation of such as
systematic effect is called a systematic error
Orear

Errormistake?
Erroruncertainty?
7
Experimental Examples

Energy in a calorimeter EaDb
a b determined by calibration expt
Branching ratio BN/(?NT)
? found from Monte Carlo studies
Steel rule calibrated at 15C but used in warm lab
If not spotted, this is a mistake
If temp. measured, not a problem
If temp. not measured guess ?uncertainty

Repeating measurements doesnt help
8
The Binomial

n trials r successes
Individual success probability p

Variance V???lt(r- ? )2gtltr2gt-ltrgt2 np(1-p)
Mean ?ltrgt?rP( r ) np
A random process with exactly two possible
outcomes which occur with fixed probabilities.
1-p ??p ? q
9
Binomial Examples
p0.1
n50
n5
n20
n10
p0.2
p0.5
p0.8
10
Poisson

Events in a continuum
The probability of observing r independent events
in a time interval t, when the counting rate is ?
and the expected number events in the time
interval is ?.

?2.5
Mean ?ltrgt?rP( r ) ?
Variance V???lt(r- ? )2gtltr2gt-ltrgt2 ?
11
More about Poisson

The approach of the binomial to the Poisson
distribution as N increases.
The mean value of r for a variable with a Poisson
distribution is ? and so is the variance. This is
the basis of the well known n??n formula that
applies to statistical errors in many situations
involving the counting of independent events
during a fixed interval.
As ???, the Poisson distribution tends to a
Gaussian one.

12
Poisson Examples
?2.0
?1.0
?0.5
?25
?10
?5.0
13
Examples

The number of particles detected by a counter in
a time t, in a situation where the particle flux
? and detector are independent of time, and where
counter dead-time ? is such that ? ? ltlt1.
The number of interactions produced in a thin
target when an intense pulse of N beam particles
is incident on it.
The number of entries in a given bin of a
histogram when the data are accumulated over a
fixed time interval.

14
Binomial and Poisson

From an exam paper
A student is standing by the road, hoping to
hitch a lift. Cars pass according to a Poisson
distribution with a mean frequency of 1 per
minute. The probability of an individual car
giving a lift is 1. Calculate the probability
that the student is still waiting for a lift
(a) After 60 cars have passed
(b) After 1 hour

0.99600.5472

b) e-0.6 0.60 /0! 0.5488
15
Gaussian (Normal)

Probability Density

Mean ?ltxgt?xP( x ) dx ?
Variance V???lt(x- ? )2gtltx2gt-ltxgt2 ??
16
Different Gaussians
Theres only one!
Width scaling factor Falls to 1/e of peak at x???
Normalisation (if required)
Location change ?
17
Probability Contents

68.27 within 1?
95.45 within 2?
99.73 within 3?

90 within 1.645 ? 95 within 1.960 ? 99 within
2.576 ? 99.9 within 3.290?
These numbers apply to Gaussians and only
Gaussians
Other distributions have equivalent values which
you could use of you wanted
18
Central Limit Theorem

Or why is the Gaussian Normal?
If a variable x is produced by the convolution of
variables x1,x2xN
I) ltxgt?1?2?N
V(x)V1V2VN
P(x) becomes Gaussian for large N

19
Multidimensional Gaussian
20
Chi squared

Sum of squared discrepancies, scaled by expected
error
Integrate all but 1-D of multi-D Gaussian

21
(No Transcript)
22
About Estimation
Theory
Probability Calculus
Data
Given these distribution parameters, what can we
say about the data?
Given this data, what can we say about the
properties or parameters or correctness of the
distribution functions?
Statistical Inference
Data
Theory
23
What is an estimator?

An estimator (written with a hat) is a function
of the data whose value, the estimate, is
intended as a meaningful guess for the value of
the parameter . (from PDG)

24
What is a good estimator?
One often has to work with less-than-perfect
estimators

A perfect estimator is
Consistent
Unbiassed
Efficient
minimum

Minimum Variance Bound
25
The Likelihood Function
Set of data x1, x2, x3, xN Each x may be
multidimensional never mind Probability depends
on some parameter a a may be multidimensional
never mind Total probability (density) P(x1a)
P(x2a) P(x3a) P(xNa)L(x1, x2, x3, xN
a) The Likelihood
26
Maximum Likelihood Estimation
Given data x1, x2, x3, xN estimate a by
maximising the likelihood L(x1, x2, x3, xN a)

In practice usually maximise ln L as its easier
to calculate and handle just add the ln P(xi) ML
has lots of nice properties
27
Properties of ML estimation

Its consistent
(no big deal)
Its biased for small N
May need to worry
It is efficient for large N
Saturates the Minimum Variance Bound
It is invariant
If you switch to using u(a), then ûu(â)

Ln L
u
û
28
More about ML

It is not right. Just sensible.
It does not give the most likely value of a.
Its the value of a for which this data is most
likely.

Numerical Methods are often needed
Maximisation / Minimisation in gt1 variable is not
easy
Use MINUIT but remember the minus sign

29
ML does not give goodness-of-fit

ML will not complain if your assumed P(xa) is
rubbish
The value of L tells you nothing

Fit P(x)a1xa0 will give a10 constant P L
a0N Just like you get from fitting
30
Least Squares
y

Measurements of y at various x with errors ? and
prediction f(xa)
Probability
Ln L
To maximise ln L, minimise ?2

x
So ML proves Least Squares. But what proves
ML? Nothing
31
Least Squares The Really nice thing

Should get ?2?1 per data point
Minimise ?2 makes it smaller effect is 1 unit
of ?2 for each variable adjusted. (Dimensionality
of MultiD Gaussian decreased by 1.)
Ndegrees Of FreedomNdata pts N parameters
Provides Goodness of agreement figure which
allows for credibility check

32
Chi Squared Results

Large ?2 comes from
Bad Measurements
Bad Theory
Underestimated errors
Bad luck

Small ?2 comes from
Overestimated errors
Good luck

33
Fitting Histograms

Often put xi into bins
Data is then nj
nj given by Poisson,
mean f(xj) P(xj)?x
4 Techniques
Full ML
Binned ML
Proper ?2
Simple ?2

x
x
34
What you maximise/minimise

Full ML
Binned ML
Proper ?2
Simple ?2

35
Confidence LevelMeaning of Error Estimates

How often we expect to include the true fixed
value of our paramter P0, within our quoted
range, p??p, for a repeated series of
experiments?
For the actual value P0, the probability that a
measurement will give us an answer in a specific
range of p is given by the area under the
relevant part of Gaussian curve. A conventional
choice of this probability is 68.

36
The Straightforward Example
Apples of different weights Need to describe the
distribution ? 68g ? 17 g
All weights between 24 and 167 g (Tolerance) 90
lie between 50 and 100 g 94 are less than 100
g 96 are more than 50 g
Confidence level statements
50 100
37
Confidence Levels
L
U

Can quote at any level
(68, 95, 99)
Upper or lower or two-sided
(xltU xltL LltxltU)
Two-sided has further choice
(central, shortest)

U
38
Maximum Likelihood and Confidence Levels

ML estimator (large N) has variance given by MVB
At peak For large N
Ln L is a parabola (L is a Gaussian)

Ln L
Falls by ½ at
a
Falls by 2 at
Read off 68 , 95 confidence regions
39
Monte Carlo Calculations

The Monte Carlo approach provides a method of
solving probability theory problems in situations
where the necessary integrals are too difficult
to perform.
Crucial element random number generator.

40
An Example
41
References

Lectures and Notes on Statistics in HEP,
http//www.ep.ph.bham.ac.uk//group/locdoc/lectures
/stats/index.html
Lecture notes of Prof. Roger Barlow,
http//www.hep.man.ac.uk/u/roger/
Louis Lyons, Statistics for Nuclear and Particle
Physicists, Cambridge 1986.
Particle Data Group, http//pdg.lbl.gov/2004/revie
ws/contents_sports.htmlmathtoolsetc

Write a Comment

User Comments (0)