Title: Uncertainties of Parton Distribution Functions
1Uncertainties of Parton Distribution Functions
Daniel Stump
Michigan State University CTEQ
2High energy particles interact through their
quark and gluon constituents the
partons. Asymptotic freedom the parton cross
sections can be approximated by perturbation
theory. Factorization theorem Parton
distribution functions in the nucleon are the
link between the PQCD theory and measurements on
nucleons.
3Parton distribution functions are important.
4The goals of QCD global analysis are
- to know the uncertainties of the PDFs
- to enable predictions, including uncertainties.
5The systematic study of uncertainties of PDFs
developed slowly. Pioneers
J. Collins and D. Soper, CTEQ Note 94/01,
hep-ph/9411214. C. Pascaud and F. Zomer,
LAL-95-05. M. Botje, Eur. Phys. J. C 14, 285
(2000).
Today many groups and individuals are involved in
this research.
6Current research on PDF uncertainties
CTEQ group at Michigan State (J. Pumplin, D.
Stump, WK. Tung, HL. Lai, P. Nadolsky, J.
Huston, R. Brock) and others (J. Collins, S.
Kuhlmann, F. Olness, J. Owens) MRST group (A.
Martin, R. Roberts, J. Stirling, R.
Thorne) Fermilab group (W. Giele, S. Keller, D.
Kosower) S. I. Alekhin V. Barone, C. Pascaud,
F. Zomer add B. Portheault HERA
collaborations ZEUS S. Chekanov et al A.
Cooper-Sarkar H1 C. Adloff et al
7Outline of this talk (focusing on CTEQ results)
- General comments CTEQ6
- Our treatment of experimental systematic errors
- Compatibility of data sets
- Uncertainty analysis
- 2 case studies
- inclusive jet production in ppbar or pp
- strangeness asymmetry
8Global Analysis
of short-distance processes using perturbative
QCD (NLO)
The challenge of Global Analysis is to construct
a set of PDFs with good agreement between data
and theory, for many disparate experiments.
9The program of Global Analysis is not a routine
statistical analysis, because of systematic
differences between experiments. We must
sometimes use physics judgement in this complex
real-world problem.
10Parametrization
At low Q0 , of order 1 GeV,
P(x) has a few more parameters for increased
flexibility. 20 free shape parameters
Q dependence of f(x,Q) is obtained by solving the
QCD evolution equations (DGLAP).
11CTEQ6 -- Table of experimental data sets
H1 (a) 96/97 low-x ep data ZEUS 96/97
ep data H1 (b) 98/99 high-Q e-p data D0
d2s/d? dpT
12Global Analysis data from many disparate
experiments
13The Parton Distribution Functions
14Different ways to plot the parton distributions
Linear
Logarithmic
Q2 10 (solid) and 1000 (dashed) GeV2
15In order to show the large and small x regions
simultaneously, we plot 3x5/3 f(x) versus x1/3.
Integral momentum fraction
16Comparison of CTEQ6 and MRST2002
blue curves CTEQ6M black dots MRTS2002
gluon and u quark at Q2 10 GeV2
17Our treatment of systematic errors
18What is a systematic error?
This is why people are so frightened of
systematic errors, and most other textbooks avoid
the subject altogether. You never know whether
you have got them and can never be sure that you
have not like an insidious disease The good
news, however, is that despite popular prejudices
and superstitions, once you know what your
systematic errors are, they can be handled with
standard statistical methods. R. J.
Barlow Statistics
19Imagine that two experimental groups have
measured a quantity ? , with the results shown.
OK, what is the value of ? ?
This is very analogous to what happens in global
analysis of PDFs. But in the case of PDFs the
systematic differences are only visible through
the PDFs.
20We use ?2 minimization with fitting of systematic
errors.
For statistical errors define
(S. D.)
Ti Ti(a1, a2, ..,, ad) a function of d theory
parameters
- Minimize ?2 w. r. t. am ? optimal parameter
values a0m. - All this would be based on the assumption that
- Di Ti(a0) ?i ri
-
21Treatment of the normalization error
In scattering experiments there is an overall
normalization uncertainty from uncertainty of the
luminosity. We define
where fN overall normalization factor
Minimize ?2 w. r. t. both am and fN.
22A method for general systematic errors
ai statistical error of Di bij set of
systematic errors (j1K) of Di
Define
quadratic penalty term
Minimize c?2 with respect to both shape
parameters am and optimized systematic shifts
sj.
23Because c?2 depends quadratically on sj we can
solve for the systematic shifts analytically, s ?
s0(a). Then let,
and minimize w.r.t am.
The systematic shifts sj are continually
optimized s ? s0(a)
24- So, we have accounted for
- Statistical errors
- Overall normalization uncertainty (by fitting
fN,e) - Other systematic errors (analytically)
We may make further refinements of the fit with
weighting factors
Default we and wN,e 1
The spirit of global analysis is compromise the
PDFs should fit all data sets satisfactorily. If
the default leaves some experiments unsatisfied,
we may be willing to reduce the quality of fit to
some experiments in order to fit better another
experiment. (We use this sparingly!)
25Quality
How well does this fitting procedure work?
26Comparison of the CTEQ6M fit to the H1 data in
separate x bins. The data points include
optimized shifts for systematic errors. The error
bars are statistical only.
27Comparison of the CTEQ6M fit to the inclusive jet
data. (a) D0 cross section versus pT for 5
rapidity bins (b) CDF cross section for central
rapidity.
28How large are the optimized normalization factors?
Expt f N
BCDMS 0.976
H1 (a) 1.010
H1 (b) 0.988
ZEUS 0.997
NMC 1.011
CCFR 1.020
E605 0.950
D0 0.974
CDF 1.004
29We must always check that the systematic shifts
are not unreasonably large.
10 systematic shifts NMC data
11 systematic shifts ZEUS data
j sj
1 1.67
2 -0.67
3 -1.25
4 -0.44
5 0.00
6 -1.07
7 1.28
8 0.62
9 -0.40
10 0.21
j sj
1 0.67
2 -0.81
3 -0.35
4 0.25
5 0.05
6 0.70
7 -0.31
8 1.05
9 0.61
10 0.26
11 0.22
30Comparison to NMC F2
without systematic shifts
31A study of compatibility
32N c2 c2/N
1 BCDMS F2p 339 366.1 1.08
2 BCDMS F2d 251 273.6 1.09
3 H1 (a) 104 97.8 0.94
4 H1 (b) 126 127.3 1.01
5 H1 (c ) 129 108.9 0.84
6 ZEUS 229 261.1 1.14
7 CDHSW F2 85 65.6 0.77
8 NMC F2p 201 295.5 1.47
9 NMC d/p 123 115.4 0.94
10 CCFR F2 69 84.9 1.23
Table of Data Sets
The PDFs are not exactly CTEQ6 but very close
a no-name generic set of PDFs for illustration
purposes.
11 E605 119 94.7 0.80
12 E866 pp 184 239.2 1.30
13 E866 d/p 15 5.0 0.33
14 D0 jet 90 62.6 0.70
15 CDF jet 33 56.1 1.70
16 CDHSW F3 96 76.4 0.80
17 CCFR F3 87 26.8 0.31
18 CDF W Lasy 11 8.7 0.79
Ntot 2291 c2global 2368.
33The effect of setting all normalization constants
to 1.
Dc2
1 BCDMS F2p 186.5
2 BCDMS F2d 27.6
3 H1 (a) 7.3
4 H1 (b) 10.1
5 H1 (c ) 24.0
8 NMC F2p 4.0
11 E605 13.3
12 E866 pp 95.7
c2 (opt. norm) 2368. c2 (norm 1) 2742. Dc2
374.0
34By applying weighting factors in the fitting
function, we can test the compatibility of
disparate data sets.
Example 1. The effect of giving the CCFR F2 data
set a heavy weight.
Dc2
3 H1 (a) 8.3
7 CDHSW F2 6.3
8 NMC F2p 18.1
10 CCFR F2 -19.7
12 E866 pp 5.5
14 D0 jet 23.5
Dc2 (CCFR) -19.7 Dc2 (other) 63.3
Giving a single data set a large weight is
tantamount to determining the PDFs from that
data set alone. The result is a significant
improvement for that data set but which does not
fit the others.
35Example 1b. The effect of giving the CCFR F2 data
weight 0, i.e., removing the data set from the
global analysis.
Dc2
3 H1 (a) -8.3
6 ZEUS 6.9
8 NMC F2p -10.1
10 CCFR F2 40.0
Dc2 (CCFR) 40.0 Dc2 (other) -17.4
Imagine starting with the other data sets, not
including CCFR. The result of adding CCFR is that
c2global of the other sets increases by 17.4
this must be an acceptable increase of c2 .
36Example 5. Giving heavy weight to H1 and BCDMS
Dc2
Dc2 for all data sets
2 BCDMS F2d -15.1
3 H1 (a) -12.4
4 H1 (b) -4.3
6 ZEUS 27.5
7 CDHSW F2 19.2
8 NMC F2p 8.0
10 CCFR F2 54.5
14 D0 jet 22.0
16 CDHSW F3 11.0
17 CCFR F3 5.9
Dc2(H B) -38.7 Dc2(other) 149.9
37Lessons from these reweighting studies
- Global analysis requires compromises the PDF
model that gives the best fit to one set of data
does not give the best fit to others. This is not
surprising because there are systematic
differences between the experiments.
- The scale of acceptable changes of c2 must be
large. Adding a new data set and refitting may
increase the c2s of other data sets by amounts
gtgt 1.
38Clever ways to test the compatibility of
disparate data sets
Plot c2 versus c2 J Collins and J Pumplin
(hep-ph/0201195)
The Bootstrap Method Efron and Tibshirani,
Introduction to the Bootstrap (ChapmanHall) Chern
ick, Bootstrap Methods (Wiley)
39 (I) Methods
Uncertainty Analysis
40We continue to use ?2global as figure of merit.
Explore the variation of ?2global in the
neighborhood of the minimum.
The Hessian method
(m, n 1 2 3 d)
41Master Formula
Classical error formula for a variable X(a)
Obtain better convergence using eigenvectors of
Hmn
Sm() and Sm(-) denote PDF sets displaced from
the standard set, along the ? directions of the
mth eigenvector, by distance T ?(Dc2) in
parameter space. (available in the LHAPDF format
2d alternate sets)
42The Lagrange Multiplier Method
for analyzing the uncertainty of PDF-dependent
predictions.
The fitting function for constrained fits
Minimization of F w.r.t am and l gives
the best fit for the value X(a min,m ) of the
variable X. Hence we obtain a curve of c2global
versus X.
43The question of tolerance
X any variable that depends on PDFs X0 the
prediction in the standard set ?2(X) curve of
constrained fits
For the specified tolerance ( ?c2 T2 ) there
is a corresponding range of uncertainty, ? DX.
What should we use for T?
44Estimation of parameters in Gaussian error
analysis would have
T 1
We do not use this criterion.
45Aside The familiar ideal example
Consider N measurements ?i of a quantity q
with normal errors si
Estimate q by minimization of c2,
The mean of qcombined is qtrue , the SD is
and
The proof of this theorem is straightforward. It
does not apply to our problem because of
systematic errors.
46Add a systematic error to the ideal model
(for simplicity suppose bi b )
Estimate q by minimization of c ?2
( s systematic shift, q observable )
and
( s 2/N b 2 )
Then, letting
, again
47Still we do not apply the criterion Dc2 1 !
- Reasons
- We keep the normalization factors fixed as we
vary the point in parameter space. The criterion
Dc2 1 requires that the systematic shifts be
continually optimized versus am. - Systematic errors may be nongaussian.
- The published standard deviations bij may be
inaccurate. - We trust our physics judgement instead.
48To judge the PDF uncertainty, we return to the
individual experiments.
Lumping all the data together in one variable
Dc2global is too constraining.
Global analysis is a compromise. All data sets
should be fit reasonably well -- that is what we
check. As we vary am, does any experiment rule
out the displacement from the standard set?
49In testing the goodness of fit, we keep the
normalization factors (i.e., optimized luminosity
shifts) fixed as we vary the shape parameters.
End result
e.g., 100 for 2000 data points.
This does not contradict the Dc2 1 criterion
used by other groups, because that refers to a
different c2 in which the normalization factors
are continually optimized as the am vary.
50Some groups do use the criterion of ??2 1 for
PDF error analysis. Often they are using limited
data sets e.g., an experimental group using
only their own data. Then the ??2 1 criterion
may underestimate the uncertainty implied by
systematic differences between experiments.
An interesting compendium of methods, by R. Thorne
CTEQ6 Dc2 100 (fixed norms)
ZEUS Dc2 50 (effective)
MRST01 Dc2 20
H1 Dc2 1
Alekhin Dc2 1
GKK not using c2
51(II) Results
Uncertainties of Parton Distributions
52Estimate the uncertainty on the predicted cross
section for ppbar ? WX at the Tevatron collider.
global c2
local c2s
53Each experiment defines a prediction and a
range. This figure shows the Dc2 1 ranges.
54This figure shows broader ranges for each
experiment based on the 90 confidence level
(cumulative distribution function of the rescaled
c2).
55The final result is an uncertainty range for the
prediction of sW.
Survey of sw?Bln predictions (by R. Thorne)
PDF set energy sw?Bln nb PDF uncert
Alekhin Tevatron 2.73 ? 0.05
MRST2002 Tevatron 2.59 ? 0.03
CTEQ6 Tevatron 2.54 ? 0.10
Alekhin LHC 215. ? 6.
MRST2002 LHC 204. ? 4.
CTEQ6 LHC 205. ? 8.
56How well can we determine the value of aS( MZ )
from Global Analysis?
For each value of aS, find the best global fit.
Then look at the c2 value for each experiment as
a function of aS.
57Each experiment defines a prediction and a
range. This figure shows the Dc2 1 ranges.
Particle data group (shaded strip) is 0.117?0.002.
The fluctuations are larger than expected for
normal statistics. The vertical lines have
Dc2global100, as(MZ)0.1165?0.0065
58Uncertainties of the PDFs themselves (only
interesting to the model builders)
Gluon and U quark at Q2 10 GeV2.
59Comparing alternate sets
red CTEQ6.1
blue Fermi2002 (H1, BCDMS, E665)
Gluon at Q2 10 GeV2
U quark at Q2 10 GeV2
60CTEQ error band with MRST2002 superimposed
Q2 10 GeV2
61Uncertainties of LHC parton-parton luminosities
Provides simple estimates of PDF uncertainties at
the LHC.
62Outlook
- Necessary infrastructure for hadron colliders
- Tools exist to study uncertainties .
- This physics is data driven -- HERA II and
Fermilab Run 2 will contribute. - Ready for the LHC
63Cases
64Inclusive jet production and the search for new
physics
Inclusive jet cross section D0 data and 40
alternate PDF sets
Fractional differences
(hep-ph/0303013)
65Is there room for new physics from Run Ib?
Contact interaction model with L 1.6, 2.0, 2.4
TeV
66The inclusive jet cross section versus pT for 3
rapidity bins at the LHC. Predictions of all 40
eigenvector basis sets are superimposed.
67Strangeness asymmetry
The NuTeV Collaboration has measured the cross
sections for n-Fe and n-Fe to mm- X. A
significant fraction of the CS comes from n s and
nbar sbar interactions.
We have added this data into the global fit to
determine
68Figure 1. Typical strangeness asymmetry s-(x) and
the associated momentum asymmetry S-(x). The axes
are chosen such that both large and small x
regions are adepquately represented, and that the
area iunder each curve equals the correponding
integral.
S- values A 0.312 x 10-3 B 0.160 x 10-3 C
0.103 x 10-3
69Figure 2. Correlation between c2 values and S-
Red dimuon cross section
Blue other data sensitive to s-sbar (F3)
70Figure 3. Comparison of the s-(x) and S-(x)
functions for three PDF sets our central fit
B (dot-dash) BPZ (blue) NuTeV (red)