Title: Chap 8: Estimation of parameters
1Chap 8 Estimation of parameters Fitting of
Probability Distributions
- Section 6.1 INTRODUCTION
- Unknown parameter(s) values must be estimated
before fitting probability laws to data.
2Section 8.2 Fitting the Poisson Distribution to
Emissions of Alpha Particles (classical example)
- Recall The Probability Mass Function of a
Poisson random variable X is given by -
- From the observed data, we must estimate a value
for the parameter -
3What if the experiment is repeated?
- The estimate of will be viewed as a random
variable which has a probability distn
referred to as its sampling distribution. - The spread of the sampling distribution reflects
the variability of the estimate. - Chap 8 is about fitting the model to data.
- Chap 9 will be dealing with testing such a fit.
-
4Assessing Goodness of Fit (GOF)
- Example Fit a Poisson distn to counts-p240
- Informally, GOF is assessed by comparing the
Observed (O) and the Expected (E) counts that are
grouped (at least 5 each) into the 16 cells. - Formally, use a measure of discrepancy such as
the Pearsons chi-square statistic - to quantify the comparison of the O and E counts.
- In this example,
5Null distn
- is a random variable (as a function of random
counts) whose probability distn is called its
null distribution. It can be shown that the null
distn of is approximately the chi-square
distn with degrees of freedom df no. of cells
no. of independent parameters fitted 1. - Notation df 16 (cells) 1(parameter ) 1
14 - The larger the value of , the worse the fit.
6p-value
- Figure 8.1 on page 242 gives a nice feeling of
what a p-value might be. The p-value measures
the degree of evidence against the statement
model fits data well Poisson is the true
model. - The smaller the p-value, the worse the fit or
there is more evidence against the model. - Small p-value means then rejecting the null or
saying that the model does NOT fit the data
well. - How small is small ?
- when P-value lt ALPHA,
- where ALPHA is the level of confidence.
78.3 Parameter EstimationMOM MLE
- Let the observed data be a random sample i.e. a
sequence of I.I.D. random variables
whose joint distribution depends on an unknown
parameter (scalar or vector). - An estimate of will be a random variable
function of the whose distn
is known as its sampling distn. - The standard deviation of the sampling distn
will be termed as its standard error.
88.4 The Method of Moments
- Definition the (popn) moment of a random
variable X is denoted by and its
(sample) moment by - is viewed as an estimate of
- Algorithm MOM estimates parameter(s) by finding
expressions for them in terms of the lowest
possible (popn) moments and then substituting
(sample) moments into the expressions.
98.5 The Method of Maximum Likelihood
- Algorithm Let be a
sequence of I.I.D. random variables. - The likelihood function is
-
- The MLE of is that value of that
maximizes the likelihood function or maximizes
the natural logarithm (since the logarithm is
monotonic function) - The log-likelihood function
is then to be maximized to get
the MLE.
108.5.1 MLEs of Multinomial Cell Probabilities
- Suppose that , the counts in
cells , follows a multinomial
distribution with total count n and cell
probabilities - Caution the marginal distn of each is
binomial - BUT the are not INDEPENDENT i.e. their joint
PMF is not the product of the marginal PFMs. The
good news is that the MLE still applies. - Problem Estimate the ps from the xs.
118.5.1a MLEs of Multinomial Cell Probabilities
(contd)
- To answer the question, we assume n is given and
we wish to estimate - From the joint PMF
, the log-likelihood becomes -
- To maximize such a log-likelihood subject to the
constraint , we use a
Lagrange multiplier to get after
maximizing -
128.5.1b MLEs of Multinomial Cell Probabilities
(contd)
- Deja vu note that the sampling distn of the
is determined by the binomial distns
of the - Hardy-Weinberg Equilibrium GENETICS
- Here the multinomial cell probabilities are
functions of other unknown parameters that is
- Read example A on page 260-261.
138.5.2 Large Sample Theory for MLEs
- Let be an estimate of a parameter based
on - The variance of the sampling distn of many
estimators decreases as the sample size n
increases. - An estimate is said to be a consistent estimate
of a parameter if approaches as the
sample size n approaches infinity. - Consistency is a limiting property that does not
require any behavior of the estimator for a
finite sample size. -
148.5.2 Large Sample Theory for MLEs (contd)
- Theorem Under appropriate smoothness conditions
on f , the MLE from an I.I.D sample is consistent
and the probability distn of
tends to N(0,1). In other words, the large
sample distribution of the MLE is approximately
normal with mean (say, the MLE is
asymptotically unbiased ) and its asymptotic
variance is - where the information about the parameter is
158.5.3 Confidence Intervals for MLEs
- Recall that a confidence interval (as seen in
Chap.7) is a random interval containing the
parameter of interest with some specific
probability. - Three (3) methods to get CI for MLEs are
- Exact CIs
- Approximated CIs using Section 8.5.2
- Bootstrap CIs
168.6 Efficiency Cramer-Rao Lower Bound
- Problem Given a variety of possible estimates,
the best one to choose should have its sampling
distribution highly concentrated about the true
parameter. - Because of its analytic simplicity, the mean
square error, MSE, will be used as a measure of
such a concentration.
178.6 Efficiency Cramer-Rao Lower Bound (contd)
- Unbiasedness means
- Definition Given two estimates, and , of
a parameter , the efficiency of relative to
is - defined to be
- Theorem (Cramer-Rao Inequality)
- Under smooth assumptions on the density
of the IID sequence
when is an
unbiased estimate of , we get the lower bound
188.7 Sufficiency
- Is there a function
containing all the information in the sample
about the parameter ? - If so, without loss of information the original
data may be reduced to this statistic
. - Definition a statistic
is said to be sufficient for if the
conditional distn of , given T
t, does not depend on for any value t - In other words, given the value of T, which is
called a sufficient statistic, one can gain no
more knowledge about the parameter from
further investigation with respect to the sample
distn.
198.7.1 a Factorization Theorem
- How to get a sufficient statistic?
- Theorem A a necessary and sufficient condition
for to be sufficient for
a parameter is that the joint PDF or PMF
factors in the form - Corollary A if T is sufficient for , then the
MLE is a function of T. -
208.7.2 The Rao-Blackwell thm
- The following theorem gives a quantitative
rationale for basing an estimator of a parameter
on an existing sufficient statistic. - Theorem Rao-Blackwell Theorem
- Let be an estimator of with
for all Suppose that T is sufficient for
, - and let .
- Then, for all ,
- The inequality is strict unless
218.8 Conclusion
- Some key ideas in Chap.7 such as sampling
distributions, Confidence Intervals were
revisited - MOM and MLE were applied to some distributional
theory approximations. - Theoretical concepts of efficiency, Cramer-Rao
lower bound, and efficiency were discussed. - Finally, some light was shed in Parametric
Bootstrapping.