Bias and variance of estimators - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Bias and variance of estimators

Description:

Tutorial 6 Bias and variance of estimators The score and Fisher information Cramer-Rao inequality Estimators and their Properties Let be a ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 22

Provided by: rud52

Category:

more less

Transcript and Presenter's Notes

Title: Bias and variance of estimators

1
Tutorial 6

Bias and variance of estimators
The score and Fisher information
Cramer-Rao inequality

2
Estimators and their Properties

Let be a parametric
set of distributions. Given a sample
drawn i.i.d from one of the
distributions in the set we would like to
estimate its parameter (thus identifying the
distribution).
An estimator for w.r.t. is any function
notice that an estimator is a
random variable.
How do we measure the quality of an estimator?
Consistency An estimator for is
consistent if
this is a (desirable) asymptotic property that
motivates us to acquire large samples. But we
should emphasize that we are also interested in
measures for finite (and small!) sample sizes.

3
Estimators and their Properties

Bias Define the bias of an estimator to be
Here, the expectation is
w.r.t. to the distribution
The estimator is unbiased if its bias is zero
Example the estimators and
, for the mean of a normal distribution, are
both unbiased. The
estimator for its variance
is biased whereas the estimator
is unbiased.
Variance another important property of an
estimator is its variance . We
would like to find estimators with minimum bias
and variance.
Which is more important, bias or variance?

4
Risky Estimators

Employ our decision-theoretic framework to
measure the quality of estimators.
Abbreviate and consider the
square error loss function
The conditional risk associated with when
is the true parameter
Claim
Proof

5
Bias vs. Variance

So, for a given level of conditional risk, there
is a tradeoff between bias and variance.
This tradeoff is among the most important facts
in pattern recognition and machine learning.
Classical approach Consider only unbiased
estimators and try to find those with minimum
possible variance.
This approach is not always fruitful
The unbiasedness only means that the average of
the estimator (w.r.t. to ) is . It
doesnt mean it will be near for a particular
sample (if variance is large).
In general, an unbiased estimate is not
guaranteed to exist.

6
The Score

The score of the family is the
random variable
measures the sensitivity of as a
function of the parameter .
Claim
Proof
Corollary

7
The Score - Example

Consider the normal distribution
clearly,
and

8
The Score - Vector Form

In case where is a
vector, the score is the vector whose th
component is
Example

9
Fisher Information

Fisher information Designed to provide a measure
of how much information the parametric
probability law carries about the
parameter .
An adequate definition of such information
should possess the following properties
The larger the sensitivity of to
changes in , the larger should be the
information
The information should be additive The
information carried by the combined law
should be the sum of those carried by
and
The information should be insensitive to the sign
of the change in and preferably positive
The information should be a deterministic
quantity should not depend on the specific
random observation

10
Fisher Information

Definition (scalar form) Fisher information
(about ), is the variance of the score
Example consider a random variable

11
Fisher Information - Cntd.

Whenever is a vector,
Fisher information is the matrix
where
Remainder
Remark the Fisher information is only defined
whenever the distributions satisfy
some regularity conditions. (For example, they
should be differentiable w.r.t. and all
the distributions in the parametric family must
have same support set).

12
Fisher Information - Cntd.

Claim Let be i.i.d. random
variables . The score of
is the sum of the individual scores.
Proof
Example If are i.i.d.
, the score is

13
Fisher Information - Cntd.

Based on i.i.d. samples, the Fisher
information about is
Thus, the Fisher information is additive w.r.t.
i.i.d. random variables.
Example Suppose are i.i.d.
. From previous example we know
that the Fisher information about the parameter
based on one sample is
Therefore, based on the entire sample,

14
The Cramer-Rao Inequality

Theorem Let be an unbiased estimator for
. Then
Proof Using we have

15
The Cramer-Rao Inequality - Cntd.

16
The Cramer-Rao Inequality - Cntd.

So,
By the Cauchy-Schwarz inequality
Therefore,
For a biased estimator we have

17
The Cramer-Rao General Case

The Cramer-Rao inequality also true in general
form The error covariance matrix for is
bounded as follows

18
The Cramer-Rao Inequality - Cntd.

Example Let be i.i.d.
. From previous example
Now let be an (unbiased)
estimator for .
So matches the
Cramer-Rao lower bound.
Def An unbiased estimator whose covariance meets
the Cramer-Rao lower bound is called efficient.

19
Efficiency

Theorem (Efficiency) The unbiased estimator
is efficient, that is,
iff
Proof (If) If
then
meaning

20
Efficiency

Only if Recall the cross covariance between
The Cauchy-Schwarz inequality for random
variables says
thus

21
Cramer-Rao Inequality and ML - Cntd.

Theorem Suppose there exists an efficient
estimator for all . Then the ML
estimator is .
Proof By assumption
By previous claim or
for all
This holds at and since
this is a maximum point the left side is zero so