Introduction to Estimation Theory: A Tutorial

About This Presentation

Title:

Introduction to Estimation Theory: A Tutorial

Description:

Design of optimum procedures for deciding between ... The model has the following components: Parameter Space (for parametric ... by Johnson, Dudgeon. ... – PowerPoint PPT presentation

Number of Views:1420

Avg rating:3.0/5.0

Slides: 41

Provided by: volkan2

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Estimation Theory: A Tutorial

1
Introduction to Estimation Theory A Tutorial

Volkan Cevher

2
Outline

Introduction
Terminology and Preliminaries
Bayesian (Random) Parameter Estimation
Nonrandom Parameter Estimation
Questions

3
Introduction

Classical detection problem
Design of optimum procedures for deciding between
possible statistical situations given a random
observation
The model has the following components
Parameter Space (for parametric detection
problems)
Probabilistic Mapping from Parameter Space to
Observation Space
Observation Space
Detection Rule

4
Introduction

Parameter Space
Completely characterizes the output given the
mapping.
Each hypothesis corresponds to a point in the
parameter space. This mapping is one-to-one.
Probabilistic Mapping from Parameter Space to
Observation Space
The probability law that governs the effect of a
parameter on the observation.

Example 1
Probabilistic mapping
Parameter Space
5
Introduction

Observation Space
Finite dimensional, i.e. Y? ?n, where n is
finite.
Detection Rule
Mapping of the observation space into its
parameters in the parameter space is called a
detection rule.

6
Introduction

Classical estimation problem
Interested in not making a choice among several
discrete situations, but rather making a choice
among a continuum of possible states.
Think of a family of distributions on the
observation space, indexed by a set of
parameters.
Given the observation, determine as accurately as
possible the actual value of the parameter.
In this example, given the observations,
parameter ? is being estimated. Its value is not
chosen among a set of discrete values, but rather
is estimated as accurately as possible.

Example 2
7
Introduction

Estimation problem also has the same components
as the detection problem.
Parameter Space
Probabilistic Mapping from Parameter Space to
Observation Space
Observation Space
Estimation Rule
Detection problem can be thought of as a special
case of the estimation problem.
There are a variety of estimation procedures
differing basically in the amount of prior
information about the parameter and in the
performance criteria applied.
Estimation theory is less structured than
detection theory. Detection is science,
estimation is art. Array Signal Processing by
Johnson, Dudgeon.

8
Introduction

Based on the a priori information about the
parameter, there are two basic approaches to
parameter estimation
Bayesian Parameter Estimation
Nonrandom Parameter Estimation
Bayesian Parameter Estimation
Parameter is assumed to be a random quantity
related statistically to the observation.
Nonrandom Parameter Estimation
Parameter is a constant without any probabilistic
structure.

9
Terminology and Preliminaries

Estimation theory relies on jargon to
characterize the properties of estimators. In
this presentation, the following definitions are
used
The set of n observations are represented by the
n-dimensional vector y?? (observation space).
The values of the parameters are denoted by the
vector ??? (parameter space).
The estimate of this parameter vector is denoted
by ???.

10
Terminology and Preliminaries

Definitions (continued)
The estimation error ?(y) (? in short) is defined
by the difference between the estimate and the
actual parameter
The function Ca,? ????? is the cost of
estimating a true value of ? as a.
Given such a cost function C, the Bayes risk
(average risk) of the estimator is defined by the
following

11
Terminology and Preliminaries

Suppose we would like to minimize
the Bayes risk defined by
for a given cost function C.
By inspection, one can see that the Bayes
estimate of ? can be found (if it exists) by
minimizing, for each y??, the posterior cost
given Yy

Example 3
12
Terminology and Preliminaries

Definitions (continued)
An estimate is said to be unbiased if the
expected value of the estimate equals the true
value of the parameter
. Otherwise the estimate is
said to be biased. The bias b(?) is usually
considered to be additive, so that
An estimate is said to be asymptotically unbiased
if the bias tends to zero as the number of
observations tend to infinity.
An estimate is said to be consistent if the
mean-squared estimation error tends to zero as
the number of observations becomes large.

13
Terminology and Preliminaries

Definitions (continued)
An efficient estimate has a mean-squared error
that equals a particular lower bound the
Cramer-Rao bound. If an efficient estimate
exists, it is optimum in the mean-squared sense
No other estimate has a smaller mean-squared
error.
Following shorthand notations will also be used
for brevity

14
Terminology and Preliminaries

Following definitions and theorems will be useful
later in the presentation
Definition Sufficiency
Suppose that ? is an arbitrary set. A function
T ??? is said to be a sufficient statistic for
the parameter set ??? if the distribution of y
conditioned on T(y) does not depend on ? for ???.
If knowing T(y) removes any further dependence
on ? of the distribution of y, one can conclude
that T(y) contains all the information in y that
is useful for estimating ?. Hence, it is
sufficient.

15
Terminology and Preliminaries

Definition Minimal Sufficiency
A function T on ? is said to be minimal
sufficient for the parameter set ??? if it is a
function of every other sufficient statistic for
?.
A minimal sufficient statistic represents the
furthest reduction in the observation without
destroying information about ?.
Minimal sufficient statistic does not
necessarily exist for every problem. Even if it
exists, it is usually very difficult to identify
it.

16
Terminology and Preliminaries

The Factorization Theorem
Suppose that the parameter set ??? has a
corresponding families of densities p?. A
statistic T is sufficient for ? iff there are
functions g? and h such that
for all y?? and ???.
Refer to the supplement for a proof.

17
Terminology and Preliminaries

(Poor) Consider the
hypothesis-testing problem ?0,1 with densities
p0 and p1. Noting that
the factorization
is possible with
Thus the likelihood ratio L is a sufficient
statistic for the binary hypothesis-testing
problem.

Example 4
18
Terminology and Preliminaries

The Rao-Blackwell Theorem
Suppose that g(y) is an unbiased estimate of
g(?) and that T is sufficient for ?. Define
Then is also an unbiased estimate of
g(?). Furthermore,
with equality iff
Refer to the supplement for a proof.

19
Terminology and Preliminaries

Definition Completeness
The parameter family ??? is said to be complete
if the condition E?f(Y)0 for all ??? implies
that P?(f(Y)0)1 for all ???.
(Poor) Suppose that ?0,1,,n, ?0,1,
and
For any function f on ?, we have
The condition E?f(Y)0 for all ??? implies
that
However, an nth order polynomial has at most n
zeros unless all of its coefficients are zero.
Hence, ??? is complete.

Example 5
20
Terminology and Preliminaries

Definition Exponential Families
A class of distributions with parameter set ???
is said to be an exponential family if there are
real-valued functions C,Q1,,Qm,T1,,Tm, and h
such that
T(y)T1(y),,Tm(y)T is a complete sufficient
statistic.

21
Bayesian Parameter Estimation

For the random observation Y? ?, indexed by a
parameter ?????m, our goal is to find a function
such that is the best guess
of the true value of ? given Yy.
Bayesian estimators are the estimators that
minimize the Bayesian risk function.
The following estimators are commonly used in
practice and can be distinguished by their cost
functions.

22
Bayesian Parameter Estimation

Minimum-Mean-Squared-Error (MMSE)
Euclidian Cost function
The posterior cost given Yy is given by
Minimizing this cost function also minimizes the
Bayes risk . Hence, on differentiating
with respect to , one can obtain the Bayes
estimate

23
Bayesian Parameter Estimation

Minimum-Mean-Absolute-Error (MMAE)
Absolute Error Cost function
The posterior cost given Yy is given by
Here we used the fact that with P(X?0)1, then

MMAE 1of3
24
Bayesian Parameter Estimation

Further simplification is also possible

MMAE 2of3
25
Bayesian Parameter Estimation

Taking the derivative with respect to each
, one can see that
This derivative is a nondecreasing function of
that approaches 1 as and
1 as . Thus
achieves its minimum where its derivative
changes sign

MMAE 3of3
26
Bayesian Parameter Estimation

Maximum A Posteriori Probability (MAP)
Uniform Error Cost function
The posterior cost given Yy is given by
Within some smoothness conditions, the estimator
that maximizes this cost function is given by

27
Bayesian Parameter Estimation

Observations
MMSE Estimator
The MMSE estimate of ? given Yy is the
conditional mean of ? given Yy .
MMAE Estimator
The MMAE estimate of ? given Yy is the
conditional median of ? given Yy .
MAP Estimator
The MMAE estimate of ? given Yy is the
conditional mode of ? given Yy .

28
Bayesian Parameter Estimation
Example 6

(Poor) Given the following
conditional probability density function
hence y has an exponential density with
parameter ?. Suppose ? is also exponential random
variable with density
Then, the posterior distribution of ? given Yy
is given by
for ??0 and y?0, and w(?y)0 otherwise.

29
Bayesian Parameter Estimation
Example 7

(Continued.)
The MMSE is the mean of this distribution
The MMAE is the median of this distribution
The MAP estimate is the mode of this distribution
(where it is maximum)
To decide which one to use, one must decide which
three of the cost functions best suits the
problem at hand.

30
Nonrandom Parameter Estimation

Our goal is the same in Bayesian parameter
estimation problem. Find ?.
Assume that the parameter set ??? is real
valued. In the nonrandom parameter estimation
problem, we do not know anything about the true
value of ? other than the fact that it lies in ?.
Hence, given the observation Yy, what is the
best estimate of ? is the question we would like
to answer.

31
Nonrandom Parameter Estimation

The only average performance cost that can be
done is with respect to the distribution of Y
given ?, given a cost function C.
A reasonable restriction to place on an estimate
of ? is that its expected value is equal to the
true parameter value
For its tractability, the Euclidian norm squared
cost function will be used.

32
Nonrandom Parameter Estimation

When the squared-error cost is used, the risk
function is the following
One can not generally expect to minimize this
risk function uniformly for all ???. This is
easily seen for the squared error cost since for
any particular value of ?, say ?0 the conditional
mean-squared error can be made zero by choosing
the estimate to be identically ?0 for all
observations y??.
However, if ? is not close to ?0, such an
estimate would perform poorly.

33
Nonrandom Parameter Estimation

With the unbiased-ness restriction, the
conditional mean-squared error becomes the
variance of the estimate. Hence, these estimators
are termed minimum-variance unbiased estimators
(MVUEs).
The procedure for seeking MVUEs
Find a complete sufficient statistics T for ???.
Find any unbiased estimator g(y) of g(?).
Then,
is an MVUE of g(?).

34
Nonrandom Parameter Estimation
Example 8

(Poor) Consider the model
where N1,,Nn are i.i.d. N(0,?2) noise samples,
and sk is a known signal for k1,,n. Our
objective is to estimate ? and ?2.
1. The density of Y is given by
where ? ?1 ?2 T and

35
Nonrandom Parameter Estimation
Example 9

(Continued.) Note that T
T1 T2 T is a complete
sufficient statistic for ?.
2. We wish to estimate
Assuming that s1?0, the estimate g1(y)y1/s1 is
an unbiased estimator of g1(?).
Moreover, note that
and that
Hence,
is an unbiased estimate of
g2(?).

36
Nonrandom Parameter Estimation
Example 10

(Continued.)
3. Since T1 and T2 are complete, the estimates
are MVUEs of ?. Note that g1(y) and T1 (y) are
both linear functions of Y and are jointly
Gaussian. Hence, MVUEs are

37
Nonrandom Parameter Estimation

Maximum-Likelihood (ML) Estimation
For many problems arising in practice, it is not
usually feasible to find MVUEs.
Another method for seeking good estimators are
needed.
ML is one of the most commonly used methods in
signal processing literature.
Consider MAP estimation for ???
In the absence of any prior information about the
parameter ?, we can assume that it is uniformly
distributed (w(?) becomes a uniform distribution)
since this represents the worst case scenario.

38
Nonrandom Parameter Estimation

ML Estimation (Continued.)
Hence, the MAP estimate for a given y?? is any
value of ? that maximizes p?(y) over ?.
p?(y) is usually called the likelihood ratio.
Hence, the ML estimate is
Maximizing p?(y) is the same as maximizing log
p?(y) (log-likelihood function). Therefore, a
necessary condition for the maximum-likelihood
estimate is
The above condition is also known as the
likelihood equation.

39
Nonrandom Parameter Estimation

Cramer-Rao Bound
Let be some unbiased estimator of ???
Then the error covariance matrix is bounded by
the Cramer-Rao bound (refer to the supplement).
If the Cramer-Rao bound can be satisfied with
equality, only the maximum likelihood estimate
achieves it. Hence, if an efficient estimate
exists, it is the maximum likelihood estimate.
refer to the attached
paper The Stochastic CRB for Array Processing
A Textbook Derivation by Stoica, Larsson, and
Gershman.

Example 11
40
Questions

Write a Comment

User Comments (0)

About PowerShow.com

Introduction to Estimation Theory: A Tutorial - PowerPoint PPT Presentation

Introduction to Estimation Theory: A Tutorial

Design of optimum procedures for deciding between ... The model has the following components: Parameter Space (for parametric ... by Johnson, Dudgeon. ... – PowerPoint PPT presentation