Introduction to Estimation Theory: A Tutorial


Title: Introduction to Estimation Theory: A Tutorial

Introduction to Estimation Theory A Tutorial
  • Volkan Cevher

  • Introduction
  • Terminology and Preliminaries
  • Bayesian (Random) Parameter Estimation
  • Nonrandom Parameter Estimation
  • Questions

  • Classical detection problem
  • Design of optimum procedures for deciding between
    possible statistical situations given a random
  • The model has the following components
  • Parameter Space (for parametric detection
  • Probabilistic Mapping from Parameter Space to
    Observation Space
  • Observation Space
  • Detection Rule

  • Parameter Space
  • Completely characterizes the output given the
  • Each hypothesis corresponds to a point in the
    parameter space. This mapping is one-to-one.
  • Probabilistic Mapping from Parameter Space to
    Observation Space
  • The probability law that governs the effect of a
    parameter on the observation.

Example 1
Probabilistic mapping
Parameter Space
  • Observation Space
  • Finite dimensional, i.e. Y? ?n, where n is
  • Detection Rule
  • Mapping of the observation space into its
    parameters in the parameter space is called a
    detection rule.

  • Classical estimation problem
  • Interested in not making a choice among several
    discrete situations, but rather making a choice
    among a continuum of possible states.
  • Think of a family of distributions on the
    observation space, indexed by a set of
  • Given the observation, determine as accurately as
    possible the actual value of the parameter.
  • In this example, given the observations,
    parameter ? is being estimated. Its value is not
    chosen among a set of discrete values, but rather
    is estimated as accurately as possible.

Example 2
  • Estimation problem also has the same components
    as the detection problem.
  • Parameter Space
  • Probabilistic Mapping from Parameter Space to
    Observation Space
  • Observation Space
  • Estimation Rule
  • Detection problem can be thought of as a special
    case of the estimation problem.
  • There are a variety of estimation procedures
    differing basically in the amount of prior
    information about the parameter and in the
    performance criteria applied.
  • Estimation theory is less structured than
    detection theory. Detection is science,
    estimation is art. Array Signal Processing by
    Johnson, Dudgeon.

  • Based on the a priori information about the
    parameter, there are two basic approaches to
    parameter estimation
  • Bayesian Parameter Estimation
  • Nonrandom Parameter Estimation
  • Bayesian Parameter Estimation
  • Parameter is assumed to be a random quantity
    related statistically to the observation.
  • Nonrandom Parameter Estimation
  • Parameter is a constant without any probabilistic

Terminology and Preliminaries
  • Estimation theory relies on jargon to
    characterize the properties of estimators. In
    this presentation, the following definitions are
  • The set of n observations are represented by the
    n-dimensional vector y?? (observation space).
  • The values of the parameters are denoted by the
    vector ??? (parameter space).
  • The estimate of this parameter vector is denoted
    by ???.

Terminology and Preliminaries
  • Definitions (continued)
  • The estimation error ?(y) (? in short) is defined
    by the difference between the estimate and the
    actual parameter
  • The function Ca,? ????? is the cost of
    estimating a true value of ? as a.
  • Given such a cost function C, the Bayes risk
    (average risk) of the estimator is defined by the

Terminology and Preliminaries
  • Suppose we would like to minimize
    the Bayes risk defined by
  • for a given cost function C.
  • By inspection, one can see that the Bayes
    estimate of ? can be found (if it exists) by
    minimizing, for each y??, the posterior cost
    given Yy

Example 3
Terminology and Preliminaries
  • Definitions (continued)
  • An estimate is said to be unbiased if the
    expected value of the estimate equals the true
    value of the parameter
  • . Otherwise the estimate is
    said to be biased. The bias b(?) is usually
    considered to be additive, so that
  • An estimate is said to be asymptotically unbiased
    if the bias tends to zero as the number of
    observations tend to infinity.
  • An estimate is said to be consistent if the
    mean-squared estimation error tends to zero as
    the number of observations becomes large.

Terminology and Preliminaries
  • Definitions (continued)
  • An efficient estimate has a mean-squared error
    that equals a particular lower bound the
    Cramer-Rao bound. If an efficient estimate
    exists, it is optimum in the mean-squared sense
    No other estimate has a smaller mean-squared
  • Following shorthand notations will also be used
    for brevity

Terminology and Preliminaries
  • Following definitions and theorems will be useful
    later in the presentation
  • Definition Sufficiency
  • Suppose that ? is an arbitrary set. A function
    T ??? is said to be a sufficient statistic for
    the parameter set ??? if the distribution of y
    conditioned on T(y) does not depend on ? for ???.
  • If knowing T(y) removes any further dependence
    on ? of the distribution of y, one can conclude
    that T(y) contains all the information in y that
    is useful for estimating ?. Hence, it is

Terminology and Preliminaries
  • Definition Minimal Sufficiency
  • A function T on ? is said to be minimal
    sufficient for the parameter set ??? if it is a
    function of every other sufficient statistic for
  • A minimal sufficient statistic represents the
    furthest reduction in the observation without
    destroying information about ?.
  • Minimal sufficient statistic does not
    necessarily exist for every problem. Even if it
    exists, it is usually very difficult to identify

Terminology and Preliminaries
  • The Factorization Theorem
  • Suppose that the parameter set ??? has a
    corresponding families of densities p?. A
    statistic T is sufficient for ? iff there are
    functions g? and h such that
  • for all y?? and ???.
  • Refer to the supplement for a proof.

Terminology and Preliminaries
  • (Poor) Consider the
    hypothesis-testing problem ?0,1 with densities
    p0 and p1. Noting that
  • the factorization
    is possible with
  • Thus the likelihood ratio L is a sufficient
    statistic for the binary hypothesis-testing

Example 4
Terminology and Preliminaries
  • The Rao-Blackwell Theorem
  • Suppose that g(y) is an unbiased estimate of
    g(?) and that T is sufficient for ?. Define
  • Then is also an unbiased estimate of
    g(?). Furthermore,
  • with equality iff
  • Refer to the supplement for a proof.

Terminology and Preliminaries
  • Definition Completeness
  • The parameter family ??? is said to be complete
    if the condition E?f(Y)0 for all ??? implies
    that P?(f(Y)0)1 for all ???.
  • (Poor) Suppose that ?0,1,,n, ?0,1,
  • For any function f on ?, we have
  • The condition E?f(Y)0 for all ??? implies
  • However, an nth order polynomial has at most n
    zeros unless all of its coefficients are zero.
    Hence, ??? is complete.

Example 5
Terminology and Preliminaries
  • Definition Exponential Families
  • A class of distributions with parameter set ???
    is said to be an exponential family if there are
    real-valued functions C,Q1,,Qm,T1,,Tm, and h
    such that
  • T(y)T1(y),,Tm(y)T is a complete sufficient

Bayesian Parameter Estimation
  • For the random observation Y? ?, indexed by a
    parameter ?????m, our goal is to find a function
    such that is the best guess
    of the true value of ? given Yy.
  • Bayesian estimators are the estimators that
    minimize the Bayesian risk function.
  • The following estimators are commonly used in
    practice and can be distinguished by their cost

Bayesian Parameter Estimation
  • Minimum-Mean-Squared-Error (MMSE)
  • Euclidian Cost function
  • The posterior cost given Yy is given by
  • Minimizing this cost function also minimizes the
    Bayes risk . Hence, on differentiating
    with respect to , one can obtain the Bayes

Bayesian Parameter Estimation
  • Minimum-Mean-Absolute-Error (MMAE)
  • Absolute Error Cost function
  • The posterior cost given Yy is given by
  • Here we used the fact that with P(X?0)1, then

MMAE 1of3
Bayesian Parameter Estimation
  • Further simplification is also possible

MMAE 2of3
Bayesian Parameter Estimation
  • Taking the derivative with respect to each
    , one can see that
  • This derivative is a nondecreasing function of
  • that approaches 1 as and
    1 as . Thus
    achieves its minimum where its derivative
    changes sign

MMAE 3of3
Bayesian Parameter Estimation
  • Maximum A Posteriori Probability (MAP)
  • Uniform Error Cost function
  • The posterior cost given Yy is given by
  • Within some smoothness conditions, the estimator
    that maximizes this cost function is given by

Bayesian Parameter Estimation
  • Observations
  • MMSE Estimator
  • The MMSE estimate of ? given Yy is the
    conditional mean of ? given Yy .
  • MMAE Estimator
  • The MMAE estimate of ? given Yy is the
    conditional median of ? given Yy .
  • MAP Estimator
  • The MMAE estimate of ? given Yy is the
    conditional mode of ? given Yy .

Bayesian Parameter Estimation
Example 6
  • (Poor) Given the following
    conditional probability density function
  • hence y has an exponential density with
    parameter ?. Suppose ? is also exponential random
    variable with density
  • Then, the posterior distribution of ? given Yy
    is given by
  • for ??0 and y?0, and w(?y)0 otherwise.

Bayesian Parameter Estimation
Example 7
  • (Continued.)
  • The MMSE is the mean of this distribution
  • The MMAE is the median of this distribution
  • The MAP estimate is the mode of this distribution
    (where it is maximum)
  • To decide which one to use, one must decide which
    three of the cost functions best suits the
    problem at hand.

Nonrandom Parameter Estimation
  • Our goal is the same in Bayesian parameter
    estimation problem. Find ?.
  • Assume that the parameter set ??? is real
    valued. In the nonrandom parameter estimation
    problem, we do not know anything about the true
    value of ? other than the fact that it lies in ?.
    Hence, given the observation Yy, what is the
    best estimate of ? is the question we would like
    to answer.

Nonrandom Parameter Estimation
  • The only average performance cost that can be
    done is with respect to the distribution of Y
    given ?, given a cost function C.
  • A reasonable restriction to place on an estimate
    of ? is that its expected value is equal to the
    true parameter value
  • For its tractability, the Euclidian norm squared
    cost function will be used.

Nonrandom Parameter Estimation
  • When the squared-error cost is used, the risk
    function is the following
  • One can not generally expect to minimize this
    risk function uniformly for all ???. This is
    easily seen for the squared error cost since for
    any particular value of ?, say ?0 the conditional
    mean-squared error can be made zero by choosing
    the estimate to be identically ?0 for all
    observations y??.
  • However, if ? is not close to ?0, such an
    estimate would perform poorly.

Nonrandom Parameter Estimation
  • With the unbiased-ness restriction, the
    conditional mean-squared error becomes the
    variance of the estimate. Hence, these estimators
    are termed minimum-variance unbiased estimators
  • The procedure for seeking MVUEs
  • Find a complete sufficient statistics T for ???.
  • Find any unbiased estimator g(y) of g(?).
  • Then,
    is an MVUE of g(?).

Nonrandom Parameter Estimation
Example 8
  • (Poor) Consider the model
  • where N1,,Nn are i.i.d. N(0,?2) noise samples,
    and sk is a known signal for k1,,n. Our
    objective is to estimate ? and ?2.
  • 1. The density of Y is given by
  • where ? ?1 ?2 T and

Nonrandom Parameter Estimation
Example 9
  • (Continued.) Note that T
    T1 T2 T is a complete
  • sufficient statistic for ?.
  • 2. We wish to estimate
  • Assuming that s1?0, the estimate g1(y)y1/s1 is
    an unbiased estimator of g1(?).
  • Moreover, note that
  • and that
  • Hence,
    is an unbiased estimate of

Nonrandom Parameter Estimation
Example 10
  • (Continued.)
  • 3. Since T1 and T2 are complete, the estimates
  • are MVUEs of ?. Note that g1(y) and T1 (y) are
    both linear functions of Y and are jointly
    Gaussian. Hence, MVUEs are

Nonrandom Parameter Estimation
  • Maximum-Likelihood (ML) Estimation
  • For many problems arising in practice, it is not
    usually feasible to find MVUEs.
  • Another method for seeking good estimators are
  • ML is one of the most commonly used methods in
    signal processing literature.
  • Consider MAP estimation for ???
  • In the absence of any prior information about the
    parameter ?, we can assume that it is uniformly
    distributed (w(?) becomes a uniform distribution)
    since this represents the worst case scenario.

Nonrandom Parameter Estimation
  • ML Estimation (Continued.)
  • Hence, the MAP estimate for a given y?? is any
    value of ? that maximizes p?(y) over ?.
  • p?(y) is usually called the likelihood ratio.
  • Hence, the ML estimate is
  • Maximizing p?(y) is the same as maximizing log
    p?(y) (log-likelihood function). Therefore, a
    necessary condition for the maximum-likelihood
    estimate is
  • The above condition is also known as the
    likelihood equation.

Nonrandom Parameter Estimation
  • Cramer-Rao Bound
  • Let be some unbiased estimator of ???
    Then the error covariance matrix is bounded by
    the Cramer-Rao bound (refer to the supplement).
  • If the Cramer-Rao bound can be satisfied with
    equality, only the maximum likelihood estimate
    achieves it. Hence, if an efficient estimate
    exists, it is the maximum likelihood estimate.
  • refer to the attached
    paper The Stochastic CRB for Array Processing
    A Textbook Derivation by Stoica, Larsson, and

Example 11
