Quantifying Uncertainty in Inverse Problems - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Quantifying Uncertainty in Inverse Problems

Description:

Some qualitative and quantitative statistical measures of uncertainty apply to ... The intrinsic uncertainty depends crucially on the prior constraints on the ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 65
Provided by: phili206
Category:

less

Transcript and Presenter's Notes

Title: Quantifying Uncertainty in Inverse Problems


1
Quantifying Uncertainty in Inverse Problems
  • Workshop on Statistical Methods for Inverse
    Problems
  • Institute for Pure and Applied MathematicsUnivers
    ity of California, Los Angeles5-6 November 2003
  • P.B. Stark
  • Department of Statistics
  • University of California
  • Berkeley, CA 94720-3860
  • www.stat.berkeley.edu/stark

2
Abstract
  • Some qualitative and quantitative statistical
    measures of uncertainty apply to inverse
    problems. Decision theory provides a framework
    for thinking about inverse problems. The
    intrinsic uncertainty in an inverse problem
    usually is not equal to the uncertainty of a
    solution to the inverse problem constructed by
    any particular technique. The intrinsic
    uncertainty depends crucially on the prior
    constraints on the unknown (including prior
    probability distributions in the case of Bayesian
    analyses), on the forward operator, on the
    distribution of the observational errors, and on
    the kinds of properties of the unknown one wishes
    to estimate. Some aspects of uncertainty in
    inverse problems can be understood geometrically.

3
My Assumptions about You
  • All of you know much more math than I do.
  • I know a little more statistics than some of you
    do.
  • You havent heard me talk about this before.
  • You are more interested in theory than numerical
    algorithms or specific applications.

4
Outline
  • Inverse Problems as Statistics
  • Ingredients Models
  • Forward and Inverse Problemsapplied perspective
  • Statistical point of view
  • Some connections
  • Notation linear problems illustration
  • Example geomagnetism from satellite observations
  • Qualitative uncertainty Identifiability and
    uniqueness
  • Sketch of identifiablity and extremal modeling
  • Backus-Gilbert theory extensions

5
Outline, contd.
  • Quantitative uncertainty Decision Theory
  • Decision rules and estimators
  • Comparing decision rules Loss and Risk
  • Example Shrinkage estimators and MSE Risk
  • Strategies. Bayes/Minimax duality
  • Mean distance error and bias
  • Illustration bounded normal mean
  • Illustration Regularization
  • Illustration Minimax estimation of linear
    functionals
  • Example Gauss coefficients of the magnetic field
  • Distinguishing models metrics and consistency

6
Inverse Problems as Statistics
  • Measurable space X of possible data.
  • Set ? of descriptions of the worldmodels.
  • Family P Pq q 2 Q of probability
    distributions on X indexed by models ?.
  • Forward operator q a Pq maps model ? into a
    probability measure on X .
  • X valued data X are a sample from Pq.
  • Pq is everything randomness in the truth,
    measurement error, systematic error, censoring,
    etc.

7
Models
  • ? usually has special structure.
  • ? could be a convex subset of a separable Banach
    space T. (geomag, seismo, grav, MT, )
  • Physical significance of ? generally gives qaPq
    reasonable analytic properties, e.g., continuity.

8
Forward Problems in Physical Science
  • Often thought of as a composition of steps
  • transform idealized model q into perfect,
    noise-free, infinite-dimensional data
    (approximate physics)
  • keep a finite number of the perfect data, because
    can only measure, record, and compute with finite
    lists
  • possibly corrupt the list with measurement error.
  • Equivalent to single-step procedure with
    corruption on par with physics, and mapping
    incorporating the censoring.

9
Inverse Problems
  • Observe data X drawn from P? for some unknown
    ???. (Assume ? contains at least two points
    otherwise, data superfluous.)
  • Use X and the knowledge that ??? to learn about
    ? for example, to estimate a parameter g(?) (the
    value g(?) at ? of a continuous G-valued function
    g defined on ?).

10
Example Geomagnetism
11
Geomagetic model parametrization
12
Geomagnetic inverse problem
13
Inverse Problems in Physical Science
  • Inverse problems in science often solved using
    applied math methods for Ill-posed problems
    (e.g., Tichonov regularization, analytic
    inversions)
  • Those methods are designed to answer different
    questions can behave poorly with data (e.g., bad
    bias variance)
  • Inference ? construction Statistical viewpoint
    may be more appropriate for interpreting real
    data with stochastic errors.

14
Elements of the Statistical View
  • Distinguish between characteristics of the
    problem, and characteristics of methods used to
    draw inferences.
  • One fundamental qualitative property of a
    parameter
  • g is identifiable if for all ?, z ? T,
  • g(?) ? g(z) ? Ph ? Pz.
  • In most inverse problems, g(?) ? not
    identifiable, and few linear functionals of ? are
    identifiable.

15
Deterministic and Statistical Connections
  • Identifiabilitydistinct parameter values yield
    distinct probability distributions for the
    observables is similar to uniquenessforward
    operator maps at most one model into the observed
    data.
  • Consistencyparameter can be estimated with
    arbitrary accuracy as the number of data grows
    is related to stability of a recovery
    algorithmsmall changes in the data produce small
    changes in the recovered model.
  • ? quantitative connections too.

16
More Notation
  • Let T be a separable Banach space, T its
    normed dual.
  • Write the pairing between T and T
  • lt, gt T x T ? R.

17
Linear Forward Problems
  • A forward problem is linear if
  • T is a subset of a separable Banach space T
  • X Rn, X (Xj)j1n
  • For some fixed sequence (?j)j1n of elements of
    T,
  • Xj hkj, qi ej, q 2 Q,
  • where e (ej)j1n is a vector of stochastic
    errors whose distribution does not depend on ?.

18
Linear Forward Problems, contd.
  • Linear functionals ?j are the representers
  • Distribution P? is the probability distribution
    of X.
  • Typically, dim(T) ? at least, n lt dim(T), so
    estimating ? is an underdetermined problem.
  • Define
  • K T ? Rn
  • q ? (lt?j, ?gt)j1n .
  • Abbreviate forward problem by X K? e, ? ? T.

19
Linear Inverse Problems
  • Use X K? e, and the constraint ? ? T to
    estimate or draw inferences about g(?).
  • Probability distribution of X depends on ? only
    through K?, so if there are two points
  • ?1, ?2 ? T such that K?1 K?2 but
  • g(?1)?g(?2),
  • then g(?) is not identifiable.

20
Example Sampling w/ systematic random error
  • Observe
  • Xj f(tj) rj ej, j 1, 2, , n,
  • f 2 C, a set of smooth of functions on 0, 1
  • tj 2 0, 1
  • rj? 1, j1, 2, , n
  • ?j iid N(0, 1)
  • Take Q C -1, 1n, X Rn, and q (f, r1, ,
    rn).
  • Then Pq has density
  • (2p)-n/2 exp-Ã¥j1n (xj f(tj)-rj)2.

21
Sketch Identifiability
Pz Ph h z, so q not identifiable
g cannot be estimated with bounded bias
Pz Ph g(h) g(z), so g not identifiable
22
Backus-Gilbert Theory
Let Q T be a Hilbert space. Let g 2 T T be
a linear parameter. Let kjj1n µ T. Then g(q)
is identifiable iff g L K for some 1 n
matrix L. If also Ee 0, then L X is
unbiased for g. If also e has covariance matrix S
EeeT, then the MSE of L X is L S LT.
23
Sketch Backus-Gilbert
24
Backus-Gilbert Necessary conditions
  • Let g be an identifiable real-valued parameter.
    Suppose ? ?0?T, a symmetric convex set T ? T,
    c?R, and g T ? R such that
  • ?0 T ? T
  • For t ?T, g(?0 t) c g(t), and g(-t)
    -g(t)
  • g(a1t1 a2t2) a1g(t1) a2g(t2), t1, t2 ? T,
    a1, a2 ? 0, a1a2 1, and
  • supt ? T g(t) lt?.
  • Then ? 1n matrix ? s.t. the restriction of g to
    T is the restriction of ?.K to T.

25
Backus-Gilbert Sufficient Conditions
  • Suppose g (gi)i1m is an Rm-valued parameter
    that can be written as the restriction to T of
    ?.K for some mn matrix ?.
  • Then
  • g is identifiable.
  • If Ee 0, ?.X is an unbiased estimator of g.
  • If, in addition, e has covariance matrix S
    EeeT, the covariance matrix of ?.X is ?.S.?T
    whatever be P?.

26
Decision Rules
  • A (randomized) decision rule
  • d X ? M1(A)
  • x ? dx(.),
  • is a measurable mapping from the space X of
    possible data to the collection M1(A) of
    probability distributions on a separable metric
    space A of actions.
  • A non-randomized decision rule is a randomized
    decision rule that, to each x ?X, assigns a unit
    point mass at some value
  • a a(x) ? A.

27
Why randomized rules?
  • In some problems, have better behavior.
  • Allowing randomized rules can make the set of
    decisions convex (by allowing mixtures of
    different decisions), which makes the math
    easier.
  • If the risk is convex, Rao-Blackwell theorem says
    that the optimal decision is not randomized.
    (More on this later.)

28
Example randomization natural
  • Coin has chance 1/3 of landing with one side
    showing chance 2/3 of the other showing. Dont
    know which side is which.
  • Want to decide whether P(heads) 1/3 or 2/3.
  • Toss coin 10 times. X heads.
  • Toss fair coin once. U heads.

Use data to pick the more likely scenario, but if
data dont help, decide by tossing a fair coin.
29
Estimators
  • An estimator of a parameter g(?) is a decision
    rule d for which the space A of possible actions
    is the space G of possible parameter values.
  • gg(X) is common notation for an estimator of
    g(?).
  • Usually write non-randomized estimator as a
    G-valued function of x instead of a M1(G)-valued
    function.

30
Comparing Decision Rules
  • Infinitely many decision rules and estimators.
  • Which one to use?
  • The best one!
  • But what does best mean?

31
Loss and Risk
  • 2-player game Nature v. Statistician.
  • Nature picks ? from T. ? is secret, but
    statistician knows T.
  • Statistician picks d from a set D of rules. d is
    secret.
  • Generate data X from P?, apply d.
  • Statistician pays loss L(?, d(X)). L should be
    dictated by scientific context, but
  • Risk is expected loss r(?, d) EqL(?, d(X))
  • Good rule d has small risk, but what does small
    mean?

32
Strategy
  • Rare that one d has smallest risk 8q?Q.
  • d is admissible if not dominated (no estimator
    does at least as well for every q, and better for
    at least one q).
  • Minimax decision minimizes rQ(d) supq?Qr(?, d)
    over d?D
  • Minimax risk is rQ infd 2 D rQ(d)
  • Bayes decision minimizes
  • rp(d) sQr(q,d)p(dq) over d?D
  • for a given prior probability distribution p on
    Q.
  • Bayes risk is rp infd 2 D rp(d).

33
Minimax is Bayes for least favorable prior
Pretty generally for convex ?, D,
concave-convexlike r,
  • If minimax risk gtgt Bayes risk, prior p controls
    the apparent uncertainty of the Bayes estimate.

34
Common Risk Mean Distance Error (MDE)
  • Let dG denote the metric on G, and let g be an
    estimator of g.
  • MDE at ? of g is
  • MDE?(g, g) Eq d(g(X), g(?)).
  • If metric derives from norm, MDE is mean norm
    error (MNE).
  • If the norm is Hilbertian, MNE2 is mean squared
    error (MSE).

35
Shrinkage
  • Suppose X N(q, I) with dim(q) d 3.
  • X not admissible for q for squared-error loss
    (Stein, 1956).
  • Dominated by X(1 a/(b X2)) for small a
    and big b.
  • James-Stein better dJS(X) X(1-a/X2), 0 lt a
    2(d-2).
  • Even better to take positive part of shrinkage
    factor
  • dJS(X) X(1-a/X2), 0 lt a 2(d-2).
  • dJS isnt minimax for MSE, but close.
  • Implications for Backus-Gilbert estimates of d 3
    linear functionals.
  • 9 extensions to other distributions see Evans
    Stark (1996).

36
Bias
  • When G is a Banach space, can define bias at ? of
    g
  • bias?(g, g) Eq g - g(?)
  • (when the expectation is well-defined).
  • If bias?(g, g) 0, say g is unbiased at ? (for
    g).
  • If g is unbiased at ? for g for every ???, say g
    is unbiased for g. If such g exists, say g is
    unbiasedly estimable.
  • If g is unbiasedly estimable then g is
    identifiable.

37
Example Bounded Normal Mean
  • Observe X N (q, 1). Know a priori q 2 -t, t.
  • Want to estimate g(q) q.
  • f() standard normal density.F() standard
    normal cumulative distribution function.
  • Suppose we choose to use squared-error loss
  • L(q, d) (q - d)2
  • r(q, d) Eq L(q, d(X)) Eq (q - d(X))2
  • rQ(d) supq 2 Q r(q, d) supq 2 Q Eq (q -
    d(X))2
  • rQ infd 2 D supq 2 Q Eq (q - d(X))2

38
Risk of X for bounded normal mean
  • Consider the simple (maximum likelihood)
    estimator
  • d(X) X.
  • EX q, so X is unbiased for q, and q is
    unbiasedly estimable.
  • r(q, X) Eq (q X)2 Var(X) 1.
  • Consider uniform Bayesian prior to capture
    constraint q 2 -t, t
  • p U-t, t, the uniform distribution on the
    interval -t, t.
  • rp(X) s-tt r(q, X) p(dq) s-tt 1 (2t)-1 dq
    1.
  • In this example, frequentist risk of X equals
    Bayes risk of X for uniform prior p (but X is not
    the Bayes estimator).

39
Truncation is better (but not best)
  • Easy to find an estimator better than X from both
    frequentist and Bayes perspectives.
  • Truncation estimate dT

dT is biased, but has smaller MSE than X,
whatever be q 2 Q. (dT is the constrained maximum
likelihood estimate.)
40
Risk of dT
x
Pq(X lt -t)
dT
f(xq)
0
t
-t
q
-t
0
q
t
41
Minimax MSE Estimate of BNM
  • Truncation estimate better than X, but neither
    minimax nor Bayes.
  • Clear that r min(1, t2) MSE(X) 1, and rQ(0)
    t2.
  • Minimax MSE estimator is a nonlinear shrinkage
    estimator.
  • Minimax MSE risk is t2/(1t2).

42
Bayes estimation of BNM
  • Posterior density of q given x is

43
Posterior Mean
  • The mean of the posterior density minimizes the
    Bayes risk, when the loss is squared error

44
Bayes estimator is also nonlinear shrinkage
Philip B. Stark function f bayesUnif(x, tau) f
x - (normpdf(tau - x) - normpdf(-tau-x))./(normc
df(tau-x) - normcdf(-tau-x)) return
6
X
dT
4
dp
2
0
Bayes estimator dp, t3
-2
-4
-6
-6
-4
-2
0
2
4
6
For t 3, Bayes risk rp ¼ 0.7 (by simulation)
.Minimax risk rQ 0.75.
45
Bayes/Minimax Risks
Philip B. Stark nsim 20000 risk 0 tau
5 for theta -tau.01tau pred
bayesunif(theta randn(nsim,1),tau) risk risk
(pred - theta)'(pred - theta)/nsim end risk/le
ngth(-tau.01tau)
Philip B. Stark
Philip B. Stark
Difference between knowing q 2 -t, t, and q
U-t, t.
46
Confidence Interval for BNM
  • Might be interested in a confidence set for q
    instead of point estimate. A 1-a confidence set
    I satisfies
  • Pq (I (X) 3 q) 1-a, 8q 2 Q.
  • Actions are now sets, not points. Decision rules
    are probability measures on sets.
  • Sensible loss? Lebesgue measure of confidence
    set. But consider the application!
  • Risk is expected measure.

47
Some Confidence Procedures
  • Naïve X za/2, X za/2
  • Truncated X za/2, X za/2 Ã… -t, t
  • Affine fixed-length aXbc/2, aXbc/2
  • Nonlinear fixed length f(X)c/2, f(X)c/2, f
    measurable
  • Variable length l(X), u(X), l, u
    measurable
  • Likelihood ratio test (LRT) Include all h 2 -t,
    t s.t.
  • f(h)/f(0) ch.
  • Choose ch to get 1-a coverage.

48
Minimax Expected Length
  • Seek procedure w/ min max expected length for q 2
    -t, t.
  • For t 2za, minimax is Truncated Pratt simple
    form. (Evans et al., 2003 Schafer Stark,
    2003.)
  • Table for a 0.05.
  • Naïve has length 3.92.

49
Regularization
  • From statistical perspective, regularization
    tries to exploit a bias/variance tradeoff to
    reduce MNE.
  • There are situations in which regularization is
    optimaldepends on prior information, forward
    operator, loss function, regularization
    functional. See, e.g., Donoho (1995), OSullivan
    (1986).
  • Generally need some prior information about q to
    know whether a given amount of smoothing
    increases bias2 by more than it decreases
    variance. (But c.f. shrinkage.)
  • Can think of regularization in dual ways
    minimizing a measure of size s.t. fitting data
    adequately, or minimizing measure of misfit
    subject to keeping the model small.
    Complementary interpretations of the Lagrange
    multiplier.
  • Generally, to get consistency, need the smoothing
    to decrease at the right rate as the number of
    data increases.

50
Sketch Regularization
51
Consistency of Occams Inversion
  • Common approach minimize norm (or other
    regularization functional) subject to mean data
    misfit 1.
  • Sometimes called Occams Inversion (Constable et
    al., 1987) simplest hypothesis consistent with
    the data.
  • In many circumstances, this estimator is
    inconsistent as number of data grows, greater
    and greater chance that the estimator is 0.
    Allowable misfit grows faster than norm of
    noise-free data image.
  • In common situations, consistency of the general
    approach requires data redundancy and averaging.

52
Singular Value Decomposition, Linear Problems
  • Assume Q ½ T , separable Hilbert space ejj1n
    iid N(0, s2)kjj1n linearly independent
  • K is compact infinite-dimensional null
    space.Let K ltn ! T be the adjoint operator to
    K.
  • 9 n triples (nj, xj, lj)j1n, nj 2 T, xj 2 X
    and lj 2 lt, such that
  • Knj lj xj,
  • K xj lj nj.
  • njj1n can be chosen to be orthonormal in T
    xjj1n can be chosen to be orthonormal in X.
  • lj gt 0, 8 j. Order s.t. l1 l2 L gt 0.
  • (nj, xj, lj)j1n are singular value
    decomposition of K.

53
Singular Value Weighting
  • Can write minimum norm model that fits data
    exactly as
  • dMN(X) Ã¥j1n lj-1 (xj X) nj.
  • Write q q q? (span of nj and its
    orthocomplement)
  • Biasq(dMN) Eq dMN(X) - q q?.
  • Varq dMN Eq Ã¥j1n lj-1 (xj e) nj 2 s2
    åj1n lj-2.
  • Components associated with small lj make variance
    big noise components multiplied by lj-1.
  • Singular value truncation reconstruct q using
    nj with lj t
  • dSVT Ã¥j1m lj-1 (xj X) nj, where m max k
    lk gt t.
  • Mollifies the noise magnification but increases
    bias.

54
Bias of SVT
  • Bias of SVT bigger than of MNE by projection of q
    onto span njjm1n.
  • Variance of SVT smaller by s2 Ã¥j m 1n lj-2.
  • With adequate prior information about q (to
    control bias) can exploit bias-variance tradeoff
    to reduce MSE.
  • SVT ? family of estimators that re-weight the
    singular functions in the reconstruction
  • dw Ã¥j1n w(lj) (xj X) nj.
  • Regularization using norm penalty, with
    regularization parameter l, corresponds to
  • w(u) u/(u2 l).
  • These tend to have smaller norm smaller than
    maximum likelihood estimate can be viewed as
    shrinkage.

55
Examples of Singular Functions
  • Linear, time-invariant filters complex sinusoids
  • Circular convolution sinusoids
  • Band and time-limiting prolate spheroidal
    wavefunctions
  • Main-field geomagnetism spherical harmonics,
    times radial polynomials in r-1
  • Abel transform Jacobi polynomials and Chebychev
    polynomials
  • See Donoho (1995) for more examples and
    references.

56
Minimax Estimation of Linear parameters
  • Observe X Kq e 2 Rn, with
  • q 2 Q µ T, T a separable Hilbert space
  • Q convex
  • eii1n iid N(0,s2).
  • Seek to learn about g(q) Q ! R, linear, bounded
    on Q
  • For variety of risks (MSE, MAD, length of
    fixed-length confidence interval), minimax risk
    is controlled by modulus of continuity of g,
    calibrated to the noise level.
  • Full problem no harder than hardest 1-dimensional
    subproblem reduces to BNM (Donoho, 1994).

57
Example Geomagnetism
  • Q q 2 l2(w) Ã¥l11 wl Ã¥m-ll qlm 2 q .
  • Estimate g(q) qlm.
  • Symmetry of Q and linearity of K, g, let us
    characterize the modulus

Problem maximize linear functional of a vector
in the intersection of two ellipsoids. In the
main-field geomagnetism problem, as the data
sampling becomes more uniform over the spherical
idealization of a satellite orbit, both the norm
(prior information) and the operator K are
diagonalized by spherical harmonics.
58
Modulus of Continuity
59
Distinguishing two models
  • Data tell the difference between two models z and
    h if the L1 distance between Pz and Ph is large

60
L1 and Hellinger distances
61
Consistency in Linear Inverse Problems
  • Xi ?i? ?i, i1, 2, 3, ??? subset of
    separable Banach space?i? ? linear, bounded
    on ? ?i iid ?
  • ? consistently estimable w.r.t. weak topology iff
    ?Tk, Tk Borel function of X1, . . . , Xk, s.t.
    ????, ??gt0, ?? ??,
  • limk Pq?Tk - ??gt? 0

62
Importance of the Error Distribution
  • µ a prob. measure on ? µa(B) µ(B-a), a? ?
  • Pseudo-metric on ?
  • If restriction to ? converges to metric
    compatible with weak topology, can estimate ?
    consistently in weak topology.
  • For given sequence of functionals ki, µ rougher
    ? consistent estimation easier.

63
Summary
  • Solving inverse problem means different things
    to different audiences.
  • Statistical viewpoint is useful abstraction.
    Physics in mapping ? ? P?Prior information in
    constraint ???.
  • There is more information in the assertion q p,
    with p supported on Q, than there is in the
    constraint q 2 Q.
  • Separating model q from parameters g(q) of
    interest is useful Sabatiers well posed
    questions. Many interesting questions can be
    answered without knowing the entire model.
  • Thinking about measures of performance is useful.
  • Difficulty of problem ? performance of specific
    method.

64
References Acknowledgements
  • Constable, S.C., R.L. Parker C.G. Constable,
    1987. Occam's inversion A practical algorithm
    for generating smooth models from electromagnetic
    sounding data, Geophysics, 52, 289-300.
  • Donoho, D.L., 1994. Statistical Estimation and
    Optimal Recovery, Ann. Stat., 22, 238-270.
  • Donoho, D.L., 1995. Nonlinear solution of linear
    inverse problems by wavelet-vaguelette
    decomposition, Appl. Comput. Harm. Anal.,2,
    101-126.
  • Evans, S.N. Stark, P.B., 2002. Inverse Problems
    as Statistics, Inverse Problems, 18, R1-R43.
  • Evans, S.N., B.B. Hansen P.B. Stark, 2003.
    Minimax expected measure confidence sets for
    restricted location parameters, Tech. Rept. 617,
    Dept. of Statistics, UC Berkeley
  • Le Cam, L., 1986. Asymptotic Methods in
    Statistical Decision Theory, Springer-Verlag, NY,
    1986, 742pp.
  • OSullivan, F., 1986. A statistical perspective
    on ill-posed inverse problems, Statistical
    Science, 1, 502-518.
  • Schafer, C.M. P.B. Stark, 2003. Using what we
    know inference with physical constraints, Tech.
    Rept., Dept. of Statistics, UC Berkeley
  • Stark, P.B., 1992. Inference in
    infinite-dimensional inverse problems
    Discretization and duality, J. Geophys. Res., 97,
    14,055-14,082.
  • Stark, P.B., 1992. Minimax confidence intervals
    in geomagnetism, Geophys. J. Intl., 108,
    329-338.Created using TexPoint by G. Necula,
    http//raw.cs.berkeley.edu/texpoint
Write a Comment
User Comments (0)
About PowerShow.com