CHAPTER 13 MODELING CONSIDERATIONS AND STATISTICAL INFORMATION - PowerPoint PPT Presentation

About This Presentation
Title:

CHAPTER 13 MODELING CONSIDERATIONS AND STATISTICAL INFORMATION

Description:

Step 2 (error calculation) Based on estimate for from Step 1 (i th training subset), calculate MSE (or other measure) with data in i th test subset. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 20
Provided by: Jhu96
Learn more at: https://www.jhuapl.edu
Category:

less

Transcript and Presenter's Notes

Title: CHAPTER 13 MODELING CONSIDERATIONS AND STATISTICAL INFORMATION


1
CHAPTER 13 MODELING CONSIDERATIONS AND
STATISTICAL INFORMATION
Slides for Introduction to Stochastic Search and
Optimization (ISSO) by J. C. Spall
  • All models are wrong some are useful.
  • ?George E. P. Box
  • Organization of chapter in ISSO
  • Bias-variance tradeoff
  • Model selection Cross-validation
  • Fisher information matrix Definition, examples,
    and efficient computation

2
Model Definition and MSE
  • Assume model z h(?, x) v, where z is output,
    h() is some function, x is input, v is noise,
    and?? is vector of model parameters
  • h() may represent simulation model
  • h() may represent metamodel (response surface)
    of existing simulation
  • A fundamental goal is to take n data points and
    estimate ?, forming
  • A common measure of effectiveness for estimate is
    mean of squared model error (MSE) at fixed x

3
Bias-Variance Decomposition
  • The MSE of the model at a fixed x can be
    decomposed as
  • Eh( , x) ? E(zx)2 x
  • Eh( , x) ? E(h( , x))2x E(h(
    , x)) ? E(zx)2
  • variance at x (bias at x)2
  • where expectations are computed w.r.t.
  • Above implies
  • Model too simple ? High bias/low variance
  • Model too complex ? Low bias/high variance

4
Unbiased Estimator May Not be Best (Example 13.1
from ISSO)
  • Unbiased estimator is such that
    (i.e., mean of prediction is same
    as mean of data z)
  • Example Let denote sample mean of scalar
    i.i.d. data as estimator of true mean ? (h(?,?x)
    ? in notation above)
  • Alternative biased estimator of ? is
    where 0 lt r lt 1
  • MSE of biased and unbiased estimators generally
    satisfy
  • Biased estimate better in MSE sense
  • However, optimal value of r requires knowledge of
    unknown (true) ?

5
Bias-Variance Tradeoff in ModelSelection in
Simple Problem
6
Example 13.2 in ISSO Bias-Variance Tradeoff
  • Suppose true process produces output according to
    z f(x) noise, where f?(x) (x x2?)1.1
  • Compare linear, quadratic, and cubic
    approximations
  • Table below gives average bias, variance, and MSE
  • Overall pattern of decreasing bias and increasing
    variance optimal tradeoff is quadratic model

7
Model Selection
  • The bias-variance tradeoff provides conceptual
    framework for determining a good model
  • Bias-variance tradeoff not directly useful
  • Need a practical method for optimizing
    bias-variance tradeoff
  • Practical aim is to pick a model that minimizes a
    criterion
  • f1(fitting?error?from?given?data)??
    f2(model?complexity)
  • where f1 and f2 are increasing functions
  • All methods based on a tradeoff between fitting
    error (high variance) and model complexity (low
    bias)
  • Criterion above may/may not be explicitly used in
    given method

8
Methods for Model Selection
  • Among many popular methods are
  • Akaike Information Criterion (AIC) (Akaike, 1974)
  • Popular in time series analysis
  • Bayesian selection (Akaike, 1977)
  • Bootstrap-based selection (Efron and Tibshirini,
    1997)
  • Cross-validation (Stone, 1974)
  • Minimum description length (Risannen, 1978)
  • V-C dimension (Vapnik and Chervonenkis, 1971)
  • Popular in computer science
  • Cross-validation appears to be most popular model
    fitting method

9
Cross-Validation
  • Cross-validation is simple, general method for
    comparing candidate models
  • Other specialized methods may work better in
    specific problems
  • Cross-validation uses the training set of data
  • Method is based on iteratively partitioning the
    full set of training data into training and test
    subsets
  • For each partition, estimate model from training
    subset and evaluate model on test subset
  • Number of training (or test) subsets number of
    model fits required
  • Select model that performs best over all test
    subsets

10
Choice of Training and Test Subsets
  • Let n denote total size of data set, nT denote
    size of test subset, nT lt n
  • Common strategy is leave-one-out nT 1
  • Implies n test subsets during cross-validation
    process
  • Often better to choose nT gt 1
  • Sometimes more efficient (sampling w/o
    replacement)
  • Sometimes more accurate model selection
  • If nT gt 1, sampling may be with or without
    replacement
  • With replacement indicates that there are n
    choose nT test subsets, written
  • With replacement may be prohibitive in practice
    e.g., n 30, nT 6 implies nearly 600K
    model fits!
  • Sampling without replacement reduces number of
    test subsets to n?/nT (disjoint test subsets)
  • With replacement indicates that there are n
    choose nT samplings
  • Above may be prohibitive in practice
  • ee means have may lead to huge number of
    samlingslarge tno Cross-validation uses the
    training set of data
  • Method is based on iteratively partitioning the
    full set of training data into training and test
    subsets
  • For each partition, estimate model from training
    subset and evaluate model on test subset
  • Select model that performs best over all test
    subsets

11
Conceptual Example of Sampling Without
Replacement Cross-Validation with 3 Disjoint
Test Subsets
12
Typical Steps for Cross-Validation
  • Step 0 (initialization) Determine size of test
    subsets and candidate model. Let i be counter
    for test subset being used.
  • Step 1 (estimation) For the i th test subset, let
    the remaining data be the i th training subset.
    Estimate ? from this training subset.
  • Step 2 (error calculation) Based on estimate for
    ? from Step 1 (i th training subset), calculate
    MSE (or other measure) with data in i th test
    subset.
  • Step 3 (new training and test subsets) Update i
    to i 1 and return to step 1. Form mean of MSE
    when all test subsets have been evaluated.
  • Step 4 (new model) Repeat steps 1 to 3 for next
    model. Choose model with lowest mean MSE as best.

13
Numerical Illustration of Cross-Validation
(Example 13.4 in ISSO)
  • Consider true system corresponding to a sine
    function of the input with additive normally
    distributed noise
  • Consider three candidate models
  • Linear (affine) model
  • 3rd-order polynomial
  • 10th-order polynomial
  • Suppose 30 data points are available, divided
    into 5 disjoint test subsets (sampling w/o
    replacement)
  • Based on RMS error (equiv. to MSE) over test
    subsets, 3rd-order polynomial is preferred
  • See following plot

14
Numerical Illustration (contd) Relative Fits
for 3 Models with Low-Noise Observations
15
Fisher Information Matrix
  • Fundamental role of data analysis is to extract
    information from data
  • Parameter estimation for models is central to
    process of extracting information
  • The Fisher information matrix plays a central
    role in parameter estimation for measuring
    information
  • Information matrix summarizes the amount
  • of information in the data relative to the
    parameters being estimated

16
Problem Setting
  • Consider the classical statistical problem of
    estimating parameter vector ? from n data vectors
    z1, z2 ,, zn
  • Suppose have a probability density and/or mass
    function associated with the data
  • The parameters ? appear in the probability
    function and affect the nature of the
    distribution
  • Example zi ? N(mean(?), covariance(?)) for all i
  • Let l(?z1, z2 ,, zn) represent the likelihood
    function, i.e., the p.d.f./p.m.f. viewed as a
    function of ? conditioned on the data

17
Information MatrixDefinition
  • Recall likelihood function l(?z1, z2 ,, zn)
  • Information matrix defined as
  • where expectation is w.r.t. z1, z2 ,, zn
  • Equivalent form based on Hessian matrix
  • Fn(?) is positive semidefinite of dimension p?p
    (pdim(?))

18
Information MatrixTwo Key Properties
  • Connection of Fn(?) and uncertainty in estimate
    is rigorously specified via two famous
    results (?? true value of ?)
  • 1. Asymptotic normality
  • where
  • 2. Cramér-Rao inequality
  • Above two results indicate greater variability
    of ?smaller Fn(?) (and vice versa)

19
Selected Applications
  • Information matrix is measure of performance for
    several applications. Four uses are
  • 1. Confidence regions for parameter estimation
  • Uses asymptotic normality and/or Cramér-Rao
    inequality
  • 2. Prediction bounds for mathematical models
  • 3. Basis for D-optimal criterion for
    experimental design
  • Information matrix serves as measure of how well
    ? can be estimated for a given set of inputs
  • 4. Basis for noninformative prior in Bayesian
    analysis
  • Sometimes used for objective Bayesian inference
Write a Comment
User Comments (0)
About PowerShow.com