Additional Topics in Prediction Methodology - PowerPoint PPT Presentation

About This Presentation
Title:

Additional Topics in Prediction Methodology

Description:

Additional Topics in Prediction Methodology – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 31
Provided by: Lin155
Category:

less

Transcript and Presenter's Notes

Title: Additional Topics in Prediction Methodology


1
Additional Topics in Prediction Methodology
2
Introduction
  • Predictive distribution for random variable Y0 is
    meant to capture all the information about Y0
    that is contained in Yn.
  • not completely specify Y0 but does provide a
    probability distribution of more likely and less
    likely values of Y0
  • EY0Yn is the best MSPE predictor of Y0

3
Hierarchical models have two stages
  • X ?Rd
  • f0f(x0) known p1 vector
  • F(fj(xj)) known np matrix
  • ? unknown p1 vector regression coefficients
  • R(R(xi-xj)) known nn matrix correlations among
    trainning data Yn
  • r0(R(xi-x0)) known n1 vector correlations of Y0
    with Yn

4
Predictive Distributions when ?Z2, R and r0 are
known
5
(No Transcript)
6
Interesting features of (a) and (b)
  • Non-informative Prior is the limit of the normal
    prior as ???
  • While the prior is non-informative, it is not a
    proper distribution. The corresponding predictive
    distribution is proper.
  • The same conditioning argument can be applied to
    drive posterior mean for the non-informative
    prior and normal prior.

7
The mean and variance of the predictive
distribution (mean)
  • ?0n(x0) and ? 0n(x0) depend on x0 only through
    the regression function f0 and correlation vector
    r0
  • ?0n(x0) is a linear unbiased predictor of Y(x0)
  • The continuity and other smoothness properties of
    ?0n(x0) are inherited from correlation function
    R(.) and the regressors f(.)j1p

8
  • ?0n(x0) depends on the parameters ?z2 ?2 only
    through their ratio
  • ?0n(x0) interpolate the training data. When
    x0xi, f0f(xi), and r0TR-1eiT, the ith unit
    vector.

9
(No Transcript)
10
The mean and variance of the predictive
distribution (Variance)
  • MSPE(?0n(x0) ) ? 0n2(x0)
  • The variance of the posterior of Y(x0) given Yn
    should be 0 whenever x0xi
  • ? 0n2(xi)0

11
Most important use of Theorem 4.1.1
12
Predictive Distributions when R and r0 are known
The posterior is a location shifted and scaled
univariate t distribution having degrees of
freedom that are enhanced when there is
informative prior information for either ? or ?z2
13
(No Transcript)
14
(No Transcript)
15
Degree of freedom
  • Base value for the degree of freedom ?in-p
  • P additional degrees of freedom when prior ? is
    informative
  • ?0 additional degree of freedom when ?z2 is
    informative

16
Location shift
The same centering value as Theorem 4.1.1 (known
?z2 ) The non-informative prior gives the BLUP
17
Scale factor ?i2(x0) (compare 4.1.15 with 4.1.6)
  • Estimate of the scale factor ?0n2(x0).
  • Qi2/?i estimate ?z2
  • Qi2 get information about ?z2 from the
    conditional distribution Yn given ?z2 and
    information from the prior of ?z2
  • ?i2(xi)0, xi is any of the training data points.

18
Prediction Distributions when Correlation
parameters are unknown
  • If the correlations among the observations is
    unknown (R r0 are unknown)?
  • Assume y(.) has a Gaussian prior with
    correlation function R(.?), ? is unknown vector
    parameters
  • Two issues
  • Standard error of Plug-in predictor ?0n(x0?) by
    substituting ? comes from MLE or REML
  • Bayesian approach to uncertainty in ? which is to
    model it by a prior distribution

19
Prediction of Multiple Response Models
  • Several outputs are available for from a computer
    experiment
  • Several codes are available for computing the
    same response (fast and slow code)
  • Competing response
  • Several stochastic models for joint response
  • Using these models to describe the optimal
    predictor for one of the several computed
    responses.

20
Modeling Multiple Outputs
  • Zi(.) marginally mean zero stationary Gaussian
    stochastic processes with unknown variance and
    correlation function R
  • Zi(x) implies that the correlation between Zi(x1)
    and Zi(x2) only depends on x1-x2
  • Assume Cov(Zi(x1), Zj(x2))?i?jRij(x1-x2)
  • Rij(.) cross-correlation function of Zi(.) and
    Zj(.)
  • Linear model global mean of the Yi process.
    fi(.) known regression functions
  • ?i unknown regression parameters

21
Selection of correlation and cross-correlation
functions are complicated
  • Reason for any input sites xli, the multivariate
    normal distributed random vector (Z1(x11), .)T
    must have a nonnegative definite covariance
    matrix
  • Solution construct the Zi(.) from a set of
    elementary processes (usually this processes are
    mutually independent)

22
Example by Kennedy and OHagan
  • Yi(x) prior for the ith code level (im
    top-level code). The autoregressive model
  • Yi(x)?i-1Yi-1(x)?i(x), i2, , m
  • The output for each successive higher level code
    i at x is related to the output of the less
    precise code i-1 at x plus the refinement ?i(x)
  • Cov(Yi(x), Yi-1(w)Yi-1(x))0 for all wx
  • No additional second-order knowledge of code i at
    x can be obtained from the lower-level code i-1
    if the value of code i-1 at x is known (Markov
    property on the hierarchy of codes)
  • Since there is no natural hierarchy of computer
    code in such applications, we need find something
    better.

23
More reasonable Model
  • Each constraint function is associated with the
    objective function plus a refinement
  • Yi(x)?iY1(x)?i(x), i2, , m1
  • Ver Hoef and Marry
  • Form models in the environmental sciences
  • Include an unknown smooth surface plus a random
    measurement error.
  • Moving averages over white noise processes

24
Morris and Mitchell model
  • Prior information about y(x) is specified by a
    Gaussian processor Y(.)
  • Prior information about the partial derivatives
    y(j)(x) is obtained by considering the
    derivative processes of Y(.)
  • Y1(.)y(.), y2(.) y(1)(.), y1m(.)y(m)(.)
  • Natural prior for y(j)(x)
  • The covariances between Y(x1), Y(j)(x2) and
    Y(i)(x1), Y(j)(x2) are

25
Optimal Predictors for Multiple Outputs
  • The best MSPE predictor based on training data
    is
  • Where Y0Y1(X0), Yini(Yi(x1i), ), and yini is
    observed value for i1,m

26
The joint distribution is the multivariate normal
distribution
27
Conditional expectation
  • ..
  • In practice, this is useless (it requires
    knowledge of marginal correlation functions,
    joint correlation function and ratio of all the
    process variance)
  • Empirical versions are of practical use
  • Every time we assume each of the correlation
    matrices Ri and cross-correlation matrices Rij
    are known up to a vector of parameters.
  • Estimate ? using MLE or REML

28
example1
  • 14 point training data has feature that it allows
    us to learn over the entire input space
    space-filling
  • Compare two model
  • Using the predictor of y(.) based on y(.) alone
  • Using the predictor of y(.) base on (y(.),
    y(1)(.), y(2)(.))
  • Second one is both more visually fit and has 24
    smaller ERMSPE

29
(No Transcript)
30
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com