Model Assessment and Selection, Akaike Information Criterion AIC - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Model Assessment and Selection, Akaike Information Criterion AIC

Description:

Model Assessment and Selection, Akaike Information Criterion (AIC) ... Model selection: Estimating the performance of different models to choose the best one. ... – PowerPoint PPT presentation

Number of Views:794
Avg rating:3.0/5.0
Slides: 17
Provided by: hong73
Category:

less

Transcript and Presenter's Notes

Title: Model Assessment and Selection, Akaike Information Criterion AIC


1
Model Assessment and Selection,Akaike
Information Criterion (AIC)
  • Elements of Statistical Learning
  • Graduate Seminar (ENEE698A)
  • Presented by Zhanfeng Yue
  • Oct. 7, 2003

2
Talk overview
  • Introduction to model selection and assessment
  • Problem formulation
  • In-sample error and optimism
  • Akaike Information Criterion (AIC)
  • Kullback-Leibler information number
  • Reasons for AIC
  • Model selection using AIC
  • Effective number of parameters

3
Model Selection and Assessment
  • Two goals
  • Model selection Estimating the performance of
    different models to choose the best one.
  • Model assessment Having chosen a final model
    estimating its prediction error.
  • If we are in a data-rich situation, the best
    approach is to randomly divide the data set into
    three parts
  • Training set
  • Validation set
  • Testing set

4
Problem Formulation
  • We have a signal, , where
  • is the output,
  • is the input, and
  • is the noise
  • We have a training set .
  • We would like to find a predictor for given
    , , so that .
  • The training error is .
  • Typically can be reduced by increasing the
    complexity of the model.

5
Optimism of Training Error
  • We focus on the in-sample error,
  • , where is the loss function.
  • We define optimism, op as the expected difference
    between the in-sample error and the training
    error.

6
Errin for Linear Models
  • We can show for different loss functions that
  • Therefore,
  • We can show for linear fit with d inputs
  • The optimism increases linearly with the number d
    of inputs, but decreases as the training sample
    size N increases.

7
A Fitting Problem
  • Data is generated according to an unknown
    parametric density
  • . Let denote a k-dimensional
    parametric family of densities. Our goal is to
    search among a class of families for the
    fitted model
  • , that best approximates .
  • To determine which of the fitted models
  • best resembles , we need a measure that
    gauges the disparity between the true model
    and an approximation model  
    .

8
Kullback-Leibler Information Number
  • Assume that our data follow the true distribution
    g(y), and f(y) is our statistical model to
    approximate g(y).
  • A ruler to measure the similarity between the
    statistical model and the true distribution is
    the Kullback-Leibler information number
  • It can be shown that -I(g,f) is the entropy. To
    minimize the Kullback-Leibler information number
    is to maximize the entropy.

9
Log-likelihood
  • Only the second term is important in evaluating
    the statistical model f(y), which can be written
    as (when ).
  • Therefore, can replace the Kullback-Leibler
    information number as a criterion of evaluating
    models.
  • Log-likelihood

10
Reasons for AIC
  • The maximum log-likelihood method cannot be used
    to compare different models without some
    corrections.
  • Using as an estimate of
    produces a bias. The reason for such bias to
    occur is that the same data are used to estimate
    parameters and to calculate the log-likelihood.
  • An unbiased estimate of yields the famous
    Akaike Information Criterion (AIC).
  • The expectation value of the bias is

11
AIC
  • When a log-likelihood loss function is used, the
    following relation holds as .
  • where is a family of densities for Y,
    is the maximum-likelihood estimate of , and
    loglik is the maximized log-likelihood

12
Model Selection Using AIC
  • Simply choose the model giving smallest AIC over
    the set of models considered.
  • Given a set of models indexed by a tuning
    parameter , define
  • Our final chosen model is , where is
    the tuning parameter that minimizes .

13
Note
  • If we have a total of p inputs, and we choose the
    best-fitting linear model with inputs,
    the optimism will exceed . Put another
    way, by choosing the best-fitting model with d
    inputs, the effective number of parameters fit is
    more than d.
  • The formula
  • holds exactly for linear models with additive
    errors and squared error loss, and approximately
    for linear models and log-likelihoods. In
    particular, the formula does not hold in general
    for 0-1 loss.

14
The Effective Number of Parameters
  • Stack the outcomes into a vector y, and
    similarly for the prediction . Then a linear
    fitting method is one for which we can write ,
    where S is an NxN matrix depending on the input
    vectors , but not on .
  • The efficient number of parameters is defined as
  • If S is an orthogonal projection matrix onto a
    basis spanned by M features, then trace(S)M.

15
Models Like Neural Network
  • For models like neural networks, in which we
    minimize an error function with weight decay
    penalty (regularization) , the
    effective number of parameters has the form
  • ,
  • where is the eigenvalues of the Hessian
    matrix .

16
Conclusions
  • The model selection and assessment is briefly
    reviewed.
  • The Akaike Information Criterion (AIC) for model
    selection is derived.
  • The effective number of parameters is introduced.
Write a Comment
User Comments (0)
About PowerShow.com