Model Assessment and Selection, Akaike Information Criterion AIC

1 / 16

About This Presentation

Title:

Description:

Number of Views:794

Avg rating:3.0/5.0

Slides: 17

Provided by: hong73

Category:

more less

Transcript and Presenter's Notes

Title: Model Assessment and Selection, Akaike Information Criterion AIC

1
Model Assessment and Selection,Akaike
Information Criterion (AIC)

2
Talk overview

3
Model Selection and Assessment

Two goals
Model selection Estimating the performance of
different models to choose the best one.
Model assessment Having chosen a final model
estimating its prediction error.
If we are in a data-rich situation, the best
approach is to randomly divide the data set into
three parts
Training set
Validation set
Testing set

4
Problem Formulation

5
Optimism of Training Error

We focus on the in-sample error,
, where is the loss function.
We define optimism, op as the expected difference
between the in-sample error and the training
error.

6
Errin for Linear Models

We can show for different loss functions that
Therefore,
We can show for linear fit with d inputs
The optimism increases linearly with the number d
of inputs, but decreases as the training sample
size N increases.

7
A Fitting Problem

Data is generated according to an unknown
parametric density
. Let denote a k-dimensional
parametric family of densities. Our goal is to
search among a class of families for the
fitted model
, that best approximates .
To determine which of the fitted models
best resembles , we need a measure that
gauges the disparity between the true model
and an approximation model
.

8
Kullback-Leibler Information Number

Assume that our data follow the true distribution
g(y), and f(y) is our statistical model to
approximate g(y).
A ruler to measure the similarity between the
statistical model and the true distribution is
the Kullback-Leibler information number
It can be shown that -I(g,f) is the entropy. To
minimize the Kullback-Leibler information number
is to maximize the entropy.

9
Log-likelihood

Only the second term is important in evaluating
the statistical model f(y), which can be written
as (when ).
Therefore, can replace the Kullback-Leibler
information number as a criterion of evaluating
models.
Log-likelihood

10
Reasons for AIC

The maximum log-likelihood method cannot be used
to compare different models without some
corrections.
Using as an estimate of
produces a bias. The reason for such bias to
occur is that the same data are used to estimate
parameters and to calculate the log-likelihood.
An unbiased estimate of yields the famous
Akaike Information Criterion (AIC).
The expectation value of the bias is

11
AIC

When a log-likelihood loss function is used, the
following relation holds as .
where is a family of densities for Y,
is the maximum-likelihood estimate of , and
loglik is the maximized log-likelihood

12
Model Selection Using AIC

13
Note

If we have a total of p inputs, and we choose the
best-fitting linear model with inputs,
the optimism will exceed . Put another
way, by choosing the best-fitting model with d
inputs, the effective number of parameters fit is
more than d.
The formula
holds exactly for linear models with additive
errors and squared error loss, and approximately
for linear models and log-likelihoods. In
particular, the formula does not hold in general
for 0-1 loss.

14
The Effective Number of Parameters

Stack the outcomes into a vector y, and
similarly for the prediction . Then a linear
fitting method is one for which we can write ,
where S is an NxN matrix depending on the input
vectors , but not on .
The efficient number of parameters is defined as
If S is an orthogonal projection matrix onto a
basis spanned by M features, then trace(S)M.

15
Models Like Neural Network

For models like neural networks, in which we
minimize an error function with weight decay
penalty (regularization) , the
effective number of parameters has the form
,
where is the eigenvalues of the Hessian
matrix .

16
Conclusions