Statistical Learning Methods

About This Presentation

Title:

Statistical Learning Methods

Description:

Bag from new manufacturer; fraction of red cherry candies; any is possible ... In games, data often becomes available sequentially; not necessary to train in one go ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 36

Provided by: Mar5334

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Learning Methods

1
Statistical Learning Methods

Marco Loog

2
Introduction

Agents can handle uncertainty by using the
methods of probability and decision theory
utility theory probability theory
But first they must learn their probabilistic
theories of the world from experience...

3
Key Concepts

Data evidence, i.e., instantiation of one or
more random variables describing the domain
Hypotheses probabilistic theories of how the
domain works

4
Outline

Bayesian learning
Maximum a posteriori and maximum likelihood
learning
Instance-based learning
Intro neural networks

5
Bayesian Learning

Let D be all data, with observed value d, then
probability of a hypothesis hi, using Bayes rule
P(hid) aP(dhi)P(hi)
For prediction about quantity X P(Xd) ?
P(Xd,hi)P(hid) ? P(Xhi)P(hid)

6
Bayesian Learning

For prediction about quantity X P(Xd) ?
P(Xd,hi)P(hid) ? P(Xhi)P(hid)
No single best-guess hypothesis, all hypothesis
are involved

7
Bayesian Learning

Simply calculates probability of each hypothesis,
given data, and makes predictions based on this
I.e., predictions based on all hypothesis,
weighted by their probabilities, rather than
using only single best hypothesis

8
Candy

Suppose five kinds of bags of candies
10 are h1 100 cherry candies
20 are h2 75 cherry candies 25 lime
candies
40 are h3 50 cherry candies 50 lime
candies
20 are h4 25 cherry candies 75 lime
candies
10 are h5 100 lime candies
We observe candies drawn from some bag

9
Mo Candy

We observe candies drawn from some bag
Assume observations are i.i.d.,e.g. because many
candies in the bag
Assume we dont like the green lime candy
Important questions
What kind of bag is it? h1, h2,...,h5?
What flavor will the next candy be?

10
Posterior Probability of Hypotheses
11
Posterior Probability of Hypotheses

True hypothesis will eventually dominate the
Bayesian prediction prior is of no influence in
the long run
More importantly maybe not for us? Bayesian
prediction is optimal

12
The Price for Being Optimal

For real learning problems the hypothesis space
is large, possibly infinite
Summation / integration over hypothesis cannot be
carried out
Resort to approximate or simplified methods

13
Maximum A Posteriori

Common approximation method make predictions on
the single most probable hypothesis
I.e. take the hi that maximizes P(hid)
Such a MAP hypothesis is approximately Bayesian,
i.e., P(Xd) P(Xhi) the more evidence the
better the approximation

14
Hypothesis Prior

Both in Bayesian learning and in MAP learning,
hypothesis prior plays an important role
If hypothesis space is too expressive overfitting
can occur cf. Chapter 18
Prior is used to penalize complexity instead of
explicitly limiting the space the more complex
the hypothesis the lower the prior probability
If enough evidence available, eventually complex
hypothesis chosen if necessary

15
Maximum Likelihood Approximation

For enough data, prior becomes irrelevant
Maximum likelihood ML learning choose h that
maximizes P(dhi)
I.e., simply get the best fit to the data
Identical to MAP for uniform prior P(hi)
Also reasonable if all hypotheses are of the same
complexity
ML is the standard non-Bayesian / classical
statistical learning method

16
E.g.

Bag from new manufacturer fraction ? of red
cherry candies any ? is possible
Suppose unwrap N candies, c cherries and l N -
c limes
Likelihood
Maximize for ? using log likelihood

17
E.g. 2

Gaussian model often denoted by N(µ,?)
Log likelihood is given by
If ? is known, find maximum likelihood for µ
If µ is known, find maximum likelihood for ?

18
Halfway Summary and Additional Remarks

Full Bayesian learning gives best possible
predictions but is intractable
MAP selects single best hypothesis prior is
still used
Maximum likelihood assumes uniform prior, OK for
large data sets
Choose parameterized family of models to describe
the data
Write down likelihood of data as function of
parameters
Write down derivative of log likelihood w.r.t.
each parameter
Find parameter values such that the derivatives
are zero
ML estimation may be hard / impossible modern
optimization techniques help
In games, data often becomes available
sequentially not necessary to train in one go

19
Outline

Bayesian learning v
Maximum a posteriori and maximum likelihood
learning v
Instance-based learning
Intro neural networks

20
Instance-Based Learning

We saw statistical learning as parameter
learning, i.e., given a specific
parameter-dependent family of probability models
fit it to the data by tweaking parameters
Often simple and effective
Fixed complexity
Maybe good for very little data

21
Instance-Based Learning

We saw statistical learning as parameter learning
Nonparametric learning methods allow hypothesis
complexity to grow with the data
The more data we have, the more wigglier the
hypothesis can be

22
Nearest-Neighbor Method

Key idea properties of an input point x are
likely to be similar to points in the
neighborhood of x
E.g. classification estimate unknown class of x
using classes of neighboring points
Simple, but how does one define what a
neighborhood is?
One solution find the k nearest neighbors
But now the problem is how to decide what nearest
is...

23
k Nearest-Neighbor Classification

Check the class / output label of your k
neighbors and simply take for example of
neighbors having class label x
kas the posterior probability of
having class label x
When assigning a single label take MAP!

24
kNN Probability Density Estimation
25
Kernel Models

Idea Put little density function a kernel in
every data point and take the normalized sum of
these
Somehow similar to kNN
Often providing comparable performance

26
Probability Density Estimation
27
Outline

Bayesian learning v
Maximum a posteriori and maximum likelihood
learning v
Instance-based learning v
Intro neural networks

28
Neural Networks and Games
29
Neural Networks and Games
30
Neural Networks and Games
31
Neural Networks and Games
32
Neural Networks and Games
33
Neural Networks and Games
34
Neural Networks and Games
35
So First... Neural Networks

According to Robert Hecht-Nielsen, a neural
network is simply a computing system made up of
a number of simple, highly interconnected
processing elements, which process information by
their dynamic state response to external inputs
Simply...
We skip the biology for now
And provide the bare basics

Write a Comment

User Comments (0)