Artificial Intelligence

About This Presentation

Title:

Artificial Intelligence

Description:

Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs) Artificial neural networks The brain is a pretty intelligent system. – PowerPoint PPT presentation

Number of Views:267

Avg rating:3.0/5.0

Slides: 41

Provided by: HH8

Category:

more less

Transcript and Presenter's Notes

Title: Artificial Intelligence

1
Artificial Intelligence

Statistical learning methods
Chapter 20, AIMA
(only ANNs SVMs)

2
Artificial neural networks

The brain is a pretty intelligent system.
Can we copy it?
There are approx. 1011 neurons in the brain.
There are approx. 23?109 neurons in the male
cortex (females have about 15 less).

3
The simple model

The McCulloch-Pitts model (1943)

w2
w1
w3
y g(w0w1x1w2x2w3x3)
Image from Neuroscience Exploring the brain by
Bear, Connors, and Paradiso
4
Neuron firing rate
5
Transfer functions g(z)
The logistic function
The Heaviside function
6
The simple perceptron
With -1,1 representation Traditionally
(early 60s) trained with Perceptron learning.
7
Perceptron learning
Desired output

Repeat until no errors are made anymore
Pick a random example x(n),f(n)
If the classification is correct, i.e. if
y(x(n)) f(n) , then do nothing
If the classification is wrong, then do the
following update to the parameters (h, the
learning rate, is a small positive number)

8
Example Perceptron learning
x2
x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 1
x1
The AND function
Initial values h 0.3
9
Example Perceptron learning
x2
x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 1
This one is correctlyclassified, no action.
x1
The AND function
10
Example Perceptron learning
x2
x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 1
This one is incorrectlyclassified, learning
action.
x1
The AND function
11
Example Perceptron learning
x2
x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 1
This one is incorrectlyclassified, learning
action.
x1
The AND function
12
Example Perceptron learning
x2
x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 1
This one is correctlyclassified, no action.
x1
The AND function
13
Example Perceptron learning
x2
x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 1
This one is incorrectlyclassified, learning
action.
x1
The AND function
14
Example Perceptron learning
x2
x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 1
This one is incorrectlyclassified, learning
action.
x1
The AND function
15
Example Perceptron learning
x2
x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 1
x1
The AND function
Final solution
16
Perceptron learning

Perceptron learning is guaranteed to find a
solution in finite time, if a solution exists.
Perceptron learning cannot be generalized to more
complex networks.
Better to use gradient descent based on
formulating an error and differentiable functions

17
Gradient search
The learning rate (h) is set heuristically
E(W)
Go downhill
W
W(k)
W(k1) W(k) DW(k)
18
The Multilayer Perceptron (MLP)

Combine several single layer perceptrons.
Each single layer perceptron uses a sigmoid
function (C?)E.g.

input
output
Can be trained using gradient descent
19
Example One hidden layer

Can approximate any continuous function
q(z) sigmoid or linear,
f(z) sigmoid.

20
Training Backpropagation(Gradient descent)
21
Support vector machines
22
Linear classifier on a linearly separable problem
There are infinitely manylines that have zero
trainingerror. Which line should we choose?
23
Linear classifier on a linearly separable problem
There are infinitely manylines that have zero
trainingerror. Which line should we choose? ?
Choose the line with thelargest margin. The
large margin classifier
margin
24
Linear classifier on a linearly separable problem
There are infinitely manylines that have zero
trainingerror. Which line should we choose? ?
Choose the line with thelargest margin. The
large margin classifier
margin
Support vectors
25
Computing the margin
The plane separating and is defined
by The dashed planes are given by
w
margin
26
Computing the margin
Divide by b Define new w w/b and a a/b
w
margin
We have defined a scalefor w and b
27
Computing the margin
We have which gives
x lw
lw
x
margin
28
Linear classifier on a linearly separable problem
Maximizing the margin isequal to minimizing
w subject to the constraints wTx(n) a
? 1 for all wTx(n) a ? -1 for all
w
Quadratic programming problem, constraints can
be included with Lagrange multipliers.
29
Quadratic programming problem
Minimize cost (Lagrangian)
Minimum of Lp occurs at the maximum of (the Wolfe
dual)
Only scalar productin cost. IMPORTANT!
30
Linear Support Vector Machine
Test phase, the predicted output
Where a is determined e.g. by looking at one of
the support vectors. Still only scalar products
in the expression.
31
How deal with nonlinear case?

Project data into high-dimensional space Z. There
we know that it will be linearly separable (due
to VC dimension of linear classifier).
We dont even have to know the projection...!

32
Scalar product kernel trick
If we can find kernel such that
Then we dont even have to know the mapping to
solve the problem...
33
Valid kernels (Mercers theorem)
Define the matrix
If K is symmetric, K KT, and positive
semi-definite, thenKx(i),x(j) is a valid
kernel.
34
Examples of kernels
First, Gaussian kernel. Second, polynomial
kernel. With d1 we have linear SVM. Linear SVM
often used with good success on highdimensional
data (e.g. text classification).
35
Example Robot color vision(Competition 1999)
Classify the Lego pieces into red, blue, and
yellow. Classify white balls, black sideboard,
and green carpet.
36
What the camera sees (RGB space)
Yellow
Red
Green
37
Mapping RGB (3D) to rgb (2D)
38
Lego in normalized rgb space
x2
Output is 6D red, blue, yellow, green, black,
white
x1
Input is 2D
39
MLP classifier
E_train 0.21 E_test 0.24
2-3-1 MLP Levenberg- Marquardt
Training time (150 epochs) 51 seconds
40
SVM classifier
E_train 0.19 E_test 0.20
SVM with g 1000
Training time 22 seconds

Write a Comment

User Comments (0)