COMP3170 Machine Learning - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

COMP3170 Machine Learning

Description:

b) Assume a learning rate of 0.1, compute the weights at time (n 1) ... All the connection weights, the bias of Neuron 1 and Neuron 2 are shown in the Figure. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 42
Provided by: compHk
Category:

less

Transcript and Presenter's Notes

Title: COMP3170 Machine Learning


1
COMP3170Machine Learning
  • tutorial

2
Q (Lecture 1)
  • Describe informally in one paragraph of English,
    the task of learning to recognize handwriting
    numerical digits.

3
  • A sample answer
  • To construct a learning system F(W) based
    on the training digit examples E(X,D), to
    predict the classification for a new unknown
    input digit F(W,X).

4
  • 2. Describe the various steps involved in
    designing a learning system to perform the task
    of question 1, give as much detail as possible
    the tasks that have to be performed in each step.

5
  • A sample answer
  • The main steps are given below, please refer to
    the details in the lecture note1 (pp.10-21).
  • Step 1 Collect Training Examples.
  • Step 2 Representing Experience by certain
  • representation scheme.
  • Step 3 Choose a Representation for the Black
  • Box (F).
  • Step 4 Learning/Adjusting the parameters of F.
  • Step 5 Use/Test the System

6
  • 3. For the tasks of learning to recognize human
    faces and finger print respectively, redo
    questions 1 and 2.

7
  • 4. In the lecture, we used a very long binary
    vector to represent the handwriting digits, can
    you think of other representation methods?

8
  • One possible approach by projection

9
Q (Lecture 2)
  • What is the weight values of a perceptron having
    the following decision surfaces

10
  • (a)
  • (b)

11
  • 2. Design two-input perceptrons for implementing
    the following boolean functions
  • AND, OR, NAND, NOR

12
  • AND
  • X1 X2 D
  • -1 -1 -1
  • -1 1 -1
  • 1 1 1
  • 1 -1 -1

13
  • 3. A single layer perceptron is incapable of
    learning simple functions such as XOR (exclusive
    OR). Explain why this is the case (hint use the
    decision boundary)

14
  • X1 X2 D
  • -1 -1 -1
  • -1 1 1
  • 1 1 -1
  • 1 -1 1
  • The hyperplane does not exist in this case!

15
  • A single layer Perceptron is as follows
  • Write down and plot the equation of the decision
    boundary of this device
  • Change the values of w1 and w2 so that the
    Perceptron can separate following two-class
    patterns
  • Class 1 Patterns (1, 2), (1.5. 2.5), (1, 3)
  • Class 2 Patterns (2, 1.5), (2, 1)

16
  • W12W2gt0, 1.5W12.5W2gt0, W13W2gt0
  • ?W1gt-2W2, W1gt-(5/3)W2, W1gt-3W2
  • ? W1gt-(5/3)W2
  • 2W11.5W2lt0, 2W1W2lt0
  • ? W1lt-(1.5/2)W2, W1lt-(1/2)W2
  • ?W1lt-(1.5/2)W2
  • ? -(5/3)W2lt W1lt-(1.5/2)W2

17
Perceptron pseudo-code
  • Pseudo code
  • Input X , delta, max_iteration,
    Error_criterion
  • Output w
  • Begin Initialize w
  • Do loop max_iteration
  • For each data x step1. calculate R
    step2update w
  • error(iteration) of misclassified
    data / of data
  • if error lt Error_criterion then Break
  • Return w
  • End

18
Q (lecture 3)
  • K nearest neighbor classifier has to store all
    training data creating high requirement on
    storage. Can you think of ways to reduce the
    storage requirement without affecting the
    performance? (hint search the Internet, you will
    find many approximation methods).

19
  • Use training set size reduction scheme(select a
    few prototypes)

20
Q (lecture 4)
  • An application of the k-means algorithm to a 2
    dimensional feature space has produced following
    three cluster prototypes M1 (1, 2), M2 (2,
    1) and M3 (2, 2). Determine which cluster will
    each of the following feature vectors be
    classified into. (hint minimum Euclidean
    Distance)
  • (i) X1 (1, 1)
  • X2 (2, 3)

21
  • The k-means algorithm has been shown to minimize
    the following cost function. Derive an online
    version of the k-means algorithm following
    similar ideas as the delta rule. In this case,
    the prototype will be updated every time a
    training sample is presented for training, this
    is in contrast to k-means algorithm where only
    after all samples are presented the prototypes
    will be updated.

22
Online K-means
  • 1- Initialize K group centroids.
  • 2- For each sample
  • a- Assign a sample to the group that has the
    closest centroid.
  • b- Recalculate the new position of the assigned
    centroid. (deltaxk)
  • 3- Repeat Step 2 until the centroids no longer
    move or the maximum iteration number has been
    reached.

23
Q (lecture 5)
  • Derive a gradient descent training rule for a
    single unit with output y (hint Batch)

24
(No Transcript)
25
  • 2. A network consists of two ADLINE units N1
    and N2 is shown as follows. Derive a delta
    training rule for all the weights (hint online)

26
(No Transcript)
27
  • 3.The connection weights of a two-input ADLINE at
    time n have following values
  • w0 (n) -0.5 w1 (n) 0.1 w2 (n) -0.3.
  • The training sample at time n is
  • x1 (n) 0.6 x2 (n) 0.8
  • The corresponding desired output is d(n) 1
  • a) Base on the Least-Mean-Square (LMS) algorithm,
    derive the learning equations for each weight at
    time n
  • b) Assume a learning rate of 0.1, compute the
    weights at time (n1)
  • w0 (n1), w1 (n1), and w2 (n1).

28
  • Similar to the last question (omit)

29
Q (lecture 6)
  • Assume that a system uses a three-layer
    perceptron neural network to recognize 10
    hand-written digits 0, 1, 2, 3, 4, 5, 6, 7, 8,
    9. Each digit is represented by a 9 x 9 pixels
    binary image and therefore each sample is
    represented by an 81-dimensional binary vector.
    The network uses 10 neurons in the output layer.
    Each of the output neurons signifies one of the
    digits. The network uses 120 hidden neurons. Each
    hidden neuron and output neuron also has a bias
    input.
  • (i) How many connection weights does the network
    contain?
  • (ii) For the training samples from each of the 10
    digits, write down their possible corresponding
    desired output vectors.
  • (iii) Describe briefly how the backprogation
    algorithm can be applied to train the network.
  • (iv) Describe briefly how a trained network will
    be applied to recognize an unknown input.

30
  • (i) (81X120120)(120X1010)
  • (ii) 0000000001---gt0
  • 0000000010---gt1
  • 0000000100---gt2
  • 1000000000---gt9

31
  • (iii)

32
  • based on the output (y(k)) of the Multilayer
    perceptron given the unknown input

33
  • The network shown in the Figure is a 3 layer feed
    forward network. Neuron 1, Neuron 2 and Neuron 3
    are McCulloch-Pitts neurons which use a threshold
    function for their activation function. All the
    connection weights, the bias of Neuron 1 and
    Neuron 2 are shown in the Figure. Find an
    appropriate value for the bias of Neuron 3, b3,
    to enable the network to solve the XOR problem
    (assume bits 0 and 1 are represented by level 0
    and 1, respectively). Show your working process.

34
(No Transcript)
35
  • Consider on case 2 we get the conclusion that b3
    should be chosen in the range
  • (-0.5,0).

36
  • Consider a 3 layer perceptron with two inputs a
    and b, one hidden unit c and one output unit d.
    The network has five weights which are
    initialized to have a value of 0.1. Given their
    values after the presentation of each of the
    following training samples
  • Input Desired
  • Output
  • a1 b0 1
  • b0 b1 0

37
  • Ysigmod(sigmod(0.1a0.1b0.11)0.10.1)
  • Output delta_Oy(1-y)(d-y)
  • Hidden delta_Hsigmod(0.1a0.1b0.11)1-sigmod
    (0.1a0.1b0.11)0.1delta_O

38
  • W_cd0.1etadelta_Osigmod(0.1a0.1b0.11)
  • W_ac0.1etadelta_Ha
  • W_bc0.1etadelta_Hb

39
Q(lecture 7)
  • Customers responses to a market survey is a
    follows. Attribute are age, which takes the value
    of young (Y), middle age (M) and old age (O)
    income which can be low (L) and high (H) owner
    of a credit card can be yes (Y) and (N). Design a
    Naïve Bayesian classifier to decide if customer
    David will response or not.

40
  • Davids response

41
  • P(reponseY) P(AgeMreponseY)
  • P(IncomeLreponseY)p(creditYreponseY)
  • 0.40.20.30.70.0168
  • P(reponseN) P(AgeMreponseN)
  • P(IncomeLreponseN)p(creditYreponseN)
  • 0.60.330.420.420.0349
  • ? Davids response should be NO!
Write a Comment
User Comments (0)
About PowerShow.com