Discriminant Analysis - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Discriminant Analysis

Description:

... or nonsubscriber, and we tally the number of correct classifications. ... Tallies. It is customary to tally the classifications in a classification matrix. ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 14
Provided by: lisa285
Category:

less

Transcript and Presenter's Notes

Title: Discriminant Analysis


1
Example 8.8
  • Discriminant Analysis

2
DISCRIM.XLS
  • This file contains the annual income and size of
    investment portfolio (both in thousands of
    dollars) for 84 people.
  • It also indicates whether each of these people
    subscribes or does not subscribe to the Wall
    Street Journal.
  • Using income and size of investment portfolio,
    determine a classification rule that maximizes
    the number of people correctly classified as
    subscribers or nonsubscribers.

3
Solution
  • The model is actually simpler for cluster
    analysis.
  • Using appropriate weights, we create a
    discriminant score for each of the 84 customers.
  • Then based on a cutoff score, we classify each
    customer as a subscriber or nonsubscriber, and we
    tally the number of correct classifications.

4
Developing the Model
  • The model appears on the next slide. It can be
    formed as follows.
  • Customer data. Enter the customer data in the
    shaded range. This includes the data on the
    variable used for classification (income and
    investment amount), as well as an indication of
    which group each customer is in. These 84
    customers represent the training sample, so we
    know which group each of them is in.
  • Decision variables. The decision variables are
    the weights used to form discriminant scores and
    the cutoff value for classification. Enter any
    values for these in the Weights and Cutoff ranges.

5
(No Transcript)
6
Developing the Model -- continued
  • Discriminant variables. Each discriminant score
    is a weighted combination of the persons income
    and investment amount. To calculate these in
    column E, enter the formula SUMPRODUCT(Weights,B1
    2C12) in cell E12 and copy it down.
  • Classifications. We will classify a person as a
    nonsubscriber if the persons discriminant score
    is below the cutoff value and as a subscriber
    otherwise. Therefore, enter the formula
    IF(E12ltCutoff,No,Yes) in cell F12 and copy
    it down.
  • Correct? Check whether each classification is
    correct by entering the formula
    IF(D12F12,Yes,No) in cell G12 and copying
    it down.

7
Developing the Model -- continued
  • Tallies. It is customary to tally the
    classifications in a classification matrix. Do
    this in the range B99C100 by entering the
    formulas COUNTIF(YesGroup,Yes) and
    COUNTA(YesGroup)-B99 in cells B99 and C99. Note
    that the YesGroup range contains the
    classifications for all subscribers. Then enter
    similar formulas in cells B100 and C100. These
    are based on the NoGroup range. Finally,
    calculate the objective, the percentage of all 84
    people classified correctly, in the PctCorrect
    cell with the formula SUM(B99,C100)/SUM(B99C100)
    .

8
Using the Evolutionary Solver
  • First, note that the Evolutionary Solver is
    required because of the IF (and COUNIF and
    COUNTA) functions used to make (and tally) the
    classifications.
  • The completed Solver dialog box appears here.

9
Using the Evolutionary Solver -- continued
  • It is straightforward except for the lower and
    upper limits on the changing cells.
  • There are no natural weights or cutoff values to
    use. However, we can always constrain the weights
    to be between 1 and 1.
  • To obtain lower and upper limits on the cutoff
    value, we first calculated the maximum sum of
    income and investment amount for any customer,
    which is slightly less than 160.

10
Using the Evolutionary Solver -- continued
  • This means that the largest discriminant score,
    using weights of 1, is no less than 160.
  • Therefore, there is no need to consider cutoff
    values below 160 or above 160.

11
Solution
  • The solution shown is certainly not unique.
  • There are many other sets of weights and cutoffs
    values that obtain 98.6 correct classification
    rate, and you will probably obtain a different
    solution from ours.
  • Note that only 6 of the 84 people are
    misclassified 4 subscribers are classified as
    nonsubscribers and 2 nonsubscribers are
    classified as subscribers.
  • Also, we see from the weights that the
    classification is based primarily on the
    investment amount, people with large investments
    amounts are classified as subscribers.

12
Solution -- continued
  • Therefore, a subscriber such as person 81 is
    misclassified because his investment amount is
    abnormally small relative to other subscribers.
  • On the other hand, a nonsubscriber is
    misclassified if his investment amount is
    abnormally large relative to other
    nonsubscribers.
  • In a real application, we would use this analysis
    for people other than the 84 in the training
    sample.

13
Solution -- continued
  • That is, we would calculate a discriminant score
    for each such person and classify each as a
    nonsubscriber if her discriminant score is less
    than the cutoff value.
  • However, the percentage correctly classified
    would typically be less maybe even considerably
    less than the 92.86 rate we obtained in the
    training sample.
Write a Comment
User Comments (0)
About PowerShow.com