Business Intelligence and Decision Modeling - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Business Intelligence and Decision Modeling

Description:

Week 9 Customer Profiling ... Profiling/Decision Tree SPSS Direct Marketing Customer Profiling SPSS Analysis ... CHAID Tutorial Impact of Variable Measurement (1) ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 19
Provided by: RichardM166
Category:

less

Transcript and Presenter's Notes

Title: Business Intelligence and Decision Modeling


1
Business Intelligence and Decision Modeling
  • Week 9
  • Customer Profiling Decision Trees (Part 2)
  • CHAID
  • CRT

2
Profiling/Decision Tree
  • SPSS Direct Marketing ? Customer Profiling
  • SPSS Analysis ? Classification ? Decision
    Tree
  • CHAID (Chi-Square Automatic Interactive
    Detection)
  • CART (Classification and Regression Tree)

3
Use of Decision Trees
  • Classify observations from a target binary or
    nominal variable ? Segmentation
  • Predictive response analysis from a target
    numerical variable ? Behaviour
  • Decision support rules ? Processing

4
Exampledmdata.sav
  • Underlying Theory
  • ? X2

5
CHAID AlgorithmSelecting Variables
  • Example
  • Regions (4), Gender (3, including Missing)Age
    (6, including Missing)
  • For each variable, collapse categories to
    maximize chi-square test of independence
    Ex Region (N, S, E, O,) ? (NEO, S)
  • Select most significant variable
  • Go to next branch and next level
  • Stop growing if estimated X2 lt theoretical X2

6
CART (Nominal Target)
  • Nominal Targets
  • GINI (Impurity Reduction or Entropy)
  • Squared probability of node membership
  • Gini0 when targets are perfectly classified.
  • Gini Index 1-?pi2
  • Example
  • Prob Bus 0.4, Car 0.3, Train 0.3
  • Gini 1 ?-(0.42 0.32 0.32) 0.660

7
CART (Metric Target)
  • Continuous Variables
  • Variance Reduction (F-test)

8
CHAID/CART Variables
  • CHAID
  • Dependent Variable Nominal
  • Independent Variables Nominal
  • Independent variables Continuous ?
    Discretecized
  • CART
  • Dependent Variable Continuous
  • Independent Variables Nominal/Continuous

9
Magic Bullet?
  • Simple to understand and interpret
  • Requires little data preparation
  • Able to handle both numerical and categorical data
  • Uses a white box model easily explained by
    Boolean logic.
  • Possible to validate a model using statistical
    tests
  • Robust

Wikipedia
10
CHAID Tutorial
  • ?http//publib.boulder.ibm.com/infocenter/spssstat
    /v20r0m0/index.jsp
  • Go to Case Studies
  • ? Decision Trees Options

11
Impact of Variable Measurement (1)
  • Use tree_textdata.sav
  • Frequency run on the two variables
  • Run Classification Tree (Chaid)
  • Examine tree output
  • Redefine variable measurements as nominal
    variables
  • Run Classification Tree (Chaid)
  • Examine tree output.

12
Impact of Variable Measurement (2)
  • Nominal Variable Categories
  • Value Label 1 yes 2 no
  • Run Chaid
  • Examine Output
  • Unnamed dependant variable is skipped

13
Decision Trees to Evaluate Credit Risk (1)
  • Use tree_credit.sav
  • Open Chaid
  • Enter all Variables
  • Select Bad Category
  • Criteria 400 / 200
  • Activate Tree and Plots Options
  • Save Terminal nodes and Predicted Values
  • Run Chaid
  • Examine Output

14
Decision Trees to Evaluate Credit Risk (2)
  • Refer to online tutorial for output explanations
    (i.e. Tree, Tables, Gains for Nodes, Gain Chart,
    Target Category, Risk Estimates, and added
    Variables in dataset

15
Decision Trees to Evaluate Credit Risk (3)
  • Refining the Model ? Options
  • Changing cost of misclassification
  • Cost of misclassifying a good client (1)
  • Cost of misclassifying a bad client (2)
  • Rerun Chaid
  • Examine Output. See change in node 9
    classification
  • Look at the change in the classification matrix
    and Risk factor
  • Why cant we use the Profits Tab?

16
Decision Trees to Evaluate Credit Risk (4)
  • Lets take a look at Validation Option
  • Cross validation
  • Split sample
  • Random Split
  • Predefined Split Variables

17
Building a Scoring Model (1)
  • Use tree_car.sav
  • Open Decision Tree ? CRT
  • Enter Price of Car as DV
  • Output ? Rules ? SPSS Syntax
  • Assign values to cases
  • Include surrogates
  • Run CRT
  • Double Click Tree ? Try icons on top
  • Examine Output

18
Building a Scoring Model (2)
  • Open tree_car_scoring.sav
  • Open the saved syntax file (.sps)
  • Highlight full syntax
  • Run syntax
  • Examine added Variable in file
  • Correlate actual and predicted car prices
Write a Comment
User Comments (0)
About PowerShow.com