Multivariate Statistical Analysis: Analyzing Criterionpredictor Association - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Multivariate Statistical Analysis: Analyzing Criterionpredictor Association

Description:

The type of analysis to be performed depends on several things. The ... Probit ... Probit: Response is assumed to be normally distributed. Logit: Logistic ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 20
Provided by: brianj80
Category:

less

Transcript and Presenter's Notes

Title: Multivariate Statistical Analysis: Analyzing Criterionpredictor Association


1
Multivariate Statistical AnalysisAnalyzing
Criterion-predictor Association
  • Chapter 18

2
Knowing Which Techniques to Use
  • The type of analysis to be performed depends on
    several things
  • The number of variables
  • The existence of dependencies
  • Number of dependencies
  • Measurement of data (scalingnominal, interval)

3
Covariation
  • In many marketing research studies, the goal is
    to explain variation in one set of variables
    (i.e., sales) with matching variation in other
    sets of variables (advertising)
  • Identifying criterion and predictor variables
    (dependent and independent variables)
  • Creating a function or equation that estimates
    the criterion from predictor variables
  • Establishing confidence in the relationship
    between variables

4
Multiple Regression
  • Same as simple regressions, but with more
    predictive terms
  • In this new case, ß1 is a partial regression
    coefficient with respect to x1
  • Because x1 is usually correlated to x2, the
    question remains how to tell whether multiple
    regression is better than simple regression

5
Regression Output Measures of Interest
  • Regression equation
  • R2both sample-determined and population adjusted
  • F-test for the overall equation
  • Individual t-tests and standard error for testing
    each partial regression coefficient
  • Partial correlation coeffecients
  • Accounted-for variance explained by each
    predictor (overlap goes to preceding predictor)

6
Contribution to Explained Variance
  • Example
  • Bivariate regression
  • Y 0.491 0.886 x1
  • Same data, multiple regression
  • Y 0.247 0.493 x1 0.484 x2
  • Independent variables may be correlated, so
    looking at R2 values is critical
  • R2 of the first equation is 0.723, meaning 72 of
    the variation in Y is explained by variation in
    x1
  • R2 of the second equation is 0.783, meaning that
    adding the x2 to the equation only explains 6
    more of the variation

7
Multicollinearity
  • Problem in regressions when predictor variables
    are highly correlated
  • Distorts coefficients, making it tough to tell
    which predictor variables are most relevant
  • No standard as to how much collinearity is too
    much
  • 3 solutions for multicollinearity
  • Ignore it
  • Delete one or more correlated predictors
  • As a rule of thumb, some researchers toss one or
    more correlated variables when the correlation is
    .9 or more
  • Find predictor variable combinations that are
    uncorrelated

8
Cross-validation
  • Randomly split cases into groups
  • Compute separate regressions for each group
  • Use equations to predict Y-values in other groups
  • Check coefficients to check for
    agreementalgebraic sign and magnitude
  • Compute a regression for the entire sample, using
    only the variables that were similar between
    groups

9
Stepwise Regression
  • Run a regression with many predictor variables
  • Reviewing results and removing the least
    important variables
  • Keep re-running the regression and removing
    variables until a solid model emerges
  • Can also start with a single variable and adding
    more, keeping those that seem promising

10
Downsides of Stepwise Regression
  • Several supposedly optimal equations emerge,
    depending on the variables started with and the
    method (eliminating or adding variables)
  • Multicollinearity clouds which variables might
    make the most predictive model

11
Discriminant Analysis
  • Discriminant analysis is used when the predictor
    variables are interval-scaled data but the
    criterion is categorical
  • Example A B2B company has used its database to
    segment its customers into groups that prefer
    ordering over the internet, through a catalog, or
    from a salesperson.With discriminant analysis,
    the company can use data on new customers (annual
    sales, number of employees) to predict which
    segment they might fall into.

12
Two Types of Discriminant Analysis
  • Discriminant predictive (or explanatory) analysis
  • Used to optimize predictive functions
  • Discriminant classification analysis
  • Like other grouping techniques, this analysis
    seeks to minimize differences within a single
    group, and then maximize differences between
    several groups

13
Chi-square Automatic Interaction Detection (CHAID)
  • CHAID analysis is used to find dependencies
    between variables
  • Typically used in data mining
  • Useful when
  • No desire to make assumptions about relationships
  • Many potential explanatory variables
  • Large number of observations

14
CHAID Process
  • Computer takes a large data set and tests
    different ways data can be split up, selecting
    the best available split
  • Groups can be further split
  • Because there are so many splits, CHAID requires
    very large samples

15
CHAID Example
  • A bank is planning a direct-mail campaign to sign
    up new customers for a credit card.
  • CHAID analysis can be run on a customer database
    to find variables that will lead to a greater
    response.
  • CHAID analysis might show that households earning
    50,000 per year or more are more likely to sign
    up. Within that group women are more likely to
    respond, and women with household size of 5 or
    more are more likely to respond.
  • Purchasing a mailing list from a credit reporting
    agency for women in large households where
    reported income is over 50,000 results in a
    greater response rate over past campaigns4 for
    a targeted list compared to 1 for a broad
    mailingat reduced cost.

16
Canonical Correlation
  • Generalization of multiple correlation to two or
    more criterion variables
  • Answers the question
  • How well does one group of variables predict
    another?

17
Correspondence Analysis
  • Specialized set of canonical correlation
    techniques
  • Useful for studying brands and attributes
  • Includes a visual display of results to help
    interpret data

18
Probit and Logit
  • Deals with the same type of problem as a
    regression, but is designed for nominal- or
    ordinal-scaled data
  • Probit Response is assumed to be normally
    distributed
  • Logit Logistic distribution is assumed

19
Path Analysis / Causal Modeling
  • Hybrid of factor analysis and simultaneous
    equation regression
  • The goal is measuring links between observed
    measures and unobservable constructs
  • These advanced techniques require the right
    software and expertise on the part of the
    researcher
Write a Comment
User Comments (0)
About PowerShow.com