A1258565574URiuF - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

A1258565574URiuF

Description:

Sophisticated multivariate statistical methods are becoming standard practice in ... The accelerated use of advanced multivariate ... Multinomial formulations ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 31
Provided by: michaelm99
Category:

less

Transcript and Presenter's Notes

Title: A1258565574URiuF


1
Multivariate Data Analysis
Overview of Methods
2
Motivation
  • Sophisticated multivariate statistical methods
    are becoming standard practice in the physical,
    natural and social sciences, as well as in
    business
  • Variations of existing methods are being
    developed, existing techniques are being applied
    to new applications, and new methods continue to
    be designed

3
Motivation
  • The accelerated use of advanced multivariate
    techniques is being driven by
  • Growing complexity in the topics being addressed
  • Ever-larger data sets
  • Ability to apply computationally intensive
    methods through powerful computer tools
  • Academic training

4
Overview of Multivariate Data Analysis Methods
5
Vocabulary Data types
6
Vocabulary Variable Types
  • Response vs explanatory
  • Response or dependent variable
  • Variable to be modeled or predicted
  • Explanatory or independent variable
  • Variables used to predict or model dependent
    variable
  • Importance of identifying data and variable types
  • Critical in determining analysis objectives and
    appropriate analysis method
  • Avoid inappropriate variable operations

7
Classification of Methods
  • Dependence techniques
  • One or a set of variables are regarded as
    dependent variables
  • Objective is to predict or explain the value of
    the dependent variable(s) based on the values of
    a set of independent variables
  • Examples
  • What is the probability that a loan applicant
    will default?
  • What factors best differentiate people whose
    primary news source is the Internet?

8
Classification of Methods
  • Dependence techniques
  • Multiple regression
  • Logistic regression
  • Discriminant analysis
  • Canonical correlation
  • Structural equation modeling
  • Analysis of variance
  • Decision trees

9
Classification of Methods
  • Interdependence techniques
  • No single group of variables defined as
    dependent or independent
  • Objective is to identify and characterize
    underlying structure between the variables
  • Examples
  • What are the underlying factors that define a
    customers perception of a brand?
  • Which signal returns arise from the same object
    and how many objects are present?

10
Classification of Methods
  • Interdependence techniques
  • Factor analysis
  • Multidimensional scaling
  • Correspondence analysis
  • Cluster analysis

11
Classification of Methods
  • Interdependence techniques are valuable data
    reduction methods
  • Data reduction attempts to manage and interpret
    the large amounts of data gathered
  • One goal is combine groups of cases measured over
    multiple variables into a relatively small number
    of understandable segments
  • Or to group variables together into categories of
    latent traits and then characterize cases with
    respect to this smaller number of traits
  • The reduced data variables are then often used as
    variables in dependence techniques

12
Multiple Regression
  • Multiple regression is a dependence technique
    used to model the relationship between the value
    of a single metric dependent variable and a set
    of metric independent variables
  • Categorical variables can be included as dummy
    variables
  • Model can be applied to predict changes in the
    dependent variables response to changes in the
    independent variables
  • Regression also indicates the relative importance
    of independent variables on the response of the
    dependent variable

13
Multiple Regression
  • For example, a client may be interested in
    understanding the effect of price and promotional
    activity on a products market share among both
    loyal and not loyal customers
  • Technical result is a linear model of the form
  • Y a0 a1X1 a2X2 anXn
  • Best visualizations of the results control all
    but one (or two) of the independent variables and
    examine how the value of dependent variable
    changes with respect to the free independent
    variables

14
Multiple Regression
Market share for loyal customers
Market share for not-loyal customers
15
Multiple Regression
  • Properties
  • Single interval scale dependent variable
  • Multiple independent variables, preferably on
    interval scale
  • Familiar and useful technique
  • Issues
  • Assumes linear relationship between dependent and
    independent variables
  • Overused and often assumptions not fully checked
  • Often misapplied to classification problems

16
Logistic Regression
  • Logistic Regression is a dependence techniques
    used to model the relationship between a single
    categorical dependent variable and a set of
    metric independent variables
  • Typically dependent variable takes one of two
    values success/failure, buy/do not buy
  • Multinomial formulations
  • A logistic model gives the probability that the
    dependent variable takes a target value given the
    values of the independent variable

17
Logistic Regression
  • For example, which credit and demographic factors
    best predict whether a customer will keep a loan
    current
  • Dependent variable taken as 60 days past due or
    worse
  • Independent variables are credit and employment
    history, and demographic descriptors

18
Logistic Regression
  • Properties
  • Powerful technique for predicting group
    membership and identifying important independent
    variables
  • Becoming more widely used
  • Procedures and results similar to linear
    regression
  • Issues
  • Adequate data
  • Model validation
  • Communicating probabilistic concepts

19
Decision Trees
  • Decision trees are a dependence technique used to
    develop a model to classify the value of a
    single dependent variable based on a set of
    independent variables
  • Dependent and independent variables can be any
    data type
  • The typical product of CART is a straightforward,
    easily interpretable set of segmentation rules
  • For example, classify existing customers as high
    or low likelihood buyers of a new product based
    on demographics and historical purchasing
    behavior. Classification could be used to focus
    advertising campaign

20
Decision Trees
  • Decision trees can be also used to examine
    profiles of different market segments with
    respect to underlying demographic and
    psychographic variables
  • For example, what are the most significant
    demographic variables determining whether the
    Internet is a persons most important information
    source?

21
Decision Trees
22
Decision Trees
  • Properties
  • Single dependent variable of any scale
  • Multiple independent variables of any scale
  • Free of model assumptions typical in other
    dependence techniques
  • Powerful statistical learning algorithm able to
    identify complex variable interactions
  • Issues
  • Not as familiar
  • Standard inferential statistics not applicable
  • Often leads to asymmetric relationships

23
Factor Analysis
  • Factor analysis is an interdependence technique
    used to identify a set of underlying latent
    traits (factors) that explain the correlations
    between a large number of variables
  • Data summarizing
  • Derive a set of underlying concepts that
    summarize a larger set of variables
  • Data reduction
  • Develop a set of factor variables that serves as
    a more parsimonious description of the data

24
Factor Analysis
  • Interested in defining underlying dimensions
    influencing the perception of online destinations
  • Survey respondents are asked to rate a set of
    destinations (including clients) with respect to
    a number of traits
  • Factor analysis can be applied to develop a
    succinct set of perception dimensions
  • This manageable set of dimensions can be used to
    characterize a clients site and to develop a
    focused plan to reposition it

25
Factor Analysis
  • Factors can then be used to provide visual
    summary of data

Competence Sophistication Trustworthy Exciting
26
Factor Analysis
  • Properties
  • Very useful in identifying structure and
    relationships in data
  • Provides tractable set of concepts for both
    managerial and analytical uses
  • Provides opportunities for visualizations
  • Issues
  • Questionnaire design
  • Variable selection
  • Factor interpretation and validity

27
Cluster Analysis
  • Cluster analysis is an interdependence technique
    used to segment cases into homogeneous groups
    based on a specified set of variables
  • Data reduction
  • Develop a more parsimonious description of cases
    which can then be used in analytical
    classification methods
  • Identify similarities between cases with respect
    to clustering variables
  • Characterize clusters with respect to other sets
    of variables

28
Cluster Analysis
  • Want to identify and then characterize similar
    groups of TV pilot shows based on survey
    responses rating shows on various traits
  • For one or two traits it may be possible to do
    this subjectively. Cluster analysis provides an
    objective method for multiple
    traits
  • Clusters can be characterized with respect to
    variables not used in the analysis, such as show
    success, and cluster membership can be used as a
    dependent variable in classification method

29
Cluster Analysis
Cluster 1 Low likelihood of success Cluster 2
Moderate likelihood of success Cluster 3 High
likelihood of success
30
Cluster Analysis
  • Properties
  • Many cluster techniques are available for data of
    all scales
  • Can identify structure in large data sets that
    may be difficult to discover in any other way
  • Provides objective segmentation method
  • Issues
  • Selecting appropriate clustering method
  • Determining appropriate number of clusters
  • Validating clusters
Write a Comment
User Comments (0)
About PowerShow.com