Quantitative Data Analysis - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Quantitative Data Analysis

Description:

Bivariate analysis gives an indication of how the dependent variable differs ... Although informative, bivariate analysis can mislead the researcher regarding ... – PowerPoint PPT presentation

Number of Views:3181
Avg rating:3.0/5.0
Slides: 40
Provided by: mica80
Category:

less

Transcript and Presenter's Notes

Title: Quantitative Data Analysis


1
Quantitative Data Analysis
Edouard Manet In the Conservatory, 1879
2
  • Quantification of Data
  • Introduction
  • To conduct quantitative analysis, responses to
    open-ended questions in survey research and the
    raw data collected using qualitative methods must
    be coded numerically.

3
  • Quantification of Data
  • Introduction (Continued)
  • Most responses to survey research questions
    already are recorded in numerical format.
  • In mailed and face-to-face surveys, responses are
    keypunched into a data file.
  • In telephone and internet surveys, responses are
    automatically recorded in numerical format.

4
  • Quantification of Data
  • Developing Code Categories
  • Coding qualitative data can use an existing
    scheme or one developed by examining the data.
  • Coding qualitative data into numerical categories
    sometimes can be a straightforward process.
  • Coding occupation, for example, can rely upon
    numerical categories defined by the Bureau of the
    Census.

5
  • Quantification of Data
  • Developing Code Categories (Continued)
  • Coding most forms of qualitative data, however,
    requires much effort.
  • This coding typically requires using an iterative
    procedure of trial and error.
  • Consider, for example, coding responses to the
    question, What is the biggest problem in
    attending college today.
  • The researcher must develop a set of codes that
    are
  • exhaustive of the full range of responses.
  • mutually exclusive (mostly) of one another.

6
  • Quantification of Data
  • Developing Code Categories (Continued)
  • In coding responses to the question, What is the
    biggest problem in attending college today, the
    researcher might begin, for example, with a list
    of 5 categories, then realize that 8 would be
    better, then realize that it would be better to
    combine categories 1 and 5 into a single category
    and use a total of 7 categories.
  • Each time the researcher makes a change in the
    coding scheme, it is necessary to restart the
    coding process to code all responses using the
    same scheme.

7
  • Quantification of Data
  • Developing Code Categories (Continued)
  • Suppose one wanted to code more complex
    qualitative data (e.g., videotape of an
    interaction between husband and wife) into
    numerical categories.
  • How does one code the many statements, facial
    expressions, and body language inherent in such
    an interaction?
  • One can realize from this example that coding
    schemes can become highly complex.

8
  • Quantification of Data
  • Developing Code Categories (Continued)
  • Complex coding schemes can take many attempts to
    develop.
  • Once developed, they undergo continuing
    evaluation.
  • Major revisions, however, are unlikely.
  • Rather, new coders are required to learn the
    existing coding scheme and undergo continuing
    evaluation for their ability to correctly apply
    the scheme.

9
  • Quantification of Data
  • Codebook Construction
  • The end product of developing a coding scheme is
    the codebook.
  • This document describes in detail the procedures
    for transforming qualitative data into numerical
    responses.
  • The codebook should include notes that describe
    the process used to create codes, detailed
    descriptions of codes, and guidelines to use when
    uncertainty exists about how to code responses.

10
  • Quantification of Data
  • Data Entry
  • Data recorded in numerical format can be entered
    by keypunching or the use of sophisticated
    optical scanners.
  • Typically, responses to internet and telephone
    surveys are entered directly into a numerical
    data base.
  • Cleaning Data
  • Logical errors in responses must be reconciled.
  • Errors of entry must be corrected.

11
  • Univariate Analysis
  • Distributions
  • Data analysis begins by examining distributions.
  • One might begin, for example, by examining the
    distribution of responses to a question about
    formal education, where responses are recorded
    within six categories.
  • A frequency distribution will show the number and
    percent of responses in each category of a
    variable.

12
  • Univariate Analysis
  • Central Tendency
  • A common measure of central tendency is the
    average, or mean, of the responses.
  • The median is the value of the middle case when
    all responses are rank-ordered.
  • The mode is the most common response.
  • When data are highly skewed, meaning heavily
    balanced toward one end of the distribution, the
    median or mode might better represent the most
    common or centered response.

13
  • Univariate Analysis
  • Central Tendency (Continued)
  • Consider this distribution of respondent ages
  • 18, 19, 19, 19, 20, 20, 21, 22, 85
  • The mean equals 27. But this number does not
    adequately represent the common respondent
    because the one person who is 85 skews the
    distribution toward the high end.
  • The median equals 20.
  • This measure of central tendency gives a more
    accurate portrayal of the middle of the
    distribution.

14
  • Univariate Analysis
  • Dispersion
  • Dispersion refers to the way the values are
    distributed around some central value, typically
    the mean.
  • The range is the distance separating the lowest
    and highest values (e.g., the range of the ages
    listed previously equals 18-85).
  • The standard deviation is an index of the amount
    of variability in a set of data.

15
  • Univariate Analysis
  • Dispersion (Continued)
  • The standard deviation represents dispersion with
    respect to the normal (bell-shaped) curve.
  • Assuming a set of numbers is normally
    distributed, then each standard deviation equals
    a certain distance from the mean.
  • Each standard deviation (1, 2, etc.) is the
    same distance from each other on the bell-shaped
    curve, but represents a declining percentage of
    responses because of the shape of the curve (see
    Chapter 7).

16
  • Univariate Analysis
  • Dispersion (Continued)
  • For example, the first standard deviation
    accounts for 34.1 of the values below and above
    the mean.
  • The figure 34.1 is derived from probability
    theory and the shape of the curve.
  • Thus, approximately 68 of all responses fall
    within one standard deviation of the mean.
  • The second standard deviation accounts for the
    next 13.6 of the responses from the mean (27.2
    of all responses), and so on.

17
  • Univariate Analysis
  • Dispersion (Continued)
  • If the responses are distributed approximately
    normal and the range of responses is lowmeaning
    that most responses fall close to the meanthen
    the standard deviation will be small.
  • The standard deviation of professional golfers
    scores on a golf course will be low.
  • The standard deviation of amateur golfers scores
    on a golf course will be high.

18
  • Univariate Analysis
  • Continuous and Discrete Variables
  • Continuous variables have responses that form a
    steady progression (e.g., age, income).
  • Discrete (i.e., categorical) variables have
    responses that are considered to be separate from
    one another (i.e., sex of respondent, religious
    affiliation).

19
  • Univariate Analysis
  • Continuous and Discrete Variables
  • Sometimes, it is a matter of debate within the
    community of scholars about whether a measured
    variable is continuous or discrete.
  • This issue is important because the statistical
    procedures appropriate for continuous-level data
    are more powerful, easier to use, and easier to
    interpret than those for discrete-level data,
    especially as related to the measurement of the
    dependent variable.

20
  • Univariate Analysis
  • Continuous and Discrete Variables (Continued)
  • Example Suppose one measures amount of formal
    education within five categories less than hs,
    hs, 2-years vocational/college, college,
    post-college).
  • Is this measure continuous (i.e., 1-5) or
    discrete?
  • In practice, five categories seems to be a cutoff
    point for considering a variable as continuous.
  • Using a seven-point response scale will give the
    researcher a greater chance of deeming a variable
    to be continuous.

21
  • Subgroup Comparisons
  • Collapsing Response Categories
  • Sometimes the researcher might want to analyze a
    variable by using fewer response categories than
    were used to measure it.
  • In these instances, the researcher might want to
    collapse one or more categories into a single
    category.
  • The researcher might want to collapse categories
    to simplify the presentation of the results or
    because few observations exist within some
    categories.

22
  • Subgroup Comparisons
  • Collapsing Response Categories Example
  • Response Frequency
  • Strongly disagree 2
  • Disagree 22
  • Neither agree nor disagree 45
  • Agree 31
  • Strongly Agree 1

23
  • Subgroup Comparisons
  • Collapsing Response Categories Example
  • One might want to collapse the extreme responses
    and work with just three categories
  • Response Frequency
  • Disagree 24
  • Neither agree nor disagree 45
  • Agree 32

24
  • Subgroup Comparisons
  • Handling Dont Knows
  • When asking about knowledge of factual
    information (Does your teenager drink alcohol?)
    or opinions on a topic the subject might not know
    much about (Do school officials do enough to
    discourage teenagers from drinking alcohol?), it
    is wise to include a dont know category as a
    possible response.
  • Analyzing dont know responses, however, can be
    a difficult task.

25
  • Subgroup Comparisons
  • Handling Dont Knows (Continued)
  • The research-on-research literature regarding
    this issue is complex and without clear-cut
    guidelines for decision-making.
  • The decisions about whether to use dont know
    response categories and how to code and analyze
    them tends to be idiosyncratic to the research
    and the researcher.

26
  • Bivariate Analysis
  • Introduction
  • Bivariate analysis refers to an examination of
    the relationship between two variables.
  • We might ask these questions about the
    relationship between two variables
  • Do they seem to vary in relation to one another?
    That is, as one variable increases in size does
    the other variable increase or decrease in size?
  • What is the strength of the relationship between
    the variables?

27
  • Bivariate Analysis
  • Bivariate Tables
  • Divide the cases into groups according to the
    attributes of the independent variable (e.g., men
    and women).
  • Describe each subgroup in terms of attributes of
    the dependent variable (e.g., what percent of men
    approve of sexual equality and what percent of
    women approve of sexual equality).

28
  • Bivariate Analysis
  • Bivariate Tables (Continued)
  • Read the table by comparing the independent
    variable subgroups with one another in terms of a
    given attribute of the dependent variable (e.g.,
    compare the percentages of men and women who
    approve of sexual equality).
  • Bivariate analysis gives an indication of how the
    dependent variable differs across levels or
    categories of an independent variable.
  • This relationship does not necessarily indicate
    causality (see Chapter 15).

29
  • Bivariate Analysis
  • Contingency Tables
  • Tables that compare responses to a dependent
    variable across levels/categories of an
    independent variable are called contingency
    tables (or sometimes, crosstabs).
  • When writing a research report, it is common
    practice, even when conducting highly
    sophisticated statistical analysis, to present
    contingency tables also to give readers a sense
    of the distributions and bivariate relationships
    among variables.

30
  • Bivariate Analysis
  • Contingency Tables (Continued)
  • A table should have a title that succinctly
    describes what is contained in the table.
  • If a table lists information about a scale or
    index, then it or a prior table should list the
    statements used to measure the scale or index.
  • The attributes of each variable should be clearly
    indicated.
  • The base of percentages should be reported.
  • Notes should be provided about missing data.

31
  • Multivariate Analysis
  • Introduction
  • Although informative, bivariate analysis can
    mislead the researcher regarding cause and
    effect.
  • Multivariate analysis (see Ch. 15-16) often is
    needed to gain a better understanding of cause
    and effect among variables.
  • Multivariate analysis can involve the
    introduction of a third variable into a
    contingency table, or it can involve more
    sophisticated analysis and presentation of
    relationships among variables.

32
  • Multivariate Techniques
  • Factor Analysis
  • Factor analysis indicates the extent to which a
    set of variables measures the same underlying
    concept.
  • This procedure assesses the extent to which
    variables are highly correlated with one another
    compared with other sets of variables.
  • Consider the table of correlations (i.e., a
    correlation matrix) on the following slide

33
Multivariate Techniques Factor Analysis
(Continued) X1 X2 X3 X4 X5 X6 X1 1 .52 .60 .21
.15 .09 X2 .52 1 .59 .12 .13 .11 X3 .60 .59 1 .08
.10 .10 X4 .21 .12 .08 1 .72 .70 X5 .15 .13 .10 .7
2 .68 .73 X6 .09 .11 .10 .70 .73 1
34
  • Multivariate Techniques
  • Factor Analysis (Continued)
  • Note that variables X1-X3 are moderately
    correlated with one another, but have weak
    correlations with variables X4-X6.
  • Similarly, variables X4-X6 are moderately
    correlated with one another, but have weak
    correlations with variables X1-X3.
  • The figures in this table indicate that variables
    X1-X3 go together and variables X4-X6 go
    together.

35
  • Multivariate Techniques
  • Factor Analysis (Continued)
  • Factor analysis would separate variables X1-X3
    into Factor 1 and variables X4-X6 into Factor
    2.
  • Suppose variables X1-X3 were designed by the
    researcher to measure self-esteem and variables
    X4-X6 were designed to measure marital
    satisfaction.

36
  • Multivariate Techniques
  • Factor Analysis (Continued)
  • The researcher could use the results of factor
    analysis, including the statistics produced by
    it, to evaluate the construct validity of using
    X1-X3 to measure self-esteem and using X4-X6 to
    measure marital satisfaction.
  • Thus, factor analysis can be a useful tool for
    confirming the validity of measures of latent
    variables.

37
  • Multivariate Techniques
  • Factor Analysis (Continued)
  • Factor analysis can be used also for exploring
    groupings of variables.
  • Suppose a researcher has a list of 20 statements
    that measure different opinions about same-sex
    marriage.
  • The researcher might wonder if the 20 opinions
    might reflect a fewer number of basic opinions.

38
  • Multivariate Techniques
  • Factor Analysis (Continued)
  • Factor analysis of responses to these statements
    might indicate, for example, that they can be
    reduced into three latent variables, related to
    religious beliefs, beliefs about civil rights,
    and beliefs about sexuality.
  • Then, the researcher can create scales of the
    grouped variables to measure religious beliefs,
    civil beliefs, and beliefs about sexuality to
    examine support for same-sex marriage.

39
Questions?
Write a Comment
User Comments (0)
About PowerShow.com