Title: Quantitative Data Analysis
1Quantitative Data Analysis
- Descriptive statistics
- Bivariate relationships
2Bivariate statistics
- Describe the relationship between two variables
- e.g. water pollution and health
- education and health promoting behaviour
- gender and liberalism
- The statistical analysis of two variables tells
us about the relationship between these variables
3Covariance and independence
- Two important concepts for understanding
bivariate statistical relationships - Covariance means that two things go together
they are associated with each other. - changes in one variable are reflected in changes
in the other variable (they covary). - higher incomes higher life expectancy
- higher education higher job prestige
4- Independence means that the variables are not
related there is no association - opposite of covariation
- The two variables are independent/unrelated.
- changes in one variable are not associated with
changes in the other variable. - number of siblings does not influence life
expectancy
5Causal statements/hypotheses
- In inferential statistics (discussed later),
independence and covariance are reflected in the
null hypothesis and the hypothesis. - The null hypothesis the vars are not related
- independence
- A hypothesis there is a causal relationship
- covariance
- you predict the variables are related - covary
6Bivariate techniques
- Scattergram
- a graph of the relationship
- cross-tabulation or contingency table
- display the relationship on a table
- measures of association
- statistical measures (the amount of
covariation is expressed in terms of a value) - also called a correlation coefficient
7Bivariate Contingency Table Cross-tabulation
- A bivariate table cross-tabulates or
cross-classifies two (or more) variables - used for categorical or grouped data (re to
condense interval or ratio level data) - Called a contingency table the distribution of
cases in one category of a variable are
contingent upon the categories of the second
variable
8Percentage Table (see Table 11.9)
- Contingency tables report (or count) the number
of cases in each cell reports raw data - Researchers convert raw counts tables into
percentage tables. Why? - Constructing Cross-tabulation tables (See Rules
on p.371) - - run percentages toward the independent
variable
9Bivariate Tables Comparing Means
- Cross-tabulation is used when variables are
categorical (nominal/ordinal) or when interval or
ratio level data are grouped. - When the independent variable is nominal and the
dependent variable is interval/ratio, we compare
the means for the two (or more) categories (See
table 11.19, p. 376)
10Correlation scattergram (see p.383)
- researcher plots all the cases on a graph, with
each axis representing one variable - used for interval/ratio level data
- not used for nominal or ordinal data
- place the independent variable on the horizontal
(or X) axis and the dependent variable on the
vertical (or Y) axis - lowest value in lower left corner highest values
at the top (for Y) and far right (for X)
111. Form of the relationship independence,
linear, and curvilinear
- Independence no relationship
- Cases form a random scatter no pattern
- Linear relationship
- Cases are located around an imagery straight
line from one corner to another - Curvilinear relationship
- Cases form a U curve, and inverted U curve, or
and S curve
122. Direction of the relationship positive or
negative
- Positive relationship the higher the value of
the X variable, the higher the value of the Y
variable ( visa versa). Your examples? - higher education higher income
- lower education lower income
- the cases form a diagonal pattern (line) from the
lower left hand corner to the upper right
132. Direction of the relationship positive or
negative
- Negative relationship the higher the value on X,
the lower the value on Y Examples? - higher education, lower of arrests
- greater social integration, lower depression
- the cases form a diagonal pattern (line) from the
top left-hand side of the graph to the lower
right-hand side of the graph - can have a shallow or steep slope
143. The degree of precision
- Precision refers to spread of points on the
graph the amount of spread - High precision cases hug the line (not spread)
- Low precision considerable dispersion (spread)
of cases around the line - Scattergram researchers eyeball precision
- Can also use advanced statistics to measure the
degree of precision of a relationship
15Measures of Association
- A measure of association is a statistical
computation which produces a single value or
number that indicates the strength of the
relationship between two variables. - It indicates the degree to which the two
variables go together or covary - Is there an association between the variables?
- Are they correlated?
- Is there a strong or weak relationship?
16Measures of Association
- There are many measures of association (lambda,
gamma, tau, chi-squared, rho) - the correct choice depends on the level of
measurement of the variables - interpretation depends upon the measure used
17Five most commonly used measures of association
- Measure Type of Data High Assn Independence
- Lambda Nominal 1.0 0
- Gamma Ordinal 1.0, -1.0 0
- Tau Ordinal 1.0, -1.0 0
- Rho Interval, ratio 1.0, -1.0 0
- Chi-squared Nominal, ordinal Infinity 0
18Summary of Measures of Association
- Rho (Pearsons r) is the most commonly used
measure of association - tells how well the data fit the (regression)
line on a scattergram (-1 to 1, 0 independence) - rho measures linear relationships only 0 can
mean no relationship or curvilinear relationship - Chi-squared is used as descriptive statistic
(measure of association) and as an inferential
statistic
19Interpreting measures of associationproportion
reduction in error
- How much does information about one variable
reduce the errors that are made when guessing the
values of the other variable? - Independence
- The measure of association 0
- knowing about one variable will not help you to
guess the value of the other variable
20Interpreting measures of associationproportion
reduction in error
- Correlation/Association
- knowing about one variable reduces the error in
predicting the values of the other variable - strong association few errors in predicting the
second variable on the basis of the first - weak association proportion of errors is larger
21Understanding Rho
- 0 the variables are not correlated at all
- 1 perfect positive correlation
- -1 perfect negative correlation
- Interpret the following
- The correlation between womens full-time
salaries and mens full-time salaries is .70 - height and intelligence is .02