Title: Correlation
1Correlation
- K300 Class 26
- April 16, 2009
2Overview
- Types of analyses
- Correlation and regression
- Scattergrams
- Correlation
- Correlation and causation
3Types of analyses
- One-way analysis of variance (ANOVA)
- Whether mean (of quantitative, interval or ratio
variable) varied with category (nominal or
ordinal variable) - Chi-square test of independence
- Whether distribution across one set of categories
dependent upon distribution across another set of
categories (two nominal or ordinal variables)
4Types of analyses
- So we have quantitative with categorical
variables (ANOVA) - And categorical with categorical variables
(chi-square test for independence) - What about quantitative with quantitative
variables? - Thats correlation and regression
5Correlation and regression
- Correlation addresses whether one variable varies
as another varies - Regression addresses ability of one variable to
predict another variable - All part of the relationship of one quantitative
variable to another quantitative variable
6Scattergrams
- Start with plots of value of one variable versus
value of other variable - See if there is a pattern
- Does one variable tend to increase (or decrease)
as the other variable increases?
7Scattergrams
- Looking at large metropolitan areas
- Is the median rent people pay for rental housing
- related to the median family income in the
metropolitan area?
8Rent versus income
9Correlation
- Measure of the extent to which one variable
varies with the other in a linear (straight-line)
relationship
10Simple correlation example
- Is there relationship between number of radio ads
aired per week and amount of sales (in thousands)?
11Scattergram
12Correlation coefficient r
13Calculating correlation coefficient
14Calculating correlation coefficient
15Hypothesis test about the correlation coefficient
- Is there a correlation, a relationship, in the
population? - Or is there no correlation, no relationship, in
the population? (null hypothesis)
16Step 1 state hypotheses
- H0 ? 0 (population correlation 0)
- H1 ? ltgt 0 (population correlation ltgt 0)
17Step 2 critical value
- a 0.05
- d.f. n 2 6 2 4
- C.V. t 2.776
18Step 3 test value
19Step 4 make decision
- t value for test value greater than critical
value - Reject null hypothesis
20Step 5 summarize the results
- There is a significant linear relationship
between sales and number of radio ads aired per
week
21Alternative hypothesis test for correlation
- Since t test statistic for correlation
coefficient depends only on correlation
coefficient r and number of cases n, which
determines the number of degrees of freedom - Can develop table for critical values of the
correlation coefficient r itselfno need to
compute a separate test statistics - Use Table I, Critical Values for PPMC (Pearson
Product-Moment Correlation Coefficient)
22Step 1 state hypotheses
- H0 ? 0 (population correlation 0)
- H1 ? ltgt 0 (population correlation ltgt 0)
23Step 2 critical value
- a 0.05
- d.f. n 2 6 2 4
- C.V. r 0.811
24Step 3 test value
25Step 4 make decision
- r value of 0.988 greater than critical value of
0.811 - Reject null hypothesis
26Step 5 summarize the results
- There is a significant linear relationship
between sales and number of radio ads aired per
week
27Some other issues
- Direction of relationship
- Assumption of linear relationship
- Assumption of bivariate normal distribution for
doing hypothesis test
28Direction of relationship
- Increase of one variable with increase of another
gives positive correlation - 0 lt r lt 1
- Decrease of one variable with increase of another
gives negative correlation - -1 lt r lt 0
29Assumption of linear relationship
- Correlation assumes relationship between one
variable and the other is linear - Straight line on scatterplot
- Proportional change in one variable always equal
to proportional change in other - Perfect nonlinear relationships could produce
zero correlation
30Assumption of bivariate normal distribution
- Doing the hypothesis test for the correlation
coefficient requires assumption of bivariate
normal distribution - Both variables normally distributed
- Combined distribution (in three dimensions) is
also normal (normal distribution hat) - Hypothesis test is robust with respect to the
assumption, however - Means that it will still generally give
reasonable results even when assumption is
violated
31Correlation and causation
- Correlation does not necessarily imply causation
- Because one variable increases does not
necessarily mean that this causes other variable
to increase
32Correlation and causation
- Correlation between two variables can be caused
by relationships to third variable - Number of births in towns in England may be
correlated to number of storks, but