Bivariate Regression and Correlation - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Bivariate Regression and Correlation

Description:

Covariance is not a good measure of the 'magnitude' of the relationship. ... Measure the dependent variable and the independent variables at time t, retake ... – PowerPoint PPT presentation

Number of Views:240

Avg rating:3.0/5.0

Slides: 20

Provided by: homeUc

Category:

more less

Transcript and Presenter's Notes

Title: Bivariate Regression and Correlation

1
Bivariate Regression and Correlation

Lecture 5

2
Analytic Tool

Answer the following
Suppose that you have a sample of 64 individuals,
the sample mean is 20, the sample standard
deviation is 16.
Can you reject the null hypothesis that the
sample mean is less than or equal to zero at the
.05 significance level?
Can you reject the null hypothesis that the
sample mean is less than or equal to 18 at the
.05 significance level?

3
Answer to the Analytic Tool

Suppose that you have a sample of 64 individuals,
the sample mean is 20, the sample standard
deviation is 16.
The critical value of the t-statistic at the .05
significance level with 63 degrees of freedom is
about 1.6.
To determine whether the sample mean is
significantly different from zero, we calculate
t (20 0) / (16 / 8) 20 / 2 10
Since t gt 1.6, we can reject the null
hypothesis.
To determine whether the sample mean is
significantly different from 18, we calculate
t (20 18) / (16 / 8) 2 / 2 1
Since t lt 1, we cannot reject the null
hypothesis.

4
Agenda

Today we will begin to learn how to investigate
the relationship between two continuous
variables.
You will learn
1) how to graphically present the relationship
between two variables
2) how to measure the correlation between two
variables

5
Review

Thus far, we have learned how to conduct three
general types of hypothesis tests.
1) Hypothesis tests concerning the sample mean
of a continuous random variable.
e.g. Is the mean income in the U.S. greater
than 40,000?
2) Hypothesis tests concerning the difference in
the means of two samples of a continuous random
variable.
e.g. Is the mean income for men greater than
the mean income for women?
3) Hypothesis tests concerning the independence
of two categorical variables.
e.g. Are race and vote choice independent?

6
Introduction

Suppose that you have two continuous variables
measured for the same observation over a number
of samples.
You would like to whether these two samples are
related.
How would you proceed based on what youve been
taught so far?

7
Possible Methods (based on what weve covered so
far)

Option 1. Split one of the variables at some
critical value (say the median). Then test for a
difference in the means of the samples for the
second variable above and below the critical
value.
e.g. Test whether cities with large percent
increases in govt expenditures had higher or
lower percent changes in unemployment than cities
with small percent changes in govt expenditures.
Option 2. Divide both categories into smaller
sets (say quartiles or quintiles). Then perform a
chi-squared test to see if those categories are
independent.

8
Graphical Representation of the Data

One way to analyze relationships between two
continuous variables is with a scatterplot.
A scatterplot is a type of diagram that displays
the covariation of two continuous variables as a
set of points on a Cartesian coordinate system.

9
Interpretation of the Scatterplot
A Positive Relationship between the two variables
occurs when an increase in the variable
represented on the x-axis corresponds to an
increase in the variable represented on the
y-axis.
A Negative Relationship between the two variables
exists when an increase in the variable
represented on the x-axis corresponds to a
decrease in the variable represented on the
y-axis.
10
Interpretation of the Scatterplot
A curvilinear relationship exists if the effect
of a change in the variable on the x-axis has a
different effect on the variable represented
along the y-axis, depending on the value of x (or
y).
No relationship exists if a change in the
variable represented along the x-axis does not
correspond to a change in the variable along the
y-axis
11
Group Analytic Tool Group

How would you device a statistic to determine
whether there was a positive or negative
relationship?

12
Covariance

Covariance is a statistical measure of the
relationship between two samples of two
variables.
Cov( X , Y ) ? ( Yi Mean(Y) ) ( Xi
Mean(X) )
N 1
If your relationship is positive, then the
covariance will be positive large values of X
will be associated with large values of Y and
small values of X will be associated with small
values of Y.
If your relationship is negative, then the
covariance will be negative large values of X
will be associated with small values of Y and
small values of X will be associated with large
values of Y.
If there is no relationship, then the covariance
is zero large values of X will be associated
with both large and small values of Y and small
values of X will be associated with both large
and small values of Y.

13
Computing the Covariance

Note that the equation for covariance can be
defined multiple ways
The intuitive expression
Cov( X , Y ) ?( Yi Mean(Y) ) ( Xi
Mean(X) )
N 1
Is equivalent to
Cov( X , Y ) N ?(Xi Yi) (?Xi )(?Yi)
(N 1)N
The second expression may be easier to use for
calculations in Excel.
Note It is useful to calculate results yourself
rather than with Excels canned function
because (at least with my version), Excel assumes
that it is estimating the covariance of two
populations and uses n as a denominator rather
than n-1.

14
Comments on Covariance

Covariance is a very good indicator of the
direction of the relationship between two
variables.
Covariance is not a good measure of the
magnitude of the relationship. This is because
covariance is sensitive to the scale of the
variables under investigation.
Note you can see this if you simply multiply
both variables by a constant and compare the
covariances.
So, it is not proper to compare the covariances
from two different data sets to see if the
relationship is stronger in one case than the
other.
How would you improve covariance to make your
findings less sensitive to scale?

15
Correlation

Correlation is a statistical measure of
association closely related to covariance.
The correlation coefficient, denoted RXY or just
R, is defined as
RXY ?( Yi Mean(Y) ) ( Xi Mean(X) )
?( Xi Mean(X) )2 ?( Yi Mean(Y) )2
Covariance (X , Y ) Standard
Deviation(X) Standard Deviation (Y)
RXY by definition can only take values between -1
and 1.
The larger the absolute value of RXY stronger the
relationship between X and Y. If RXY 1, then X
and Y are positively related and X is a perfect
predictor of Y.
If If RXY -1, then X and Yare negatively
related and X is a perfect predictor of Y.
If RXY 0, then X and Y are unrelated.

16
Correlation cont.

RXY by definition can only take values between -1
and 1.
The larger the absolute value of RXY stronger the
relationship between X and Y. If RXY 1, then X
and Y are positively related and X is a perfect
predictor of Y (and Y is a perfect predictor of
X).
If RXY -1, then X and Yare negatively related
and X is a perfect predictor of Y (and Y is a
perfect predictor of X).
If RXY 0, then X and Y are unrelated.Overview
Give the big picture of the subject
Explain how all the individual topics fit together

17
Comments on Correlation

The correlation coefficient provides a very
useful summary of the relationship between X and
Y.
But, it takes real effort to use a knowledge of
the correlation coefficient and the value of X
(or Y) to make prediction about the value of Y
(X).
Additionally, correlation does not imply
causation.
e.g. In our original example, do changes in govt
expenditures cause changes in employment or do
changes in unemployment cause changes in govt
expenditures?

18
Getting at Causation

When we do statistical analyses, we generally
have to make assumptions about what constitutes a
cause and what constitutes an effect.
That is, we make a formal statement about our
hypothesized relationship like
Yi f(stuff), where Y is the dependent variable
and stuff is the set of independent variables.
If we are clever, we can estimate the effect of
stuff (and thats what we will be talking about
for the next few weeks) to test whether it has a
statistically significant influence on Y.
If we are really clever, can we test for
causality as well? How?

19
Getting at Causation cont.

In order for a variable to be a cause, it is
necessary (but not sufficient) for the variable
to occur prior to the effect.
Possible Research Designs to Examine Causality.
- For a dependent variable that doesnt change
much
Measure a stable set of individual-level
characteristics (e.g. race, gender, parents value
for the dependent variable), then examine which
stable characteristics explain variation in your
sample.
- For a dependent variable that does change
Measure the dependent variable and the
independent variables at time t, retake the
measurements for the same sample at time t1,
then examine whether changes (stability) in the
independent variables led to changes (stability)
in the dependent variable. (Note ideally, youd
show that changes in X occurred before changes in
Y)
- For a dependent variable that does change
Cohort Analysis