Title: Bivariate Relationships
1Bivariate Relationships
- Standardization and Confidence
2Standardized variable review
- Z scores are linear transformations of variables
- Z score (x) (x-mean of x) /standard deviation
of x
3Z scores
- Z scores always have
- a mean of zero
- a standard deviation of 1
-
4Histogram of Happiness
60
50
40
Percent
30
20
10
0
Not Too Happy
Pretty Happy
Very Happy
1
2
3
General Happiness
5Histogram of z score of happiness
60
50
40
Percent
30
20
10
0
Not Too Happy
Pretty Happy
Very Happy
-1.3
.12
1.6
General Happiness
6Descriptives Syntax
desc vars happy zhappy.
7Descriptives of Happiness and z score of happiness
8What is the correlation between zhappy and happy?
9Normal distribution review
10- Approximately 68 percent of the area under a
normal curve lies between the values of the mean
and the standard deviation and the mean.
11- Approximately 95 of the area lies between 2
standard deviations and the mean.
12- Approximately 99.7 lies between 3 standard
deviations and the mean.
13What is the mean of a z score?
14What is the null hypothesis?
- What are we trying to reject in a null hypothesis?
15Not Zero
- So, we are more confident as we believe that the
slope is not zero. - We know that the area under the normal curve at 2
standard deviations away from zero (the mean) is
2.5 of the area of the curve (approximately). - We also know that 2 standard deviations away from
the mean in the other direction is 2.5 of the
area of the curve.
16T statistic
95
2.5
2.5
0
mean
If the slope falls out of the range of two
standard deviations from 0 then we can say that
we are 95 confident that the relationship is not
zero.
17Formula for t
- T slope/standard error
- If the t is at least 2, then it is two standard
deviations from the mean of the curve which is
zero (why is it 0?), then we are 95 confident - Significance is a linear transformation of the t
statistic based on the theory of the normal
curve. - Also known as p values.
18How confident are we?
If the slope falls within two standard deviations
from zero, then we have a difficult time saying
that we are confident. Since we can say with
precision what the probability is that the
relationship from the population could be zero,
then we know how confident we are.
19- Approximately 68 percent of the area under a
normal curve lies between the values of the mean
and the standard deviation and the mean.
If t 1, then we are 68 confident. That is not
very confident.
20Approximately 99.7 lies between 3 standard
deviations and the mean. If the t 3, then
we are 99.7 confident.
21One tailed versus two tailed test
95
2.5
2.5
You can use theory to rule out one of the areas
covering 2.5. If you know the slope should be
positive, then you can cross out the 2.5 on the
left. Then you are 97.5 confident that the
relationship is not zero.
22One tailed versus two tailed test
95
2.5
2.5
You can use theory to rule out one of the areas
covering 2.5. If you know the slope should be
negative, then you can cross out the 2.5 on the
left. Then you are 97.5 confident that the
relationship is not zero.
23Covariation
- When it tends to be the case that x is greater
than the mean when y is greater than the mean AND
x is lower than the mean when y is lower than the
mean, then there is a positive covariation
24Plot showing positive covariance
Mean urban
Mean female literacy
25Expected value
- But we may want to know more specific knowledge
than that we may want to know the expected
value of y for each increased value of x - I may know the mean of everyones height in class
- But if I know gender, then I can generate two
expected values - If you remember, we are always trying to do
better than the mean
26Regression analysisimportant to know
substantive effect
- For every 10K dollars given in humanitarian aid,
there is an increase in 3K spent on weapons - Different from every 10K dollars given in
humanitarian aid, there is a .5K increase spent
on weapons - Different from every 10K dollars given in
humanitarian aid, there is a 8K increase spent on
weapons - Unit of analysis?
27Life Happiness and Occupational Prestige
28Life Happiness and Prestige
3
2.5
Happiness
2
1.5
R
S
q
L
i
n
e
a
r
0
.
2
6
8
1
20
30
40
50
R's Occupational Prestige Score (1980)
29Regression equation
- Y a bx e
- y a bx
- y is also known as yhat
- y is the dependent variable value
- yhat is the predicted value
- a is the intercept
30Regression Syntax
- Syntax is regr DV IV
- regr happy prest80, beta
- Reports beta coefficients same as Pearson r
(when there is only one independent variable) - regr happy prest80
- Reports confidence intervals instead of betas
31Regression results
Source SS df MS
Number of obs 11 -------------------------
------------------ F( 1, 9)
3.30 Model 1.31753739 1 1.31753739
Prob F 0.1026 Residual
3.59155351 9 .399061502 R-squared
0.2684 ------------------------------------
------- Adj R-squared 0.1871
Total 4.90909091 10 .490909091
Root MSE .63171 -------------------------
--------------------------------------------------
--- happy Coef. Std. Err. t
Pt Beta ----------------
--------------------------------------------------
----------- prestg80 -.0380391 .0209348
-1.82 0.103 -.518061
_cons 3.330371 .8050567 4.14 0.003
. ----------------------------
--------------------------------------------------
Why are we not that confident in our results? Why
is the beta so much larger than the coefficient
for the slope?
32Regression results
Source SS df MS
Number of obs 11 -------------------------
------------------ F( 1, 9)
3.30 Model 244.378788 1 244.378788
Prob F 0.1026 Residual
666.166667 9 74.0185185 R-squared
0.2684 ------------------------------------
------- Adj R-squared 0.1871
Total 910.545455 10 91.0545455
Root MSE 8.6034 -------------------------
--------------------------------------------------
--- prestg80 Coef. Std. Err. t
Pt Beta ----------------
--------------------------------------------------
----------- happy -7.055556 3.88302
-1.82 0.103 -.518061
_cons 50.83333 7.853795 6.47 0.000
. ----------------------------
--------------------------------------------------
Why is the coefficient so much bigger? What
happens to the confidence? Why is the beta the
same?
33(No Transcript)
34What happens to the confidence if we keep the
slope the same but double the n?
35What happens to the confidence if we keep the
doubled n but decrease the variance of
occupational prestige?
36Syntax
- if prestg80
- if prestg80 32 occpres 2.
- if prestg80 45 occpres 3.
- execute.
- add value labels occpres 1 "Not that Prestigious
- 2 "Pretty
Prestigious" - 3 "Very Prestigious".
37 . sum prestg80 Variable Obs
Mean Std. Dev. Min
Max ---------------------------------------------
------------------------ prestg80 11
37.36364 9.542251 22 51
38(No Transcript)
39(No Transcript)
40Regression results
Source SS df MS
Number of obs 11 -------------------------
------------------ F( 1, 9)
3.30 Model 244.378788 1 244.378788
Prob F 0.1026 Residual
666.166667 9 74.0185185 R-squared
0.2684 ------------------------------------
------- Adj R-squared 0.1871
Total 910.545455 10 91.0545455
Root MSE 8.6034 -------------------------
--------------------------------------------------
--- prestg80 Coef. Std. Err. t
Pt 95 Conf. Interval ----------------
--------------------------------------------------
----------- happy -7.055556 3.88302
-1.82 0.103 -15.83956 1.728447
_cons 50.83333 7.853795 6.47 0.000
33.06681 68.59985 ----------------------------
--------------------------------------------------
Why are we not that confident in our results?
41Regression results
Source SS df MS
Number of obs 11 -------------------------
------------------ F( 1, 9)
3.30 Model 244.378788 1 244.378788
Prob F 0.1026 Residual
666.166667 9 74.0185185 R-squared
0.2684 ------------------------------------
------- Adj R-squared 0.1871
Total 910.545455 10 91.0545455
Root MSE 8.6034 -------------------------
--------------------------------------------------
--- prestg80 Coef. Std. Err. t
Pt 95 Conf. Interval ----------------
--------------------------------------------------
----------- happy -7.055556 3.88302
-1.82 0.103 -15.83956 1.728447
_cons 50.83333 7.853795 6.47 0.000
33.06681 68.59985 ----------------------------
--------------------------------------------------
Why are we not that confident in our results?
42Regression results
Source SS df MS
Number of obs 11 -------------------------
------------------ F( 1, 9)
3.30 Model 244.378788 1 244.378788
Prob F 0.1026 Residual
666.166667 9 74.0185185 R-squared
0.2684 ------------------------------------
------- Adj R-squared 0.1871
Total 910.545455 10 91.0545455
Root MSE 8.6034 -------------------------
--------------------------------------------------
--- prestg80 Coef. Std. Err. t
Pt 95 Conf. Interval ----------------
--------------------------------------------------
----------- happy -7.055556 3.88302
-1.82 0.103 -15.83956 1.728447
_cons 50.83333 7.853795 6.47 0.000
33.06681 68.59985 ----------------------------
--------------------------------------------------
Why are we not that confident in our results?