Title: P1246990939QubTa
1Properties of Estimators
2OLS review
Ordinary Least Squares minimizes the squared
errors from the slope. The standard error is the
average deviation from the slope
3Residuals
6
5
Slope
4
Political Tolerance
Mean
3
2
1
0
6
5
4
3
2
1
0
Education
4Residuals review
- Residuals of OLS analysis (errors of the slope)
have a mean of zero - This is true by definition they have been
computed by their minimization. - We also assume that they are distributed
normally.
5Regression results review
--------------------------------------------------
---------------------------- happy
Coef. Std. Err. t Pt
Beta ----------------------------------------
-------------------------------------
prestg80 -.0380391 .0209348 -1.82 0.103
-.518061 _cons
3.330371 .8050567 4.14 0.003
. --------------------------------------
----------------------------------------
6Residuals are variables
For each observation, they represent the squared
distance from the slope.
7Residuals and OLS
- Therefore they are distributed along a standard
normal distribution, mean of zero. - The standard deviation is not necessarily 1, but
it is assumed to be constant across all values of
x. - Foreshadowing if this assumption does not hold,
you are not advised to use OLS.
8What is the question that we ask in scientific
analysis?
- Are we wrong about our theory?
- Or how likely is it that we are wrong about our
theory? - Is there a non-zero relationship?
- How much better than the mean have we done in
predicting the dependent variable from the
independent variable?
9The Null Hypothesis
- The null hypothesis is that the relationship is
zero, that the slope is zero, that we are doing
no better than the mean. - We are trying to reject the null hypothesis.
10Confidence in point estimates
- We have a point estimate of y for each value of
x - The set of predicted values is a variable
- Predicted values comprise a slope, but the
values of the slope are only true for our sample - We do not know anything about the population.
11Error in estimation
- So, we know that there is error in our estimate.
We put bounds around that estimate. - So, to reject the null hypothesis, neither the
upper nor lower bound of our estimate is likely
to contain zero.
12Strange question to ask
- How likely is it that the true value from the
population is zero? (not different from the mean
of y) - How likely is it that the true value of the slope
is NOT zero?
13A Caveat
- Standardization review
- Z scores
- Normal distribution
- Standard normal distribution
14Standardized variable review
- Z scores are linear transformations of variables
- Z score (x) (x-mean of x) /standard deviation
of x
15Z scores
- Z scores always have
- a mean of zero
- a standard deviation of 1
-
16Histogram of Happiness
17Creating a zscore
- sum happy
- Variable Obs Mean Std. Dev.
Min Max - -------------------------------------------------
-------------------- - happy 11 1.909091 .700649
1 3 - generate zhappy happy - 1.9/.7
- . sum zhappy happy
- Variable Obs Mean Std. Dev.
Min Max - -------------------------------------------------
-------------------- - zhappy 11 -.8051948 .7006491
-1.714286 .2857143 - happy 11 1.909091 .700649
1 3
18Frequency of zhappy
- . tab zhappy
- zhappy Freq. Percent Cum.
- -----------------------------------------------
- -1.285714 3 27.27 27.27
- .1428571 6 54.55 81.82
- 1.571429 2 18.18 100.00
- -----------------------------------------------
- Total 11 100.00
19Histogram of z score of happiness
20Descriptives Syntax
sum vars happy zhappy.
21What is the correlation between zhappy and happy?
22(No Transcript)
23Normal distribution review
24- Approximately 68 percent of the area under a
standard normal curve lies between the values of
the mean and the standard deviation and the
mean.
25- Approximately 95 of the area lies between 2
standard deviations and the mean.
26- Approximately 99.7 lies between 3 standard
deviations and the mean.
27Standard normal distribution
28Attributes of standard normal
- Mean is zero
- Standard deviation is 1
- 67 of the area lies between -1 and 1
- 95 of the area lies between -2 and 2
- 99 of the area lies between -3 and 3
2995 confidence interval
- Generally, we want to be at least 95 confident
that our estimate does not include zero. - So, to be 95 confident, then the slope must be
two standard deviations from the mean of the
standard normal curve, which is zero.
30Review Central limit theorem
- The central limit theorem is based on a theory of
repeated samples - A 95 confidence interval means that if this
process of estimation occurred in 100 samples
from the same population, 5 times out of a
hundred, this estimate would be zero.
31We are trying to reject the hypothesis that the
relationship is zero
- So, we are more confident as we believe that the
slope is not zero. - We know that the area under the normal curve at 2
standard deviations away from zero (the mean) is
2.5 of the area of the curve (approximately). - We also know that 2 standard deviations away from
the mean in the other direction is 2.5 of the
area of the curve.
32T statistic
If the slope falls out of the range of 2 standard
deviations from 0 then we can say that we are 95
confident that the relationship is not zero.
33Formula for t
- T slope/standard error
- If the t is at least 2, then it is two standard
deviations from the mean of the curve which is
zero (why is it 0?), then we are 95 confident
that the relationship is NOT zero - Significance is a linear transformation of the t
statistic based on the theory of the normal
curve. - Also known as probability values (p).
34How confident are we?
- If the slope falls within two standard
deviations from zero, then we have a difficult
time saying that we are confident. - Since we can say with precision what the
probability is that the relationship from the
population would be zero if we repeated samples,
then we estimate how confident we are.
35T 1
- Approximately 68 percent of the area under a
normal curve lies between the values of the mean
and the standard deviation and the mean. - If t 1, then we are 68 confident.
- That is not very confident.
36T 3
Approximately 99.7 lies between 3 standard
deviations and the mean. If t 3, then the
theory (from which theorem?) is that if we
repeated samples, 99.7 of the time, the sample
slope would not be zero.
37One tailed versus two tailed test
95
2.5
2.5
You can use theory to rule out one of the areas
covering 2.5. If you know the slope should be
positive, then you can cross out the 2.5 on the
left. Then you are 97.5 confident that the
relationship is not zero.
38One tailed versus two tailed test
95
2.5
2.5
You can use theory to rule out one of the areas
covering 2.5. If you know the slope should be
negative, then you can cross out the 2.5 on the
left. Then you are 97.5 confident that the
relationship is not zero.
39Defining the meaning of 95 confidence
If a certain interval is a 95 confidence
interval, then we can say that if we repeated the
procedure of drawing random samples and computing
confidence intervals over and over again, 95 of
those confidence intervals include the true value
from the population. This is not to say that we
are 95 confident that the true value lies
between the upper and lower bound.
40Defining the meaning of 95 confidence
- Instead, I am 95 confident that a confidence
interval covers the true value from the
population, based not on this single confidence
interval from this single test, - but rather
- as a result of what would happen were I to repeat
the process of drawing samples and doing this
test over and over again.
41Happiness and occupational prestige
. regr happy prestg80 Source SS
df MS Number of obs
11 -------------------------------------------
F( 1, 9) 3.30 Model
1.31753739 1 1.31753739 Prob F
0.1026 Residual 3.59155351 9
.399061502 R-squared
0.2684 ------------------------------------------
- Adj R-squared 0.1871 Total
4.90909091 10 .490909091 Root
MSE .63171 ------------------------------
------------------------------------------------
happy Coef. Std. Err. t
Pt 95 Conf. Interval ------------------
--------------------------------------------------
--------- prestg80 -.0380391 .0209348
-1.82 0.103 -.085397 .0093187
_cons 3.330371 .8050567 4.14 0.003
1.509207 5.151536 ----------------------------
--------------------------------------------------
42Effect of Index of Signals on the Number of Cases
on the U.S Supreme Court Agenda, 1953-1995
8
7
6
4.62
5
3.85
Upper bound of the 95 confidence
interval Estimate Lower bound of the 95
confidence interval
4
3
2.11
2
1.27
1.19
1.34
1
0
-1
1
2
3
4
5
6
-2
Lag Year
43The Effect of Supreme Court Signals on Amicus
Briefs at Courts of Appeals
44Upper bound 95 Confidence Interval
Point Estimate - slope
Lower bound 95 Confidence Interval
45Upper bound 95 Confidence Interval
Point Estimate - slope
Lower bound 95 Confidence Interval