Title: Statistics II: An Overview of Statistics
1Statistics II An Overview of Statistics
2Outline for Statistics II Lecture SPSS Syntax
Some examples. Normal Distribution Curve.
Sampling Distribution Hypothesis Testing
Type I and Type II Errors Linking Z to, alpha,
and hypothesis testing. Bivariate Measures of
Association Bivariate Regression/Correlation
3The Normal Distribution
4 The standard normal distribution a bell-shaped
symmetrical distribution having a mean of 0 and a
standard deviation of 1.
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14Z scores. A z score (or standard score) a
transformed score expressed as a deviation from
an expected value that has a standard deviation
as its unit of measurement. A standard score
belonging to the standard normal distribution.
Y-µ z s
15Sampling Distribution
16(No Transcript)
17(No Transcript)
18(No Transcript)
19That is, the spread of the sampling distribution
depends on the sample size n, and the spread of
the population distribution. As the sample size
n increases the standard error decreases. The
reason for this is that the denominator of the
ratio increases as n increases, whereas the
numerator is the population standard deviation,
which is a constant and is not dependent on the
value of n.
20Central Limit Theorem For random sampling, as
the sample size n grows, the sampling
distribution of approaches a normal
distribution. The approximate normality of the
sampling distribution applies no matter what the
shape of the population distribution.
21(No Transcript)
22(No Transcript)
23Hypothesis Testing
24Steps of a Statistical Significance Test. 1.
Assumptions Type of data, form of population,
method of sampling, sample size 2.
Hypotheses Null hypothesis, Ho (parameter value
for no effect) Alternative hypothesis, Ha
(alternative parameter values) 3. Test
statistic Compares point estimate to null
hypothesized parameter value 4.
P-value Weight of evidence about Ho smaller P
is more contradictory
25 5. Conclusion Report P-value Formal decision
26 Alpha or significance levels The a - level is
a number such that one reject if Ho if the
P-value is less than or equal to it. The a -
level is also called the significance level of
the test. The most common a - levels are .05 and
.01.
27 Type I and Type II Errors A Type I error
occurs when Ho is rejected, even though it is
true. A Type II error occurs when Ho is not
rejected, even though it is false.
28(No Transcript)
29(No Transcript)
30Bivariate Statistics
31PROPORTIONAL REDUCTION IN ERROR (PRE) all good
measures of association use a proportionate
reduction in error (PRE) approach. The PRE
family of statistics is based on comparing the
errors made in predicting the dependent variable
with knowledge of the independent variable, to
the errors made without information about the
independent variable. In other words, PRE
measures indicate how knowing the values of the
independent variable (first variable) increase
our ability to accurately predict the dependent
variable (second variable).
32 Error without Error with decision rule
- decision rule PRE statistic
_____________________________
Error without decision rule
33Another way of stating this is E1 - E2 PRE
value _____ E1 Where E1
number of errors made by the first prediction
method. E2 number of errors made by the
second prediction method.
34 PRE measures are more versatile and more
informative than are the chi-square-based
measures. All PRE measures are normed they
use a standardized scale where the value 0 means
there is no association and 1 means there is
perfect association. Any value between these
extremes indicates the relative degree of
association in a ratio comparison sense. E.g., a
PRE measure with a value of .50 represents an
association that is twice as strong as one that
has a PRE value of .25. The number of cases,
the table size, and the variables being measured
do not interfere with the interpretation that can
be given to them.
35Chi Square The Chi-square test examines whether
two nominal variables are associated. It is NOT
a PRE measure. The chi-square test is based on a
comparison between the frequencies that are
observed in the cells of a cross-classification
table and those that we would expect to observe
if the null hypothesis were true. The hypotheses
for the chi-square are Ho the variables are
statistically independent. Ha the variables are
statistically dependent.
36Goodman and Kruskas Gamma (G) A measure of
association for data grouped in ordered
categories. G is a PRE measure. G compares two
measures of a prediction 1st it randomly
predicts all untied scores to be either in
agreement or disagreement. 2nd it predicts all
untied pairs to be of the same type. Agreement
or disagreement is determined by the direction of
the bivariate distribution. For a positive
pattern we expect untied pairs to be in
agreement For a negative pattern we expect
untied pairs to be in disagreement.
37Pa we find the number of agreement pairs by
multiplying the frequency for each cell by the
sum of the frequencies from all cells that are
both to the right and below it. Pd is found by
multiplying the frequency for each cell in the
table by the sum of the frequencies from all
cells that are both to the left and below it.
38Bivariate Regression and Correlation
39BIVARIATE REGRESSION AND CORRELATION WHY AND
WHEN TO USE REGRESSION/CORRELATION? WHAT
DOES REGRESSION/CORRELATION MEAN?
40You should be able to interpret The least
squares equation. R2 and Adjusted R2 F and
significance. The unstandardized regression
coefficient. The standardized regression
coefficient. t and significance. The 95
confidence interval. A graph of the regression
line.
41ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NOR
MALITY OF VARIANCE IN Y FOR EACH VALUE OF X For
any fixed value of the independent variable X,
the distribution of the dependent variable Y is
normal. NORMALITY OF VARIANCE FOR THE ERROR
TERM The error term is normally distributed.
(Many authors argue that this is more important
than normality in the distribution of Y). THE
INDEPENDENT VARIABLE IS UNCORRELATED WITH THE
ERROR TERM
42ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION
(Continued) HOMOSCEDASTICITY It is assumed that
there is equal variances for Y, for each fixed
value of X. LINEARITY The relationship between
X and Y is linear. INDEPENDENCE The Ys are
statistically independent of each other.
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)