Title: Final review - statistics Spring 03
1 Final review - statistics Spring 03
- Also, see final review - research design
2Statistics
Descriptive Statistics
Statistics to summarize and describe the data we
collected
Inferential Statistics
Statistics to make inferences from samples to the
populations
3Descriptive Statistics
A summary of your data
Center / Central Tendencies Indicates a central
value for the variable Measures of Dispersion
(Variability / Spread) Indicate how much each
participants score vary from each other Measures
of Association Indicates how much variables go
together (Shown in Tables, Graphs,
Distributions)
4Measures of Center
- Mode
- A value with the highest frequency
- The most common value
- Median
- The middle score
- Mean
- Average
5WHY are LEVELS / SCALE of MEASUREMENT IMPORTANT?
- Because you need to match the statistic you use
to the kind of variable you have
6Measures of Central Tendency, Center
Nominal
Ordinal
Interval/Ratio
Mean
7Summary
Meaningful Zero
Equal Interval
Ratio
Interval
Info of difference among values
Order
Ordinal
Difference
Nominal
Level of Measurement
8Why Equal Distance Matters?
- If the distance between values are equal (as in
interval or ratio data), you are able to
calculate (add, subtract, multiply, divide) values
- You can get a mean only for interval/ratio
variables - A wider variety of statistical tests are
available for interval/ratio variables
94 5 6 7 8 9 10
What are the Mean, Median, and Mode for this
distribution?
What is this distribution shape called?
10Types of Measures of Dispersion Variability /
Spread
- Frequencies / Percentages
- Range
- The distance between the highest score and the
lowest score (highest lowest) - Standard deviation /
- Variance
11 Variance / Standard Deviation
- Variance (S-squared) An approximate average of
the squared deviations from the mean - Standard Deviation(S or SD) Square root of
variance - The larger the variance/ SD is, the higher
variability the data has or larger variation in
scores, or distributions that vary widely from
the mean.
12(No Transcript)
13Measures of Dispersion
Nominal
Ordinal
Interval/Ratio
StandardDeviatn, Variance
14CORRELATION
- Co-relation
- 2 variables tend to go together
- Indicates how strongly and
- in which direction two variables are correlated
with each other - Correlation does NOT EQUAL cause
15SIGN
- 0 No systematic relationship
16Correlation Co-efficient
Negative
Positive
1
-1
0
17SIZE
- Ranges from 1 to 1
- 0 or close to 0 indicates NO relationship
- /- .2 - .4 weak
- /- .4 - .6 moderate
- /- .6 - .8 strong
- /- .8 - .9 very strong
- /- 1.00 perfect
- Negative relationships are NOT weaker!
18Significance Test
- Correlation co-efficient also comes with
significance test (p-value) - p.05 .05 probability of no correlation in the
population 5 risk of TYPE I Error 95
confidence level - If plt.05, reject H0 and support Ha at 95
confidence level
19Inferential Statistics
- Infer characteristics of a population from the
characteristics of the samples. - Hypothesis Testing
- Statistical Significance
- The Decision Matrix
20Inferential Statistics
Sample Statistics X SD n
Population Parameters m s N
21Inferential Statistics
- assess -- are the sample statistics indicators
of the population parameters? - Differences between 2 groups -- happened by
chance? - What effect do random sampling errors have on our
results?
22Random sampling error
- ?Random sampling error
- Difference between the sample characteristics
and the population characteristics caused by
chance
- Sampling bias
- Difference between the sample characteristics
and the population characteristics - caused by biased (non-random) sampling
23Probability
- Probability (p) ranges between 1 and 0
- p 1 means that the event would occur in every
trial - p 0 means the event would never occur in any
trial - The closer the probability is to 1, the more
likely that the event will occur - The closer the probability is to 0, the less
likely the event will occur
24P gt .05 means that
- Means of two groups fall in 95 central area of
normal distribution with one population mean
Mean 1
Mean 2
95
25P lt .05 means that
- Means of two groups do NOT fall in 95 central
area of normal distribution of one population
mean, so it is more reasonable to assume that
they belong to different populations
?1
?2
26Null Hypothesis
- Says IV has no influence on DV
- There is no difference between the two variables.
- There is no relationship between the two
variables.
27Null Hypothesis
- States there is NO true difference between the
groups - If sample statistics show any difference, it is
due to random sampling error - Referred as H0
- (Research Hypothesis Ha)
- If you can reject H0, you can support Ha
- If you fail to reject H0, you reject Ha
28- Be conservative.
- What are chances I would get these results if
null hypothesis is true? - Only if pattern is highly unlikely (p ? .05) do
you reject null hypothesis and support your
hypothesis - Since cannot be 100 sure your conclusion is
correct, you take up to 5 risk. - Your p-value tells you the risk /the
- probability of making TYPE I Error
29True state
Wrong person to marry
Right person to marry
Type II error
You think its the wrong person to marry
Type I error
You think - right person to marry
30True state
Fire
No fire
Type II error
No Alarm
Type I error
Alarm
31True State
Fire
Ho (no fire)
Ha
You decide...
Accept Ho (no alarm)
Type II error
Type I error
Reject Ho
Ho null hypothesis there is NO fire
Alarm
Ha alternative hyp. there IS a FIRE
32Easy ways to LOSE points
- Use the word prove
- Better to say support the hypothesis or
- consistent with the hypothesis
- Tentative statements acknowledge possibility of
making a Type 1 or Type 2 error - Use the word random incorrectly
33Significance Test
- Significance test examines the probability of
TYPE I error (falsely rejecting H0) - Significance test examines how probable it is
that the observed difference is caused by random
sampling error - Reject the null hypothesis if probability is lt.05
(probability of TYPE I error - is smaller than .05)
34Principle Logic
P lt .05
Reject Null Hypothesis (H0)
Support Your Hypothesis (Ha)
35Logic of Hypothesis Testing
Statistical tests used in hypothesis testing deal
with the probability of a particular event
occurring by chance.
Is the result common or a rare occurrence if
only chance is operating?
A score (or result of a statistical test) is
Significant if score is unlikely to occur on
basis of chance alone.
36Level of Significance
The Level of Significance is a cutoff point for
determining significantly rare or unusual scores.
Scores outside the middle 95 of a distribution
are considered Rare when we adopt the standard
5 Level of Significance This level of
significance can be written as p .05
37Decision Rules
- Reject Ho (accept Ha) when
- the sample statistic is statistically significant
at the - chosen p level, otherwise accept Ho (reject Ha).
- Possible errors
- You reject the Null Hypothesis when in fact it is
true, - a Type I Error, or Error of Rashness.
- You accept the Null Hypothesis when in fact it is
false, - a Type II Error, or Error of Caution.
38True state
Your decision
?
39Parametric Tests
To compare two groups on Mean Scores use t-test.
For more than 2 groups use Analysis of Variance
(ANOVA)
Nonparametric Tests
Cant get a mean from nominal or ordinal data.
Chi Square tests the difference in Frequency
Distributions of two or more groups.
40Parametric Tests
- Used with data w/ mean score or standard
deviation. - t-test, ANOVA and Pearsons Correlation r.
-
- Use a t-test to compare mean differences between
two groups (e.g., male/female and
married/single). -
41Parametric Tests
- use ANalysis Of VAriance (ANOVA) to compare more
than two groups (such as age and family income)
to get probability scores for the overall group
differences. - Use a Post Hoc Tests to identify which subgroups
differ significantly from each other.
42When comparing two groups on MEAN SCORES use the
t-test.
43T-test
- If plt.05, we conclude that two groups are drawn
from populations with different distribution
(reject H0) at 95 confidence level
44Our Research Hypothesis hair length leads to
different perceptions of a person. The Null
Hypothesis there will be no difference between
the pictures.
When comparing two groups on MEAN SCORES use the
t-test.
45I think she is one of those people who quickly
earns respect.
Short Hair Mean 2.2 SD
1.9 n 100
p .03
Accept Ha
Mean scores come from different distributions.
Long Hair Mean 4.1 SD
1.8 n 100
Accept Ho
Mean scores reflect just chance differences from
a single distribution.
46In my opinion, she is a mature person.
Short Hair Mean 1.6 SD
1.7 n 100
p .01
Accept Ha
Mean scores come from different distributions.
Long Hair Mean 3.6 SD
1.2 n 100
Accept Ho
Mean scores reflect just chance differences from
a single distribution.
47I think we are quite similar to one another.
Short Hair Mean 3.7 SD
1.8 n 100
Accept Ha
Mean scores come from different distributions.
p .89
Long Hair Mean 3.9 SD
1.5 n 100
Accept Ho
Mean scores are just chance differences from a
single distribution.
48A nonsignificant result may be caused by a
- A. low sample size.
- B. very cautious significance level.
- C. weak manipulation of independent variables.
- D. true null hypothesis.
49When to use various statistics
- Parametric
- Interval or ratio data
- Non-parametric
- Ordinal and nominal data
50Chi-Square X2
- Chi Square tests the difference in frequency
distributions of two or more groups. - Test of Significance
- of two nominal variables or
- of a nominal variable an ordinal variable
- Used with a cross tabulation table