Title: Statistics
1Statistics
- Psych 231 Research Methods in Psychology
2Statistics
- 2 General kinds of Statistics
- Descriptive statistics
- Used to describe, simplify, organize data sets
- Describing distributions of scores
- Inferential statistics
- Used to test claims about the population, based
on data gathered from samples - Takes sampling error into account, are the
results above and beyond what youd expect by
random chance
3Statistics
- Why do we use them?
- Descriptive statistics
- Used to describe, simplify, organize data sets
- Describing distributions of scores
- Inferential statistics
- Used to test claims about the population, based
on data gathered from samples - Takes sampling error into account, are the
results above and beyond what youd expect by
random chance
4Inferential Statistics
- Purpose To make claims about populations based
on data collected from samples - Whats the big deal?
- Example Experiment
- Group A - gets treatment to improve memory
- Group B - gets no treatment (control)
- After treatment period test both groups for
memory - Results
- Group As average memory score is 80
- Group Bs is 76
- Is the 4 difference a real difference
(statistically significant) or is it just
sampling error?
5Testing Hypotheses
- Step 1 State your hypotheses
- Step 2 Set your decision criteria
- Step 3 Collect your data from your sample(s)
- Step 4 Compute your test statistics
- Step 5 Make a decision about your null
hypothesis - Reject H0
- Fail to reject H0
6Testing Hypotheses
- Step 1 State your hypotheses
- Null hypothesis (H0)
- Alternative hypothesis(ses)
- There are no differences (effects)
- Generally, not all groups are equal
- You arent out to prove the alternative
hypothesis (although it feels like this is what
you want to do) - If you reject the null hypothesis, then youre
left with support for the alternative(s) (NOT
proof!)
7Testing Hypotheses
- Step 1 State your hypotheses
- In our memory example experiment
- Null H0 mean of Group A mean of Group B
- Alternative HA mean of Group A ? mean of Group B
- (Or more precisely Group A gt Group B)
- It seems like our theory is that the treatment
should improve memory. - Thats the alternative hypothesis. Thats NOT
the one the well test with inferential
statistics. - Instead, we test the H0
8Testing Hypotheses
- Step 1 State your hypotheses
- Step 2 Set your decision criteria
- Your alpha level will be your guide for when to
- reject the null hypothesis
- fail to reject the null hypothesis
- This could be correct conclusion or the incorrect
conclusion - Two different ways to go wrong
- Type I error saying that there is a difference
when there really isnt one (probability of
making this error is alpha level) - Type II error saying that there is not a
difference when there really is one
9Error types
Real world (truth)
H0 is correct
H0 is wrong
Type I error
Reject H0
Experimenters Conclusions
Fail to Reject H0
Type II error
10Error types Courtroom analogy
Real world (truth)
Defendant is innocent
Defendant is guilty
Type I error
Find guilty
Jurys decision
Type II error
Find not guilty
11Error types
- Type I error concluding that there is an effect
(a difference between groups) when there really
isnt. - Sometimes called significance level
- We try to minimize this (keep it low)
- Pick a low level of alpha
- Psychology 0.05 and 0.01 most common
- Type II error concluding that there isnt an
effect, when there really is. - Related to the Statistical Power of a test
- How likely are you able to detect a difference if
it is there
12Testing Hypotheses
- Step 1 State your hypotheses
- Step 2 Set your decision criteria
- Step 3 Collect your data from your sample(s)
- Step 4 Compute your test statistics
- Descriptive statistics (means, standard
deviations, etc.) - Inferential statistics (t-tests, ANOVAs, etc.)
- Step 5 Make a decision about your null
hypothesis - Reject H0 statistically significant
differences - Fail to reject H0 not statistically significant
differences
- When you reject your null hypothesis
- Essentially this means that the observed
difference is above what youd expect by chance
13Generic statistical test
- Tests the question
- Are there differences between groups due to a
treatment?
Two possibilities in the real world
One population
Two sample distributions
14Generic statistical test
- Tests the question
- Are there differences between groups due to a
treatment?
Two possibilities in the real world
H0 is false (is a treatment effect)
H0 is true (no treatment effect)
Two populations
XB
76
80
76
80
People who get the treatment change, they form a
new population (the treatment population)
15Generic statistical test
- ER Random sampling error
- ID Individual differences (if between subjects
factor) - TR The effect of a treatment
- Why might the samples be different?
- (What is the source of the variability between
groups)?
16Generic statistical test
- ER Random sampling error
- ID Individual differences (if between subjects
factor) - TR The effect of a treatment
- The generic test statistic - is a ratio of
sources of variability
Observed difference
Computed test statistic
Difference from chance
17Statistical significance
- Statistically significant differences
- When you reject your null hypothesis
- Essentially this means that the observed
difference is above what youd expect by chance - Chance is determined by estimating how much
sampling error there is - Factors affecting chance
- Sample size
- Population variability
18Sampling error
x
n 1
19Sampling error
x
x
n 2
20Sampling error
- Generally, as the sample size increases, the
sampling error decreases
n 10
21Sampling error
- Typically the narrower the population
distribution, the narrower the range of possible
samples, and the smaller the chance
22Sampling error
- These two factors combine to impact the
distribution of sample means. - The distribution of sample means is a
distribution of all possible sample means of a
particular sample size that can be drawn from the
population
Samples of size n
23Generic statistical test
- The generic test statistic distribution
- To reject the H0, you want a computed test
statistics that is large - reflecting a large Treatment Effect (TR)
- Whats large enough? The alpha level gives us the
decision criterion
Distribution of the test statistic
Distribution of sample means
a-level determines where these boundaries go
24Generic statistical test
- The generic test statistic distribution
- To reject the H0, you want a computed test
statistics that is large - reflecting a large Treatment Effect (TR)
- Whats large enough? The alpha level gives us the
decision criterion
Distribution of the test statistic
Reject H0
Fail to reject H0
25Generic statistical test
- The generic test statistic distribution
- To reject the H0, you want a computed test
statistics that is large - reflecting a large Treatment Effect (TR)
- Whats large enough? The alpha level gives us the
decision criterion
Distribution of the test statistic
Reject H0
One tailed test sometimes you know to expect a
particular difference (e.g., improve memory
performance)
Fail to reject H0
26Significance
- A statistically significant difference means
- the researcher is concluding that there is a
difference above and beyond chance - with the probability of making a type I error at
5 (assuming an alpha level 0.05) - Note statistical significance is not the same
thing as theoretical significance. - Only means that there is a statistical difference
- Doesnt mean that it is an important difference
27Non-Significance
- Failing to reject the null hypothesis
- Generally, not interested in accepting the null
hypothesis (remember we cant prove things only
disprove them) - Usually check to see if you made a Type II error
(failed to detect a difference that is really
there) - Check the statistical power of your test
- Sample size is too small
- Effects that youre looking for are really small
- Check your controls, maybe too much variability
28Summary
- Example Experiment
- Group A - gets treatment to improve memory
- Group B - gets no treatment (control)
- After treatment period test both groups for
memory - Results
- Group As average memory score is 80
- Group Bs is 76
H0 µA µB
H0 there is no difference between Grp A and Grp B
- Is the 4 difference a real difference
(statistically - significant) or is it just sampling error?
Two sample distributions
29Some inferential statistical tests
- 1 factor with two groups
- T-tests
- Between groups 2-independent samples
- Within groups Repeated measures samples
(matched, related) - 1 factor with more than two groups
- Analysis of Variance (ANOVA) (either between
groups or repeated measures) - Multi-factorial
- Factorial ANOVA
30T-test
- Design
- 2 separate experimental conditions
- Degrees of freedom
- Based on the size of the sample and the kind of
t-test - Formula
Computation differs for between and within
t-tests
31T-test
- Reporting your results
- The observed difference between conditions
- Kind of t-test
- Computed T-statistic
- Degrees of freedom for the test
- The p-value of the test
- The mean of the treatment group was 12 points
higher than the control group. An independent
samples t-test yielded a significant difference,
t(24) 5.67, p lt 0.05. - The mean score of the post-test was 12 points
higher than the pre-test. A repeated measures
t-test demonstrated that this difference was
significant significant, t(12) 5.67, p lt 0.05.
32Analysis of Variance
- Designs
- More than two groups
- 1 Factor ANOVA, Factorial ANOVA
- Both Within and Between Groups Factors
- Test statistic is an F-ratio
- Degrees of freedom
- Several to keep track of
- The number of them depends on the design
33Analysis of Variance
- More than two groups
- Now we cant just compute a simple difference
score since there are more than one difference - So we use variance instead of simply the
difference - Variance is essentially an average difference
341 factor ANOVA
- 1 Factor, with more than two levels
- Now we cant just compute a simple difference
score since there are more than one difference - A - B, B - C, A - C
351 factor ANOVA
- Null hypothesis
- H0 all the groups are equal
Alternative hypotheses
HA not all the groups are equal
361 factor ANOVA
Planned contrasts and post-hoc tests - Further
tests used to rule out the different
Alternative hypotheses
Test 1 A ? B
Test 2 A ? C
Test 3 B C
37- Reporting your results
- The observed differences
- Kind of test
- Computed F-ratio
- Degrees of freedom for the test
- The p-value of the test
- Any post-hoc or planned comparison results
- The mean score of Group A was 12, Group B was
25, and Group C was 27. A 1-way ANOVA was
conducted and the results yielded a significant
difference, F(2,25) 5.67, p lt 0.05. Post hoc
tests revealed that the differences between
groups A and B and A and C were statistically
reliable (respectively t(1) 5.67, p lt 0.05
t(1) 6.02, p lt0.05). Groups B and C did not
differ significantly from one another
1 factor ANOVA
38Factorial ANOVAs
- We covered much of this in our experimental
design lecture - More than one factor
- Factors may be within or between
- Overall design may be entirely within, entirely
between, or mixed - Many F-ratios may be computed
- An F-ratio is computed to test the main effect of
each factor - An F-ratio is computed to test each of the
potential interactions between the factors
39Factorial ANOVAs
- Reporting your results
- The observed differences
- Because there may be a lot of these, may present
them in a table instead of directly in the text - Kind of design
- e.g. 2 x 2 completely between factorial design
- Computed F-ratios
- May see separate paragraphs for each factor, and
for interactions - Degrees of freedom for the test
- Each F-ratio will have its own set of dfs
- The p-value of the test
- May want to just say all tests were tested with
an alpha level of 0.05 - Any post-hoc or planned comparison results
- Typically only the theoretically interesting
comparisons are presented
40Non-Experimental designs
- Sometimes you just cant perform a fully
controlled experiment - Because of the issue of interest
- Limited resources (not enough subjects,
observations are too costly, etc). - Surveys
- Correlational
- Quasi-Experiments
- Developmental designs
- Small-N designs
- This does NOT imply that they are bad designs
- Just remember the advantages and disadvantages of
each
41Quasi-experiments
- What are they?
- Almost true experiments, but with an inherent
confounding variable - General types
- An event occurs that the experimenter doesnt
manipulate - Something not under the experimenters control
- (e.g., flashbulb memories for traumatic events)
- Interested in subject variables
- high vs. low IQ, males vs. females
- Time is used as a variable
42Quasi-experiments
- Program evaluation
- Research on programs that is implemented to
achieve some positive effect on a group of
individuals. - e.g., does abstinence from sex program work in
schools - Steps in program evaluation
- Needs assessment - is there a problem?
- Program theory assessment - does program address
the needs? - Process evaluation - does it reach the target
population? Is it being run correctly? - Outcome evaluation - are the intended outcomes
being realized? - Efficiency assessment- was it worth it? The the
benefits worth the costs?
43Quasi-experiments
- Nonequivalent control group designs
- with pretest and posttest (most common)
- (think back to the second control lecture)
- But remember that the results may be
compromised because of the nonequivalent control
group (review threats to internal validity)
44Quasi-experiments
- Advantages
- Allows applied research when experiments not
possible - Threats to internal validity can be assessed
(sometimes)
- Disadvantages
- Threats to internal validity may exist
- Designs are more complex than traditional
experiments - Statistical analysis can be difficult
- Most statistical analyses assume randomness
45Non-Experimental designs
- Sometimes you just cant perform a fully
controlled experiment - Because of the issue of interest
- Limited resources (not enough subjects,
observations are too costly, etc). - Surveys
- Correlational
- Quasi-Experiments
- Developmental designs
- Small-N designs
- This does NOT imply that they are bad designs
- Just remember the advantages and disadvantages of
each
46Developmental designs
- Used to study changes in behavior that occur as a
function of age changes - Age typically serves as a quasi-independent
variable
- Three major types
- Cross-sectional
- Longitudinal
- Cohort-sequential
47Developmental designs
- Cross-sectional design
- Groups are pre-defined on the basis of a
pre-existing variable - Study groups of individuals of different ages at
the same time - Use age to assign participants to group
- Age is subject variable treated as a
between-subjects variable
48Developmental designs
- Advantages
- Can gather data about different groups (i.e.,
ages) at the same time - Participants are not required to commit for an
extended period of time
49Developmental designs
- Disavantages
- Individuals are not followed over time
- Cohort (or generation) effect individuals of
different ages may be inherently different due to
factors in the environment - Are 5 year old different from 15 year olds just
because of age, or can factors present in their
environment contribute to the differences? - Imagine a 15yr old saying back when I was 5 I
didnt have a Wii, my own cell phone, or a
netbook - Does not reveal development of any particular
individuals - Cannot infer causality due to lack of control
50Developmental designs
- Follow the same individual or group over time
- Age is treated as a within-subjects variable
- Rather than comparing groups, the same
individuals are compared to themselves at
different times - Changes in dependent variable likely to reflect
changes due to aging process - Changes in performance are compared on an
individual basis and overall
51Longitudinal Designs
- Example
- Wisconsin Longitudinal Study (WLS)
- Began in 1957 and is still on-going (50 years)
- 10,317 men and women who graduated from Wisconsin
high schools in 1957 - Originally studied plans for college after
graduation - Now it can be used as a test of aging and
maturation
52Developmental designs
- Advantages
- Can see developmental changes clearly
- Can measure differences within individuals
- Avoid some cohort effects (participants are all
from same generation, so changes are more likely
to be due to aging)
53Developmental designs
- Disadvantages
- Can be very time-consuming
- Can have cross-generational effects
- Conclusions based on members of one generation
may not apply to other generations - Numerous threats to internal validity
- Attrition/mortality
- History
- Practice effects
- Improved performance over multiple tests may be
due to practice taking the test - Cannot determine causality
54Developmental designs
- Measure groups of participants as they age
- Example measure a group of 5 year olds, then the
same group 10 years later, as well as another
group of 5 year olds - Age is both between and within subjects variable
- Combines elements of cross-sectional and
longitudinal designs - Addresses some of the concerns raised by other
designs - For example, allows to evaluate the contribution
of cohort effects
55Developmental designs
Time of measurement
1975
1985
1995
Cohort A
1970s
Cohort B
1980s
Cohort C
1990s
56Developmental designs
- Advantages
- Get more information
- Can track developmental changes to individuals
- Can compare different ages at a single time
- Can measure generation effect
- Less time-consuming than longitudinal (maybe)
- Disadvantages
- Still time-consuming
- Need lots of groups of participants
- Still cannot make causal claims
57Small N designs
- What are they?
- Historically, these were the typical kind of
design used until 1920s when there was a shift
to using larger sample sizes - Even today, in some sub-areas, using small N
designs is common place - (e.g., psychophysics, clinical settings,
expertise, etc.)
58Small N designs
- One or a few participants
- Data are typically not analyzed statistically
rather rely on visual interpretation of the data - Observations begin in the absence of treatment
(BASELINE) - Then treatment is implemented and changes in
frequency, magnitude, or intensity of behavior
are recorded
59Small N designs
- Baseline experiments the basic idea is to show
- when the IV occurs, you get the effect
- when the IV doesnt occur, you dont get the
effect (reversibility) - Before introducing treatment (IV), baseline needs
to be stable - Measure level and trend
60Small N designs
- Level how frequent (how intense) is behavior?
- Are all the data points high or low?
- Trend does behavior seem to increase (or
decrease) - Are data points flat or on a slope?
61ABA design
- ABA design (baseline, treatment, baseline)
- The reversibility is necessary, otherwise
- something else may have caused the effect
- other than the IV (e.g., history, maturation,
etc.)
62Small N designs
- Advantages
- Focus on individual performance, not fooled by
group averaging effects - Focus is on big effects (small effects typically
cant be seen without using large groups) - Avoid some ethical problems e.g., with
non-treatments - Allows to look at unusual (and rare) types of
subjects (e.g., case studies of amnesics, experts
vs. novices) - Often used to supplement large N studies, with
more observations on fewer subjects
63Small N designs
- Disadvantages
- Effects may be small relative to variability of
situation so NEED more observation - Some effects are by definition between subjects
- Treatment leads to a lasting change, so you dont
get reversals - Difficult to determine how generalizable the
effects are
64Small N designs
- Some researchers have argued that Small N designs
are the best way to go. - The goal of psychology is to describe behavior of
an individual - Looking at data collapsed over groups looks in
the wrong place - Need to look at the data at the level of the
individual