Statistics - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Statistics

Description:

... allows to evaluate the contribution of cohort effects Cohort-sequential design ... This is the hypothesis that you are testing Null hypothesis ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 65
Provided by: JCooper5
Category:

less

Transcript and Presenter's Notes

Title: Statistics


1
Statistics
  • Psych 231 Research Methods in Psychology

2
Statistics
  • 2 General kinds of Statistics
  • Descriptive statistics
  • Used to describe, simplify, organize data sets
  • Describing distributions of scores
  • Inferential statistics
  • Used to test claims about the population, based
    on data gathered from samples
  • Takes sampling error into account, are the
    results above and beyond what youd expect by
    random chance

3
Statistics
  • Why do we use them?
  • Descriptive statistics
  • Used to describe, simplify, organize data sets
  • Describing distributions of scores
  • Inferential statistics
  • Used to test claims about the population, based
    on data gathered from samples
  • Takes sampling error into account, are the
    results above and beyond what youd expect by
    random chance

4
Inferential Statistics
  • Purpose To make claims about populations based
    on data collected from samples
  • Whats the big deal?
  • Example Experiment
  • Group A - gets treatment to improve memory
  • Group B - gets no treatment (control)
  • After treatment period test both groups for
    memory
  • Results
  • Group As average memory score is 80
  • Group Bs is 76
  • Is the 4 difference a real difference
    (statistically significant) or is it just
    sampling error?

5
Testing Hypotheses
  • Step 1 State your hypotheses
  • Step 2 Set your decision criteria
  • Step 3 Collect your data from your sample(s)
  • Step 4 Compute your test statistics
  • Step 5 Make a decision about your null
    hypothesis
  • Reject H0
  • Fail to reject H0

6
Testing Hypotheses
  • Step 1 State your hypotheses
  • Null hypothesis (H0)
  • Alternative hypothesis(ses)
  • There are no differences (effects)
  • Generally, not all groups are equal
  • You arent out to prove the alternative
    hypothesis (although it feels like this is what
    you want to do)
  • If you reject the null hypothesis, then youre
    left with support for the alternative(s) (NOT
    proof!)

7
Testing Hypotheses
  • Step 1 State your hypotheses
  • In our memory example experiment
  • Null H0 mean of Group A mean of Group B
  • Alternative HA mean of Group A ? mean of Group B
  • (Or more precisely Group A gt Group B)
  • It seems like our theory is that the treatment
    should improve memory.
  • Thats the alternative hypothesis. Thats NOT
    the one the well test with inferential
    statistics.
  • Instead, we test the H0

8
Testing Hypotheses
  • Step 1 State your hypotheses
  • Step 2 Set your decision criteria
  • Your alpha level will be your guide for when to
  • reject the null hypothesis
  • fail to reject the null hypothesis
  • This could be correct conclusion or the incorrect
    conclusion
  • Two different ways to go wrong
  • Type I error saying that there is a difference
    when there really isnt one (probability of
    making this error is alpha level)
  • Type II error saying that there is not a
    difference when there really is one

9
Error types
Real world (truth)
H0 is correct
H0 is wrong


Type I error
Reject H0
Experimenters Conclusions
Fail to Reject H0
Type II error
10
Error types Courtroom analogy
Real world (truth)
Defendant is innocent
Defendant is guilty


Type I error
Find guilty
Jurys decision
Type II error
Find not guilty
11
Error types
  • Type I error concluding that there is an effect
    (a difference between groups) when there really
    isnt.
  • Sometimes called significance level
  • We try to minimize this (keep it low)
  • Pick a low level of alpha
  • Psychology 0.05 and 0.01 most common
  • Type II error concluding that there isnt an
    effect, when there really is.
  • Related to the Statistical Power of a test
  • How likely are you able to detect a difference if
    it is there

12
Testing Hypotheses
  • Step 1 State your hypotheses
  • Step 2 Set your decision criteria
  • Step 3 Collect your data from your sample(s)
  • Step 4 Compute your test statistics
  • Descriptive statistics (means, standard
    deviations, etc.)
  • Inferential statistics (t-tests, ANOVAs, etc.)
  • Step 5 Make a decision about your null
    hypothesis
  • Reject H0 statistically significant
    differences
  • Fail to reject H0 not statistically significant
    differences
  • When you reject your null hypothesis
  • Essentially this means that the observed
    difference is above what youd expect by chance

13
Generic statistical test
  • Tests the question
  • Are there differences between groups due to a
    treatment?

Two possibilities in the real world
One population
Two sample distributions
14
Generic statistical test
  • Tests the question
  • Are there differences between groups due to a
    treatment?

Two possibilities in the real world
H0 is false (is a treatment effect)
H0 is true (no treatment effect)
Two populations
XB
76
80
76
80
People who get the treatment change, they form a
new population (the treatment population)
15
Generic statistical test
  • ER Random sampling error
  • ID Individual differences (if between subjects
    factor)
  • TR The effect of a treatment
  • Why might the samples be different?
  • (What is the source of the variability between
    groups)?

16
Generic statistical test
  • ER Random sampling error
  • ID Individual differences (if between subjects
    factor)
  • TR The effect of a treatment
  • The generic test statistic - is a ratio of
    sources of variability

Observed difference
Computed test statistic


Difference from chance
17
Statistical significance
  • Statistically significant differences
  • When you reject your null hypothesis
  • Essentially this means that the observed
    difference is above what youd expect by chance
  • Chance is determined by estimating how much
    sampling error there is
  • Factors affecting chance
  • Sample size
  • Population variability

18
Sampling error
x
n 1
19
Sampling error
x
x
n 2
20
Sampling error
  • Generally, as the sample size increases, the
    sampling error decreases

n 10
21
Sampling error
  • Typically the narrower the population
    distribution, the narrower the range of possible
    samples, and the smaller the chance

22
Sampling error
  • These two factors combine to impact the
    distribution of sample means.
  • The distribution of sample means is a
    distribution of all possible sample means of a
    particular sample size that can be drawn from the
    population

Samples of size n
23
Generic statistical test
  • The generic test statistic distribution
  • To reject the H0, you want a computed test
    statistics that is large
  • reflecting a large Treatment Effect (TR)
  • Whats large enough? The alpha level gives us the
    decision criterion

Distribution of the test statistic
Distribution of sample means
a-level determines where these boundaries go
24
Generic statistical test
  • The generic test statistic distribution
  • To reject the H0, you want a computed test
    statistics that is large
  • reflecting a large Treatment Effect (TR)
  • Whats large enough? The alpha level gives us the
    decision criterion

Distribution of the test statistic
Reject H0
Fail to reject H0
25
Generic statistical test
  • The generic test statistic distribution
  • To reject the H0, you want a computed test
    statistics that is large
  • reflecting a large Treatment Effect (TR)
  • Whats large enough? The alpha level gives us the
    decision criterion

Distribution of the test statistic
Reject H0
One tailed test sometimes you know to expect a
particular difference (e.g., improve memory
performance)
Fail to reject H0
26
Significance
  • A statistically significant difference means
  • the researcher is concluding that there is a
    difference above and beyond chance
  • with the probability of making a type I error at
    5 (assuming an alpha level 0.05)
  • Note statistical significance is not the same
    thing as theoretical significance.
  • Only means that there is a statistical difference
  • Doesnt mean that it is an important difference

27
Non-Significance
  • Failing to reject the null hypothesis
  • Generally, not interested in accepting the null
    hypothesis (remember we cant prove things only
    disprove them)
  • Usually check to see if you made a Type II error
    (failed to detect a difference that is really
    there)
  • Check the statistical power of your test
  • Sample size is too small
  • Effects that youre looking for are really small
  • Check your controls, maybe too much variability

28
Summary
  • Example Experiment
  • Group A - gets treatment to improve memory
  • Group B - gets no treatment (control)
  • After treatment period test both groups for
    memory
  • Results
  • Group As average memory score is 80
  • Group Bs is 76

H0 µA µB
H0 there is no difference between Grp A and Grp B
  • Is the 4 difference a real difference
    (statistically
  • significant) or is it just sampling error?

Two sample distributions
29
Some inferential statistical tests
  • 1 factor with two groups
  • T-tests
  • Between groups 2-independent samples
  • Within groups Repeated measures samples
    (matched, related)
  • 1 factor with more than two groups
  • Analysis of Variance (ANOVA) (either between
    groups or repeated measures)
  • Multi-factorial
  • Factorial ANOVA

30
T-test
  • Design
  • 2 separate experimental conditions
  • Degrees of freedom
  • Based on the size of the sample and the kind of
    t-test
  • Formula

Computation differs for between and within
t-tests
31
T-test
  • Reporting your results
  • The observed difference between conditions
  • Kind of t-test
  • Computed T-statistic
  • Degrees of freedom for the test
  • The p-value of the test
  • The mean of the treatment group was 12 points
    higher than the control group. An independent
    samples t-test yielded a significant difference,
    t(24) 5.67, p lt 0.05.
  • The mean score of the post-test was 12 points
    higher than the pre-test. A repeated measures
    t-test demonstrated that this difference was
    significant significant, t(12) 5.67, p lt 0.05.

32
Analysis of Variance
  • Designs
  • More than two groups
  • 1 Factor ANOVA, Factorial ANOVA
  • Both Within and Between Groups Factors
  • Test statistic is an F-ratio
  • Degrees of freedom
  • Several to keep track of
  • The number of them depends on the design

33
Analysis of Variance
  • More than two groups
  • Now we cant just compute a simple difference
    score since there are more than one difference
  • So we use variance instead of simply the
    difference
  • Variance is essentially an average difference

34
1 factor ANOVA
  • 1 Factor, with more than two levels
  • Now we cant just compute a simple difference
    score since there are more than one difference
  • A - B, B - C, A - C

35
1 factor ANOVA
  • Null hypothesis
  • H0 all the groups are equal

Alternative hypotheses
HA not all the groups are equal
36
1 factor ANOVA
Planned contrasts and post-hoc tests - Further
tests used to rule out the different
Alternative hypotheses
Test 1 A ? B
Test 2 A ? C
Test 3 B C
37
  • Reporting your results
  • The observed differences
  • Kind of test
  • Computed F-ratio
  • Degrees of freedom for the test
  • The p-value of the test
  • Any post-hoc or planned comparison results
  • The mean score of Group A was 12, Group B was
    25, and Group C was 27. A 1-way ANOVA was
    conducted and the results yielded a significant
    difference, F(2,25) 5.67, p lt 0.05. Post hoc
    tests revealed that the differences between
    groups A and B and A and C were statistically
    reliable (respectively t(1) 5.67, p lt 0.05
    t(1) 6.02, p lt0.05). Groups B and C did not
    differ significantly from one another

1 factor ANOVA
38
Factorial ANOVAs
  • We covered much of this in our experimental
    design lecture
  • More than one factor
  • Factors may be within or between
  • Overall design may be entirely within, entirely
    between, or mixed
  • Many F-ratios may be computed
  • An F-ratio is computed to test the main effect of
    each factor
  • An F-ratio is computed to test each of the
    potential interactions between the factors

39
Factorial ANOVAs
  • Reporting your results
  • The observed differences
  • Because there may be a lot of these, may present
    them in a table instead of directly in the text
  • Kind of design
  • e.g. 2 x 2 completely between factorial design
  • Computed F-ratios
  • May see separate paragraphs for each factor, and
    for interactions
  • Degrees of freedom for the test
  • Each F-ratio will have its own set of dfs
  • The p-value of the test
  • May want to just say all tests were tested with
    an alpha level of 0.05
  • Any post-hoc or planned comparison results
  • Typically only the theoretically interesting
    comparisons are presented

40
Non-Experimental designs
  • Sometimes you just cant perform a fully
    controlled experiment
  • Because of the issue of interest
  • Limited resources (not enough subjects,
    observations are too costly, etc).
  • Surveys
  • Correlational
  • Quasi-Experiments
  • Developmental designs
  • Small-N designs
  • This does NOT imply that they are bad designs
  • Just remember the advantages and disadvantages of
    each

41
Quasi-experiments
  • What are they?
  • Almost true experiments, but with an inherent
    confounding variable
  • General types
  • An event occurs that the experimenter doesnt
    manipulate
  • Something not under the experimenters control
  • (e.g., flashbulb memories for traumatic events)
  • Interested in subject variables
  • high vs. low IQ, males vs. females
  • Time is used as a variable

42
Quasi-experiments
  • Program evaluation
  • Research on programs that is implemented to
    achieve some positive effect on a group of
    individuals.
  • e.g., does abstinence from sex program work in
    schools
  • Steps in program evaluation
  • Needs assessment - is there a problem?
  • Program theory assessment - does program address
    the needs?
  • Process evaluation - does it reach the target
    population? Is it being run correctly?
  • Outcome evaluation - are the intended outcomes
    being realized?
  • Efficiency assessment- was it worth it? The the
    benefits worth the costs?

43
Quasi-experiments
  • Nonequivalent control group designs
  • with pretest and posttest (most common)
  • (think back to the second control lecture)
  • But remember that the results may be
    compromised because of the nonequivalent control
    group (review threats to internal validity)

44
Quasi-experiments
  • Advantages
  • Allows applied research when experiments not
    possible
  • Threats to internal validity can be assessed
    (sometimes)
  • Disadvantages
  • Threats to internal validity may exist
  • Designs are more complex than traditional
    experiments
  • Statistical analysis can be difficult
  • Most statistical analyses assume randomness

45
Non-Experimental designs
  • Sometimes you just cant perform a fully
    controlled experiment
  • Because of the issue of interest
  • Limited resources (not enough subjects,
    observations are too costly, etc).
  • Surveys
  • Correlational
  • Quasi-Experiments
  • Developmental designs
  • Small-N designs
  • This does NOT imply that they are bad designs
  • Just remember the advantages and disadvantages of
    each

46
Developmental designs
  • Used to study changes in behavior that occur as a
    function of age changes
  • Age typically serves as a quasi-independent
    variable
  • Three major types
  • Cross-sectional
  • Longitudinal
  • Cohort-sequential

47
Developmental designs
  • Cross-sectional design
  • Groups are pre-defined on the basis of a
    pre-existing variable
  • Study groups of individuals of different ages at
    the same time
  • Use age to assign participants to group
  • Age is subject variable treated as a
    between-subjects variable

48
Developmental designs
  • Cross-sectional design
  • Advantages
  • Can gather data about different groups (i.e.,
    ages) at the same time
  • Participants are not required to commit for an
    extended period of time

49
Developmental designs
  • Cross-sectional design
  • Disavantages
  • Individuals are not followed over time
  • Cohort (or generation) effect individuals of
    different ages may be inherently different due to
    factors in the environment
  • Are 5 year old different from 15 year olds just
    because of age, or can factors present in their
    environment contribute to the differences?
  • Imagine a 15yr old saying back when I was 5 I
    didnt have a Wii, my own cell phone, or a
    netbook
  • Does not reveal development of any particular
    individuals
  • Cannot infer causality due to lack of control

50
Developmental designs
  • Longitudinal design
  • Follow the same individual or group over time
  • Age is treated as a within-subjects variable
  • Rather than comparing groups, the same
    individuals are compared to themselves at
    different times
  • Changes in dependent variable likely to reflect
    changes due to aging process
  • Changes in performance are compared on an
    individual basis and overall

51
Longitudinal Designs
  • Example
  • Wisconsin Longitudinal Study (WLS)
  • Began in 1957 and is still on-going (50 years)
  • 10,317 men and women who graduated from Wisconsin
    high schools in 1957
  • Originally studied plans for college after
    graduation
  • Now it can be used as a test of aging and
    maturation

52
Developmental designs
  • Longitudinal design
  • Advantages
  • Can see developmental changes clearly
  • Can measure differences within individuals
  • Avoid some cohort effects (participants are all
    from same generation, so changes are more likely
    to be due to aging)

53
Developmental designs
  • Longitudinal design
  • Disadvantages
  • Can be very time-consuming
  • Can have cross-generational effects
  • Conclusions based on members of one generation
    may not apply to other generations
  • Numerous threats to internal validity
  • Attrition/mortality
  • History
  • Practice effects
  • Improved performance over multiple tests may be
    due to practice taking the test
  • Cannot determine causality

54
Developmental designs
  • Cohort-sequential design
  • Measure groups of participants as they age
  • Example measure a group of 5 year olds, then the
    same group 10 years later, as well as another
    group of 5 year olds
  • Age is both between and within subjects variable
  • Combines elements of cross-sectional and
    longitudinal designs
  • Addresses some of the concerns raised by other
    designs
  • For example, allows to evaluate the contribution
    of cohort effects

55
Developmental designs
  • Cohort-sequential design

Time of measurement
1975
1985
1995
Cohort A
1970s
Cohort B
1980s
Cohort C
1990s
56
Developmental designs
  • Cohort-sequential design
  • Advantages
  • Get more information
  • Can track developmental changes to individuals
  • Can compare different ages at a single time
  • Can measure generation effect
  • Less time-consuming than longitudinal (maybe)
  • Disadvantages
  • Still time-consuming
  • Need lots of groups of participants
  • Still cannot make causal claims

57
Small N designs
  • What are they?
  • Historically, these were the typical kind of
    design used until 1920s when there was a shift
    to using larger sample sizes
  • Even today, in some sub-areas, using small N
    designs is common place
  • (e.g., psychophysics, clinical settings,
    expertise, etc.)

58
Small N designs
  • One or a few participants
  • Data are typically not analyzed statistically
    rather rely on visual interpretation of the data
  • Observations begin in the absence of treatment
    (BASELINE)
  • Then treatment is implemented and changes in
    frequency, magnitude, or intensity of behavior
    are recorded

59
Small N designs
  • Baseline experiments the basic idea is to show
  • when the IV occurs, you get the effect
  • when the IV doesnt occur, you dont get the
    effect (reversibility)
  • Before introducing treatment (IV), baseline needs
    to be stable
  • Measure level and trend

60
Small N designs
  • Level how frequent (how intense) is behavior?
  • Are all the data points high or low?
  • Trend does behavior seem to increase (or
    decrease)
  • Are data points flat or on a slope?

61
ABA design
  • ABA design (baseline, treatment, baseline)
  • The reversibility is necessary, otherwise
  • something else may have caused the effect
  • other than the IV (e.g., history, maturation,
    etc.)

62
Small N designs
  • Advantages
  • Focus on individual performance, not fooled by
    group averaging effects
  • Focus is on big effects (small effects typically
    cant be seen without using large groups)
  • Avoid some ethical problems e.g., with
    non-treatments
  • Allows to look at unusual (and rare) types of
    subjects (e.g., case studies of amnesics, experts
    vs. novices)
  • Often used to supplement large N studies, with
    more observations on fewer subjects

63
Small N designs
  • Disadvantages
  • Effects may be small relative to variability of
    situation so NEED more observation
  • Some effects are by definition between subjects
  • Treatment leads to a lasting change, so you dont
    get reversals
  • Difficult to determine how generalizable the
    effects are

64
Small N designs
  • Some researchers have argued that Small N designs
    are the best way to go.
  • The goal of psychology is to describe behavior of
    an individual
  • Looking at data collapsed over groups looks in
    the wrong place
  • Need to look at the data at the level of the
    individual
Write a Comment
User Comments (0)
About PowerShow.com