Intro to Statistics for the Behavioral Sciences PSYC 1900

1 / 31
About This Presentation
Title:

Intro to Statistics for the Behavioral Sciences PSYC 1900

Description:

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 5: Probability and Hypothesis Testing –

Number of Views:152
Avg rating:3.0/5.0
Slides: 32
Provided by: DavidD218
Category:

less

Transcript and Presenter's Notes

Title: Intro to Statistics for the Behavioral Sciences PSYC 1900


1
Intro to Statistics for the Behavioral
SciencesPSYC 1900
  • Lecture 5 Probability and Hypothesis Testing

2
Probability
  • Relative Frequency Perspective
  • Probability of some event is the limit of the
    relative frequency of occurrence as the number of
    draws (i.e., samples) approaches infinity.
  • If we have 8 blue marbles and 2 red marbles, the
    probability of drawing a red 2/10 20 on any
    trial (i.e., analytic perspective).
  • Across repeated trials, we would find that 20 of
    them produce a red marble.
  • Note that were sampling with replacement.

3
Terminology
  • Sampling with replacement
  • After an event, the draw or event goes back into
    the pool.
  • Sampling in which an item drawn on trial N is
    replaced before the drawing of the N1 trial.
  • Event
  • The outcome of a trial
  • Independent events
  • Events where the occurrence of one has no effect
    on the probability of the occurrence of others
  • Voting behavior of random citizens, marble draw
  • Mutually exclusive events
  • Two events are mutually exclusive when the
    occurrence of one precludes the occurrence of the
    other.
  • Gender, religion, handedness

4
Basic Laws of Probability
  • Probabilities range from 0 to 1, where a 1 means
    the event must occur.
  • Additive Rule
  • Gives probs of occurrence for one or more
    mutually exclusive envents.
  • 30 red marbles, 15 blue, 55 green 100 total
  • p(red).30, p(blue).15, p(green) .55
  • Probability of drawing a red or blue?
  • Given a set of mutually exclusive events, the
    probability of one event or the other equals the
    sum of their separate probabilities.
  • p(red).30 p(blue).15.45

5
Basic Laws of Probability
  • Multiplicative Law
  • Gives the probability of the joint occurrence of
    independent events.
  • 30 red marbles, 15 blue, 55 green 100 total
  • p(red).30, p(blue).15, p(green) .55
  • Probability of drawing a red on the first trial
    and a red on the second?
  • The prob of a joint occurrence of two or more
    independent events equals the product of their
    individual probabilities.
  • p(red) X p(red) .3X.3 .09

6
  • Sequence of coin flips
  • H,H,T,H,T,T,T,H,T,T, __
  • What is the probability of H on next draw?
  • Prob.5 Events are independent
  • What is the probability of H and H on the next
    two draws?
  • Prob.5X.5.25 Events are independent
  • Conditional probability of independent
    events

7
Joint Probabilities
  • The probability of the co-occurrence of two or
    more events
  • Probability of sampling a red cube from a sample
    of red and blue marbles and cubes
  • p(red,cube) p(red) x p(cube)
  • If the events are independent
  • If not independent (i.e., a correlation among
    events), computation of prob is more complex

8
Conditional Probabilities
  • The prob of one even given the occurrence of
    another event
  • The prob that a person will fracture a bone given
    that he/she has osteoporosis
  • p(fractureosteoporosis) Y
  • If the null hypothesis is true, the probability
    of obtaining a difference between sample means of
    X size

9
Bone Density No Fracture Fracture Total
Normal 153 24 177
Row 86 14 49
Column 59 24
Cell 43 7

Osteoporosis 105 76 181
Row 58 42 51
Column 41 76
Cell 29 21

Total 258 100
Column 72 28
  • p(no fracture) 258/358.72
  • p(norm den, no frac)153/358.43
  • Why not p(norm) x p(no frac) .49x.72.35?
  • p(fracosteo) .42 p(fracnorm).14
  • Other conditional prob examples?

10
Discrete vs. Continuous Probability Distributions
  • For discrete distributions, we can calculate
    probs for specific events.
  • p(Harvard, vanilla)
  • 7/20.35

11
Discrete vs. Continuous Probability Distributions
  • For continuous distributions, case is slightly
    different.
  • Prob that baby will crawl at 35 weeks?
  • Almost zero at 35.00001 weeks.
  • Events at a very specific point are infrequent.
  • Density gives probability for specific range
  • 35 weeks means from 34.5 to 35.5 weeks.
  • Integrate to find area under curve which provides
    a probability as a function of proportion of
    interval area to entire area under curve (where
    total area is set to equal 1)

12
Sampling Distributions Hypothesis Testing
  • Until now, we have primarily focused on
    descriptive statistics.
  • Although such statistics are quite useful for
    assessing the characteristics of samples, they
    cannot answer questions related to inference.
  • Is the difference between two means likely to
    represent chance variation?
  • To answer such questions, the remainder of this
    course will focus on the statistical process of
    inference.

13
Basic Form of Inference
  • The most basic question is one in which we might
    compare the means of two groups.
  • If one group has a mean of 50 and the other a
    mean of 42 following some manipulation, can we
    infer that the manipulation lowered the score?

14
Sampling Error
  • To answer this question, we have to understand
    sampling error.
  • Sampling error is the variability of a statistic
    from sample to sample due to chance.
  • If I took samples from a population, the
    descriptives of the samples would cluster around,
    but not always equal the parameters of the
    population.

15
Hypothesis Testing
  • The basic question in hypothesis testing is
  • Is the given difference large enough that it does
    not likely stem from sampling error?
  • Hypothesis Testing
  • A process by which decisions are made regarding
    the values of parameters.

16
Sampling Distributions
  • The distribution of a statistic over repeated
    sampling from a specified population.
  • Both descriptive and inferential statistics
    (e.g., t, F, r) have sampling distributions.
  • Tell us what values we might expect given certain
    conditions.
  • A conditional probability

17
Sampling Distribution of the Mean
  • To determine if the difference between two means
    is likely due to sampling error, we need to know
    the sd of a distribution of means from the
    population.
  • Standard Error of the Mean
  • sd of a sampling distribution of means
  • Sampling distritribution of the mean is the
    distribution of means collected from repeated
    sampling of the same population.

18
Distribution of Sample Means
19
Hypothesis Testing
  • Sampling distributions allow us to test
    hypotheses.
  • Sampling distributions can be derived
    mathematically.
  • If the aggression mean of kids viewing a violent
    video is 6.5, and the normal population mean
    for kids is 5.65, does this difference imply that
    the such videos increase aggressive thoughts?

20
Logic of Hypothesis Testing
  • Set up relevant null hypothesis H0
  • Sample (i.e., kids who watch violent videos)
    represents same population.
  • Mean should equal population mean of 5.65
  • Calculate mean of sample
  • Mean 6.5
  • Obtain sampling distribution and standard error
  • Determine probability of obtaining a mean at
    least as large as the actual sample mean
  • On that basis, decide whether to accept or reject
    the null hypothesis

21
The Null Hypothesis
  • At its heart, the null states that parameters are
    the same.
  • For example, 2 means are equal
  • The difference between the means is zero
  • Any differences reflect sampling error
  • Why use the null?
  • Excellent starting place
  • What would the alternative be?
  • Wed have to specify sampling distributions for
    exact alternative parameter values?

22
Test Statistics and Sampling Distributions
  • The same logic applies to test statistics as well
    as means.
  • ts, Fs, rs
  • A sampling distribution can be calculated for
    each statistic and used to evaluate the
    corresponding null.
  • For t, a sampling distribution when H0 is true
    would consist of t values from an infinite number
    of paired samples.
  • Compare current t to sampling distribution to
    determine viability of null.

23
Using Normal Distribution to Test Hypotheses
  • The normal distribution can be used to test
    hypotheses involving individual scores or sample
    means.
  • Assumes scores or sampling distributions of the
    mean are normally distributed
  • Going back to our example
  • Mean of kids watching violent videos 6.5
  • Population parameters
  • Mean 5.65, sd .45

24
Using Normal Distribution to Test Hypotheses
  • Convert 6.5 to a z score
  • applet
  • p(6.5N(5.65,0.45)).06

25
Terminology
  • Significance Level
  • Probability with which we are willing to reject
    null when it is in fact correct
  • Also called alpha level
  • Rejection Region
  • Set of outcomes that will lead to rejection of
    null
  • Alternative Hypothesis
  • Hypothesis that is adopted when null is rejected
  • Usually the research hypothesis

26
Type I and Type II Errors
  • As weve seen, determining whether a difference
    is real or due to sampling error requires a
    choice of a critical value or significance level.
  • Because we are making a choice, there is always
    the chance that the choice will be incorrect.

27
Type I and Type II Errors
  • If we use a significance level of .05
  • 5 of the time we will reject the null hypothesis
    when it is true
  • Type I Error
  • p(Type I) alpha
  • If we feel this amount of error is too large,
    what can we do to minimize Type I errors?

28
Type I and Type II Errors
  • Use a more stringent alpha level to reduce Type I
    errors
  • Alpha .01 only 1 error in rejecting null
  • This strategy has a trade-off
  • Failing to reject the null when it is false is a
    Type II error
  • p(Type II) beta

29
Decision True State Of World
Null True Null False
Reject Null Type 1 Error Correct Decision
Fail to Reject Null Correct Decision Type II Error
30
One-Tailed vs. Two-Tailed Tests
  • Two-tailed (nondirectional) tests are most common
  • Look for extremes in both tails (i.e., positive
    or negative deviations from the mean)
  • Alpha .05 has .025 null rejection area in each
    tail of sampling distribution
  • Used because one might never truly be sure what
    outcome to expect

31
One-Tailed vs. Two-Tailed Tests
  • One-tailed (directional tests) are less commonly
    used
  • Look for extreme parameter values in only 1 tail
  • Researcher predicts direction of difference
  • Alpha.05 places total .05 null rejection area in
    a single tail
  • What is the benefit in terms of power?
  • Smaller differences will be viewed as significant
    due to increased null rejection area
Write a Comment
User Comments (0)
About PowerShow.com