Zen and the Art of Significance Testing - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Zen and the Art of Significance Testing

Description:

Zen and the Art of Significance Testing At the center of it all: the sampling distribution The task: learn something about an unobserved population on the basis of an ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 12
Provided by: HitEnt
Category:

less

Transcript and Presenter's Notes

Title: Zen and the Art of Significance Testing


1
Zen and the Art of Significance Testing
  • At the center of it all the sampling
    distribution
  • The task learn something about an unobserved
    population on the basis of an observed sample.
    Not much of this can be done directlybest
    guesses are corresponding parameters in the
    population.
  • To know accuracy and confidence level, the sample
    isnt enoughwe need the sampling distribution.
  • Sampling distribution is a bridge between sample
    and population, in that pieces of information
    about each come together such that we can make
    statements about the population with varying
    degrees of confidence and accuracy.

2
Frequency Distribution
  • A sampling distribution is simply another type of
    frequency distribution. A frequency distribution
    is a depiction of the number of times each value
    of a variable occurs in a sample or population.
  • FD often depicted as curve above horizontal axis,
    such that the height of a point above the axis
    depicts the number of times that score occurred
    in the collectivity.
  • The height can also represent relative frequency,
    that is the proportion of times a value occurs.
    AHA! The sum of all the relative frequencies
    always equals 1.0 or 100.
  • Two components scores on variable
    concernedhorizontal axis, and frequency/relative
    frequency given by height above axis. Infinite
    distributions can be that in either direction

3
Sampling Distribution U of A
  • A sampling distributions U of A is a
    collectivity, but not in the sense that a large
    number of collectivities were observed (such as
    classes, groves of trees, soil samples).
  • The collectivity in this case is the sample.
  • Since only one sample is observed, how can it be
    an individual? Thats where the Zen comes inhang
    on a minute.
  • What are the variables, or description, of such
    an individual? A statistic. So
  • IF there were a lot of samples (not going to
    happen), they could each be characterized by
    their own score (statistic) and in this nonreal
    sense a statistic can be a variable, and have a
    distribution.

4
Sampling Distribution U of A
  • A sampling distributions U of A is a
    collectivity, but not in the sense that a large
    number of collectivities were observed (such as
    classes, groves of trees, soil samples).
  • The collectivity in this case is the sample.
  • Since only one sample is observed, how can it be
    an individual? Thats where the Zen comes inhang
    on a minute.
  • What are the variables, or description, of such
    an individual? A statistic. So
  • IF there were a lot of samples (not going to
    happen), they could each be characterized by
    their own score (statistic) and in this nonreal
    sense a statistic can be a variable, and have a
    distribution.

5
Deep breath inDistribution of a sample
  • At the moment before you sample (I mean random,
    always), you can have in mind a statistic you
    will calculate once the sample is pulled. Say,
    the mean (income, intelligence, whatever).
  • That mean will depend on the sample you pull, and
    likely change with repeated samples.
  • At the moment before you sample, then, there is a
    set of possibilities for the outcome of the
    statisticit is these possibilities for the
    calculated statistic that are ranged along the
    horizontal axis. Note that depending on what you
    are sampling, this range will differthere are
    many sampling distributions.
  • The second component of the distribution is the
    number of times each value occurs, which in this
    case do not happen in actuality at all, they are
    merely possibilities.

6
Hold that breathDistribution of a sample
  • Possibilities imagine drawing samples of the
    same size over and over, for fun, forever. The
    same sample might come up, or the same mean
    derived from different samples. In this sense,
    the second component of the distribution is
    infinite
  • Some numbers are more probable than othersin an
    infinite sense there is no sense counting
    frequency, so we think of relative frequency (or
    probability) of the means occurring. Depends on
    the nature of the population and size of
    hypothetical sample.
  • Soa sampling distribution is a compendium of the
    probabilities of calculated outcomes when one is
    about to draw a sample of size n from a
    population and calculate a certain statistic.

7
Let it goahhhh, Probability and Randomness
  • How is this hypothesized distribution useful?
  • We can know probabilities about populations by
    knowing relevant facts of the population. Even
    rough, summary information. Why?
  • The mathematics of probability. And the
    importance of the random sample. Without it, the
    specifics of the sampling distribution would be
    unknown. The rules of probability do not apply to
    anything but random samples to yield a
    predetermined sampling distribution (our
    hypothesized one).
  • Everything we are doing is based on a random
    sampling procedure or one close enough to
    amounting to the same-o, same-o.

8
Relaxed? Good, lets talk Curves
  • Predetermined sampling distributions are given by
    mathematical equations. The normal curve is one
    such formula.
  • Many statistics we are concerned with happen to
    have a normal sampling distribution. It just is
    that way. Be with it.
  • Another curve is the t-distribution, and many
    statistics are found to have a t-distribution. It
    just is that way. Be with it.
  • They look kind of similar. They are actually
    defined by totally different formulae. They both
    have bell-shaped curves.
  • Both are symmetric about mean, taper to two
    tails, and asymptotic
  • Cant arrive at the curves empirically, as the
    second component is infinite. Thank you
    mathematicians for figuring this out.
  • The relationship between the sampling
    distribution of the mean and the normal curve is
    one of great mathematical convenience and
    empirical plausibility. It just is that way. Be
    with it.

9
Normal curve as foundational
  • From the normal curve, all other major curves in
    statistical inference follow mathematically. That
    is, the t-distribution, the F-distribution, and
    Chi-squared are all curves that specify sampling
    distributions of different statistics, and all
    are based on the normal distribution.
  • The whole makes a remarkable, close-knit family
    that is wonderful in mathematical elegance and
    momentous in practical utility. I couldnt even
    make that stuff up.

10
Dancing to the Central Limit Theorem
  • As a sample is to be drawn, and we are interested
    in a mean, x-bar, there is conceptually a
    sampling distribution of x-bar.
  • The Central Limit Theorem demonstrates that this
    distribution is approximately normal. And, the
    mean of x-bar is the mean of the population. A
    wonderfully convenient result.
  • It means on average, a sample mean is the same
    as the population mean.
  • CLT also tells us that the standard deviation of
    the sampling distribution is the population
    standard deviation divided by square root of
    sample size. We call this the standard error.
  • CLT tells us, in sum 1) sampling distribution is
    normal 2) it has a certain mean, and 3) it has a
    certain std dev.

11
Dancing to the Central Limit Theorem, Part II
  • That mean thing is neat, but what about the
    normal and std dev?
  • They help determine how far off from the
    population mean our sample mean is likely to be,
    once we actually draw it.
  • A sampling distribution of a mean is normal as
    long as sample is large (gt100). Because a large
    sample, randomly drawn from a population, tends
    to mirror the population.
  • That means that of all the possibilities for
    x-bar, the most likely are those near the true
    mean.
  • For standard deviation, that of the sampling
    distribution is much narrower than the population
    (1/sq rt n times the pop s.d.). Why? Although
    extreme values are possible, most of the
    hypothetical sample means will be near the true
    mean. That is, central values are far more
    probable than extreme values, so the average
    distance off center is small.
  • So, once our sample is drawn, if our mean is off
    at all it is likely not off by much, and the
    probability of being off by a lot is low.
Write a Comment
User Comments (0)
About PowerShow.com