Zen and the Art of Significance Testing - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Zen and the Art of Significance Testing

Description:

Zen and the Art of Significance Testing At the center of it all: the sampling distribution The task: learn something about an unobserved population on the basis of an ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 12

Provided by: HitEnt

Learn more at: http://courses.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Zen and the Art of Significance Testing

1
Zen and the Art of Significance Testing

At the center of it all the sampling
distribution
The task learn something about an unobserved
population on the basis of an observed sample.
Not much of this can be done directlybest
guesses are corresponding parameters in the
population.
To know accuracy and confidence level, the sample
isnt enoughwe need the sampling distribution.
Sampling distribution is a bridge between sample
and population, in that pieces of information
about each come together such that we can make
statements about the population with varying
degrees of confidence and accuracy.

2
Frequency Distribution

A sampling distribution is simply another type of
frequency distribution. A frequency distribution
is a depiction of the number of times each value
of a variable occurs in a sample or population.
FD often depicted as curve above horizontal axis,
such that the height of a point above the axis
depicts the number of times that score occurred
in the collectivity.
The height can also represent relative frequency,
that is the proportion of times a value occurs.
AHA! The sum of all the relative frequencies
always equals 1.0 or 100.
Two components scores on variable
concernedhorizontal axis, and frequency/relative
frequency given by height above axis. Infinite
distributions can be that in either direction

3
Sampling Distribution U of A

A sampling distributions U of A is a
collectivity, but not in the sense that a large
number of collectivities were observed (such as
classes, groves of trees, soil samples).
The collectivity in this case is the sample.
Since only one sample is observed, how can it be
an individual? Thats where the Zen comes inhang
on a minute.
What are the variables, or description, of such
an individual? A statistic. So
IF there were a lot of samples (not going to
happen), they could each be characterized by
their own score (statistic) and in this nonreal
sense a statistic can be a variable, and have a
distribution.

4
Sampling Distribution U of A

A sampling distributions U of A is a
collectivity, but not in the sense that a large
number of collectivities were observed (such as
classes, groves of trees, soil samples).
The collectivity in this case is the sample.
Since only one sample is observed, how can it be
an individual? Thats where the Zen comes inhang
on a minute.
What are the variables, or description, of such
an individual? A statistic. So
IF there were a lot of samples (not going to
happen), they could each be characterized by
their own score (statistic) and in this nonreal
sense a statistic can be a variable, and have a
distribution.

5
Deep breath inDistribution of a sample

At the moment before you sample (I mean random,
always), you can have in mind a statistic you
will calculate once the sample is pulled. Say,
the mean (income, intelligence, whatever).
That mean will depend on the sample you pull, and
likely change with repeated samples.
At the moment before you sample, then, there is a
set of possibilities for the outcome of the
statisticit is these possibilities for the
calculated statistic that are ranged along the
horizontal axis. Note that depending on what you
are sampling, this range will differthere are
many sampling distributions.
The second component of the distribution is the
number of times each value occurs, which in this
case do not happen in actuality at all, they are
merely possibilities.

6
Hold that breathDistribution of a sample

Possibilities imagine drawing samples of the
same size over and over, for fun, forever. The
same sample might come up, or the same mean
derived from different samples. In this sense,
the second component of the distribution is
infinite
Some numbers are more probable than othersin an
infinite sense there is no sense counting
frequency, so we think of relative frequency (or
probability) of the means occurring. Depends on
the nature of the population and size of
hypothetical sample.
Soa sampling distribution is a compendium of the
probabilities of calculated outcomes when one is
about to draw a sample of size n from a
population and calculate a certain statistic.

7
Let it goahhhh, Probability and Randomness

How is this hypothesized distribution useful?
We can know probabilities about populations by
knowing relevant facts of the population. Even
rough, summary information. Why?
The mathematics of probability. And the
importance of the random sample. Without it, the
specifics of the sampling distribution would be
unknown. The rules of probability do not apply to
anything but random samples to yield a
predetermined sampling distribution (our
hypothesized one).
Everything we are doing is based on a random
sampling procedure or one close enough to
amounting to the same-o, same-o.

8
Relaxed? Good, lets talk Curves

Predetermined sampling distributions are given by
mathematical equations. The normal curve is one
such formula.
Many statistics we are concerned with happen to
have a normal sampling distribution. It just is
that way. Be with it.
Another curve is the t-distribution, and many
statistics are found to have a t-distribution. It
just is that way. Be with it.
They look kind of similar. They are actually
defined by totally different formulae. They both
have bell-shaped curves.
Both are symmetric about mean, taper to two
tails, and asymptotic
Cant arrive at the curves empirically, as the
second component is infinite. Thank you
mathematicians for figuring this out.
The relationship between the sampling
distribution of the mean and the normal curve is
one of great mathematical convenience and
empirical plausibility. It just is that way. Be
with it.

9
Normal curve as foundational

From the normal curve, all other major curves in
statistical inference follow mathematically. That
is, the t-distribution, the F-distribution, and
Chi-squared are all curves that specify sampling
distributions of different statistics, and all
are based on the normal distribution.
The whole makes a remarkable, close-knit family
that is wonderful in mathematical elegance and
momentous in practical utility. I couldnt even
make that stuff up.

10
Dancing to the Central Limit Theorem

As a sample is to be drawn, and we are interested
in a mean, x-bar, there is conceptually a
sampling distribution of x-bar.
The Central Limit Theorem demonstrates that this
distribution is approximately normal. And, the
mean of x-bar is the mean of the population. A
wonderfully convenient result.
It means on average, a sample mean is the same
as the population mean.
CLT also tells us that the standard deviation of
the sampling distribution is the population
standard deviation divided by square root of
sample size. We call this the standard error.
CLT tells us, in sum 1) sampling distribution is
normal 2) it has a certain mean, and 3) it has a
certain std dev.

11
Dancing to the Central Limit Theorem, Part II

That mean thing is neat, but what about the
normal and std dev?
They help determine how far off from the
population mean our sample mean is likely to be,
once we actually draw it.
A sampling distribution of a mean is normal as
long as sample is large (gt100). Because a large
sample, randomly drawn from a population, tends
to mirror the population.
That means that of all the possibilities for
x-bar, the most likely are those near the true
mean.
For standard deviation, that of the sampling
distribution is much narrower than the population
(1/sq rt n times the pop s.d.). Why? Although
extreme values are possible, most of the
hypothetical sample means will be near the true
mean. That is, central values are far more
probable than extreme values, so the average
distance off center is small.
So, once our sample is drawn, if our mean is off
at all it is likely not off by much, and the
probability of being off by a lot is low.