II' The World: Probability Theory - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

II' The World: Probability Theory

Description:

In real life, we often do not know various details about the world, and we want ... we don't start out with a preconceived idea about what value should be, and then ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 33

Provided by: SocialSc2

Category:

more less

Transcript and Presenter's Notes

Title: II' The World: Probability Theory

1
Chapter 9
2
II. The World Probability Theory (Ch. 6 8)
I. Data Descriptive Statistics (Ch. 2 5)
III. Drawing Conclusions Statistical
Inference (Ch. 9 11)
3

In real life, we often do not know various
details about the world, and we want to use our
data sets to help us determine these details.
An important part of doing good empirical science
is that we must do this carefully
We must understand what kind of evidence our
(particular uses of) our data sets are providing
.
We must understand how strong this evidence is.
We must understand on what grounds we are drawing
any conclusions that we do.
In short, we want to quantify our degree of
uncertainty about any conclusions we may choose
to draw.

EXAMPLE What kind of weekly salaries do American
adults make?
What is the average salary?
How spread out is this distribution?
Are there issues of skewness, kurtosis, etc.?
We have a sample of current American salaries.
But how likely is it that our estimates from the
sample (e.g, ) will be close to the actual
population parameters (e.g., ?).
And what do we mean by close to, anyways?

The first thing to note about sampling (i.e.,
generating a data set) is that we always try to
get data sets with more than one observation.
Why??
Suppose I want to determine the average weekly
salary of US adults.
Option 1 Ask just one adult, assume this salary
is representative of the whole.
But if there is this kind of variation possible
in checking just one person, why cant/wont
there be the same kind of variation (or even
more?) if I use a bunch of people?

At this point we must start distinguishing
sample statistics from random variables (and
their distributions).
We use statistics derived from our actual data
sets
e.g., 3.167, from 1, 2, 7, 8, -3, 4
to try to determine features about the
distribution of the (theoretical) population
e.g.,

Similarly, we need to distinguish
is a particular number, and thus has no mean
or variance.
(although it is a sample mean)
From
is a random variable (built out of other
random variables, and it does have a mean and a
variance.

Here is a useful way to think about the
differences between notions like and
is a particular number that exists only after
you have collected your data

In contrast is not any particular number.
Instead, is built out of all the ways your
data could turn out. Intuitively, is what you
have before you collect your data.

Typically, not all data sets are equally likely
instead, they have a distribution. This
distribution is what we are trying to discover.
10

In contrast is not any particular number.
Instead, is built out of all the ways your
data could turn out. Intuitively, is what you
have before you collect your data.

is determined by all the possible ways that
the Score column could be filled out, and the
probabilities that it will actually be filled out
in those ways.
11

Lets use our CPS data regarding weekly incomes
to think about the underlying logic of random
sampling.
To simplify things, lets pretend that the
population of interest is only the persons in the
sample.
Well pretend we dont know what our populations
mean weekly income is
I.e., we dont know what ? is.
Instead, were going to try to estimate this
population by collecting a random sample from it.

Random sampling does not make assumptions about
people and their incomes
The sampling procedure does not assume that
after we have selected Jon, Lisa, Ming, Pete,
Casey, to be used in our sample, we assume that
each one of them
could have had a weekly salary of, 0, with a
probability around 28/14380, AND
could have had a weekly salary of 1, with a
probability around 11/14380, AND
.., AND
could have had a weekly salary of 2884.610, with
a probability around 215/14380.

Random sampling does not make assumptions about
people and their incomes
The sampling procedure does not assume that
after we have selected Jon, Lisa, Ming, Pete,
Casey, to be used in our sample, we assume that
each one of them
could have had a weekly salary of, 0, with a
probability around 28/14380, AND
could have had a weekly salary of 1, with a
probability around 11/14380, AND
.., AND
could have had a weekly salary of 2884.610, with
a probability around 215/14380.

We decide (in advance) to collect a sample of
size n
E.g., maybe n 50, or 100, or 1000
So we have slots X1, X2, X3, Xn, to fill in
We fill in these slots by selecting persons at
random to be in these slots.
So there is about a w0/P chance of filling in any
given slot with someone whos weekly income is
0, AND
there is about a w1/P chance of filling in any
given slot with someone whos weekly income is
1, AND
. AND
there is about a w28864.61/P chance of filling in
any given slot with someone whos weekly income
is 2884.61

This underlying logic has three parts.
We get values for X1, X2, X3, Xn by
Independently drawing from
one population, which is
the correct population of interest.
Its important to observe all three of these
parts.
Violating any one of these requirements can
easily lead to numerical data which can be the
basis of impressive-looking empirical results.
But which have little or no relevance to the
empirical issue you are studying!

the i.i.d, requirement
16

Random sampling involves a set X1,,Xn of
i.i.d. random variables.
Remember i.i.d. Independent and Identically
Distributed
One observation doesnt affect the distribution
of any other observation
E.g., we dont wait to collect a low score just
because the last 10 scores we collected were
high.
E.g., we dont insist that each subsequent score
be higher than all the previous ones.
E.g., we dont start out with a preconceived idea
about what value should be, and then fill in
our slots X1,,Xn with values that make our
prediction come out right.

Random sampling involves a set X1,,Xn of
i.i.d. random variables.
Remember i.i.d. Independent and Identically
Distributed
One observation doesnt affect the distribution
of any other observation
E.g., we dont wait to collect a low score just
because the last 10 scores we collected were
high.
E.g., we dont insist that each subsequent score
be higher than all the previous ones.
E.g., we dont start out with a preconceived idea
about what value should be, and then fill in
our slots X1,,Xn with values that make our
prediction come out right.

Each Xi comes from just one population out in
the world.
E.g., to learn about American incomes, we dont
sample 30 Americans and 20 Iraqis.
E.g., we also dont ask 30 Americans about both
their weekly incomes and their monthly
rent/mortgage, and treat these as 60 data points.

Each Xi comes from just one population out in
the world.
E.g., to learn about American incomes, we dont
sample 30 Americans and 20 Iraqis.
E.g., we also dont ask 30 Americans about both
their weekly incomes and their monthly
rent/mortgage, and treat these as 60 data points.

That one population is the right one for our
purposes.
E.g., to learn about American incomes, we dont
just sample South Dakotans.
This would be biased sampling.
We would be focusing on a subpopulation, which
would be problematic for our purposes.
Our sample would probably be biased low.
E.g., we also dont ask Iraqis about their
incomes.
That is the wrong population for our study.
Similarly, we dont just ask women only, or
Hispanics only, or Senators only.
These too are interesting populations, but they
are not right for the question we are trying to
answer.

That one population is the right one for our
purposes.
E.g., to learn about American incomes, we dont
just sample South Dakotans.
This would be biased sampling.
We would be focusing on a subpopulation, which
would be problematic for our purposes.
Our sample would probably be biased low.
E.g., we also dont ask Iraqis about their
incomes.
That is the wrong population for our study.
Similarly, we dont just ask women only, or
Hispanics only, or Senators only.
These too are interesting populations, but they
are not right for the question we are trying to
answer.

Lets begin by taking a random sample of weekly
wages.
Afterwards, we will consider the underlying
statistical properties of the variableand
explore them in relation to the underlying
variable of interest, X
is the variable that will yield our sample
average of American weekly wages
X is the variable that yields these wages
We are interested in the distribution of X,
because that is what determines a large part of
our economy!
We typically care about only insofar as it
helps us understand X.

To get we need a random sample X1,,X100,
where Xi X.
Ultimately, we are interested in the distribution
of X.
Lets begin by looking at the relationship
between the mean of X and the mean of

24
In short, the mean of is the same as the
mean of X.
25

Now lets explore the variance of
Since the variables X1,,Xn are i.i.d., we can
make use of what we saw in Chapter 7, namely

26
(No Transcript)
27

In short, the variance of is 1/nth the
variance of X.
So, the standard deviation of is the size
of the standard deviation of X.
To sum up

In other words, the standard deviation of the
average of our sample (our data set) is only
the size of the standard deviation of a sample
of just one.
So if you want your estimate to have 10 times
less variability (measured by ?) in it than is
present in American adults in general, you should
take the average of 100 adults, instead of just
measuring one adult.
Thus, the reliability of our estimate increases
dramatically as n increases.

Notice that in our calculation of
it was crucial that the Xis were all i.i.d.
(independent and identically distributed)
If the Xis were not independent, then we couldnt
have made the step
If the Xis were not identically distributed, then
we couldnt have made the step

Notice also that by the Central Limit Theorem,
as n gets large, will approximate a Gaussian
distribution.
Knowing that has an (approximately) normal
distribution gives us much useful information
about how close to the mean of (and
hence to the mean of X) we are likely to get with
a given random sample.