Title: Probability
1Probability
- Probability Sampling Distribution of Mean,
Standard Error of the Mean Representativeness of
the Sample Mean
2Probability Frequency View
- Probability is long run relative frequency
- Same as relative frequency in the population
- Dice toss p(1) p(2) p(6) 1/6
- Coin flip p(Head) p(Tail) .5
3Probability Decision Making
- Decision making like gambling go with what is
likely. - Lady tasting tea in England. Milk first or
second? - 5 cups of tea to taste. What is the probability
she gets it right?
4If you cannot tell the difference, how likely
will you be right on all cups?
Cup Probability Correct
1 .5 ½
2 .25 ½½
3 .125 ½½½
4 .0625 ½½½½
5 .03125 ½½½½½
How many cups would it take to convince you?
Convention in social science is a probability of
.05. Using this standard, she would have to get
all 5 right to be convincing in her ability. She
did they were.
5Frequency Distribution of the Mean
- What is the distribution of means if we roll dice
once? - What is the distribution of means if we roll
dices twice and take the average? - Three times?
- (See Excel File dice)
6Dice
Sampling Distributions of Means
Raw Data
Ave of 2 Dice
1 Die
Ave of 3 Dice
M 3.5 SD 1.87
M 3.5 SD .99
M 3.5 SD 1.23
Notice the mean, standard deviation, and shape of
the distributions.
7Sampling Distribution
- Notion of trials, experiments, replications
- Coin toss example (5 flips, heads)
- Repeated estimation of the mean
- Sampling distribution is a distribution of a
statistic (not raw data) over all possible
samples. Same as distribution over infinite
number of trials. Recall dice example.
8Estimator
- We use statistics to estimate parameters
- Most often
- Suppose we want to estimate mean height of
students at USF. Sample students, estimate M. - Accuracy of estimate depends mostly upon N and SD.
9Example of Height
Hypothetical data.
Note that graph shows the population.
10Raw Data vs. Sampling Distribution
Note middle and spread of the two distributions.
How do they compare?
11Definition of Bias
- Statisticians have worked out properties of
sampling distributions - Middle and spread of sampling distribution are
known. - If mean of sampling distribution equals
parameter, statistic is unbiased. (otherwise,
its biased.) The sample mean is unbiased. - Best estimate of is .
12Definition of Standard Error
- The standard deviation of the sampling
distribution is the standard error. For the
mean, it indicates the average distance of the
statistic from the parameter.
Standard error of the mean.
13Formula Standard Error of Mean
- To compute the SEM, use
- For our Example
Standard error SD of means .57
14Review
- What is a sampling distribution?
- What is bias?
- What is the standard error of a statistic?
- Suppose we repeatedly sampled 100 people at a
time instead of 50 for height at USF. - What would the mean of the sampling distribution?
- What would be the standard deviation of the
sampling distribution?
15Definition
- A sampling distribution is a distribution of
_____? - 1 parameters
- 2 samples
- 3 statistics
- 4 variables
16Definition
- What is the standard error of the mean?
- 1 average distance of standard from the error
- 2 average distance of raw data (X) from the data
average (X-bar) - 3 square root of the sampling distribution of
the variance - 4 standard deviation of the sampling distribution
of the mean
17Computation
- If the population mean is 50, the population
standard deviation is 2, and the sample size is
100, what is the standard error of the mean? - 1 .2
- 2 .5
- 3 2
- 4 10
18Deciding whether a Sample represents a Population
Representativeness degree to which the sample
distribution resembles the population
distribution.
We can use the normal distribution to figure the
probability of a sample mean. If the sample mean
is very unlikely (has a low probability) we
conclude the sample does not represent the
population. If it is likely, we conclude it does.
Suppose we grab a sample of 49 students and their
mean GPA is 3.7. We know the population mean is
3.1 and the population SD is .35. Is the sample
representative?
19Likely?
Area beyond 10 ? From z table p 7.6910-23
Recall that anything beyond z 2 is rare
anything beyond z 3 is remote.
20Rejection Region
Place in the curve that is unlikely if the
scenario is true. Area totals to probability.
Convention is p .05 That 5 percent of the area
least likely to occur if the scenario is true is
the rejection region. In most cases, the
extremes of both tails are the places for the
rejection region. The sample is unrepresentative
if it falls far from the center. For z, the
border is /- 1.96 for p .05 for 2 tails. For
1 tail, it is 1.65.
Bottom 2.5 pct
Top 2.5 pct
21Review
We know the population mean is 50 and the
population standard deviation is 10. We grab 100
people at random and find the mean of the sample
is 45. Does the sample represent the population?