Title: Inferential Statistics
1Inferential Statistics
2Sample and Population
real
Population PARAMETER
hypothetical
random/probability
Sample STATISTICS
non-random
3Why? and How?
- Why?
- To INFER from the sample data (STATISTIC) - what
we have - to the population (PARAMETER) -
what we want - To account for/handle the problem of SAMPLING
ERROR - To make decisions in the face of uncertainty, is
something going on.
4How?
- Hypothesis Testing
- By determining the probability that the result
obtained (based on the Descriptive Statistics)
can be accounted for by ERROR (cause by the
sampling method) - via testing Statistical Hypotheses - using
theoretical models (Sampling distributions) that
represent what would be likely to happen, simply
due to ERROR.
5How?
- Estimation
- By determining the precision of our ESTIMATE (we
dont formally test an Hypothesis) - via establishing CONFIDENCE INTERVALS around
SAMPLE STATISTICS (where the range of the
interval is primarily a function of the magnitude
of the sampling error) - What is the population value likely to be
6The concept of theoretical SAMPLING DISTRIBUTION
- A sampling distribution is a theoretical
distribution of all possible - sample values of a given size (n), under the
assumption that the - null-hypothesis is true (i.e.., under the
assumption that the variability - in the sample values is due to ERROR.
- The variability in the sample values is
represented by the SE - (Standard Error)
SAMPLING DISTRIBUTION Distribution of all
possible SAMPLE values of a given size
Mean
7How we use a SAMPLING DISTRIBUTION and STANDARD
ERROR (SE)
- We test the model represented by the sampling
distribution by finding the relative position of
our (one and only) real SAMPLE STATISTIC in
this theoretical sampling distribution
8The Logic
- Common outcome
- If the model is true, then the relative position
of our sample value in the model should not be
far out (since, the further out, the less
probable - vs
- Rare outcomes
- If the relative position of our sample value is
far out, (assuming it is, in fact a random
representative sample from the population),
then the model (which, recall, represents ERROR
only) is probably not true.
Variability due to error
Relative position of our one, real Sample
Statistic
9Variability due to error
Relative position of our one, real Sample
Statistic
If this model were true, then is an improbable
event. Since is real, i.e.., since our sample
value is an empirical fact therefore, the
model is probably not true. So we reject the
model.
10Sampling distribution
- Inferential statistics estimates the population
parameters from the sample values - Characteristics
- A batch of means, called sampling distribution of
means - The mean of the sampling distribution equals the
mean of the population - The standard error of the mean is the standard
deviation of the population - The batch of sample means would be normally
distributed around the mean of the distribution,
X with a standard deviation of ????
11Central Limit Theorem
- if a population has a mean ??and a standard
deviation ?? then the distribution of sample
means drawn from this population approaches a
normal distribution as N increases, with a
and standard deviation ?????
12Significance Level
- When do we draw the line?
- At what point do we say that our result is a rare
occurrence and that is highly unlikely to be
sampling error? - Significance level is the probability that a
result is due to sampling error, and, if this
probability is small enough, we reject the notion
that sampling error is the cause. - .05 significance level. If the probability that
our result happened by change is .05 or less, we
say that our results are significant at the .05
level.
13Estimation
- The population mean is a fixed value, and it is
the sample mean that deviate about this fixed
value. - Instead of talking of possible values that ??may
take, given our sample X, we set up a confidence
interval in which the true mean probably lies. - 95 confidence interval - X 1.96sx
- X the sample mean and S standard error of the
mean - (sample mean 69.2 and s of 0.3)
- X 1.96sx 69.2 1.96(0.3)
- 69.2 .59
- 68.61 to 69.79
14Confidence interval -Examples
- Estimate of population mean
- N (200), X 102, S 12
- 95 confidence interval X 1.96sx
- First calculate Standard error of the mean
- X 1.96 102 1.96(0.85)
- 102 1.67
- 100.33 to 103.67
15- Claim that college women are taller than 10 years
ago. - Todays average height is 66
- We take a random sample of 50 college women
- We find a Mean 65, S 2.5
- Calculate SE
- We have to deny that the claim that average is 66
is wrong, since the population mean is in the
64.07 to 65.93 and 66 lies outside that interval.
X 2.58 65 2.58(0.36) 65
0.93 64.07 to 65.93