Title: A Review of Basic Concepts
1A Review of Basic Concepts
2Definition 1.1
- Statistics is the science of data. This involves
collecting, classifying, summarizing, organizing,
analyzing, and interpreting data.
3Definition 1.2
- An experimental unit is an object (person or
thing) upon which we collect data.
4Definition 1.3
- A variable is a characteristic (property) of the
experimental unit with outcomes (data) that vary
from one observation to the next.
5Definition 1.4
- Quantitative data are observations measured on a
naturally occurring numerical scale.
6Definition 1.5
- Nonnumerical data that can only be classified
into one of a group if categories are said to be
qualitative data.
7Definition 1.6
- A population data set is a collection (or set) of
data measured on all experimental units of
interest to you.
8Definition 1.7
- A sample is a subset of data selected from a
population.
9Definition 1.8
- A statistical inference is an estimate,
prediction, or some other generalization about a
population based on information contained in a
sample.
10Definition 1.9
- A measure of reliability is a statement (usually
quantified with a probability value) about the
degree of uncertainty associated with a
statistical inference.
11Definition 1.10
- A representative sample exhibits characteristics
typical if those possessed by the population.
12Definition 1.11
- A random sample of n experimental units is one
selected from the population in such a way that
every different sample of size n has an equal
probability (chance) of selection.
13Describing Quantitative Data Numerically
14Definition 1.15
- The mean of a sample of n measurements is
15Notation
- Sample mean
- Population mean
16Definition 1.16
- The range of a sample of n measurements is
the difference between the largest and smallest
measurements in the sample.
17Definition 1.17
- The variance of a sample of n measurements
is defined to be
18Definition 1.18
- The standard deviation of a set of measurements
is equal to the square root of their variance.
Thus, the standard deviation of a sample and a
population areSample standard deviation
sPopulation standard deviation
19Guidelines for Interpreting a Standard Deviation
- For any data set (population or sample), at least
three-fourths of the measurements will lie within
2 standard deviations of their mean. - For most data sets of moderate (say, 25 or more
measurements) with a mound-shaped distribution,
approximately 95 of the measurements will lie
within 2 standard deviations of their mean.
20Definition 1.19
- Numerical descriptive measures of a population
are called parameters.
21Definition 1.20
- A sample statistic is a quantity calculated from
the observations in a sample.
22Standard normal distribution
23Definition 1.21
- The sampling distribution of a sample statistic
calculated from a sample of n measurements is the
probability distribution of the statistic.
24Theorem 1.1
- If represent a random sample of n
measurements for a large (or infinite) population
with mean and standard deviation then,
regardless of the form of the population relative
frequency distribution, the mean and standard
error of estimate of the sampling distribution of
will beMean Standard error of estimate
25The Central Limit Theorem
- For large sample sizes, the mean of a sample
from a population with mean and standard
deviation has a sampling distribution that is
approximately normal, regardless of the
probability distribution of the sampled
population. The larger sample size, the better
will be the normal approximation to the sampling
distribution of
26Estimating a Population Mean
- If the mean of the sampling distribution of a
statistics equals the parameter we are
estimating, we say that the statistic is an
unbiased estimator of the parameter. If not, we
say that it is biased.
27Fig. 1.17 Sampling distribution of
- See Applet
- (http//www.ruf.rice.edu/lane/stat_sim/sampling_
dist/index.html)
28Large-Sample 100(1-?) Confidence Interval for ?
- where is the z value with and area ?/2 to
its right (see Figure 1.18) and The parameter
? is the standard deviation of the sampled
population and n is the sample size. If ? is
unknown, its value may be approximated by the
sample deviation s. The approximation us valid
for large samples (e.g., n ? 30) only.
29Small-Sample Confidence Interval for ?
- where and is a t value based on (n 1)
degrees of freedom, such that the probability
that is ?/2. - Assumptions The relative frequency distribution
of the sampled population is approximately normal.
30Testing a Hypothesis About a Population Mean
- A null hypothesis, denoted by the symbol which
is the hypothesis that we postulate is true - An alternative (or research) hypothesis, denoted
by the symbol which is counter to the null
hypothesis and is what we want to support.
31- A test statistic, calculated from the sample
data, that functions as a decision maker. - A rejection region, values of a test statistic
for which we reject the null hypothesis and
accept the alterative hypothesis.
32Large-Sample (n?30) Test of Hypothesis About ?
- TWO-TAILED TEST
- Test statistic
- Rejection region
- where is chosen so that
33Type I and Type II errorSmall-Sample Test of
Hypothesis About ?
- TWO-TAILED TEST
- Test statistic
- Rejection region
- where is based on (n 1) df
34Reporting Test Results as p-Values How to Decide
Whether to Reject H0
- Choose the maximum value of ? that you are
willing to tolerate. - If the observed significance level (p-value) of
the test is less than the maximum value of ?,
then reject the null hypothesis.
35Large-Sample Confidence Interval for (m1-m2)
Independent Samples
- Assumptions The two samples are randomly and
independently selected from the two populations.
The sample sizes, n1 and n2, are large enough so
that and each have approximately normal
sampling distributions and so that and
provide good approximations to and This
will be true if n1 ? 30 and n2 ? 30.
36Large-Sample Test of Hypothesis About (m1-m2)
Independent Samples
- TWO-TAILED TEST
- where D0Hypothesized difference between the
means (this is often 0) - Test statistic
- where Rejection region
37Small-Sample Confidence Interval for (m1-m2)
Independent Samples
- where
- Is a pooled estimate of the common population
variance and ta/2 is based on (n1n2-2) df. - Assumptions
- Both sampled populations have relative frequency
distributions that are approximately normal. - The population variances are equal.
- The samples are randomly and independently
selected from the populations.
38Small-Sample Test of Hypothesis About (?1-?2)
Independent Samples
- TWO-TAILED TEST
- Test statistic
- Rejection region
- where t? is based on (n1n2-2)df
- Assumptions Same as for the small-sample
confidence - interval for (?1-?2) in the previous box.
39Paired Difference Confidence Interval for mdm1-m2
- LARGE SAMPLE
- Assumption The sample differences are randomly
selected from the population of differences.
40Continued
- SMALL SAMPLE
- where ta/2 is based on (nd-1) degrees of freedom
- Assumptions
- The relative frequency distribution of the
population of differences is normal. - The sample differences are randomly selected from
the population of differences.
41Paired Difference Test of Hypothesis for mDm1-m2
- TWO-TAILED TEST
- LARGE SAMPLE
- Test statistic
- Rejection region
- Assumption The differences are randomly selected
from the population of differences.
42Continued
- SMALL SAMPLE
- Test statistic
- Rejection region
- where ta/2 is based on (nd-1) degrees of freedom
- Assumptions
- The relative frequency distribution of the
population of differences is normal. - The differences are randomly selected from the
population of differences.