Title: MATH 401 Probability and Statistics
1MATH 401Probability and Statistics
2Basis for Inferential Statistics
- We assume that an unknown population is described
by a random variable. - In that case a histogram and an ogive for
relative frequencies based on a sample give
the contour of the PDF and Cumulative Probability
Function, respectively. - Other important characteristics of a random
variable are the expectation and the variance. - Summarizing data is essentially an initial
attempt to estimate these parameters.
3Population Mean
- For a population of size N, its mean, ?, is given
by
4Sample Mean
- For a sample of size n, the sample mean is given
by
5The Population Variance
- The population variance is given by
6Sample Variance
- For a better estimate (???), the sample variance
is defined by
7Parameter Estimation
8Parameters and Statistics
- A parameter is a population measure (e.g. ?, ?2).
- A statistic is a sample function (e.g. sample
mean, sample variance). - Hence, statistics may be regarded as random
variables. - Statistics are used to estimate parameters and
are called point estimators. - A point estimate of a parameter is a single
numerical value of a respective estimator. - The standard deviation of an estimator is called
the standard error.
9Good Point Estimator
- Let ? be a parameter and An a statistic
estimating ? (based on a sample of size n). An is
a good estimator of ?, if - It is unbiased E(An) a
- It is consistent
- It is relatively efficient It has the smallest
variance.
10Estimating the Mean and Variance
- Given a population of size N, we need to find its
mean ? and variance s2. - The number N is too big, so we pick a sample of
reasonable (?) size n. - Find the sample mean and sample variance.
- How good is as an estimate of ??
- How good is s2 as an estimate of s2?
11Reminder 1 Scaling of Expectation and Variance
- Let a be a real number.
- Then aX is a new random variable with the same
distribution as X. - We observe that
12Reminder 2 Sum of Independent RV
- Let Xk, k1,,n, be independent RV. Then
13Conclusion
- Let Xk, k1,,n, be independent RV. Then
14Analysis of Sample Variance
- Let X1,, Xn be independent i.d. NRV. One can
show that
15Analysis of Sample Variance
- We compute the expectation of the sum
16Analysis of Sample Variance
- It remains to notice that
17Reminder 3 Normal Distribution
- A continuous RV is said to be normally
distributed if its PDF is given by
18Reminder 4 Standard Normal Distribution
- A non-standard ND can be standardized by
- That is,
19Reminder 5 Sum of Normal Distributions
- Let Xk, k1,,n, be independent normally
distributed RV. Then
20Distribution of a Normal Sample Mean
- Suppose all Xi are identically normally
distributed. - Then the sample mean is clearly normally
distributed with
21Standardization of Sample Mean
- Hence, the random variable
- has a SND.
-
22Interval Estimates
- An interval estimate of a parameter is an
interval within which the parameter is estimated
to exist. - The confidence level of an interval estimate is
the probability that the interval contains the
parameter. - Notation An interval estimate with a confidence
level 1-a, is referred to as a 1-a confidence
interval.
23Interval Estimates on the Population Mean
- An interval estimate on the mean is an interval
centered at the sample mean - ? is the maximum error of estimation.
- Saying that ??? is
equivalent to saying that . - How confident we are in this statement depends on
(1 a) - the confidence level of the interval.
24The Error and the Confidence Level
- Recall that
- It is clear that 1-a grows as ? increases.
- You have a better chance of hitting the
population mean if you widen the interval around
the sample mean. - We would like to know the exact relation between
a and ?. - For example, what would the error be if you would
like to be 99 confident in your interval
estimate of ??
25Note
26Observations
- If the original variable X is normally
distributed, then the sample mean is normally
distributed with mean ? and variance ?2/n. - Were interested in
- where x1 ??? and x2 ???.
- The corresponding z-values are
27The Relation between a and ?
- Thus,
- That is, 1-a is twice the area under the
SND-curve between 0 and . - Hence, z?/2 the
100(1-a/2)-percentage point of the standard ND
variable.
28The Central Limit Theorem
- Let X1, X2, . . ., Xn be a sequence of
independent identically distributed random
variables, each having mean ? and variance ?2. - Then, for large values of n, the distribution of
- X X1 X2 . . . Xn
- is approximately normal
- with mean n? and variance n?2.
29Implications of the Central Limit Theorem
- For large n,
- The distribution of the sum of independent
identically distributed random variables is
normal although the variables themselves need not
be normally distributed. - The distribution of the sample means is
approximately normal, with mean ? and variance
?2/n. - In many practical examples a sample of size 40 or
more will be sufficient for the normal
approximation to work well. In some cases the
Central Limit Theorem will work even if nlt40.
30Example
- The president of a large university wishes to
estimate the average age of students presently
enrolled. From past studies, the standard
deviation is known to be 2 years. A sample of 50
students is selected, and the mean is found to be
23.2 years. Find the 95 confidence interval of
the population mean.
31Solution
- We need to find ? such that
- P(23.2 ? ? ?? ? ? 23.2 ?) 0.95 1-a
- Hence,
- Thus, we need to find z (a.k.a z?/2)such that
32Solution
- From the standard normal distribution table, we
get
33Solution
- Hence, the 95 confidence interval of the
population mean is - (23.2 ? 0.6, 23.2 0.6)
- (22.6, 23.8)
34Example
- A college president wishes to estimate the
average age of students presently enrolled. How
large a sample is necessary? The president would
like to be 99 confident that the estimate should
be accurate with 1 year. From a previous study
the standard deviation of the ages is known to be
3 years.
35Solution
- Here we are given the following
- 1-a 0.99
- ? 3
- ? 1
- We would like to know the sample size, n, such
that - where .
36Solution
- From the table za/2 2.58.
- Thus, n (2.58)(3)2 59.9.
- Which is rounded up to 60.
37Estimating the Variance
- Another parameter which often needs to be
estimated is the variance s2. - Its natural estimator is the sample variance S2 .
- In order to construct an interval estimate on the
population variance we shall require a more
detailed analysis of S2.
38Analysis of Sample Variance
- Let X1,, Xn be independent i.d. NRV. We showed
that
39Analysis of Sample Variance
- Standardization of ND implies that
40Chi-Square Distribution
- Let Z1,,Zn be independent standard
normally-distributed random variables. - The random variable
- is called a chi-square distribution with n
degrees of freedom (d.f.).
41Future Plans
- In the next meeting we are going to study the
chi-square distribution in detail. - This will enable us to construct confidence
intervals on the population variance. - So the next lecture is on
- CONFIDENCE INTERVALS ON VARIANCE.
42Thank you