Title: Law of large numbers.
1Lecture 9.
- Law of large numbers.
- Central limit theorem.
- Confidence interval.
- Hypothesis testing. Types of errors.
2Some practice with the Normal distribution (please
open Lect9_ClassPractice)
1. Math aptitude scores are distributed as
N(500,100). Find a. the probability that the
individual score exceeds 600 andb. the
probability that the individual score exceeds 600
given that it exceeds 500
2. What mathematics SAT score, can be expect to
occur with the probability 0.90 or greater ?
3The (weak) law of large numbers
(The following result is also known as a
Bernoulli theorem).
Let X1 , X2 ..., Xn be a sequence of
independent and identically distributed (iid)
random variables, each having a mean EXi ? and
standard deviation Define a new
variable, the sample mean
One should keep in mind that is also a
random variable
it randomly changes between different samples
of the same size n, and those changes are
sensitive to the size of the sample, n
Example (Lect9_LargeNumbers.nb).
4Using the properties of expectations it is easy
to prove that
Using the properties of the variances (see Lect.
8) we can also prove that
Then, using the Chebyshev inequality, (we use it
without prove) one can find
5where is any (but typically small) positive
number. Let us discuss this result. The left side
contains the probability that the deviation
between the sample average, and the mean
value of the individual random variable, ?,
exceeds . The right hand side shows that
this probability decreases as 1/n for large
n. Thus, one can say that converges to
? in probability.
(Weak) Law of Large Numbers.Suppose X1 ,X2
...,Xn are i.i.d. with finite E(Xi) ?. Then, as
(9.5), converges to ?
in probability.
6(9.5) could be called the fundamental theorem of
statistics because it says that the sample mean
is close to the mean ? of the underlying
population when the sample is large. It implies
that we do not have to measure the weights of all
American 30 year old males in order to find their
average weight.We can use the weights of 1000
individuals chosen at random. Later on we will
discuss how close the mean on the sample of this
size will be to the mean of the underlying
population.
7Dice average
8Coins- Average (relative to ½)
9Coins average (2)
10Experiments with dice and coins show that the
rate of convergence to the average value is quite
slow. Using (9.4) we can understand why it is
so. Suppose that we flip a coin n 1000 times
and let Xi be 1 if the i-th toss is Heads and 0
otherwise. The Xi are Bernoulli random
variables with pP(Xi1)0.5. Exip0.5. VarXi
(1-p)2p p2(1-p)(1-p)p(1-pp)p(1-p)1/4. Takin
g ??0.05 (which is the 10 of the expected
value), we find
We can see that the probability of a
comparatively large fluctuation (10 of the mean
value) is not too small. It explains the large
irregularities that we can find in the pictures.
11Central Limit Theorem.
The Law of Large Numbers indicated that the
sample average converges in probability to the
population average. Now our concern are the
probability distributions. Suppose that we know a
distribution for i.i.d. Xi. We wonder how the
distribution of the sum of Xi is related to the
individual distributions. Reiteration assuming
we know pdf(Xi), what can we tell about the pdf(
) for large n.
Central Limit Theorem(First formulation)Suppose
X1 ,X2 ...,Xn are i.i.d. with E(Xi) ? and finite
Var(Xi) ?2. Then, as
(9.5)
where ? denotes a random variable with the
standard normal distribution.
12Another formulation (which is essentially the
same, but provides some additional insight)
Let Sn X1 ,X2...,Xn be the sum of n discrete
i.i.d. with E(Xi) ? and Var(Xi) ?2. Then, for
any x
We already know from previous that is the
probability function for the continuous
distribution (see also cumulative distribution
function)
Let us now compare the equations (9.6) and (A).
Please notice that z in these equations is a
silent variable. Its name can be changed by any
other letter and does not affect the outcome of
the integration. In my experience this naming
issue often causes confusion ?
13Now we are well equipped for the final step.
P(X (A)
Comparing (9.6) and (A) we discover that the
distribution function for the (standard) random
variable
asymptotically approaches the standard normal
distribution.
Comment Sn can be dimensional variable height,
weight, gas pressure, etc.Try proving that
is always dimensionless.
14Examples
Example 1 Suppose we flip a coin 900 times. What
is the probability to get at least 465 heads?
Check it!
15Meditation about the CLT
Can a large group of insane individuals
behave as reasonably
as a randomly chosen groupe
of IBM or NASA engineers ?
16As a matter of fact. CLT indicates that such an
assumption is not completely unreasonable. Accord
ing to CLT, the sums of independent random
variables tend to look normal no matter what
crazy distribution the individual variables have
(GS,p.362).
17Question for practice (GS,9.3.7) Extra-credit
- Choose independently 25 numbers from the interval
0,1 with the probability density f(x) given
below and compute their sum s25. Repeat this
experiment 100 times and make a bar graph of the
result. How well does the normal density fit your
bar graph in each case? - f(x)1
- f(x) 3 x2
- f(x) 2 - 4x-1/2
18CLT in statistics
19Confidence Interval
In statistics we always deal with the limited
samples of population. Usually, the goal is to
draw inferences about a population from a sample.
This approach is called Inferential statistics.
Suppose that we are interested in the mean number
of words that can be remembered by a high school
student. First, we have to select a random sample
from the population. Suppose that the group of 15
students is chosen. We can use the sample mean as
an unbiased estimate of , the population
mean value. However, given the small size of the
sample, it will certainly not be a perfect
estimate. By chance it is bound to be at least
either a little bit too high or a little bit too
low.
20For the estimate of µ to be of value, one must
have some idea of how precise it is. That is, how
close to µ is the estimate likely to be? The
CLT helps answering this question. It tells us
that the sample mean is a random variable
normally distributed near the population
mean. This can help us to derive a probabilistic
estimate of µ based on the sample mean. An
excellent way to specify the precision is to
construct a confidence interval. 4.1 Confidence
interval, is known Lets assume for
simplicity that we have to estimate µ while
is known (this case is not very realistic but can
give us an idea).
21It can be shown (check it with Mathematica) that
for the standard normal distribution
Noticing now that the standardized variable is
we have
More precisely, the 95 range is (-1.96, 1.96),
but using (-2,2) is good enough for all practical
purposes
22In other words, with the probability 0.95,
belongs to the interval
We say that this is a 95 confidence interval for
23Example 1. Suppose that we have a random sample
of size 100 from a population with standard
deviation 3 and we observe a sample mean
. What can we tell about µ?
In this case
As a result,
24Example 2. How large a sample must be selected
from a normal distribution with standard
deviation 12 in order to estimateto within 2
units
25In the previous example it was assumed that the
Variance is known. It is much more common that
neither mean nor the variance is known when the
sampling is done. As a rough estimate, one can
use the sample variance
This will lead to a normalized variable
Which is a function of two random variables.
26It was shown by W.S. Gosset that the density
function for Tn is not normal but rather a
t-density with n degrees of freedom. For large n
t-density approaches the normal distribution. In
general, t-density closely related the
Chi-squared distribution
This is called the Chi-squared distribution with
n degrees of freedom.
27This distribution describes the sum of squares of
n independent random variables. It is very
important for comparing experimental data with a
theoretical discrete distribution to see whether
the data support the theoretical model. Please
read Chapter 7.2 of GS for details. Here we
consider only simple examples of hypothesis
testing.
28Hypothesis testing
We consider now two types of hypothesis
testing Testing the mean and testing the
difference between two means
29Example 1. Suppose we run a casino and we wonder
if our roulette wheel is biased. To rephrase our
question in statistical terms, let p be the
probability red comes up and introduce two
hypothesis
H0 p 18/38 null
hypothesis H1 p 18/38
alternative hypothesis
To test to see if the null hypothesis is true, we
spin the roulette n times and let Xi1 if red
comes up on the i-th trial and 0 otherwise, so
that (the observed mean value) is the
fraction of times red came up in the first n
trials.
30How to decide if H0 is correct? It is
reasonable to suggest that large deviations of
from the fair value 18/38 would indicate
that H0 failed. But how to decide which
deviations should be considered large? The
concept of the confidence interval can help
answering this question. As we know, with
probability 95 should belong to the
confidence interval
Reminder n is the sampling size or the number of
the iid (independent and identically distributed)
variables added together, and are
their individual mean and standard deviation
values.
31Rejecting H0 when it is true is called a type I
error. Accepting H0 when the alternative
hypothesis is true is called a type II error
In this test we have set the type I error to be 5
Thus, we can say that if
falls outside the chosen interval,
then the H0 can be rejected with possible error
less then 5.
Let us find now the individual standard
deviation. The variable Xi takes only two values
0 and 1. Probability(1)18/38, Probability(0)
20/38. Using this, we can find
118/38020/3818/38. (I use notation for
the mean.)
32Thus, 2 s 1 and the test can now be formulated
as
or in terms of the total number of reds Sn S1
S2 Sn,
33Suppose that we spin the wheel 3800 times and get
red 1869 times. Is the wheel biased? The
expected number of reds is (18/38)n1800. Is the
deviation from this value Sn 1800 69
indicates that the wheel is biased? Given the
large numbers of trials, this deviation might not
seem large. However, it must be compared with
n1/2 (3800)1/2 61.6 Given that 69 61.6 we
have to reject H0 with a notion that if H0 were
correct then we would see and observation this
far from the mean less than 5 of the
time. Obviously, it does not prove the wheel is
biased but shows that it is very likely to be
true.
34Example 2. Do married college students with
children do less well because they have less time
to study or do they do better because they are
more serious? The average GPA for all students
was 2.48. To answer the question, we formulate
two hypothesis H0 2.48 null
hypothesis H1 2.48 alternative
hypothesis
35The records of 25 married college students with
children indicate that their GPA average was
2.36 with the standard deviation of 0.5. Using
the last example, we see that to derive a
conclusion with type I error of 5 we should
reject H0 if
We can see that the observed deviation Xi
-2.480.12 reject the 0-hypothesis. In other words, we can
not be 95 certain that married students with
children do either worth or better that the
students without children.
36Testing the difference of two means
Suppose that we want to compare the test results
for two independent random samples of size n1 and
n2 taken from two different populations. A
typical example would be testing a drug, one
sample taken from the population taking the drug
and another from the reference group. Suppose
now that the population means are unknown. For
example, we do not know what of population
would be infected by a certain disease if the
drug is not taken. Notice that in the previous
case we were able to calculate the mean for the
0-hypothesis. Now we can not.
37H0 - null hypothesis
H1 - alternative hypothesis
From the CLT
-
If H0 correct, then
38Based on the last result, if we want a test with
a type 1 error of 5 then we should reject H0
if
Example study of passive smoking reported in
NEJM. The size of lung airways was taken for 200
female nonsmokers who were in a smoky environment
and for 200 who were not. For the 1-st group the
average was 2.72 and sigma 0.71, while for the
second group the corresponding values were 3.17
and 0.74.
39Based on these data we find
while
This means that the conclusions are convincing
(should be taken seriously)