Continuous Probability Distributions - PowerPoint PPT Presentation

1 / 103
About This Presentation
Title:

Continuous Probability Distributions

Description:

Continuous probability distributions are typically associated with ratio scales: ... situation where people have varying degrees of tendency to visit McDonald's over ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:5.0/5.0
Slides: 104
Provided by: SocialSc2
Category:

less

Transcript and Presenter's Notes

Title: Continuous Probability Distributions


1
Chapter 8
  • Continuous Probability Distributions

2
Continuous Probability Distributions
  • Continuous probability distributions are
    typically associated with ratio scales
  • Height how likely is it that a child in the
    class is 1.7 meters tall?
  • Finance what are the chances that the ratio of
    First and Second Quarter profits will be ? 1.25?
  • Vision Science at what wavelength (measured in
    nm) in the electromagnetic spectrum are the M
    human photoreceptors maximally receptive?
  • Physics the magnetic moment of the electron is
    1.001159652201 ? 0.000000000030

3
  • Similar to discrete probability distributions,
    continuous probability distributions identify the
    events in a probability space with sets of
    numbers on the number line.

4
  • A function f is a probability density function
    (pdf) if
  • f(x) 0, for every number x
  • where

5
  • If f is the pdf of a continuous distribution,
    then F is the cumulative distribution function
    (cdf) of that distribution

6
  • We define the probability of the event of the
    random variable yielding a value less than some
    number a as
  • Pr(X lt a) F(a)
  • Similarly, the probability of X being greater
    than a is
  • Pr(X gt a) 1 F(a)

7
  • We define the probability of the event of the
    random variable yielding a value in the interval
    a, b
  • Pr(a X b) F(b) F(a)

8
  • We would like to understand a continuous
    probability distribution like

9
  • So lets approximate it with
  • We ignore the extreme values whose absolute
    value exceeds 4
  • We use cell marks (cf. chap. 3) to estimate the
    probability of falling within a given range

10
  • Because of the way we take cell marks, with only
    a few categories, our accuracy is limited
  • With more categories, well get more accurate

11
  • Now we can figure out the probabilities of being
    in one of these categories
  • Just as in the previous chapter, we can represent
    these probabilities precisely and completely with
    a histogram.
  • At this point it is crucial to remember that
    histograms express probabilities of events (i.e,
    the probability of being in one of these
    intervals) as the area of the histogram
    corresponding to the event.

12
  • The probability that X yields a value between .5
    and 1

13
  • In probability theory, we demand that the total
    area of the histogram 1
  • (This contrasts with how Eviews handles sample
    distributions.)
  • So we have
  • Lets check our accuracy on Eviews.

14
  • We now have a probability distribution for 9
    possible categories
  • Each category is an interval of possible values
  • We trimmed off the extremities the values
    greater than 4 or less than 4.
  • Well leave these extremities alone for now.
  • But why stop with just 9 categories?
  • Lets make a more fine-grained histogram, one
    with 20 categories
  • Howzabout 40 categories? 100? 1000?

15
  • If you remember your calculus, you can see what
    were doing here.
  • We are creating increasingly fine-grained
    (discrete) approximations of a continuous curve.
  • We finish off (this part of) our project by going
    whole-hog
  • We dont stop with n 100, or n 10 million
  • Instead, we let n go to infinity.

16
  • Lets look at this situation a bit more
    carefully.
  • For any number n you like (for ease, lets assume
    n gt 10)
  • We create a partition of the interval (a, b), by
    specifying n 1 points, all equally spaced
    apart
  • Thus a c0, b cn, and for every ci, (i n)

17
  • Thus, intuitively speaking, our probability
    distribution (leaving out the extremities for
    now) turns out to be represented by the histogram
    of the grouped data for n groups, but with n ?,
    and each category containing a single number.

18
  • Lets use Length(ci) and Height(ci) to denote the
    length of category ci ( ci ci-1) and the
    height of the bar associated with ci.

19
  • In our current example, a -4, and b 4.
  • f is the pdf of the continuous distribution
  • It characterizes how probabilities are
    distributed across the infinitely many numbers in
    the interval (a, b).
  • It replaces the probability function pr(ci-1 lt
    X ci) used in our discrete distributions.

20
Extending the distribution to the entire line
of real numbers
  • Lets now turn to those numbers outside of (a, b)
    that weve ignored so far
  • To make the situation visually more obvious,
    lets pretend we were working with the interval
    (-1.5, 1.5), instead of (-4, 4).

21
  • So far, weve seen how to go from

22
  • to

23
  • Notice that by working with (1.5, 1.5) our
    estimation of the curve is forced to be more
    inaccurate.
  • Because the area under the curve must be 1.

24
  • But now what about those extremities that weve
    been ignoring?
  • We want our theory to allow every number to be a
    possible value, not just those between a and b.
  • So we need to extend our theory just a little bit
    more
  • We will do what we just did, but we will extend
    each boundary by some quantity m
  • (a m), (b m)
  • E.g. (1.5 .5), (1.5 .5)
  • So our new interval will be (2, 2)

25
  • Now we go from

26
  • to

27
  • Notice also how our approximation improves

28
  • Lets make m even larger, and go from

29
  • to

30
  • Now our approximation is getting pretty good

31
  • The remaining probabilities that we havent yet
    accounted for
  • pr(X 3), pr(X ? 3)
  • are rather small, but that doesnt matter here
  • We can continue extending our probability space
    by setting
  • m 5
  • m 6
  • m 60
  • m 10,000

32
  • In short, we go from

33
  • to

34
  • Lets examine three features of the pdf f
  • Our construction of f ensures that pr(X c) 0,
    for any number c.
  • f is a derivative.
  • f is not a probability function.

35
  • Some Preliminaries.
  • Recall

36
  • More specifically, for any appropriate n and i,
    such as n 100, and i 32 ________________
    ________________

37
  • 1. pr(X k) 0 for all numbers k.
  • Earlier we showed that
  • Hence

38
But as n gets very large, the length of every
cell ci ( ci ci-1) gets very small
39
  • 2. f is a derivative
  • From our Preliminaries, we have
  • Hence

40
  • Recall that
  • Notice also
  • So we can argue

41
  • So, in conclusion, we haveBut this means
    that f is a derivative

42
  • Question Is it possible to put this last
    equality in the form for derivatives given by my
    calculus book?
  • here, f F'

43
  • Let
  • So h is determined by n, and as n gets large, h
    gets arbitrarily small.
  • for each h, we can define a function
  • where ci-1 lt x ci.

44
  • Now we define
  • So we have

45
(No Transcript)
46
  • There is another way that we can tell that f is a
    derivative
  • From the Fundamental Theorem of Calculus, we have
    the relationship

47
  • 3. f is not a probability function.
  • Notice that pdfs take single numbers as their
    arguments, probability functions take sets of
    numbers as their arguments.

48
  • A concrete (counter-)example
  • Sometimes f takes on values greater than 1.
  • Probability functions, by definition, cannot do
    this!
  • But for any 0 a b 1, pr(a X b) 1

49
The Uniform Distribution
  • The uniform distribution is simple but important.
  • The uniform distribution over the interval (a,
    b) is defined as

50
Here is the uniform distribution on (0, 1)
51
Here is the uniform distribution on (-2, 14)
52
Here are the cdfs of the two distributions. Why
is the cdf F(x) (for a lt x lt b)??
53
Here are the cdfs of the two distributions.
54
In general, the cdf of U(a, b) (i.e., the uniform
distribution on the interval from a to b) is
55
  • The uniform distribution is useful in cases where
    a number is known (or assumed) to fall within a
    definite finite interval, and you have no further
    information about what that number might be
  • Since you have no reason to treat one number as
    more likely than the other, you give them all the
    same density

56
  • The uniform distribution appears when all the
    data must appear in some fixed interval, but
    there is absolutely no further information or
    structure that would bias the random variable
    to take one value rather than another.

57
Example
  • The uniform distribution is often used as a kind
    of null or default hypothesis regarding the
    distribution of probabilities within a
    population.
  • E.g., in a situation where people have varying
    degrees of tendency to visit McDonalds over
    Burger King, the least informative hypothesis
    would be a uniform distribution of probabilities
    (on the interval 0, 1)

58
  • The uniform distribution on (a, b) is a
    probability distribution

59
Expectations
  • Expectations are defined similarly to those for
    discrete random variables.
  • If X is a continuous random variable whose pdf is
    f, then

60
Expectations
  • Using this definition, we can also define the
    variance, standard deviation, etc. of X

61
Expectations
  • You should be able to calculate that if X U(a,
    b), then

62
Expectations
  • Importantly, everything we have proven about
    expectations for discrete random variables holds
    for continuous random variables.
  • The linearity of expectations holds

63
Expectations
  • Importantly, everything we have proven about
    expectations for discrete random variables holds
    for continuous random variables.
  • The linearity of expectations holds

64
Expectations
  • Whether or not we use the linearity of
    expectations, or calculate directly from our
    definitions, for any continuously distributed
    random variable X, whose mean is ? and standard
    deviation is ?, we have
  • Where

65
The Normal Distribution
  • The normal distribution (aka the Gaussian
    distribution) is probably the most common
    distribution in all of science.

N(0, 1)
66
  • the pdf of the normal distribution is
  • In the case where ?0, ?1, this equation
    simplifies to (and has a special name)

67
  • the cdf of the normal distribution is
  • In the case where ?0, ?1, this equation
    simplifies to (and has a special name)

68
  • We often use expressions like
  • N(?, ?2),
  • which is shorthand for The normal distribution
    with a mean of ?, and a variance of ?2.
  • We also write things like
  • X N(?, ?2),
  • which is shorthand for X is normally
    distributed, with a mean of ?, and a variance of
    ?2.

69
X N(0, 1)
70
X N(3, 1)
71
X N(6, 1)
72
X N(16, 1)
73
X N(3.1, 1)
74
X N(0, 1)
75
X N(0, 3)
76
X N(0, 5)
77
  • X N(0, 1), Y N(0, 3), Z N(0, 5)
  • For any N(?, ?2), where is the high point of the
    pdf?

78
  • For any N(?, ?2)
  • the standardized skew
  • The standardized kurtosis

79
  • Notice that the pdf for N(?, ?2) can be seen as
    using the standardization of X
  • Where

80
  • It is easy to turn one normal distribution into
    another.
  • If X N(0, 1), and Y a bX (b ? 0), then
  • Y N(a, b2)
  • If X N(?, ?2), and Y then
  • Y N(0, 1)
  • If X N(?, ?2), and Y a bX (b ? 0), then
  • Y N(b?a, (b?)2)

81
  • Two numbers that you will encounter are

1.64 For any Normally distributed X, there is a
95 chance that the value of X will be less than
? (1.64 ?)
82
  • Two numbers that you will encounter are

1.64 For any Normally distributed X, there is a
95 chance that the value of X will be greater
than ? (1.64 ?)
83
  • Two numbers that you will encounter are

1.96 For any Normally distributed X, there is a
95 chance that the value of X will be within
1.96 standard deviations from the mean.
84
The Central Limit Theorem
  • The CLT is a big part of why the normal
    distribution is so important to science.
  • Given the way we typically do empirical science,
    by collecting random samples, in a certain
    senseregardless of what distribution they are
    coming from, as the sample size gets large, the
    mean of the random sample becomes approximately
    normally distributed.

85
  • A random sample is a collection of n many i.i.d.
    random variables X1,, Xn
  • Notice the capital letters these are random
    variables, not known quantities.
  • The i.i.d. part is important.
  • Independent and Identically Distributed
  • But let the distribution that they all have in
    common be any probability distribution in the
    world that you like
  • Then as n gets very large the sum of the
    standardizations of the Xis (divided by n1/2)
    approaches a normal distribution with a mean of 0
    and a variance of 1.

86
  • This is called the Central Limit Theorem.
  • More carefully, it says
  • If X1,, Xn are i.i.d. random variables from a
    distribution with mean ? and variance ?2, then
  • In short, the Central Limit Theorem says that the
    variability in the whole of any (large) random
    sample is approximately distributed as N(0, 1).

87
  • What this means is that if you randomly sample a
    population
  • children, cities, cancer patients, purchases,
    etc.,
  • then if you standardize these measurements and
    add them up,
  • the result, divided by n1/2, can of course still
    vary some,
  • you couldve sampled different children,
    patients,
  • will nevertheless vary with a distribution that
    is similar to N(0, 1), especially as n gets large.

88
  • Notice that the only sources of uncertainty
    come from the i.i.d. random variables X1,,Xn.
  • Thus, the Central Limit Theorem tells us that the
    sum of all these random variables is
    approximately normally distributed
  • This distribution will be as N(?, ?2), not
    necessarily as N(0, 1).

89
  • The CLT explains why normal distributions are
    fairly common
  • When a population is made up of individuals who
    are all of the same general type, and who differ
    from one another due to a large number of
    influences that are themselves mutually
    independent, the resulting population will often
    be (approximately) normally distributed.

90
  • The CLT explains why normal distributions are
    fairly common
  • E.g., people are roughly about the same height,
    but heights differ due to many largely
    independent influences
  • diet, various genetic propensities, illness
    during adolescence, age, amputation, etc.
  • Thus, this population (of human heights) might
    naturally be modeled as a random variable
  • where , and
  • Y is (approximately) normally distributed.

91
  • Since Y Y1 Yn, CLT tells us that Y will
    be approximately normally distributed
  • And if we know the mean and variance of the Yis,
    then if we can approximate n, we can approximate
    the precise distribution of Y.
  • In our height example, the Yis might be
  • Y1 quality of diet during adolescence
  • Y2 racial/ethnic background (on a good scale)
  • Y3 degree of height propensity from some given
    genetic type
  • Y4 amount of mercury in local water supply
  • Y5 severity of measles in childhood.
  • Y6 severity of mumps in childhood.
  • ETC.

92
  • More generally, if our measurement Y is the
    combination of some other variables, etc., along
    with the Xis, then we may have a situation where,
    e.g.
  • Y a bZ (X1 Xn)
  • Y a bZ ?
  • Here the single random variable (X1 Xn) is
    approximately normally distributed
  • Although Y and/or Z may not be.
  • The CLT is one of the reasons why the error in
    our models frequently turns out to be a random
    variable ? N(0, ?2)
  • A nice visualization of this phenomenon is at
    http//www.inf.ethz.ch/personal/gut/lognormal/

93
  • Notice that the CLT can be seen as involving the
    standardization of a (big) random variable
  • is the very same thing as

94
(No Transcript)
95
Chebychevs Inequality
  • Let X be a random variable with any distribution
    you like, with a mean ? and standard deviation ?.
  • Chebychevs theorem then says that for any c gt 0
  • In other words, regardless of Xs distribution,
    the probability of X yielding a value more than c
    standard deviations away from Xs mean is always
    less than 1/c2.

96
  • So regardless of the Xs actual distribution,
  • The probability that X yields a value more than 2
    standard deviations from the mean is less than ¼
    .25.
  • The probability that X yields a value more than 5
    standard deviations from the mean is less than
    1/25 .04.

97
  • The core of classical statistical inference
    involves finding data which is simply too
    unlikely to have come from a certain
    distribution.
  • E.g., often our data sets x1,, xn produce a
    certain number b (e.g, b).
  • Often our experimental design allows us to create
    a complex random variable W out of the others
    that generated the data set
  • X1,, Xn
  • We then see whether the probability that W would
    produce b is below a certain threshold
  • pr(b W b) lt .05???

98
  • In theory, we could merely use our threshold (.05
    in our example) to figure out how extreme our
    data had to be to allow us to draw this
    conclusion.
  • If we use Chebyshevs inequality, W will have to
    be further than ?/(.05)1/2 away from the mean of
    W.
  • Although this boundary is rather remarkable,
    because it holds for any random variable, it is
    rather inefficient.
  • If we can obtain more information about the null
    hypothesis that we are testing, we may be able
    to draw stronger conclusions from less extreme
    data.

99
  • For example Suppose our null hypothesis
    distribution is N(0, 1), and our threshold is
    .05.
  • From Chebyshevs inequality, we can calculate
  • So we solve for ?

100
  • Since our null hypothesis distribution is N(0,
    1), we can continue
  • Thus, to draw a statistical inference using
    Chebychevs inequality, our random variable would
    have to yield a value more extreme than ?4.472
  • As weve seen, this hardly ever occurs from
  • N(0, 1)!

101
  • In short, it can be rather hard to draw
    inferences using Chebyshevs inequality
  • This is a price you pay for the fact that the
    inequality is so general.

102
  • But what if we made use of the information that
    our null hypothesis was N(0, 1)?
  • This amounts to utilizing more information in the
    experimental design.
  • As you will learn later, if you do use this
    information, then you can draw an inference (at
    the .05 level) if your data isnt more extreme
    than ?4.472
  • Instead, it only needs to be more extreme than
    ?1.96

103
  • In sum, there is a kind of trade-off
  • Chebychevs inequality requires no (significant)
    background assumptions, and so applies everywhere
  • But it is very inefficient.
  • The techniques we will explore later require some
    significant background assumptions, and so cannot
    apply to all situations .
  • But they are much more efficient.
Write a Comment
User Comments (0)
About PowerShow.com