Last Lecture: - PowerPoint PPT Presentation

About This Presentation
Title:

Last Lecture:

Description:

4) Interpretation: If I play powerball a lot when there is a $200 million ... 5) If tickets are a dollar each, why doesn't Powerball lose money? ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 33
Provided by: johnstau
Category:

less

Transcript and Presenter's Notes

Title: Last Lecture:


1
Last Lecture
  • Histograms
  • Definition
  • Interpretation in terms of probability
  • Estimate of distribution function
  • Sample Means, Sample Medians, and Sample
    Variances / Standard Deviations (also known as
    statistics)
  • Definitions
  • Interpretations
  • Estimates of true values
  • This Thursday, see www.math.umass.edu/jstauden/
    for homework 2.
  • In the HW, youll also learn about percentiles
    and boxplots (from Chapters 2 and 3).
  • Well learn a lot more about the stuff in the
    first 3 chapters later in the semester

2
Economic Growth Rate is an Estimate
  • If 100 economists were asked to estimate the
    growth of the economy last quarter, a histogram
    of the estimates might look like this

Sample mean growth estimate is about 0.002
(0.2) Min -0.002 Max 0.006 Range
0.008 Distribution is bell shaped. Fact Most of
data falls w/in mean /- 2std dev for bell shaped
distributions As a result, the sample std dev of
the estimates is about 0.008/4 0.002 (Could
also calculate s 0.00173 in this case)
3
Probability (starting chapter 4)
  • A probability is a number between zero and one
    that is assigned to an event.
  • The higher the probability, the more likely the
    event.
  • Notation Pr( event occurs )
  • If the experiment that generates the event were
    repeated many times, the probability describes
    the fraction of the time it would occur.

4
One way to think about probability
Box represents all possible events
Event 1
Total Area Of the box Is One (think about why)
Pr( Event 1 occurs ) area of oval Pr( Event 1
does not occur) 1 Pr ( Event 1 occurs )
5
Suppose there are 2 events that are
independent (well define independence later)
Box represents all possible events
Event 1
Event 2
Pr( Event 1 and Event 2 ) area of
overlap Pr( event 1 ) Pr( event 2
) Pr(Event 1 or Event 2) Pr(Event 1)
Pr(Event 2) Pr(Event 1 and Event 2) Pr(Event
1) Pr(Event 2) Pr(Event 1) Pr(Event 2)
6
Example with dice
  • Pr( rolling a 1 on 1 die in one roll) 1/6
  • Pr( rolling a 2 on 1 die in one roll) 1/6
  • Pr( rolling a 3 on 1 die in one roll) 1/6
  • Pr( rolling a 4 on 1 die in one roll) 1/6
  • Pr( rolling a 5 on 1 die in one roll) 1/6
  • Pr( rolling a 6 on 1 die in one roll) 1/6
  • Pr( rolling less than 4 in one roll) Pr(
    rolling 1 or 2 or 3 in one roll)
  • Pr( rolling 1 in one roll)
  • Pr( rolling 2 in one roll)
  • Pr( rolling 3 in one roll)
  • - Pr( rolling a and a 2 and a 3 in one
    roll)
  • 1/6 1/6 1/6 0 1/2

7
Example with 2 dice
Outcome on die 1 1 2 3 4
5 6
1 2 Outcome 3 on die 2 4 5 6
x
x
Each square isa possibleeventPr( any
specificevent) 1/6 1/6 1/36Pr( rolling
a seven in total) 6/36 1/6(squares w/ xs
inthem are 7s)
x
x
x
x
8
Example with 2 dice (a related interpretation)
  • Pr( rolling a seven in total )
  • number of ways to roll a seven
  • number of possible outcomes
  • In general when ways the event could occur are
    equally probable, Pr( event )
  • number of ways that the event could occur
  • number of possible outcomes
  • Whats a simple expression for Pr( event doesnt
    occur?)
  • (hint it involves Pr( event ))

9
Aside Odds
  • Odds are related to probabilities. More
    specifically, the odds of an event arePr(
    event does not occur ) / Pr (event occurs) to 1
  • At start of last football season, the odds
    that the Patriots would win the Superbowl were
    250 to 1.(1-pr(Pats win))/pr(Pats win)
    250(1-pr(Pats win)) 250pr(Pats win)1
    251pr(Pats win)pr(Pats win) 1/251
  • Q How were odds determined?
  • A Doing that well is how casinos make money. The
    precise methods are proprietary. One way is to
    try to estimate the probabilities from historical
    data
  • i.e. the odds what some casino in NV thought.

10
  • Example 2
  • Researcher mates 2 fruit flies and observes the
    traits of 300 offspring
  • wing size
  • normal miniature
  • eye color normal 140 6
  • vermilion 3 151
  • What is pr(a fly in the experiment has normal eye
    color and normal wing size)?
  • What is pr(a fly in the experiment has vermillion
    eyes)?
  • What is pr(a fly in the experiment has vermillion
    eyes, miniature wings, or both)?
  • WAYS EVENT CAN OCCUR / TOTAL NUMBER OF EVENTS

11
Independence
  • Definition
  • events A and B are independent if
  • Pr(event A and B) Pr(A)Pr(B)
  • Idea
  • Does whether A occurs or not give you any
    information about whether B occurs or not?
  • If yes, then A and B are not independent.

12
Independence example
Consider the following example from a latex glove
manufacturer. Each number represents an 8 hour
manufacturing shift.
Defects
8 hour shifts that produce defects
8 hour shifts that produce no defects
90
9200
Weather Raining Not Raining
80
15000
Q Are defects and weather independent?
13
Pr( rain ) shifts w/ rain / total shifts(90
9200 )/(9092008015000) .38
Pr( defect ) shifts w/ defect / total
shifts(90 80 )/(9092008015000) .00698
Pr( defect and rain) shifts w/ defects and
rain / total shifts(90)/(9092008015000)
.00369.00369 does not equal (0.38)(.00698)So
the events are not independent in this sample.
(Humidity is related to defects)
14
Random Variables
Let X be a number whose value depends on the
outcome of a chance eventExamplesA poll is
asked of 100 people X 0 if person 1 answers no
and 1 if yes or X total number of yesesX
measurement of a board with a rulerX weight of
a randomly selected cat
15
A probability distribution function (pdf) is
associated with every random variable.
  • Assume for now that X is discrete (takes values
    mapable to the integers or a subset of the
    integers). The probability distribution function
    is
  • Pr( X a number ) (argument is a number
  • output is probability)
  • p(k) Pr( X k)

Capital letter random variable
Lower case letter number
16
Properties of pdfs
  • p(k) is greater than or equal to 0 for any k
  • p(k) less than or equal to 1 for any k
  • sum of p(k) over all possible ks 1
  • The pdf is a model for how X behaves.
  • Note that histograms estimate pdfs from data.
  • Histograms, sample means, sample variances etc
    show how observations of X actually behave.

17
Ways to determine PDFs
  • Given in a table
  • Given by a formula There are famous ones
    binomial, Poisson, hypergeometric,

18
PDF probabilities in a table
Let X coffee cart line length at 10am
of Phone Calls in an Hour (k) Pr(k) 0 0.10 1
0.20 2 0.25 3 0.30 4 0.15 Suppose
greater than 4 people is impossible
If you observe the line length onseveral days
and make a histogram, then it will be close
to pr(k). It gets closer as the number of days
increased.
Pr( X 0) Pr( X 3) Pr( X 1) Pr( X
19
Associated with PDFs are true Means and Variances
( true std devs)idea pdf provides model. True
means and variances are attributes of the model
  • True Mean sum( kp(k) ) where sum is over all
    the possible ks .
  • True Variance sum(p(k) (k-mean)2) where sum
    is over all possible ks.
  • Line length
  • Mean E(X) 0.1 1.2 2.25 3.3 4.15
    2.2
  • Variance Var(X)
  • (0-2.2)2.1 (1-2.2)2.2 (2-2.2)2.25
    (3-2.2)2.3 (4-2.2)2.15 1.46
  • Sample means and Sample variances are calculated
    from datasets.
  • True means and True variances are part of the
    theoretical model for the data.
  • KEY IDEA as the size of the dataset becomes
    larger, the Sample means and variances get
    closer to the true means and variances

20
Powerball Example Winners 0 1 2 3 4 5 6Probab
ility 8 21 26 21 13 9 2(These are
estimates based on historical data, but assume
that they are the truthfor the sake of the
example.)Probability that I am a winner if I
buy 1 ticket 1/80 million ( 1/80M).Jackpot
(pre tax) 200 million. (if 1 person wins,
jackpot is divided).Assume whether or not I win
is independent of the number of winners.Let X
millions of dollars I win from one
ticket.PDFx 0 200 100 66.7 50 40 33.3pr(x)
? .2283/80M .2826/80M .2283/80M .1413/80M .0978/8
0M .0217/80M1) What does ? equal? (and how did
I compute the other pr(x)s)?(see next slide for
ans)2) mu E(X) sum(x p(x)). In dollars
this is about 1.26. (you can confirm this)3)
Var(X) sum((x mu)pr(x))4) Interpretation
If I play powerball a lot when there is a 200
million jackpot, then I can expect to win 1.26
on average.5) If tickets are a dollar each, why
doesnt Powerball lose money? (These numbers are
all based on real data.)
21
Answers to question on previous slide
  • The ? (80million-1)/80million
  • You know this is true since the probability that
    I do not win is one minus the probability that I
    win (and the probability that I win is given to
    be 1/80 million).
  • How did I compute the pr(x)s
  • The probability that I win 200million Pr(I
    win and there is only winner given that there is
    at least 1 winner)Pr(I win)Pr(there is only
    one winner given that there is at least one
    winner)(1/80million) (0.21/(.21.26.21.13.0
    9.02))

Uses independence
Uses the rule for conditional probability on page
141.
22
Cumulative Probability
  • A cumulative probability is the probability that
    X is less than or equal to a some number
  • Ex powerball
  • Pr(there are 3 or fewer winners)Pr(X or X1 or X2 or X3)Pr(no winners)Pr(1
    winner)Pr(2 winners)Pr(3 winners)
    8212621
  • Notation F(3)Pr(X called the Cumulative Distribution Function or
    CDF)
  • If this helps, think of F(k) as the integral of
    the PDF from 0 to k.
  • Note Pr(X 3) 1-Pr(X
    and

23
Graphically
Pr( X regions 1 Pr( X4 ) 1 sum of the
areas of the white regions
PDF for the random variable that represents the
number of winners
24
Famous PDFs
  • Binomial Xbin(n,p)
  • Setup
  • Let X number of successes out of n identical
    trials
  • n identical independent trials
  • Each trial results in a success w/ probability p
    or failure with probability q1-p
  • X could possibly be 0,,n
  • PDF
  • Pr(X k) (n choose k) pkqn-k
  • (n choose k) number of ways to choose k things
    from n things (n) n! / (k! (n-k)!) (k)
  • Note that n! n(n-1)21
  • Also, 0! 1
  • Expectation E(X) npVariance Var(X)
    npqStdDev sqrt(Var(X))

25
  • Example
  • Suppose each person in a 5 person class comes
    with probability 0.85?
  • Let X number of people in class on a given day.
  • Whats probability 4 people show up one day?
  • Xbin(5,0.85)
  • Pr(X 4) (5 choose 4) 0.854 0.151 5
    0.854 0.151 0.3915047

26
Why the binomial pdf is correct
  • Example
  • 5 Students. Each attends with probability 0.85.
    Whats the probability of exactly 4 successes?
  • There are 5 choose 4 ( 5 5!/(4!1!) ) possible
    configurations of students (YYYYN, YYYNY, etc).
  • Each configuration has probability 0.8540.151
  • Pr(X 2) 5 0.8540.151 39 (were using the
    or rule here person 1 doesnt come or person 2
    doesnt come, or

Probaility of 4 People coming
Probaility of 1 Person not coming
(remember the and rule for independent events)
27
Famous PDFs
  • Poisson XPois(r)
  • Setup
  • Let X number of occurrences of an event in time
    or space
  • Events are expected to occur at rate r
  • X could possibly be 0,1,2,
  • PDF
  • Pr(X k) rke-r/k!
  • e is 2.718
  • Note that 0! 1
  • Expectation E(X) rVariance Var(X)
    rStdDev sqrt(Var(X))

One could show why the Poisson PDF is correct,
but the math is more involved. If youre
interested, come talk to me sometime.
28
  • Example
  • Inspect an experimental rats brain for tumorous
    cells. You expect 10 tumorous cells in 60mm3 of
    brain. Whats the probability that you see either
    2 or 3 tumorous cells in 10mm3?
  • X tumors found 10mm3 of brain. XPoisson(5/3)
    (rate per 60mm3 is 10, so rate per 10mm3 is 10/6
    5/3)
  • Pr(X 2 or 3) Pr(X 2) Pr(X 3)
    (5/3)2e-(5/3)/2! (5/3)3e-(5/3)/3! 41

29
Famous PDFs
  • Hypergeometric XHyp(N,M,n)
  • Setup
  • There are a total of N items. M are of type A and
    N-M are of type B. n items are chosen at random
    without replacement.
  • Let X number of chosen items that are type A
  • Pr(X k) (M choose k)(N-M choose n-k)/(N
    choose n)
  • Remember(n choose k) number of ways to choose
    k things from n things (n) n! / (k!
    (n-k)!) (k)
  • Note that 0! 1
  • Note that binomial is like the hypergeometric,
    but the binomial is with replacement (which
    results in a fixed p)

30
Hypergeometric Example
  • Cards probability of being dealt a flush in
    hearts in a hand of poker (flushall cards of
    same suit)
  • X number of hearts in the hand
  • N 52
  • M 52/4 13
  • n 5
  • Want Pr(X5) (13 choose 5 ) (39 choose 0)/(52
    choose 5)
  • 1287 1 / 2598960
  • 0.0004951981 (NOTE THAT THIS NUMBER IS DIFFERENT
    FROM WHAT I WROTE ON THE BOARD IN THE CLASS)
  • Whats probabilty of getting a flush in any suit?
  • (see minitabcalcProbability Distributions
    Hypergeometric)

31
  • For each of the following
  • What is the random variable?
  • What is its distribution and what are numbers
    for its parameters?
  • What is the probability that is being asked for?
  • How can it be computed from the probability
    density function.

32
More Examples
  • There are 4 security checkpoints. The probability
    of being searched at any one is 0.2. You may be
    searched more than once and all searches are
    independent. Whats the probability of being
    searched at least one time?
  • 50 geese in a flock of 200 are tagged by a
    wildlife biologist. The next year, 10 ducks from
    the flock are captured. Assume the flock still
    has 200 ducks and no tags are lost. Whats the
    probability that at least 5 of the recaptured
    ducks have tags?
  • Suppose a written test has 5 True/False
    questions. Passing at least 3 correct answers
    and the test can be taken at most 3 times.
    (Assume no learning occurs between tests if one
    fails!)
  • If one randomly guesses whats the probability of
    passing?
  • Whats the probability that someone who randomly
    guesses will eventually pass?
  • An overloaded server receives an average of 25
    emails per second at 1200PM. If it receives
    more than 30 emails in a second, it will crash.
    Whats the probability of a crash at 1200PM on a
    given day (based on the traffic in the previous 1
    second)?
Write a Comment
User Comments (0)
About PowerShow.com