Title: Last Lecture:
1Last Lecture
- Histograms
- Definition
- Interpretation in terms of probability
- Estimate of distribution function
- Sample Means, Sample Medians, and Sample
Variances / Standard Deviations (also known as
statistics) - Definitions
- Interpretations
- Estimates of true values
- This Thursday, see www.math.umass.edu/jstauden/
for homework 2. - In the HW, youll also learn about percentiles
and boxplots (from Chapters 2 and 3). - Well learn a lot more about the stuff in the
first 3 chapters later in the semester
2Economic Growth Rate is an Estimate
- If 100 economists were asked to estimate the
growth of the economy last quarter, a histogram
of the estimates might look like this
Sample mean growth estimate is about 0.002
(0.2) Min -0.002 Max 0.006 Range
0.008 Distribution is bell shaped. Fact Most of
data falls w/in mean /- 2std dev for bell shaped
distributions As a result, the sample std dev of
the estimates is about 0.008/4 0.002 (Could
also calculate s 0.00173 in this case)
3Probability (starting chapter 4)
- A probability is a number between zero and one
that is assigned to an event. - The higher the probability, the more likely the
event. - Notation Pr( event occurs )
- If the experiment that generates the event were
repeated many times, the probability describes
the fraction of the time it would occur.
4One way to think about probability
Box represents all possible events
Event 1
Total Area Of the box Is One (think about why)
Pr( Event 1 occurs ) area of oval Pr( Event 1
does not occur) 1 Pr ( Event 1 occurs )
5Suppose there are 2 events that are
independent (well define independence later)
Box represents all possible events
Event 1
Event 2
Pr( Event 1 and Event 2 ) area of
overlap Pr( event 1 ) Pr( event 2
) Pr(Event 1 or Event 2) Pr(Event 1)
Pr(Event 2) Pr(Event 1 and Event 2) Pr(Event
1) Pr(Event 2) Pr(Event 1) Pr(Event 2)
6Example with dice
- Pr( rolling a 1 on 1 die in one roll) 1/6
- Pr( rolling a 2 on 1 die in one roll) 1/6
- Pr( rolling a 3 on 1 die in one roll) 1/6
- Pr( rolling a 4 on 1 die in one roll) 1/6
- Pr( rolling a 5 on 1 die in one roll) 1/6
- Pr( rolling a 6 on 1 die in one roll) 1/6
- Pr( rolling less than 4 in one roll) Pr(
rolling 1 or 2 or 3 in one roll) - Pr( rolling 1 in one roll)
- Pr( rolling 2 in one roll)
- Pr( rolling 3 in one roll)
- - Pr( rolling a and a 2 and a 3 in one
roll) - 1/6 1/6 1/6 0 1/2
7Example with 2 dice
Outcome on die 1 1 2 3 4
5 6
1 2 Outcome 3 on die 2 4 5 6
x
x
Each square isa possibleeventPr( any
specificevent) 1/6 1/6 1/36Pr( rolling
a seven in total) 6/36 1/6(squares w/ xs
inthem are 7s)
x
x
x
x
8Example with 2 dice (a related interpretation)
- Pr( rolling a seven in total )
- number of ways to roll a seven
- number of possible outcomes
- In general when ways the event could occur are
equally probable, Pr( event ) - number of ways that the event could occur
- number of possible outcomes
- Whats a simple expression for Pr( event doesnt
occur?) - (hint it involves Pr( event ))
9Aside Odds
- Odds are related to probabilities. More
specifically, the odds of an event arePr(
event does not occur ) / Pr (event occurs) to 1 - At start of last football season, the odds
that the Patriots would win the Superbowl were
250 to 1.(1-pr(Pats win))/pr(Pats win)
250(1-pr(Pats win)) 250pr(Pats win)1
251pr(Pats win)pr(Pats win) 1/251 - Q How were odds determined?
- A Doing that well is how casinos make money. The
precise methods are proprietary. One way is to
try to estimate the probabilities from historical
data - i.e. the odds what some casino in NV thought.
10- Example 2
- Researcher mates 2 fruit flies and observes the
traits of 300 offspring - wing size
- normal miniature
- eye color normal 140 6
- vermilion 3 151
- What is pr(a fly in the experiment has normal eye
color and normal wing size)? - What is pr(a fly in the experiment has vermillion
eyes)? - What is pr(a fly in the experiment has vermillion
eyes, miniature wings, or both)? - WAYS EVENT CAN OCCUR / TOTAL NUMBER OF EVENTS
11Independence
- Definition
- events A and B are independent if
- Pr(event A and B) Pr(A)Pr(B)
- Idea
- Does whether A occurs or not give you any
information about whether B occurs or not? - If yes, then A and B are not independent.
12Independence example
Consider the following example from a latex glove
manufacturer. Each number represents an 8 hour
manufacturing shift.
Defects
8 hour shifts that produce defects
8 hour shifts that produce no defects
90
9200
Weather Raining Not Raining
80
15000
Q Are defects and weather independent?
13Pr( rain ) shifts w/ rain / total shifts(90
9200 )/(9092008015000) .38
Pr( defect ) shifts w/ defect / total
shifts(90 80 )/(9092008015000) .00698
Pr( defect and rain) shifts w/ defects and
rain / total shifts(90)/(9092008015000)
.00369.00369 does not equal (0.38)(.00698)So
the events are not independent in this sample.
(Humidity is related to defects)
14Random Variables
Let X be a number whose value depends on the
outcome of a chance eventExamplesA poll is
asked of 100 people X 0 if person 1 answers no
and 1 if yes or X total number of yesesX
measurement of a board with a rulerX weight of
a randomly selected cat
15A probability distribution function (pdf) is
associated with every random variable.
- Assume for now that X is discrete (takes values
mapable to the integers or a subset of the
integers). The probability distribution function
is - Pr( X a number ) (argument is a number
- output is probability)
- p(k) Pr( X k)
Capital letter random variable
Lower case letter number
16Properties of pdfs
- p(k) is greater than or equal to 0 for any k
- p(k) less than or equal to 1 for any k
- sum of p(k) over all possible ks 1
- The pdf is a model for how X behaves.
- Note that histograms estimate pdfs from data.
- Histograms, sample means, sample variances etc
show how observations of X actually behave.
17Ways to determine PDFs
- Given in a table
- Given by a formula There are famous ones
binomial, Poisson, hypergeometric,
18PDF probabilities in a table
Let X coffee cart line length at 10am
of Phone Calls in an Hour (k) Pr(k) 0 0.10 1
0.20 2 0.25 3 0.30 4 0.15 Suppose
greater than 4 people is impossible
If you observe the line length onseveral days
and make a histogram, then it will be close
to pr(k). It gets closer as the number of days
increased.
Pr( X 0) Pr( X 3) Pr( X 1) Pr( X
19Associated with PDFs are true Means and Variances
( true std devs)idea pdf provides model. True
means and variances are attributes of the model
- True Mean sum( kp(k) ) where sum is over all
the possible ks . - True Variance sum(p(k) (k-mean)2) where sum
is over all possible ks. - Line length
- Mean E(X) 0.1 1.2 2.25 3.3 4.15
2.2 - Variance Var(X)
- (0-2.2)2.1 (1-2.2)2.2 (2-2.2)2.25
(3-2.2)2.3 (4-2.2)2.15 1.46 -
- Sample means and Sample variances are calculated
from datasets. - True means and True variances are part of the
theoretical model for the data. - KEY IDEA as the size of the dataset becomes
larger, the Sample means and variances get
closer to the true means and variances
20Powerball Example Winners 0 1 2 3 4 5 6Probab
ility 8 21 26 21 13 9 2(These are
estimates based on historical data, but assume
that they are the truthfor the sake of the
example.)Probability that I am a winner if I
buy 1 ticket 1/80 million ( 1/80M).Jackpot
(pre tax) 200 million. (if 1 person wins,
jackpot is divided).Assume whether or not I win
is independent of the number of winners.Let X
millions of dollars I win from one
ticket.PDFx 0 200 100 66.7 50 40 33.3pr(x)
? .2283/80M .2826/80M .2283/80M .1413/80M .0978/8
0M .0217/80M1) What does ? equal? (and how did
I compute the other pr(x)s)?(see next slide for
ans)2) mu E(X) sum(x p(x)). In dollars
this is about 1.26. (you can confirm this)3)
Var(X) sum((x mu)pr(x))4) Interpretation
If I play powerball a lot when there is a 200
million jackpot, then I can expect to win 1.26
on average.5) If tickets are a dollar each, why
doesnt Powerball lose money? (These numbers are
all based on real data.)
21Answers to question on previous slide
- The ? (80million-1)/80million
- You know this is true since the probability that
I do not win is one minus the probability that I
win (and the probability that I win is given to
be 1/80 million). - How did I compute the pr(x)s
- The probability that I win 200million Pr(I
win and there is only winner given that there is
at least 1 winner)Pr(I win)Pr(there is only
one winner given that there is at least one
winner)(1/80million) (0.21/(.21.26.21.13.0
9.02))
Uses independence
Uses the rule for conditional probability on page
141.
22Cumulative Probability
- A cumulative probability is the probability that
X is less than or equal to a some number - Ex powerball
- Pr(there are 3 or fewer winners)Pr(X or X1 or X2 or X3)Pr(no winners)Pr(1
winner)Pr(2 winners)Pr(3 winners)
8212621 - Notation F(3)Pr(X called the Cumulative Distribution Function or
CDF) - If this helps, think of F(k) as the integral of
the PDF from 0 to k. - Note Pr(X 3) 1-Pr(X
and
23Graphically
Pr( X regions 1 Pr( X4 ) 1 sum of the
areas of the white regions
PDF for the random variable that represents the
number of winners
24Famous PDFs
- Binomial Xbin(n,p)
- Setup
- Let X number of successes out of n identical
trials - n identical independent trials
- Each trial results in a success w/ probability p
or failure with probability q1-p - X could possibly be 0,,n
- PDF
- Pr(X k) (n choose k) pkqn-k
- (n choose k) number of ways to choose k things
from n things (n) n! / (k! (n-k)!) (k) - Note that n! n(n-1)21
- Also, 0! 1
- Expectation E(X) npVariance Var(X)
npqStdDev sqrt(Var(X))
25- Example
- Suppose each person in a 5 person class comes
with probability 0.85? - Let X number of people in class on a given day.
- Whats probability 4 people show up one day?
- Xbin(5,0.85)
- Pr(X 4) (5 choose 4) 0.854 0.151 5
0.854 0.151 0.3915047
26Why the binomial pdf is correct
- Example
- 5 Students. Each attends with probability 0.85.
Whats the probability of exactly 4 successes? - There are 5 choose 4 ( 5 5!/(4!1!) ) possible
configurations of students (YYYYN, YYYNY, etc). - Each configuration has probability 0.8540.151
- Pr(X 2) 5 0.8540.151 39 (were using the
or rule here person 1 doesnt come or person 2
doesnt come, or
Probaility of 4 People coming
Probaility of 1 Person not coming
(remember the and rule for independent events)
27Famous PDFs
- Poisson XPois(r)
- Setup
- Let X number of occurrences of an event in time
or space - Events are expected to occur at rate r
- X could possibly be 0,1,2,
- PDF
- Pr(X k) rke-r/k!
- e is 2.718
- Note that 0! 1
- Expectation E(X) rVariance Var(X)
rStdDev sqrt(Var(X))
One could show why the Poisson PDF is correct,
but the math is more involved. If youre
interested, come talk to me sometime.
28- Example
- Inspect an experimental rats brain for tumorous
cells. You expect 10 tumorous cells in 60mm3 of
brain. Whats the probability that you see either
2 or 3 tumorous cells in 10mm3? - X tumors found 10mm3 of brain. XPoisson(5/3)
(rate per 60mm3 is 10, so rate per 10mm3 is 10/6
5/3) - Pr(X 2 or 3) Pr(X 2) Pr(X 3)
(5/3)2e-(5/3)/2! (5/3)3e-(5/3)/3! 41
29Famous PDFs
- Hypergeometric XHyp(N,M,n)
- Setup
- There are a total of N items. M are of type A and
N-M are of type B. n items are chosen at random
without replacement. - Let X number of chosen items that are type A
- Pr(X k) (M choose k)(N-M choose n-k)/(N
choose n) - Remember(n choose k) number of ways to choose
k things from n things (n) n! / (k!
(n-k)!) (k) - Note that 0! 1
- Note that binomial is like the hypergeometric,
but the binomial is with replacement (which
results in a fixed p)
30Hypergeometric Example
- Cards probability of being dealt a flush in
hearts in a hand of poker (flushall cards of
same suit) - X number of hearts in the hand
- N 52
- M 52/4 13
- n 5
- Want Pr(X5) (13 choose 5 ) (39 choose 0)/(52
choose 5) - 1287 1 / 2598960
- 0.0004951981 (NOTE THAT THIS NUMBER IS DIFFERENT
FROM WHAT I WROTE ON THE BOARD IN THE CLASS) - Whats probabilty of getting a flush in any suit?
- (see minitabcalcProbability Distributions
Hypergeometric)
31- For each of the following
- What is the random variable?
- What is its distribution and what are numbers
for its parameters? - What is the probability that is being asked for?
- How can it be computed from the probability
density function.
32More Examples
- There are 4 security checkpoints. The probability
of being searched at any one is 0.2. You may be
searched more than once and all searches are
independent. Whats the probability of being
searched at least one time? - 50 geese in a flock of 200 are tagged by a
wildlife biologist. The next year, 10 ducks from
the flock are captured. Assume the flock still
has 200 ducks and no tags are lost. Whats the
probability that at least 5 of the recaptured
ducks have tags? - Suppose a written test has 5 True/False
questions. Passing at least 3 correct answers
and the test can be taken at most 3 times.
(Assume no learning occurs between tests if one
fails!) - If one randomly guesses whats the probability of
passing? - Whats the probability that someone who randomly
guesses will eventually pass? - An overloaded server receives an average of 25
emails per second at 1200PM. If it receives
more than 30 emails in a second, it will crash.
Whats the probability of a crash at 1200PM on a
given day (based on the traffic in the previous 1
second)?