Title: EMAT 20205 Data Analysis WEEK 2
1EMAT 20205Data AnalysisWEEK -2
2Axioms of Probability
- The probability law (assigning a number to each
event E) must satisfy the following axioms - Nonnegativity
- Additivity if E and F are two disjoint events,
then the probability of their union satisfies - Normalization the probability of the entire
sample space is equal to 1
3Some comments
- The maximum value for the probability of an event
is 1 (probability of the entire sample space) - This means that that event is CERTAIN
- P(W)1 means the outcome will be one of the
possible outcomes (obviously)(e.g. the dice
roll will certainly give outcome 1 or 2 or 3 or
4 or 5 or 6)
4Comments
- An event E is IMPOSSIBLE if it has zero
probability P(E)0 - An event is CERTAIN if it has probability P(E)1
- The interesting things happen in between
5Comments on Additivity
- Additivity if E and F are two disjoint events,
then the probability of their union satisfies - Probability of E or F is P(E) P(F)
- E.g. in dice roll probability of 1 or 2 is
P(1)P(2)
6Consequences
- If we use a sample space WO1, O2, O3, O4,On
the probabilities of the outcomes Oi must
satisfyP(O1)P(O2)P(On)1 - We will write this sum as
7Consequences
- From this axiom we can see thatthe probability
of the empty event is 0(so there MUST be
an outcome, think of dice roll example)
8Probability Law
- We have seen 3 axioms that must be satisfied by
the probability assignment to the outcomes
(sample space) and some of their consequences - BUT who gives us the probabilities ?
- They are largely an arbitrary design choice
(although we will see practical methods)
9Example
- Think again of the case of the dice roll.
- Given our knowledge of physics, and the symmetry
of a dice, we see no reason why a certain outcome
should be more likely than another. So we
wantP(1)P(2)P(3)P(4)P(5)P(6) - The normalization axiom gives P()1/6 for each
of them - We can then use these probabilities and the
axioms to compute probabilities of more complex
events
10Example
- Coin toss.
- Again no reason to prefer one outcome over
another, soP(H)P(T)1/2 - Unless
11Frequency information
- Unless we actually know that specific coin (or
dice) and we know the exact frequency of the
outcomes in the last 1000s experiments - Possibly the coin is not fair, and we observe 80
head, 20 tail outcomes - We can incorporate this in the model, assigning
P(H)0.8 P(T)0.2In the first case we have
used our knowledge of the situation in the
second case we have estimated the probabilities
by using frequencies
12Probabilistic Model of Coin Toss
- Sample space is WH,T
- Possible events are all subsetsH,T, H, T,
0 (empty) - Fair coin ? P(H)P(T)0.5
- P(H,T)P(H)P(T)1
- P(0)0
- So we have assigned a probability to EACH
possible event based on the probabilities on the
outcomes, in a way to satisfy all axioms
13Model Toss of Three Coins
- Sample space (8 possible outcomes)WHHH,HHT,HTH
, HTT, TTT, THH, THT, TTH - We assume they are all equally likely, so we
assign to each of them probability 1/8 - The probability law should assign probabilities
to EVERY POSSIBLE EVENT
14Tossing Three Coins
- A possible event 2 heads occur
- How many outcomes are in this event ?HHT, HTH,
THH - 3 disjoint events, their union has probability
equal to the sum of their probabilities - P(HHT, HTH, THH) P(HHT)P(HTH)P(THH)
1/81/81/83/8
15Tossing Three Coins
- We can calculate similarly the probability of all
possible events, and this gives a probability law
that satisfies the axioms. - We can see that obtaining 3 heads has probability
1/8, less than observing 2 heads (3/8), and so on
16Probability law for finite sample spaces
- For finite sample spaces, we specify the
probability law by just assigning probabilities
to the individual outcomes - Often the outcomes are equiprobable,
thenP(E)number of outcomes in E / total number
of outcomes
17Continuous Sample Space
- In the case of the dart and target, things are
different - If each outcome is a point,its probability
cannotbe bigger than zero,else the total
probability will exceed one - Solution outcomes must be(infinitesimally)
small areas, not points - Do not worry too much about this for now
18Properties of Probability Law
Assume area of set probability of event!
19Using Probabilistic Models
- Say we want to model an uncertain situation(e.g.
an experiment) - We first decide a sample space and a probability
law. This step is somewhat arbitrary, and fully
specifies the model. - Then operating within the model we derive the
probabilities of the events of interest, or other
properties. This is fully unambiguous.
20Example
- We want to choose a day in 2009 when to organize
a picnic - We want to avoid rain, cold and traffic
- These are three possible events(dayrain
daycold daytraffic)not mutually exclusive
21Assume this is a generic month. A random day will
havevalues for R,C,T we can compute the
probability forR (rain), or for nT (not
traffic) but also for R AND T
22Event RAIN
23Event COLD
24Event TRAFFIC
25Unions and Intersections of Events
- We may want to calculate the probability to
randomly selecting a day that is both not-rainy
and not-cold - Today we talk of probabilities of COMBINATIONS of
events
26Intersection of Events
- Probability that BOTH events occur simultaneously
- We DEFINE A NEW EVENT consisting of the outcomes
that are in both events E and F and we calculate
its probability - New event
- The probability of both events occurring is
27Intersection of Events
- The probability of this event is the sum of the
probabilities of the outcomes that are both in E
and in F (e.g. fraction of days that are both R
and T) - Two events are mutually exclusive (or disjoint)
if their intersection is empty(e.g. R and nR
are disjoint)
28rain
cold
EventRain and cold
29Union of Events
- We want to calculate the probability that at
least one of the events E and F occurs - This is the probability of the union event
- The probability of G is the sum of the
probability of the outcomes that are in either E
or F(e.g. number of days that are either R or C)
30rain
cold
EventRain OR cold
31Other combinations
- We can consider the probability of being in E and
not in F by considering the probability of being
in E and in FC
32Dice Example
- Event E 1,2,3 outcome is small
(less than 3) - Event F 2,4,6 outcome is even number
- Probability of being either even OR small ?
- Probability of being even AND small ?
33R,C,T 6 6/30 nR,C,T 0 0/30 R,nC,T 16 16/30 nR,
NC,T 1 1/30 R,C,nT 0 0/30 nR,C,nT 6 6/30 R,nC,nT
1 1/30 nR,NC,nT 0 0/30
34Important
- Calculate the joint probabilities from the table
- P(R,C,nT)0/30
- P(R,C,T)6/30
- P(R,C)P(R,C,nT)P(R,C,T)6/30
35Conditional Probability
- What is the probability of rain in this
month?(count all rainy days and divide by 30) - P ( R )R / Days
- What is the probability of rain given that it is
cold ?
36Conditional Probability
- Outcomes of experiment days
- Being a cold day is an event
- Being a rainy day is an event
- Probability of being cold AND rainy ?
- Cold AND NOT rainy ?
- NOW Is it more likely to be cold in rainy days ?
- What about COLD given that it is RAINY ?
37R,C,T 6 6/30 nR,C,T 0 0/30 R,nC,T 16 16/30 nR,
NC,T 1 1/30 R,C,nT 0 0/30 nR,C,nT 6 6/30 R,nC,nT
1 1/30 nR,NC,nT 0 0/30
- P(d is rainy d is cold)
- P(d is cold) 12/30
- P(d is rainy) 23/30
- P(d is rainy and cold)6/30
38Is it more likely to have rain in cold days ?
- P(rain)23/30
- What is the rain probability IN THE COLD DAYS ?
- Probability of rain given cold is
P(raincold) P(rain AND cold)/P(cold) - P(raincold) 6/120.5
39Definition
- We define conditional probability of E given F
- Given that F is true, what is the probability of
E ? - In a way, restrict to the case when only F
exists, F is the universe here
40Conditional probability
- We can consider the conditional probability
P(EF) as a new probability law defined on a new
universe, F - P(FF)1
- All other axioms also remain valid
41Properties of Conditional Probability
- It satisfies all the axioms to be a probability
law
42Properties of Conditional Probability
- Definition
- This can be seen as a new probability law in the
restricted universe F - For finite sample spaces
43Independent Events
- We define 2 independent events as follows
Independent eventsP(EF)P(E)
44Independent Events
- 2 independent eventsrain and monday
- 2 dependent eventsrain and january
- 2 dependent (?) eventstraffic and Monday
- 2 independent eventsjanuary and monday
In theory(not sure aboutour finite dataset)
45Bayes Theorem
- Calculation
- P(cold)12/302/5
- P(traffic)6/301/5
- P(cold AND traffic)6/301/5
- P(coldtraffic)1
- P(trafficcold)1/2
46Bayes Theorem
P(coldtraffic)P(traffic)P(cold AND traffic)
- Calculation
- P(cold)12/302/5
- P(traffic)6/301/5
- P(cold AND traffic)6/301/5
- P(coldtraffic)1
- P(trafficcold)1/2
11/51/5
P(trafficcold)P(cold)P(traffic AND cold)
½2/51/5
P(coldtraffic)P(traffic) P(trafficcold)P(cold)
47Bayes Theorem
- P(coldtraffic)P(traffic) P(trafficcold)P(cold)
P(coldtraffic) P(trafficcold)P(cold)/P(traffic)
48Independent Events
- P(EF)P(E)
- E independent of F
49Independent Events
- Since it was
- And we are assuming
- it follows that for independent events
50Independent Events
- If E and F are independent, so are E and FC
51Independence of 3 events
- E,F,G are independent if every subset of these 3
events is independent - E,F are independent
- E,G are independent
- F,G are independent
- And P(E,F,G)P(E)P(F)P(G)
52Independent Events
- We can decompose joint probabilitiesP(E,F,G)P(E
)P(F)P(G)if they are independent - Otherwise, we should writeP(E,F,G)P(EF,G)P(FG
)P(G)
53Bernoulli Trials
- Toss a coin N times
- Probability of starting with H ½
- Probability of starting with HH ½ ½
-
- Probability of N consecutive H (½)N
54MATLAB INTERLUDE
- INTERSECT Set intersection.
- INTERSECT(A,B) when A and B are vectors returns
the values common to both A and B. The result
will be sorted. A and B can be cell arrays of
strings.
55MATLAB INTERLUDE
- UNION Set union.
- UNION(A,B) when A and B are vectors returns the
combined values from A and B but with no
repetitions. The result will be sorted.
56MATLAB INTERLUDE
- FIND Find indices of nonzero elements.
- I FIND(X) returns the linear indices
corresponding to the nonzero entries of the array
X. - X may be a logical expression.
- So you can find elements in a set with a given
property, and make a new set
57MATLAB INTERLUDE
- LENGTH Length of vector.
- LENGTH(X) returns the length of vector X. It is
equivalent to MAX(SIZE(X)) for non-empty arrays
and 0 for empty ones.
58MATLAB INTERLUDE
- You can use these set commands to count the
elements in various sets, and hence to compute
probabilities
59Topics
- Modeling with Random Variables
- Discrete Random Variables
- Events and
- Probability Mass Function
- Examples of RV
- Bernoulli
- Binomial
- Geometric
- The concept of Expectation
60RANDOM VARIABLES
- We have studied Probabilistic Models in general,
the notions of outcome, sample space and event. - Now an important special casein many
probabilistic models the outcomes are NUMBERS, or
can be associated to numbers
61RANDOM VARIABLES
- Examples of numerical outcome
- How many people showed up today ?
- How many are sitting next to a statistics major?
- How many days of rain in january ?
- Temperature on a given day ?
- OR we can ASSOCIATE numerical values to
non-numerical outcomes
62RANDOM VARIABLES
- Associating numerical values to non-numerical
outcomes HOMEWORK EXPERIMENT - Outcome the homework
- Sample space set of all possible answers you
COULD have given - Associated numerical value the GRADE
63RANDOM VARIABLES
- Easier model multiple choice quiz
- 10 questions, 3 choices each (A,B,C)
- Experiment give the test to a student
- Outcome a string of 10 symbols
- Sample space set of all possible 10 symbols
strings - Numeric value the grade assigned to each string
(some form of distance to correct string)
64RANDOM VARIABLES
- We call RANDOM VARIABLE a real-valued function of
the outcome of an experiment - Given an experiment, and the corresponding set of
possible outcomes, a random variable associates a
particular number with each outcome
65RANDOM VARIABLES
- Example
- Sample space AAA, AAB, AAC, .
- Random variableAAA?3AAB?2AAC?3
- This could be a model of grading a test
66RANDOM VARIABLES
- Why are RANDOM VARIABLES important ?
- They allow us to model uncertain situations in a
quantitative way, we will talk aboutthe
EXPECTED temperature on january 25, or the
EXPECTED number of students that will pass the
test, etc. - We can also talk about expected deviations from
this estimate
67RANDOM VARIABLES(continuous vs discrete)
- A random variable is called discrete if its range
(the set of values it can take) is finite or
COUNTABLY infinite - It is called continuous for example - if its
range is the real axis (but we will not deal with
this case today)
68RANDOM VARIABLES
- Examples of discrete random variables
- Number of things (number of tails in1000 coin
tosses) - Number of minutes this class will last
- Roll of 2 dice, sum or product of the outputs is
a discrete random variable
69RANDOM VARIABLES
- The 2- dice example
- Let us call A B C D E
F
70RANDOM VARIABLES
- Let us consider the following random variable N
associated to one diceN(A)1N(B)2N(
C)3N(D)4N(E)5N(F)6
71RANDOM VARIABLES
- Sample space of the 2 dice experiment
AA,AB,AC,AD,AE,AF, BA,BB,BC,BD,BE,BF,
CA,CB,CC,CD,CE,CF, DA,DB,DC,DD,DE,DF,
EA,EB,EC,ED,EE,EF, FA,FB,FC,FD,FE,FF,
72RANDOM VARIABLES
- Sum random variableAA?112 S(AA)AB?123
S(AB)FF?6612 S(FF) - Range of random variable2,3,4,5,6,7,8,9,10,11,1
2
73RANDOM VARIABLES
- Similarly we can define the random variable
PRODUCT, etc - So after the same experiment (rolling 2 dice) we
may define different random variables (sum,
absolute difference, product, max, min, etc of
the two individual outcomes ) - Whatever attaches a numeric value to the OUTCOME
of the experiment is a RANDOM VARIABLE
74RANDOM VARIABLESimportant concepts
- A discrete random variable is a real valued
function of the outcome of the experiment that
can take a finite or countably infinite number of
values - A function of a discrete random variable defines
another random variable - We will define MEAN and VARIANCE of a random
variable - We will define independence and all other
concepts we defined in the previous classes
75RANDOM VARIABLES
- For discrete random variables we will define
PROBABILITY MASS FUNCTIONS, that are probability
laws that assign a probability to each possible
numerical value the random variable can assume - It will be analogous to what done so far
76RANDOM VARIABLESnotation
- We will denote by uppercase letters (X) the
random variable, by lowercase letters (x) the
actual value it assumes in a given experiment - So we will talk about the probability that Xx,
for example and we will write it P(Xx)
77RANDOM VARIABLES
- Look at the website of the course, where we
publish the statistics about the past homeworks - Random variable GRADE, G
- A particular grade g
- For example we can talk about P(G27)
78RANDOM VARIABLES
- Easier model multiple choice quiz
- 10 questions, 3 choices each (A,B,C)
- Experiment give the test to a student
- Outcome a string of 10 symbols
- Sample space set of all possible 10 symbols
strings - Numeric value the grade assigned to each string
(some form of distance to correct string)
79RANDOM VARIABLES
- We call RANDOM VARIABLE a real-valued function of
the outcome of an experiment - Given an experiment, and the corresponding set of
possible outcomes, a random variable associates a
particular number with each outcome
80RANDOM VARIABLESimportant concepts
- A discrete random variable is a real valued
function of the outcome of the experiment that
can take a finite or countably infinite number of
values - A function of a discrete random variable defines
another random variable - We will define MEAN and VARIANCE of a random
variable - We will define independence and all other
concepts we defined in the previous classes
81RANDOM VARIABLES
- For discrete random variables we will define
PROBABILITY MASS FUNCTIONS, that are probability
laws that assign a probability to each possible
numerical value the random variable can assume - It will be analogous to what done so far
82RANDOM VARIABLESnotation
- We will denote by uppercase letters (X) the
random variable, by lowercase letters (x) the
actual value it assumes in a given experiment - So we will talk about the probability that Xx,
for example and we will write it P(Xx)
83Probability Mass Function (PMF)
- The most important way to characterize a random
variable is through the probabilities of the
values that it can take - For the random variable X, these are given by the
PMF of X, denoted pX. - If x is any possible value of X, the probability
mass of x, pX(x) is the probability of the event
Xx, consisting of all outcomes that give rise
to a value of X equal to x - pX(x)P(Xx)
84PMF
- Example experiment tossing 2 fair coins
- Random Variable X number of heads obtained
(range 0,1,2) - Compute the PMF of X
- pX(x)
¼ if x0 ½ if x1 ¼ if x2 0 otherwise
(impossible)
85PMF
- Event x0 ? corr. Outcome TT
- Event x1 ? corr. Outcomes HT or TH
- Event x2 ? corr. Outcome HH
- Each outcome has probability ¼
- ? hence the probabilities given before
- (grouping outcomes based on value of random
variable a way to define events )
86PMF
- Some propertiessince the events corresponding
to each value of the random variable must be
disjoint, and form a partition of the sample
space, - From probability axioms we obtain
87PMF
- By a similar argument, we have for any set S of
possible values of X In coin example
before, we can say probability of at least 1
head is ¾ (sum of prob 1 heat prob 2 heads)
88PMF
89PMF
- Some propertiessince the events corresponding
to each value of the random variable must be
disjoint, and form a partition of the sample
space, - From probability axioms we obtain
90PMF
- By a similar argument, we have for any set S of
possible values of XIn coin example before,
we can say probability of at least 1 head is ¾
(sum of prob 1 heat prob 2 heads)
91Functions of Random Variables
- One can generate new random variables as
functions of random variables
92CALCULATION OF PMF OF A RANDOM VARIABLE X
- For each possible value x of X
- Collect all the possible outcomes that give rise
to the event Xx - Add their probabilities to obtain pX(x)
- THIS IS IMPORTANT !!
93Example
- Probability of having HW grade larger than 30 ?
- Prob G30 prob G31 prob G40
- Each probability count number of outcomes,
divide by total sample space size
94Expectation
- The PMF of a random variable provides us with
several numbers the probabilities of all
possible values of X - We would like to summarize this in few numbers
that represent the PMF - One such number is the EXPECTATION
95Expectation
- Expected value of Xweighted average of all
possible values of X (using probabilities as
weights)
96Expectation
- Suppose you roll a dice many times, and each time
you receive as many dollars as the outcome of the
dice-roll - How much money would you expect for each roll ?
- We need to specify these terms
97Expectation
- Suppose you roll the dice K times, and Ki is the
number of times the outcome is i - Sample space 1,2,3,4,5,6
- The total amount of money you receive is
98Expectation
- The total amount in K rolls is
- So the amount per roll is
99Expectation
- If we have been rolling the dice many times (K
is v. large), we can approximate the probability
of an outcome with its frequency piKi/K - Then we can write the expected amount of money
as
100Expectation
- We define the expected value (expectation, or
mean) of a random variable X, with PMF pX, by
101Expectation
- Remark we can consider this as the center of
gravity of the distribution
102Variance
- Other important quantity to describe PMF.
- Expectation we know the average behavior of
the random variable - But how often does the random variable deviate
from the average behavior ?
103Variance
- Let us create a NEW random variable describing
the deviation of X from its mean EX, and let us
study it - What is the expected value of the random variable
(X-EX)2 ?
104Variance
- New random variable (X-EX)2
- Its expectation E(X-EX)2Var(X) is called
the variance of X - It is always nonnegative
- Provides a measure of dispersion of X around its
mean
105Variance
106Variance
- Another related measure of dispersion is the
standard deviation of X, defined as the square
root of the variance
- From a practical viewpoint, the STD is easier to
use because its has the same units as X(I.e. if
X is in meters, STD will be in meters, Var(X) in
square meters)
107Calculation of Variance
- Can just study expectation of R.V. Z(X-EX)2
- X
- Z
- Var(X)EZ
108Expected Value of Functions of Random Values
- Let X be a random variable with PMF p(x), and let
g(X) be a function of X - ?The expected value of the random variable g(X)
is
109Variance
- So the variance can be calculated as
110Properties of Mean and Variance
- Let X be a random variable and let us consider
the linear function YaXb where a,b are given
scalars. Then - EYaEXb
- Var(Y)a2Var(X)
- THIS ONLY if g(X) is linear !!
111A useful relation(variance as a function of
moments)
- Var(X)E(X-EX) 2
- Var(X)EX2-(EX) 2
- Proof SEE IN LATER SLIDES FOR FULL PROOF Use
the relation
112Variance
- The variance can be calculated as
113Properties of Mean and Variance
- Let X be a random variable and let us consider
the linear function YaXb where a,b are given
scalars. Then - EYaEXb
- Var(Y)a2Var(X)
- THIS ONLY if g(X) is linear !!
114A useful relation
- Var(X)E(X-EX 2)
- Var(X)EX2-(EX) 2
- Proof either as HW or with TasUse the relation
115Variance Calculation
- Var(X)E(X-EX 2) EX2-(EX) 2
We will use this a lot
116Covariance of 2 RVs
- In probability theory and statistics, covariance
is a measure of how much two variables change
together (variance is a special case of the
covariance when the two variables are identical). - If two variables tend to vary together (that is,
when one of them is above its expected value,
then the other variable tends to be above its
expected value too), then the covariance between
the two variables will be positive. On the other
hand, when one of them is above its expected
value the other variable tends to be below its
expected value, then the covariance between the
two variables will be negative. - from wikipedia
117Covariance of 2 RVs
- The covariance between two real-valued random
variables X and Y, with expected values E(X)m
E(Y)n is defined as - Cov(X, Y) E(X - m) (Y - n)
118In Matlab
- COV Covariance matrix.
- COV(X), if X is a vector, returns the
variance. For matrices, - where each row is an observation, and each
column a variable, - COV(X) is the covariance matrix.
DIAG(COV(X)) is a vector of - variances for each column, and
SQRT(DIAG(COV(X))) is a vector - of standard deviations. COV(X,Y), where X and
Y are matrices with - the same number of elements, is equivalent to
COV(X() Y()).
119Correlation Coefficient
From wikipedia
120Correlation CoefficientBetween 2 Random Variables
- CORRCOEF Correlation coefficients.
- RCORRCOEF(X) calculates a matrix R of
correlation coefficients for - an array X, in which each row is an
observation and each column is a - variable.
-
- RCORRCOEF(X,Y), where X and Y are column
vectors, is the same as - RCORRCOEF(X Y).
-
- If C is the covariance matrix, C COV(X),
then CORRCOEF(X) is - the matrix whose (i,j)'th element is
-
- C(i,j)/SQRT(C(i,i)C(j,j)).
-
121(No Transcript)
122EXTRA MATERIALBELOW THIS POINT
- WHAT FOLLOWS IS EXTRA MATERIAL FOR REFERENCE
- Not covered in class 1 of week 2 (refers to
class 2 of week 2)
123(No Transcript)
124(No Transcript)
125(No Transcript)
126Bernoulli Random Variable
- Consider the toss of a (generally not fair) coin,
probability H p prob T 1-p - The BERNOULLI random variable is a RV that takes
the two values 0 or 1 depending on whether the
outcome is H or T(remember RV is a function of
the outcome) - X1 if outcome is H X0 if outcome is T
127Bernoulli Random Variable
- The PMF of this Bernoulli RV is
- PX(x)
- Very important RV in modeling any generic
situation with just 2 outcomes, e.g. outcome of
the football match on Sunday,
P if x1 1-p if x0
128Binomial Random Variable
- Experiment N coin tosses, each one with
prob(H)p prob(T)1-p - The random variable X is the number of heads in
the n-toss sequence - We refer to X as a BINOMIAL RANDOM VARIABLE WITH
PARAMETERS n AND p
129Binomial Random Variable
- The PMF of X consists of the binomial
probabilities we have seen some time ago - Two parts
- probability of a sequence with k heads and n-k
tails - Number of sequences with k heads and n-k tails
130Binomial Random Variable
- The normalization property can be written as
- We will study this more in the future
131Geometric Random Variable
- We repeatedly toss the same coin as before.
- RV number of tosses before the first head comes
up - TTTTTTTH
- TTH
- H
- TTTTTTTTTTTTTTTTTTTTTTTTTTTTH
132Geometric Random Variable
- PMFtwo parts probability of the prefix of
k1 tails, and probability of the end H - Normalization
133Geometric Random Variable
- This can model the process of you trying to
connect with the modem to an internet service
provider (how many fails before 1 success ?)
134Poisson Random Variable
135Functions of Random Variables
- One can generate new random variables as
functions of random variables
136Expectation
- The PMF of a random variable provides us with
several numbers the probabilities of all
possible values of X - We would like to summarize this in few numbers
that represent the PMF - One such number is the EXPECTATION
137Expectation
- Expected value of Xweighted average of all
possible values of X (using probabilities as
weights) - Next time we will develop this and other
concepts
138Conclusion
- Random Variables
- Probability Mass Functions
- How to calculate PMFs
- Bernoulli
- Binomial
- Geometric
- Poisson ?
139(No Transcript)
140Probability Mass Function (PMF)
- The most important way to characterize a random
variable is through the probabilities of the
values that it can take - For the random variable X, these are given by the
PMF of X, denoted pX. - If x is any possible value of X, the probability
mass of x, pX(x) is the probability of the event
Xx, consisting of all outcomes that give rise
to a value of X equal to x - pX(x)P(Xx)
141PMF
- Example experiment tossing 2 fair coins
- Random Variable X number of heads obtained
(range 0,1,2) - Each outcome has probability ¼
- ? hence the probabilities are
- Event x0 ? corr. Outcome TT
- Event x1 ? corr. Outcomes HT or TH
- Event x2 ? corr. Outcome HH
142PMF
- Compute the PMF of X
- pX(x)
¼ if x0 ½ if x1 ¼ if x2 0 otherwise
(impossible)
143PMF
144PMF
- Some propertiessince the events corresponding
to each value of the random variable must be
disjoint, and form a partition of the sample
space, - From probability axioms we obtain
145PMF
- By a similar argument, we have for any set S of
possible values of XIn coin example before,
we can say probability of at least 1 head is ¾
(sum of prob 1 heat prob 2 heads)
146Functions of Random Variables
- One can generate new random variables as
functions of random variables
147Bernoulli Random Variable
- Consider the toss of a (generally not fair) coin,
probability H p prob T 1-p - The BERNOULLI random variable is a RV that takes
the two values 0 or 1 depending on whether the
outcome is H or T(remember RV is a function of
the outcome) - X1 if outcome is H X0 if outcome is T
148Bernoulli Random Variable
- The PMF of this Bernoulli RV is
- PX(x)
- Very important RV in modeling any generic
situation with just 2 outcomes, e.g. outcome of
the football match on Sunday,
P if x1 1-p if x0
149Mean and Variance
- EX1p 0(1-p)p
- EX2 12p 02(1-p)p
- Var(X)EX2-(EX) 2p-p2p(1-p)
150Uniform Distribution dice roll
151Binomial Random Variable
- Experiment N coin tosses, each one with
prob(H)p prob(T)1-p - The random variable X is the number of heads in
the n-toss sequence - We refer to X as a BINOMIAL RANDOM VARIABLE WITH
PARAMETERS n AND p
152Binomial Random Variable
- The PMF of X consists of the binomial
probabilities we have seen some time ago - Two parts
- probability of a sequence with k heads and n-k
tails - Number of sequences with k heads and n-k tails
153Binomial Random Variable
- The normalization property can be written as
- We will study this more in the future
154Geometric Random Variable
- We repeatedly toss the same coin as before.
- RV number of tosses before the first head comes
up - TTTTTTTH
- TTH
- H
- TTTTTTTTTTTTTTTTTTTTTTTTTTTTH
155Geometric Random Variable
- PMFtwo parts probability of the prefix of
k1 tails, and probability of the end H - Normalization
156Bernoulli Random Variable
- Consider the toss of a (generally not fair) coin,
probability H p prob T 1-p - The BERNOULLI random variable is a RV that takes
the two values 0 or 1 depending on whether the
outcome is H or T(remember RV is a function of
the outcome) - X1 if outcome is H X0 if outcome is T
157Bernoulli Random Variable
- The PMF of this Bernoulli RV is
- PX(x)
- Very important RV in modeling any generic
situation with just 2 outcomes, e.g. outcome of
the football match on Sunday,
P if x1 1-p if x0
158Mean and Variance
- EX1p 0(1-p)p
- EX2 12p 02(1-p)p
- Var(X)EX2-(EX) 2p-p2p(1-p)
159Two Important Series
- We do not derive them here.We will apply these
to calculations of variance
160(No Transcript)
161Uniform Distribution dice roll
- Discrete Uniform PMF over a,b(case of the dice
rolls)
162Uniform
The expectation is This can be seen
directly, since the PMF is symmetric around
(ab/2). Or use the series given before... Dice
example 12345621Direct Computation of
Expectation 21/63.5Formula says (16)/23.5
163Variance of Discrete Uniform
- We first study case where a1 bn the general
case will reduce to this - We will use relation Var(X)EX2-(EX)2
Can verify this by inductionof just believe it
164Variance of Discrete Uniform
Notice we are still working with special case
a1 bn
165Variance of Discrete Uniform
- Now we can study the general case by SHIFTING a
distribution, its variance does not change (so we
can study a,b case by studying variance of
1,b-a1 case) - So setting nb-a1 in the previous equation
gives the general case
166Variance of Discrete Uniform
- Example I get 1 for each point on the dice, I
can expect 3.5 dollars at each roll, and a
Standard Deviation of sqrt(35/12)1.7
167Binomial Random Variable
- Experiment N coin tosses, each one with
prob(H)p prob(T)1-p - The random variable X is the number of heads in
the n-toss sequence - We refer to X as a BINOMIAL RANDOM VARIABLE WITH
PARAMETERS n AND p
168Binomial Random Variable
- The PMF of X consists of the binomial
probabilities we have seen some time ago - Two parts
- probability of a sequence with k heads and n-k
tails - Number of sequences with k heads and n-k tails
169Binomial Random Variable
- The normalization property can be written as
- We will study this more in the future
170QUESTION
- There are 94 students
- Each has probability 1/3 to get an A
- The number of students that get an A is a random
variable - What is its mean ? (how many are expected to get
an A)
171Mean of the Binomial
- If we want the mean of the binomial, we first
need to learn how to handle JOINT PMFs of
MULTIPLE RANDOM VARIABLES
172JOINT PMFs of MULTIPLE RANDOM VARIABLES
- Consider 2 discrete random variables, X and Y
associated with the same experiment - The probabilities of the values that X and Y can
take, are captured by the JOINT PMF of X and Y,
written pX,Y - pX,Y(x,y)P(Xx,Yy)
173JOINT PMF of 2 RV
- (if we consider the pair X,Y as a random
variable, all ideas transfer ) - If A is an event (set of pairs (x,y) that have a
certain property) then
P((X,Y) in A)S(x,y in A)pX,Y(x,y)
174students
- Consider the random variable Xi that is 1 if
student i gets an A, and 0 otherwise - If n students, probability p, this is np
175Conclusion
- Mean of Random Variables
- Variance of Random Variables
- Properties, relations for variance and moments
- Bernoulli
- Discrete Uniform,
- General Methods for variance calculation
176(No Transcript)
177topics
- Some probability distributions
- Some real applicationsdecision making modeling
clashes between ants - Modeling the distribution of ping times
178Marginalization
- For a fixed value y,
- Using the definition of conditional probability,
we have
179Random Variables
- Joint probability
- Conditional probability
- Independence
180Joint Probability
- It is common for several random variables to be
defined on the same sample space. If X and Y are
random variables, the functionf(x,y) PrX x
and Y y is the joint probability mass function
of X and Y.
181Independent Random Variables
- We define two random variables X and Y to be
independent if for all x and y, the events X x
and Y y are independent or, equivalently, if
for all x and y, we have PrX x and Y y
PrX x PrY y.
182Functions of Random Variables
- Given a set of random variables defined over the
same sample space, one can define new random
variables as sums, products, or other functions
of the original variables.
183Expected value of a random variable
- The simplest and most useful summary of the
distribution of a random variable is the
"average" of the values it takes on. The expected
value (or, synonymously, expectation or mean) of
a discrete random variable X is
184Expectation of joint RVs
- Given random variables X and Y, and given their
PMF PXx and Yy, what is their joint
expectation ? EX,Y - Easy if they are independent
185Expectation of Joint Independent RVs
186In general
- In general, when n random variables X1, X2, . . .
, Xn are mutually independent,EX1X2 Xn
EX1EX2 EXn .
187More about independent RVs
- When X and Y are independent random variables,
- VarX Y VarX VarY.
- (whereas for ANY random variablesthe expectation
of the sum is the sum of their expectations, that
is, - EX Y EX EY , )
188The Geometric Distribution
- A coin flip is an instance of a Bernoulli trial,
which is defined as an experiment with only two
possible outcomes success, which occurs with
probability p, and failure, which occurs with
probability q 1 - p. - When we speak of Bernoulli trials collectively,
we mean that the trials are mutually independent
and that each has the same probability p for
success. - Two important distributions arise from Bernoulli
trials the geometric distribution and the
binomial distribution.
189Geometric Distribution
- Take a sequence of Bernoulli trials, each with a
probability p of success and a probability q 1
- p of failure. - How many trials occur before we obtain a success?
190Geometric Distribution
- Let the random variable X be the number of trials
needed to obtain a success. Then X has values in
the range 1, 2, . . ., and PrX k qk-1p ,
(for k larger than 0)since
we have k - 1 failures before the one success. - A probability distribution satisfying this
equation is said to be a geometric distribution.
191Geometric Distribution
- This is the geometric dictribution (picture
taken from Cormen, Leiserson and Rivests book on
Algorithms) - In this case, the coin has probability p 1/3 of
success and a probability q 1 - p of failure
192Geometric distribution
- Expectationwe can use the relation
- That holds when the summation is infinite and x
lt 1
193Geometric Distribution
The expectation of the distribution is 1/p 3.
194Geometric Distribution
- The variance, which can be calculated similarly,
isVarX q/p2 - Example repeatedly roll two dice until we
obtain either a seven or an eleven.Of the 36
possible outcomes, 6 yield a seven and 2 yield an
eleven. Thus, the probability of success is p
8/36 2/9, and we must roll 1/p 9/2 4.5
times on average to obtain a seven or eleven. - NEXT WEEK we will implement things like this .
195BINOMIAL DISTRIBUTION
- How many successes occur during n Bernoulli
trials, where a success occurs with probability p
and a failure with probability q 1 - p?
196Binomial Distribution
- Define the random variable X to be the number of
successes in n trials. Then X has values in the
range 0, 1, . . . , n, and for k 0, . . . ,
n, - since there are ways to pick which k of
the n trials are successes, and the probability
that each occurs is pkqn-k. A probability
distribution satisfying this equation is said to
be a binomial distribution.
197Binomial Distribution
- Let Xi be the random variable describing the
number of successes in the ith trial. Then EXi
p1 q0 p, and by linearity of expectation,
the expected number of successes for n trials is
198Binomial Distribution
- Similarly we can do for the variance, exploiting
the relation VarXEX2 - E2X - Since Xi only takes on the values 0 and 1, we
have EX2 EXp - And hence VarXi p - p2 pq .
- Then we can use independence, to move from
VarXi to the variance of the binomial
199Binomial Distribution
- The binomial distribution increases as k runs
from 0 to n until it reaches the mean np, and
then it decreases. - Picture from cormen, leiserson, rivests book
200Binomial Distribution
201Conclusion
- Conditional PMF in RVs
- Independence
- Expectation and Variance for RVs
- Geometric distribution
- Binomial distribution
- ? next we will implement all of these ideas
202- EXTRA MATERIAL (NOT COVERED IN CLASS)
203Cards
? ? ? ?
Ace 2 3 4 5 6 7 8 9 10 Jack Queen King
204(No Transcript)
205Counting
- Probability of Generating a growing sequence of
cards (1,2,3,4,5,6,7,8,9,) - Probability of starting with a 1 probability of
having a 2 probability of having a king
206COUNTING METHODS
- How many ways to obtain K heads and N-K tails in
N coin tosses ? - How many ways to have a 4-of-a-kind ?
207Basic Counting
- Two experiments are performed. The first one can
have any one of N possible outcomes, the second
one any of M possible outcomes. - ? there are MN possible outcomes for the two
experiments considered together
208Basic Counting
- How many different arrangements of the letters
A,B,C are possible ? - ABCACBBACBCACABCBA
- Each arrangement known as a PERMUTATION.
- There are 6 possible permutations of a set of 3
objects - There are N! permutations of a set of N
objectsN!N(N-1)(N-2)321
209Combinations
- How many different groups of M objects can I form
from a total of N objects ?(e.g. how many groups
of 5 cards can I form from a deck of 52 ?) - (there are 52 ways to select the first 51 to
select the second but we are counting each
group each time we see one of its possible
orderings we need to correct for this ) - (5251504948)/(54321)
210Combinations
- Ways of choosing k elements out of a set of n
elements
211Combinations and Permutations
- How many ways to put N balls in K boxes ?
- OOO11O1O1OOO11 ? example
-
- 1 is the boundary of the box ? will use G(K-1)
1s - O is the ball ? N will use Os
- (NG)!
- Correct for permutations of the 1s and of the 0s
- (NG)!/(N!G!)
- If create and MNG
- M!/(M-G)!G! Same as before
In exampleG6 K7N8 M14
212COUNTING METHODS
- Combinations VS permutations
- How many sets of 3 numbers out of 10 ?
- How many ordered sets of 3 numbers ?
213Pascals Triangle
214Pascals Triangle
215Binomial Coefficient and Pascals Triangle
- A number in the triangle can be found by nCr (n
Choose r) where n is the number of the row and r
is the element in that row. For example, in row
3, 1 is the zeroth element, 3 is element number
1, the next three is the 2nd element, and the
last 1 is the 3rd element. The formula for nCr
is - n!--------r!(n-r)!
216Examples
- How many ways to select 5 cards from the deck ?
- How many ways to have 4 equal cards in a set of 5
? - Probability of selecting 5 cards containing a
poker ?
217Poker Probabilities
- Deck of 52 cards, rankedace, king, queen, jack,
10,9,8,7,6,5,4,3,2 (and ace again it can be
either high or low) - 4 suits spades, hearts, diamonds and clubs
- 5 card draw 5 cards make up a poker hand
- The highest hand wins
- Hands are ranked as follows
218Poker Probabilities
- Royal flush ? 10, J, Q, K, A of the same suit
- Four of a kind ? 4 cards of the same RANK
- Full house ?3 cards of the same rank 2 cards of
the same rank - Flush ? 5 cards of the same suit
219Poker Probabilities
- How many poker hands ?2,598,960
220Poker Probabilities
- How many combinations of royal flush?4
(probability 0.00000154) - How many combinations of 4-of-a-kind ?624
221- Consider a number of experiments with poker cards
- Write down SAMPLE SPACE
- Count possible outcomes for each experiment (see
book, or handouts)
? ? ? ?
Ace 2 3 4 5 6 7 8 9 10 Jack Queen King
222Kind of questions
- Probability of having King of ? at first draw ?
- Probability of having 4 kings ?
- Probability of having any set of 4 equal cards ?
- When we ask to write sample space for 5-cards
experiment, we do not mean to list all of the
outcomes (they are about 2.5 million), just to
show you know what the sample space is e.g.all
hands of 5 cards, or 2S, 2C,2D,
2H,3S,KS,KC,KD,KH, AC,
223How to do the homework
- Always write down probabilistic model
- Use one of the 3 formulae we have for COUNTING
number of events of a certain type, or of
outcomes - Use definitions like P(event) outcomes in
event / possible outcomes
224Combinations
- Ways of choosing k elements out of a set of n
elements - HOW MANY COMMITTEES OF 5 PEOPLE CAN WE MAKE OUT
OF A CLASS OF 10 PEOPLE ?
225Poker Probabilities
- How many poker hands ?2,598,960
226Poker Probabilities
- How many combinations of royal flush?4
(probability 0.00000154) - How many combinations of 4-of-a-kind ?624