EMAT 20205 Data Analysis WEEK 2

About This Presentation

Title:

EMAT 20205 Data Analysis WEEK 2

Description:

... of the case of the dice ... We want to avoid: rain, cold and traffic. These are three ... In a way, restrict to the case when only F exists, F is the ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 227

Provided by: NEL61

Category:

more less

Transcript and Presenter's Notes

Title: EMAT 20205 Data Analysis WEEK 2

1
EMAT 20205Data AnalysisWEEK -2

Nello Cristianini

2
Axioms of Probability

The probability law (assigning a number to each
event E) must satisfy the following axioms
Nonnegativity
Additivity if E and F are two disjoint events,
then the probability of their union satisfies
Normalization the probability of the entire
sample space is equal to 1

3
Some comments

The maximum value for the probability of an event
is 1 (probability of the entire sample space)
This means that that event is CERTAIN
P(W)1 means the outcome will be one of the
possible outcomes (obviously)(e.g. the dice
roll will certainly give outcome 1 or 2 or 3 or
4 or 5 or 6)

4
Comments

An event E is IMPOSSIBLE if it has zero
probability P(E)0
An event is CERTAIN if it has probability P(E)1
The interesting things happen in between

5
Comments on Additivity

Additivity if E and F are two disjoint events,
then the probability of their union satisfies
Probability of E or F is P(E) P(F)
E.g. in dice roll probability of 1 or 2 is
P(1)P(2)

6
Consequences

If we use a sample space WO1, O2, O3, O4,On
the probabilities of the outcomes Oi must
satisfyP(O1)P(O2)P(On)1
We will write this sum as

7
Consequences

From this axiom we can see thatthe probability
of the empty event is 0(so there MUST be
an outcome, think of dice roll example)

8
Probability Law

We have seen 3 axioms that must be satisfied by
the probability assignment to the outcomes
(sample space) and some of their consequences
BUT who gives us the probabilities ?
They are largely an arbitrary design choice
(although we will see practical methods)

9
Example

Think again of the case of the dice roll.
Given our knowledge of physics, and the symmetry
of a dice, we see no reason why a certain outcome
should be more likely than another. So we
wantP(1)P(2)P(3)P(4)P(5)P(6)
The normalization axiom gives P()1/6 for each
of them
We can then use these probabilities and the
axioms to compute probabilities of more complex
events

10
Example

Coin toss.
Again no reason to prefer one outcome over
another, soP(H)P(T)1/2
Unless

11
Frequency information

Unless we actually know that specific coin (or
dice) and we know the exact frequency of the
outcomes in the last 1000s experiments
Possibly the coin is not fair, and we observe 80
head, 20 tail outcomes
We can incorporate this in the model, assigning
P(H)0.8 P(T)0.2In the first case we have
used our knowledge of the situation in the
second case we have estimated the probabilities
by using frequencies

12
Probabilistic Model of Coin Toss

Sample space is WH,T
Possible events are all subsetsH,T, H, T,
0 (empty)
Fair coin ? P(H)P(T)0.5
P(H,T)P(H)P(T)1
P(0)0
So we have assigned a probability to EACH
possible event based on the probabilities on the
outcomes, in a way to satisfy all axioms

13
Model Toss of Three Coins

Sample space (8 possible outcomes)WHHH,HHT,HTH
, HTT, TTT, THH, THT, TTH
We assume they are all equally likely, so we
assign to each of them probability 1/8
The probability law should assign probabilities
to EVERY POSSIBLE EVENT

14
Tossing Three Coins

A possible event 2 heads occur
How many outcomes are in this event ?HHT, HTH,
THH
3 disjoint events, their union has probability
equal to the sum of their probabilities
P(HHT, HTH, THH) P(HHT)P(HTH)P(THH)
1/81/81/83/8

15
Tossing Three Coins

We can calculate similarly the probability of all
possible events, and this gives a probability law
that satisfies the axioms.
We can see that obtaining 3 heads has probability
1/8, less than observing 2 heads (3/8), and so on

16
Probability law for finite sample spaces

For finite sample spaces, we specify the
probability law by just assigning probabilities
to the individual outcomes
Often the outcomes are equiprobable,
thenP(E)number of outcomes in E / total number
of outcomes

17
Continuous Sample Space

In the case of the dart and target, things are
different
If each outcome is a point,its probability
cannotbe bigger than zero,else the total
probability will exceed one
Solution outcomes must be(infinitesimally)
small areas, not points
Do not worry too much about this for now

18
Properties of Probability Law
Assume area of set probability of event!
19
Using Probabilistic Models

Say we want to model an uncertain situation(e.g.
an experiment)
We first decide a sample space and a probability
law. This step is somewhat arbitrary, and fully
specifies the model.
Then operating within the model we derive the
probabilities of the events of interest, or other
properties. This is fully unambiguous.

20
Example

We want to choose a day in 2009 when to organize
a picnic
We want to avoid rain, cold and traffic
These are three possible events(dayrain
daycold daytraffic)not mutually exclusive

21
Assume this is a generic month. A random day will
havevalues for R,C,T we can compute the
probability forR (rain), or for nT (not
traffic) but also for R AND T
22
Event RAIN
23
Event COLD
24
Event TRAFFIC
25
Unions and Intersections of Events

We may want to calculate the probability to
randomly selecting a day that is both not-rainy
and not-cold
Today we talk of probabilities of COMBINATIONS of
events

26
Intersection of Events

Probability that BOTH events occur simultaneously
We DEFINE A NEW EVENT consisting of the outcomes
that are in both events E and F and we calculate
its probability
New event
The probability of both events occurring is

27
Intersection of Events

The probability of this event is the sum of the
probabilities of the outcomes that are both in E
and in F (e.g. fraction of days that are both R
and T)
Two events are mutually exclusive (or disjoint)
if their intersection is empty(e.g. R and nR
are disjoint)

28
rain
cold
EventRain and cold
29
Union of Events

We want to calculate the probability that at
least one of the events E and F occurs
This is the probability of the union event
The probability of G is the sum of the
probability of the outcomes that are in either E
or F(e.g. number of days that are either R or C)

30
rain
cold
EventRain OR cold
31
Other combinations

We can consider the probability of being in E and
not in F by considering the probability of being
in E and in FC

32
Dice Example

Event E 1,2,3 outcome is small
(less than 3)
Event F 2,4,6 outcome is even number
Probability of being either even OR small ?
Probability of being even AND small ?

33
R,C,T 6 6/30 nR,C,T 0 0/30 R,nC,T 16 16/30 nR,
NC,T 1 1/30 R,C,nT 0 0/30 nR,C,nT 6 6/30 R,nC,nT
1 1/30 nR,NC,nT 0 0/30
34
Important

Calculate the joint probabilities from the table
P(R,C,nT)0/30
P(R,C,T)6/30
P(R,C)P(R,C,nT)P(R,C,T)6/30

35
Conditional Probability

What is the probability of rain in this
month?(count all rainy days and divide by 30)
P ( R )R / Days
What is the probability of rain given that it is
cold ?

36
Conditional Probability

Outcomes of experiment days
Being a cold day is an event
Being a rainy day is an event
Probability of being cold AND rainy ?
Cold AND NOT rainy ?
NOW Is it more likely to be cold in rainy days ?
What about COLD given that it is RAINY ?

37
R,C,T 6 6/30 nR,C,T 0 0/30 R,nC,T 16 16/30 nR,
NC,T 1 1/30 R,C,nT 0 0/30 nR,C,nT 6 6/30 R,nC,nT
1 1/30 nR,NC,nT 0 0/30

P(d is rainy d is cold)
P(d is cold) 12/30
P(d is rainy) 23/30
P(d is rainy and cold)6/30

38
Is it more likely to have rain in cold days ?

P(rain)23/30
What is the rain probability IN THE COLD DAYS ?
Probability of rain given cold is
P(raincold) P(rain AND cold)/P(cold)
P(raincold) 6/120.5

39
Definition

We define conditional probability of E given F
Given that F is true, what is the probability of
E ?
In a way, restrict to the case when only F
exists, F is the universe here

40
Conditional probability

We can consider the conditional probability
P(EF) as a new probability law defined on a new
universe, F
P(FF)1
All other axioms also remain valid

41
Properties of Conditional Probability

It satisfies all the axioms to be a probability
law

42
Properties of Conditional Probability

Definition
This can be seen as a new probability law in the
restricted universe F
For finite sample spaces

43
Independent Events

We define 2 independent events as follows

Independent eventsP(EF)P(E)
44
Independent Events

2 independent eventsrain and monday
2 dependent eventsrain and january
2 dependent (?) eventstraffic and Monday
2 independent eventsjanuary and monday

In theory(not sure aboutour finite dataset)
45
Bayes Theorem

Calculation
P(cold)12/302/5
P(traffic)6/301/5
P(cold AND traffic)6/301/5
P(coldtraffic)1
P(trafficcold)1/2

46
Bayes Theorem
P(coldtraffic)P(traffic)P(cold AND traffic)

Calculation
P(cold)12/302/5
P(traffic)6/301/5
P(cold AND traffic)6/301/5
P(coldtraffic)1
P(trafficcold)1/2

11/51/5
P(trafficcold)P(cold)P(traffic AND cold)
½2/51/5
P(coldtraffic)P(traffic) P(trafficcold)P(cold)
47
Bayes Theorem

P(coldtraffic)P(traffic) P(trafficcold)P(cold)

P(coldtraffic) P(trafficcold)P(cold)/P(traffic)
48
Independent Events

P(EF)P(E)
E independent of F

49
Independent Events

Since it was
And we are assuming
it follows that for independent events

50
Independent Events

If E and F are independent, so are E and FC

51
Independence of 3 events

E,F,G are independent if every subset of these 3
events is independent
E,F are independent
E,G are independent
F,G are independent
And P(E,F,G)P(E)P(F)P(G)

52
Independent Events

We can decompose joint probabilitiesP(E,F,G)P(E
)P(F)P(G)if they are independent
Otherwise, we should writeP(E,F,G)P(EF,G)P(FG
)P(G)

53
Bernoulli Trials

Toss a coin N times
Probability of starting with H ½
Probability of starting with HH ½ ½
Probability of N consecutive H (½)N

54
MATLAB INTERLUDE

INTERSECT Set intersection.
INTERSECT(A,B) when A and B are vectors returns
the values common to both A and B. The result
will be sorted. A and B can be cell arrays of
strings.

55
MATLAB INTERLUDE

UNION Set union.
UNION(A,B) when A and B are vectors returns the
combined values from A and B but with no
repetitions. The result will be sorted.

56
MATLAB INTERLUDE

FIND Find indices of nonzero elements.
I FIND(X) returns the linear indices
corresponding to the nonzero entries of the array
X.
X may be a logical expression.
So you can find elements in a set with a given
property, and make a new set

57
MATLAB INTERLUDE

LENGTH Length of vector.
LENGTH(X) returns the length of vector X. It is
equivalent to MAX(SIZE(X)) for non-empty arrays
and 0 for empty ones.

58
MATLAB INTERLUDE

You can use these set commands to count the
elements in various sets, and hence to compute
probabilities

59
Topics

Modeling with Random Variables
Discrete Random Variables
Events and
Probability Mass Function
Examples of RV
Bernoulli
Binomial
Geometric
The concept of Expectation

60
RANDOM VARIABLES

We have studied Probabilistic Models in general,
the notions of outcome, sample space and event.
Now an important special casein many
probabilistic models the outcomes are NUMBERS, or
can be associated to numbers

61
RANDOM VARIABLES

Examples of numerical outcome
How many people showed up today ?
How many are sitting next to a statistics major?
How many days of rain in january ?
Temperature on a given day ?
OR we can ASSOCIATE numerical values to
non-numerical outcomes

62
RANDOM VARIABLES

Associating numerical values to non-numerical
outcomes HOMEWORK EXPERIMENT
Outcome the homework
Sample space set of all possible answers you
COULD have given
Associated numerical value the GRADE

63
RANDOM VARIABLES

Easier model multiple choice quiz
10 questions, 3 choices each (A,B,C)
Experiment give the test to a student
Outcome a string of 10 symbols
Sample space set of all possible 10 symbols
strings
Numeric value the grade assigned to each string
(some form of distance to correct string)

64
RANDOM VARIABLES

We call RANDOM VARIABLE a real-valued function of
the outcome of an experiment
Given an experiment, and the corresponding set of
possible outcomes, a random variable associates a
particular number with each outcome

65
RANDOM VARIABLES

Example
Sample space AAA, AAB, AAC, .
Random variableAAA?3AAB?2AAC?3
This could be a model of grading a test

66
RANDOM VARIABLES

Why are RANDOM VARIABLES important ?
They allow us to model uncertain situations in a
quantitative way, we will talk aboutthe
EXPECTED temperature on january 25, or the
EXPECTED number of students that will pass the
test, etc.
We can also talk about expected deviations from
this estimate

67
RANDOM VARIABLES(continuous vs discrete)

A random variable is called discrete if its range
(the set of values it can take) is finite or
COUNTABLY infinite
It is called continuous for example - if its
range is the real axis (but we will not deal with
this case today)

68
RANDOM VARIABLES

Examples of discrete random variables
Number of things (number of tails in1000 coin
tosses)
Number of minutes this class will last
Roll of 2 dice, sum or product of the outputs is
a discrete random variable

69
RANDOM VARIABLES

The 2- dice example
Let us call A B C D E
F

70
RANDOM VARIABLES

Let us consider the following random variable N
associated to one diceN(A)1N(B)2N(
C)3N(D)4N(E)5N(F)6

71
RANDOM VARIABLES

Sample space of the 2 dice experiment
AA,AB,AC,AD,AE,AF, BA,BB,BC,BD,BE,BF,
CA,CB,CC,CD,CE,CF, DA,DB,DC,DD,DE,DF,
EA,EB,EC,ED,EE,EF, FA,FB,FC,FD,FE,FF,

72
RANDOM VARIABLES

Sum random variableAA?112 S(AA)AB?123
S(AB)FF?6612 S(FF)
Range of random variable2,3,4,5,6,7,8,9,10,11,1
2

73
RANDOM VARIABLES

Similarly we can define the random variable
PRODUCT, etc
So after the same experiment (rolling 2 dice) we
may define different random variables (sum,
absolute difference, product, max, min, etc of
the two individual outcomes )
Whatever attaches a numeric value to the OUTCOME
of the experiment is a RANDOM VARIABLE

74
RANDOM VARIABLESimportant concepts

A discrete random variable is a real valued
function of the outcome of the experiment that
can take a finite or countably infinite number of
values
A function of a discrete random variable defines
another random variable
We will define MEAN and VARIANCE of a random
variable
We will define independence and all other
concepts we defined in the previous classes

75
RANDOM VARIABLES

For discrete random variables we will define
PROBABILITY MASS FUNCTIONS, that are probability
laws that assign a probability to each possible
numerical value the random variable can assume
It will be analogous to what done so far

76
RANDOM VARIABLESnotation

We will denote by uppercase letters (X) the
random variable, by lowercase letters (x) the
actual value it assumes in a given experiment
So we will talk about the probability that Xx,
for example and we will write it P(Xx)

77
RANDOM VARIABLES

Look at the website of the course, where we
publish the statistics about the past homeworks
Random variable GRADE, G
A particular grade g
For example we can talk about P(G27)

78
RANDOM VARIABLES

Easier model multiple choice quiz
10 questions, 3 choices each (A,B,C)
Experiment give the test to a student
Outcome a string of 10 symbols
Sample space set of all possible 10 symbols
strings
Numeric value the grade assigned to each string
(some form of distance to correct string)

79
RANDOM VARIABLES

We call RANDOM VARIABLE a real-valued function of
the outcome of an experiment
Given an experiment, and the corresponding set of
possible outcomes, a random variable associates a
particular number with each outcome

80
RANDOM VARIABLESimportant concepts

A discrete random variable is a real valued
function of the outcome of the experiment that
can take a finite or countably infinite number of
values
A function of a discrete random variable defines
another random variable
We will define MEAN and VARIANCE of a random
variable
We will define independence and all other
concepts we defined in the previous classes

81
RANDOM VARIABLES

For discrete random variables we will define
PROBABILITY MASS FUNCTIONS, that are probability
laws that assign a probability to each possible
numerical value the random variable can assume
It will be analogous to what done so far

82
RANDOM VARIABLESnotation

We will denote by uppercase letters (X) the
random variable, by lowercase letters (x) the
actual value it assumes in a given experiment
So we will talk about the probability that Xx,
for example and we will write it P(Xx)

83
Probability Mass Function (PMF)

The most important way to characterize a random
variable is through the probabilities of the
values that it can take
For the random variable X, these are given by the
PMF of X, denoted pX.
If x is any possible value of X, the probability
mass of x, pX(x) is the probability of the event
Xx, consisting of all outcomes that give rise
to a value of X equal to x
pX(x)P(Xx)

84
PMF

Example experiment tossing 2 fair coins
Random Variable X number of heads obtained
(range 0,1,2)
Compute the PMF of X
pX(x)

¼ if x0 ½ if x1 ¼ if x2 0 otherwise
(impossible)
85
PMF

Event x0 ? corr. Outcome TT
Event x1 ? corr. Outcomes HT or TH
Event x2 ? corr. Outcome HH
Each outcome has probability ¼
? hence the probabilities given before
(grouping outcomes based on value of random
variable a way to define events )

86
PMF

Some propertiessince the events corresponding
to each value of the random variable must be
disjoint, and form a partition of the sample
space,
From probability axioms we obtain

87
PMF

By a similar argument, we have for any set S of
possible values of X In coin example
before, we can say probability of at least 1
head is ¾ (sum of prob 1 heat prob 2 heads)

88
PMF
89
PMF

Some propertiessince the events corresponding
to each value of the random variable must be
disjoint, and form a partition of the sample
space,
From probability axioms we obtain

90
PMF

By a similar argument, we have for any set S of
possible values of XIn coin example before,
we can say probability of at least 1 head is ¾
(sum of prob 1 heat prob 2 heads)

91
Functions of Random Variables

One can generate new random variables as
functions of random variables

92
CALCULATION OF PMF OF A RANDOM VARIABLE X

For each possible value x of X
Collect all the possible outcomes that give rise
to the event Xx
Add their probabilities to obtain pX(x)
THIS IS IMPORTANT !!

93
Example

Probability of having HW grade larger than 30 ?
Prob G30 prob G31 prob G40
Each probability count number of outcomes,
divide by total sample space size

94
Expectation

The PMF of a random variable provides us with
several numbers the probabilities of all
possible values of X
We would like to summarize this in few numbers
that represent the PMF
One such number is the EXPECTATION

95
Expectation

Expected value of Xweighted average of all
possible values of X (using probabilities as
weights)

96
Expectation

Suppose you roll a dice many times, and each time
you receive as many dollars as the outcome of the
dice-roll
How much money would you expect for each roll ?
We need to specify these terms

97
Expectation

Suppose you roll the dice K times, and Ki is the
number of times the outcome is i
Sample space 1,2,3,4,5,6
The total amount of money you receive is

98
Expectation

The total amount in K rolls is
So the amount per roll is

99
Expectation

If we have been rolling the dice many times (K
is v. large), we can approximate the probability
of an outcome with its frequency piKi/K
Then we can write the expected amount of money
as

100
Expectation

We define the expected value (expectation, or
mean) of a random variable X, with PMF pX, by

101
Expectation

Remark we can consider this as the center of
gravity of the distribution

102
Variance

Other important quantity to describe PMF.
Expectation we know the average behavior of
the random variable
But how often does the random variable deviate
from the average behavior ?

103
Variance

Let us create a NEW random variable describing
the deviation of X from its mean EX, and let us
study it
What is the expected value of the random variable
(X-EX)2 ?

104
Variance

New random variable (X-EX)2
Its expectation E(X-EX)2Var(X) is called
the variance of X
It is always nonnegative
Provides a measure of dispersion of X around its
mean

105
Variance
106
Variance

Another related measure of dispersion is the
standard deviation of X, defined as the square
root of the variance

From a practical viewpoint, the STD is easier to
use because its has the same units as X(I.e. if
X is in meters, STD will be in meters, Var(X) in
square meters)

107
Calculation of Variance

Can just study expectation of R.V. Z(X-EX)2
X
Z
Var(X)EZ

108
Expected Value of Functions of Random Values

Let X be a random variable with PMF p(x), and let
g(X) be a function of X
?The expected value of the random variable g(X)
is

109
Variance

So the variance can be calculated as

110
Properties of Mean and Variance

Let X be a random variable and let us consider
the linear function YaXb where a,b are given
scalars. Then
EYaEXb
Var(Y)a2Var(X)
THIS ONLY if g(X) is linear !!

111
A useful relation(variance as a function of
moments)

Var(X)E(X-EX) 2
Var(X)EX2-(EX) 2
Proof SEE IN LATER SLIDES FOR FULL PROOF Use
the relation

112
Variance

The variance can be calculated as

113
Properties of Mean and Variance

Let X be a random variable and let us consider
the linear function YaXb where a,b are given
scalars. Then
EYaEXb
Var(Y)a2Var(X)
THIS ONLY if g(X) is linear !!

114
A useful relation

Var(X)E(X-EX 2)
Var(X)EX2-(EX) 2
Proof either as HW or with TasUse the relation

115
Variance Calculation

Var(X)E(X-EX 2) EX2-(EX) 2

We will use this a lot
116
Covariance of 2 RVs

In probability theory and statistics, covariance
is a measure of how much two variables change
together (variance is a special case of the
covariance when the two variables are identical).
If two variables tend to vary together (that is,
when one of them is above its expected value,
then the other variable tends to be above its
expected value too), then the covariance between
the two variables will be positive. On the other
hand, when one of them is above its expected
value the other variable tends to be below its
expected value, then the covariance between the
two variables will be negative.
from wikipedia

117
Covariance of 2 RVs

The covariance between two real-valued random
variables X and Y, with expected values E(X)m
E(Y)n is defined as
Cov(X, Y) E(X - m) (Y - n)

118
In Matlab

COV Covariance matrix.
COV(X), if X is a vector, returns the
variance. For matrices,
where each row is an observation, and each
column a variable,
COV(X) is the covariance matrix.
DIAG(COV(X)) is a vector of
variances for each column, and
SQRT(DIAG(COV(X))) is a vector
of standard deviations. COV(X,Y), where X and
Y are matrices with
the same number of elements, is equivalent to
COV(X() Y()).

119
Correlation Coefficient
From wikipedia
120
Correlation CoefficientBetween 2 Random Variables

CORRCOEF Correlation coefficients.
RCORRCOEF(X) calculates a matrix R of
correlation coefficients for
an array X, in which each row is an
observation and each column is a
variable.
RCORRCOEF(X,Y), where X and Y are column
vectors, is the same as
RCORRCOEF(X Y).
If C is the covariance matrix, C COV(X),
then CORRCOEF(X) is
the matrix whose (i,j)'th element is
C(i,j)/SQRT(C(i,i)C(j,j)).

121
(No Transcript)
122
EXTRA MATERIALBELOW THIS POINT

WHAT FOLLOWS IS EXTRA MATERIAL FOR REFERENCE
Not covered in class 1 of week 2 (refers to
class 2 of week 2)

123
(No Transcript)
124
(No Transcript)
125
(No Transcript)
126
Bernoulli Random Variable

Consider the toss of a (generally not fair) coin,
probability H p prob T 1-p
The BERNOULLI random variable is a RV that takes
the two values 0 or 1 depending on whether the
outcome is H or T(remember RV is a function of
the outcome)
X1 if outcome is H X0 if outcome is T

127
Bernoulli Random Variable

The PMF of this Bernoulli RV is
PX(x)
Very important RV in modeling any generic
situation with just 2 outcomes, e.g. outcome of
the football match on Sunday,

P if x1 1-p if x0
128
Binomial Random Variable

Experiment N coin tosses, each one with
prob(H)p prob(T)1-p
The random variable X is the number of heads in
the n-toss sequence
We refer to X as a BINOMIAL RANDOM VARIABLE WITH
PARAMETERS n AND p

129
Binomial Random Variable

The PMF of X consists of the binomial
probabilities we have seen some time ago
Two parts
probability of a sequence with k heads and n-k
tails
Number of sequences with k heads and n-k tails

130
Binomial Random Variable

The normalization property can be written as
We will study this more in the future

131
Geometric Random Variable

We repeatedly toss the same coin as before.
RV number of tosses before the first head comes
up
TTTTTTTH
TTH
H
TTTTTTTTTTTTTTTTTTTTTTTTTTTTH

132
Geometric Random Variable

PMFtwo parts probability of the prefix of
k1 tails, and probability of the end H
Normalization

133
Geometric Random Variable

This can model the process of you trying to
connect with the modem to an internet service
provider (how many fails before 1 success ?)

134
Poisson Random Variable
135
Functions of Random Variables

One can generate new random variables as
functions of random variables

136
Expectation

The PMF of a random variable provides us with
several numbers the probabilities of all
possible values of X
We would like to summarize this in few numbers
that represent the PMF
One such number is the EXPECTATION

137
Expectation

Expected value of Xweighted average of all
possible values of X (using probabilities as
weights)
Next time we will develop this and other
concepts

138
Conclusion

Random Variables
Probability Mass Functions
How to calculate PMFs
Bernoulli
Binomial
Geometric
Poisson ?

139
(No Transcript)
140
Probability Mass Function (PMF)

The most important way to characterize a random
variable is through the probabilities of the
values that it can take
For the random variable X, these are given by the
PMF of X, denoted pX.
If x is any possible value of X, the probability
mass of x, pX(x) is the probability of the event
Xx, consisting of all outcomes that give rise
to a value of X equal to x
pX(x)P(Xx)

141
PMF

Example experiment tossing 2 fair coins
Random Variable X number of heads obtained
(range 0,1,2)
Each outcome has probability ¼
? hence the probabilities are
Event x0 ? corr. Outcome TT
Event x1 ? corr. Outcomes HT or TH
Event x2 ? corr. Outcome HH

142
PMF

Compute the PMF of X
pX(x)

¼ if x0 ½ if x1 ¼ if x2 0 otherwise
(impossible)
143
PMF
144
PMF

Some propertiessince the events corresponding
to each value of the random variable must be
disjoint, and form a partition of the sample
space,
From probability axioms we obtain

145
PMF

By a similar argument, we have for any set S of
possible values of XIn coin example before,
we can say probability of at least 1 head is ¾
(sum of prob 1 heat prob 2 heads)

146
Functions of Random Variables

One can generate new random variables as
functions of random variables

147
Bernoulli Random Variable

Consider the toss of a (generally not fair) coin,
probability H p prob T 1-p
The BERNOULLI random variable is a RV that takes
the two values 0 or 1 depending on whether the
outcome is H or T(remember RV is a function of
the outcome)
X1 if outcome is H X0 if outcome is T

148
Bernoulli Random Variable

The PMF of this Bernoulli RV is
PX(x)
Very important RV in modeling any generic
situation with just 2 outcomes, e.g. outcome of
the football match on Sunday,

P if x1 1-p if x0
149
Mean and Variance

EX1p 0(1-p)p
EX2 12p 02(1-p)p
Var(X)EX2-(EX) 2p-p2p(1-p)

150
Uniform Distribution dice roll

see later slides

151
Binomial Random Variable

Experiment N coin tosses, each one with
prob(H)p prob(T)1-p
The random variable X is the number of heads in
the n-toss sequence
We refer to X as a BINOMIAL RANDOM VARIABLE WITH
PARAMETERS n AND p

152
Binomial Random Variable

The PMF of X consists of the binomial
probabilities we have seen some time ago
Two parts
probability of a sequence with k heads and n-k
tails
Number of sequences with k heads and n-k tails

153
Binomial Random Variable

The normalization property can be written as
We will study this more in the future

154
Geometric Random Variable

We repeatedly toss the same coin as before.
RV number of tosses before the first head comes
up
TTTTTTTH
TTH
H
TTTTTTTTTTTTTTTTTTTTTTTTTTTTH

155
Geometric Random Variable

PMFtwo parts probability of the prefix of
k1 tails, and probability of the end H
Normalization

156
Bernoulli Random Variable

Consider the toss of a (generally not fair) coin,
probability H p prob T 1-p
The BERNOULLI random variable is a RV that takes
the two values 0 or 1 depending on whether the
outcome is H or T(remember RV is a function of
the outcome)
X1 if outcome is H X0 if outcome is T

157
Bernoulli Random Variable

The PMF of this Bernoulli RV is
PX(x)
Very important RV in modeling any generic
situation with just 2 outcomes, e.g. outcome of
the football match on Sunday,

P if x1 1-p if x0
158
Mean and Variance

EX1p 0(1-p)p
EX2 12p 02(1-p)p
Var(X)EX2-(EX) 2p-p2p(1-p)

159
Two Important Series

We do not derive them here.We will apply these
to calculations of variance

160
(No Transcript)
161
Uniform Distribution dice roll

Discrete Uniform PMF over a,b(case of the dice
rolls)

162
Uniform
The expectation is This can be seen
directly, since the PMF is symmetric around
(ab/2). Or use the series given before... Dice
example 12345621Direct Computation of
Expectation 21/63.5Formula says (16)/23.5

163
Variance of Discrete Uniform

We first study case where a1 bn the general
case will reduce to this
We will use relation Var(X)EX2-(EX)2

Can verify this by inductionof just believe it
164
Variance of Discrete Uniform
Notice we are still working with special case
a1 bn
165
Variance of Discrete Uniform

Now we can study the general case by SHIFTING a
distribution, its variance does not change (so we
can study a,b case by studying variance of
1,b-a1 case)
So setting nb-a1 in the previous equation
gives the general case

166
Variance of Discrete Uniform

Example I get 1 for each point on the dice, I
can expect 3.5 dollars at each roll, and a
Standard Deviation of sqrt(35/12)1.7

167
Binomial Random Variable

Experiment N coin tosses, each one with
prob(H)p prob(T)1-p
The random variable X is the number of heads in
the n-toss sequence
We refer to X as a BINOMIAL RANDOM VARIABLE WITH
PARAMETERS n AND p

168
Binomial Random Variable

The PMF of X consists of the binomial
probabilities we have seen some time ago
Two parts
probability of a sequence with k heads and n-k
tails
Number of sequences with k heads and n-k tails

169
Binomial Random Variable

The normalization property can be written as
We will study this more in the future

170
QUESTION

There are 94 students
Each has probability 1/3 to get an A
The number of students that get an A is a random
variable
What is its mean ? (how many are expected to get
an A)

171
Mean of the Binomial

If we want the mean of the binomial, we first
need to learn how to handle JOINT PMFs of
MULTIPLE RANDOM VARIABLES

172
JOINT PMFs of MULTIPLE RANDOM VARIABLES

Consider 2 discrete random variables, X and Y
associated with the same experiment
The probabilities of the values that X and Y can
take, are captured by the JOINT PMF of X and Y,
written pX,Y
pX,Y(x,y)P(Xx,Yy)

173
JOINT PMF of 2 RV

(if we consider the pair X,Y as a random
variable, all ideas transfer )
If A is an event (set of pairs (x,y) that have a
certain property) then

P((X,Y) in A)S(x,y in A)pX,Y(x,y)
174
students

Consider the random variable Xi that is 1 if
student i gets an A, and 0 otherwise
If n students, probability p, this is np

175
Conclusion

Mean of Random Variables
Variance of Random Variables
Properties, relations for variance and moments
Bernoulli
Discrete Uniform,
General Methods for variance calculation

176
(No Transcript)
177
topics

Some probability distributions
Some real applicationsdecision making modeling
clashes between ants
Modeling the distribution of ping times

178
Marginalization

For a fixed value y,
Using the definition of conditional probability,
we have

179
Random Variables

Joint probability
Conditional probability
Independence

180
Joint Probability

It is common for several random variables to be
defined on the same sample space. If X and Y are
random variables, the functionf(x,y) PrX x
and Y y is the joint probability mass function
of X and Y.

181
Independent Random Variables

We define two random variables X and Y to be
independent if for all x and y, the events X x
and Y y are independent or, equivalently, if
for all x and y, we have PrX x and Y y
PrX x PrY y.

182
Functions of Random Variables

Given a set of random variables defined over the
same sample space, one can define new random
variables as sums, products, or other functions
of the original variables.

183
Expected value of a random variable

The simplest and most useful summary of the
distribution of a random variable is the
"average" of the values it takes on. The expected
value (or, synonymously, expectation or mean) of
a discrete random variable X is

184
Expectation of joint RVs

Given random variables X and Y, and given their
PMF PXx and Yy, what is their joint
expectation ? EX,Y
Easy if they are independent

185
Expectation of Joint Independent RVs
186
In general

In general, when n random variables X1, X2, . . .
, Xn are mutually independent,EX1X2 Xn
EX1EX2 EXn .

187
More about independent RVs

When X and Y are independent random variables,
VarX Y VarX VarY.
(whereas for ANY random variablesthe expectation
of the sum is the sum of their expectations, that
is,
EX Y EX EY , )

188
The Geometric Distribution

A coin flip is an instance of a Bernoulli trial,
which is defined as an experiment with only two
possible outcomes success, which occurs with
probability p, and failure, which occurs with
probability q 1 - p.
When we speak of Bernoulli trials collectively,
we mean that the trials are mutually independent
and that each has the same probability p for
success.
Two important distributions arise from Bernoulli
trials the geometric distribution and the
binomial distribution.

189
Geometric Distribution

Take a sequence of Bernoulli trials, each with a
probability p of success and a probability q 1
- p of failure.
How many trials occur before we obtain a success?

190
Geometric Distribution

Let the random variable X be the number of trials
needed to obtain a success. Then X has values in
the range 1, 2, . . ., and PrX k qk-1p ,
(for k larger than 0)since
we have k - 1 failures before the one success.
A probability distribution satisfying this
equation is said to be a geometric distribution.

191
Geometric Distribution

This is the geometric dictribution (picture
taken from Cormen, Leiserson and Rivests book on
Algorithms)
In this case, the coin has probability p 1/3 of
success and a probability q 1 - p of failure

192
Geometric distribution

Expectationwe can use the relation
That holds when the summation is infinite and x
lt 1

193
Geometric Distribution
The expectation of the distribution is 1/p 3.
194
Geometric Distribution

The variance, which can be calculated similarly,
isVarX q/p2
Example repeatedly roll two dice until we
obtain either a seven or an eleven.Of the 36
possible outcomes, 6 yield a seven and 2 yield an
eleven. Thus, the probability of success is p
8/36 2/9, and we must roll 1/p 9/2 4.5
times on average to obtain a seven or eleven.
NEXT WEEK we will implement things like this .

195
BINOMIAL DISTRIBUTION

How many successes occur during n Bernoulli
trials, where a success occurs with probability p
and a failure with probability q 1 - p?

196
Binomial Distribution

Define the random variable X to be the number of
successes in n trials. Then X has values in the
range 0, 1, . . . , n, and for k 0, . . . ,
n,
since there are ways to pick which k of
the n trials are successes, and the probability
that each occurs is pkqn-k. A probability
distribution satisfying this equation is said to
be a binomial distribution.

197
Binomial Distribution

Let Xi be the random variable describing the
number of successes in the ith trial. Then EXi
p1 q0 p, and by linearity of expectation,
the expected number of successes for n trials is

198
Binomial Distribution

Similarly we can do for the variance, exploiting
the relation VarXEX2 - E2X
Since Xi only takes on the values 0 and 1, we
have EX2 EXp
And hence VarXi p - p2 pq .

Then we can use independence, to move from
VarXi to the variance of the binomial

199
Binomial Distribution

The binomial distribution increases as k runs
from 0 to n until it reaches the mean np, and
then it decreases.
Picture from cormen, leiserson, rivests book

200
Binomial Distribution
201
Conclusion

Conditional PMF in RVs
Independence
Expectation and Variance for RVs
Geometric distribution
Binomial distribution
? next we will implement all of these ideas

202

EXTRA MATERIAL (NOT COVERED IN CLASS)

203
Cards
? ? ? ?
Ace 2 3 4 5 6 7 8 9 10 Jack Queen King
204
(No Transcript)
205
Counting

Probability of Generating a growing sequence of
cards (1,2,3,4,5,6,7,8,9,)
Probability of starting with a 1 probability of
having a 2 probability of having a king

206
COUNTING METHODS

How many ways to obtain K heads and N-K tails in
N coin tosses ?
How many ways to have a 4-of-a-kind ?

207
Basic Counting

Two experiments are performed. The first one can
have any one of N possible outcomes, the second
one any of M possible outcomes.
? there are MN possible outcomes for the two
experiments considered together

208
Basic Counting

How many different arrangements of the letters
A,B,C are possible ?
ABCACBBACBCACABCBA
Each arrangement known as a PERMUTATION.
There are 6 possible permutations of a set of 3
objects
There are N! permutations of a set of N
objectsN!N(N-1)(N-2)321

209
Combinations

How many different groups of M objects can I form
from a total of N objects ?(e.g. how many groups
of 5 cards can I form from a deck of 52 ?)
(there are 52 ways to select the first 51 to
select the second but we are counting each
group each time we see one of its possible
orderings we need to correct for this )
(5251504948)/(54321)

210
Combinations

Ways of choosing k elements out of a set of n
elements

211
Combinations and Permutations

How many ways to put N balls in K boxes ?
OOO11O1O1OOO11 ? example
1 is the boundary of the box ? will use G(K-1)
1s
O is the ball ? N will use Os
(NG)!
Correct for permutations of the 1s and of the 0s
(NG)!/(N!G!)
If create and MNG
M!/(M-G)!G! Same as before

In exampleG6 K7N8 M14
212
COUNTING METHODS

Combinations VS permutations
How many sets of 3 numbers out of 10 ?
How many ordered sets of 3 numbers ?

213
Pascals Triangle
214
Pascals Triangle
215
Binomial Coefficient and Pascals Triangle

A number in the triangle can be found by nCr (n
Choose r) where n is the number of the row and r
is the element in that row. For example, in row
3, 1 is the zeroth element, 3 is element number
1, the next three is the 2nd element, and the
last 1 is the 3rd element. The formula for nCr
is
n!--------r!(n-r)!

216
Examples

How many ways to select 5 cards from the deck ?
How many ways to have 4 equal cards in a set of 5
?
Probability of selecting 5 cards containing a
poker ?

217
Poker Probabilities

Deck of 52 cards, rankedace, king, queen, jack,
10,9,8,7,6,5,4,3,2 (and ace again it can be
either high or low)
4 suits spades, hearts, diamonds and clubs
5 card draw 5 cards make up a poker hand
The highest hand wins
Hands are ranked as follows

218
Poker Probabilities

Royal flush ? 10, J, Q, K, A of the same suit
Four of a kind ? 4 cards of the same RANK
Full house ?3 cards of the same rank 2 cards of
the same rank
Flush ? 5 cards of the same suit

219
Poker Probabilities

How many poker hands ?2,598,960

220
Poker Probabilities

How many combinations of royal flush?4
(probability 0.00000154)
How many combinations of 4-of-a-kind ?624

221

Consider a number of experiments with poker cards
Write down SAMPLE SPACE
Count possible outcomes for each experiment (see
book, or handouts)

? ? ? ?
Ace 2 3 4 5 6 7 8 9 10 Jack Queen King
222
Kind of questions

Probability of having King of ? at first draw ?
Probability of having 4 kings ?
Probability of having any set of 4 equal cards ?
When we ask to write sample space for 5-cards
experiment, we do not mean to list all of the
outcomes (they are about 2.5 million), just to
show you know what the sample space is e.g.all
hands of 5 cards, or 2S, 2C,2D,
2H,3S,KS,KC,KD,KH, AC,

223
How to do the homework

Always write down probabilistic model
Use one of the 3 formulae we have for COUNTING
number of events of a certain type, or of
outcomes
Use definitions like P(event) outcomes in
event / possible outcomes

224
Combinations