EMAT 20205 Data Analysis WEEK 2 - PowerPoint PPT Presentation

1 / 226
About This Presentation
Title:

EMAT 20205 Data Analysis WEEK 2

Description:

... of the case of the dice ... We want to avoid: rain, cold and traffic. These are three ... In a way, restrict to the case when only F exists, F is the ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 227
Provided by: NEL61
Category:
Tags: emat | week | analysis | case | cold | data | files

less

Transcript and Presenter's Notes

Title: EMAT 20205 Data Analysis WEEK 2


1
EMAT 20205Data AnalysisWEEK -2
  • Nello Cristianini

2
Axioms of Probability
  • The probability law (assigning a number to each
    event E) must satisfy the following axioms
  • Nonnegativity
  • Additivity if E and F are two disjoint events,
    then the probability of their union satisfies
  • Normalization the probability of the entire
    sample space is equal to 1

3
Some comments
  • The maximum value for the probability of an event
    is 1 (probability of the entire sample space)
  • This means that that event is CERTAIN
  • P(W)1 means the outcome will be one of the
    possible outcomes (obviously)(e.g. the dice
    roll will certainly give outcome 1 or 2 or 3 or
    4 or 5 or 6)

4
Comments
  • An event E is IMPOSSIBLE if it has zero
    probability P(E)0
  • An event is CERTAIN if it has probability P(E)1
  • The interesting things happen in between

5
Comments on Additivity
  • Additivity if E and F are two disjoint events,
    then the probability of their union satisfies
  • Probability of E or F is P(E) P(F)
  • E.g. in dice roll probability of 1 or 2 is
    P(1)P(2)

6
Consequences
  • If we use a sample space WO1, O2, O3, O4,On
    the probabilities of the outcomes Oi must
    satisfyP(O1)P(O2)P(On)1
  • We will write this sum as

7
Consequences
  • From this axiom we can see thatthe probability
    of the empty event is 0(so there MUST be
    an outcome, think of dice roll example)

8
Probability Law
  • We have seen 3 axioms that must be satisfied by
    the probability assignment to the outcomes
    (sample space) and some of their consequences
  • BUT who gives us the probabilities ?
  • They are largely an arbitrary design choice
    (although we will see practical methods)

9
Example
  • Think again of the case of the dice roll.
  • Given our knowledge of physics, and the symmetry
    of a dice, we see no reason why a certain outcome
    should be more likely than another. So we
    wantP(1)P(2)P(3)P(4)P(5)P(6)
  • The normalization axiom gives P()1/6 for each
    of them
  • We can then use these probabilities and the
    axioms to compute probabilities of more complex
    events

10
Example
  • Coin toss.
  • Again no reason to prefer one outcome over
    another, soP(H)P(T)1/2
  • Unless

11
Frequency information
  • Unless we actually know that specific coin (or
    dice) and we know the exact frequency of the
    outcomes in the last 1000s experiments
  • Possibly the coin is not fair, and we observe 80
    head, 20 tail outcomes
  • We can incorporate this in the model, assigning
    P(H)0.8 P(T)0.2In the first case we have
    used our knowledge of the situation in the
    second case we have estimated the probabilities
    by using frequencies

12
Probabilistic Model of Coin Toss
  • Sample space is WH,T
  • Possible events are all subsetsH,T, H, T,
    0 (empty)
  • Fair coin ? P(H)P(T)0.5
  • P(H,T)P(H)P(T)1
  • P(0)0
  • So we have assigned a probability to EACH
    possible event based on the probabilities on the
    outcomes, in a way to satisfy all axioms

13
Model Toss of Three Coins
  • Sample space (8 possible outcomes)WHHH,HHT,HTH
    , HTT, TTT, THH, THT, TTH
  • We assume they are all equally likely, so we
    assign to each of them probability 1/8
  • The probability law should assign probabilities
    to EVERY POSSIBLE EVENT

14
Tossing Three Coins
  • A possible event 2 heads occur
  • How many outcomes are in this event ?HHT, HTH,
    THH
  • 3 disjoint events, their union has probability
    equal to the sum of their probabilities
  • P(HHT, HTH, THH) P(HHT)P(HTH)P(THH)
    1/81/81/83/8

15
Tossing Three Coins
  • We can calculate similarly the probability of all
    possible events, and this gives a probability law
    that satisfies the axioms.
  • We can see that obtaining 3 heads has probability
    1/8, less than observing 2 heads (3/8), and so on

16
Probability law for finite sample spaces
  • For finite sample spaces, we specify the
    probability law by just assigning probabilities
    to the individual outcomes
  • Often the outcomes are equiprobable,
    thenP(E)number of outcomes in E / total number
    of outcomes

17
Continuous Sample Space
  • In the case of the dart and target, things are
    different
  • If each outcome is a point,its probability
    cannotbe bigger than zero,else the total
    probability will exceed one
  • Solution outcomes must be(infinitesimally)
    small areas, not points
  • Do not worry too much about this for now

18
Properties of Probability Law
Assume area of set probability of event!
19
Using Probabilistic Models
  • Say we want to model an uncertain situation(e.g.
    an experiment)
  • We first decide a sample space and a probability
    law. This step is somewhat arbitrary, and fully
    specifies the model.
  • Then operating within the model we derive the
    probabilities of the events of interest, or other
    properties. This is fully unambiguous.

20
Example
  • We want to choose a day in 2009 when to organize
    a picnic
  • We want to avoid rain, cold and traffic
  • These are three possible events(dayrain
    daycold daytraffic)not mutually exclusive

21
Assume this is a generic month. A random day will
havevalues for R,C,T we can compute the
probability forR (rain), or for nT (not
traffic) but also for R AND T
22
Event RAIN
23
Event COLD
24
Event TRAFFIC
25
Unions and Intersections of Events
  • We may want to calculate the probability to
    randomly selecting a day that is both not-rainy
    and not-cold
  • Today we talk of probabilities of COMBINATIONS of
    events

26
Intersection of Events
  • Probability that BOTH events occur simultaneously
  • We DEFINE A NEW EVENT consisting of the outcomes
    that are in both events E and F and we calculate
    its probability
  • New event
  • The probability of both events occurring is

27
Intersection of Events
  • The probability of this event is the sum of the
    probabilities of the outcomes that are both in E
    and in F (e.g. fraction of days that are both R
    and T)
  • Two events are mutually exclusive (or disjoint)
    if their intersection is empty(e.g. R and nR
    are disjoint)

28
rain
cold
EventRain and cold
29
Union of Events
  • We want to calculate the probability that at
    least one of the events E and F occurs
  • This is the probability of the union event
  • The probability of G is the sum of the
    probability of the outcomes that are in either E
    or F(e.g. number of days that are either R or C)

30
rain
cold
EventRain OR cold
31
Other combinations
  • We can consider the probability of being in E and
    not in F by considering the probability of being
    in E and in FC

32
Dice Example
  • Event E 1,2,3 outcome is small
    (less than 3)
  • Event F 2,4,6 outcome is even number
  • Probability of being either even OR small ?
  • Probability of being even AND small ?

33
R,C,T 6 6/30 nR,C,T 0 0/30 R,nC,T 16 16/30 nR,
NC,T 1 1/30 R,C,nT 0 0/30 nR,C,nT 6 6/30 R,nC,nT
1 1/30 nR,NC,nT 0 0/30
34
Important
  • Calculate the joint probabilities from the table
  • P(R,C,nT)0/30
  • P(R,C,T)6/30
  • P(R,C)P(R,C,nT)P(R,C,T)6/30

35
Conditional Probability
  • What is the probability of rain in this
    month?(count all rainy days and divide by 30)
  • P ( R )R / Days
  • What is the probability of rain given that it is
    cold ?

36
Conditional Probability
  • Outcomes of experiment days
  • Being a cold day is an event
  • Being a rainy day is an event
  • Probability of being cold AND rainy ?
  • Cold AND NOT rainy ?
  • NOW Is it more likely to be cold in rainy days ?
  • What about COLD given that it is RAINY ?

37
R,C,T 6 6/30 nR,C,T 0 0/30 R,nC,T 16 16/30 nR,
NC,T 1 1/30 R,C,nT 0 0/30 nR,C,nT 6 6/30 R,nC,nT
1 1/30 nR,NC,nT 0 0/30
  • P(d is rainy d is cold)
  • P(d is cold) 12/30
  • P(d is rainy) 23/30
  • P(d is rainy and cold)6/30

38
Is it more likely to have rain in cold days ?
  • P(rain)23/30
  • What is the rain probability IN THE COLD DAYS ?
  • Probability of rain given cold is
    P(raincold) P(rain AND cold)/P(cold)
  • P(raincold) 6/120.5

39
Definition
  • We define conditional probability of E given F
  • Given that F is true, what is the probability of
    E ?
  • In a way, restrict to the case when only F
    exists, F is the universe here

40
Conditional probability
  • We can consider the conditional probability
    P(EF) as a new probability law defined on a new
    universe, F
  • P(FF)1
  • All other axioms also remain valid

41
Properties of Conditional Probability
  • It satisfies all the axioms to be a probability
    law

42
Properties of Conditional Probability
  • Definition
  • This can be seen as a new probability law in the
    restricted universe F
  • For finite sample spaces

43
Independent Events
  • We define 2 independent events as follows

Independent eventsP(EF)P(E)
44
Independent Events
  • 2 independent eventsrain and monday
  • 2 dependent eventsrain and january
  • 2 dependent (?) eventstraffic and Monday
  • 2 independent eventsjanuary and monday

In theory(not sure aboutour finite dataset)
45
Bayes Theorem
  • Calculation
  • P(cold)12/302/5
  • P(traffic)6/301/5
  • P(cold AND traffic)6/301/5
  • P(coldtraffic)1
  • P(trafficcold)1/2

46
Bayes Theorem
P(coldtraffic)P(traffic)P(cold AND traffic)
  • Calculation
  • P(cold)12/302/5
  • P(traffic)6/301/5
  • P(cold AND traffic)6/301/5
  • P(coldtraffic)1
  • P(trafficcold)1/2

11/51/5
P(trafficcold)P(cold)P(traffic AND cold)
½2/51/5
P(coldtraffic)P(traffic) P(trafficcold)P(cold)
47
Bayes Theorem
  • P(coldtraffic)P(traffic) P(trafficcold)P(cold)

P(coldtraffic) P(trafficcold)P(cold)/P(traffic)
48
Independent Events
  • P(EF)P(E)
  • E independent of F

49
Independent Events
  • Since it was
  • And we are assuming
  • it follows that for independent events

50
Independent Events
  • If E and F are independent, so are E and FC

51
Independence of 3 events
  • E,F,G are independent if every subset of these 3
    events is independent
  • E,F are independent
  • E,G are independent
  • F,G are independent
  • And P(E,F,G)P(E)P(F)P(G)

52
Independent Events
  • We can decompose joint probabilitiesP(E,F,G)P(E
    )P(F)P(G)if they are independent
  • Otherwise, we should writeP(E,F,G)P(EF,G)P(FG
    )P(G)

53
Bernoulli Trials
  • Toss a coin N times
  • Probability of starting with H ½
  • Probability of starting with HH ½ ½
  • Probability of N consecutive H (½)N

54
MATLAB INTERLUDE
  • INTERSECT Set intersection.
  • INTERSECT(A,B) when A and B are vectors returns
    the values common to both A and B. The result
    will be sorted. A and B can be cell arrays of
    strings.

55
MATLAB INTERLUDE
  • UNION Set union.
  • UNION(A,B) when A and B are vectors returns the
    combined values from A and B but with no
    repetitions. The result will be sorted.

56
MATLAB INTERLUDE
  • FIND Find indices of nonzero elements.
  • I FIND(X) returns the linear indices
    corresponding to the nonzero entries of the array
    X.
  • X may be a logical expression.
  • So you can find elements in a set with a given
    property, and make a new set

57
MATLAB INTERLUDE
  • LENGTH Length of vector.
  • LENGTH(X) returns the length of vector X. It is
    equivalent to MAX(SIZE(X)) for non-empty arrays
    and 0 for empty ones.

58
MATLAB INTERLUDE
  • You can use these set commands to count the
    elements in various sets, and hence to compute
    probabilities

59
Topics
  • Modeling with Random Variables
  • Discrete Random Variables
  • Events and
  • Probability Mass Function
  • Examples of RV
  • Bernoulli
  • Binomial
  • Geometric
  • The concept of Expectation

60
RANDOM VARIABLES
  • We have studied Probabilistic Models in general,
    the notions of outcome, sample space and event.
  • Now an important special casein many
    probabilistic models the outcomes are NUMBERS, or
    can be associated to numbers

61
RANDOM VARIABLES
  • Examples of numerical outcome
  • How many people showed up today ?
  • How many are sitting next to a statistics major?
  • How many days of rain in january ?
  • Temperature on a given day ?
  • OR we can ASSOCIATE numerical values to
    non-numerical outcomes

62
RANDOM VARIABLES
  • Associating numerical values to non-numerical
    outcomes HOMEWORK EXPERIMENT
  • Outcome the homework
  • Sample space set of all possible answers you
    COULD have given
  • Associated numerical value the GRADE

63
RANDOM VARIABLES
  • Easier model multiple choice quiz
  • 10 questions, 3 choices each (A,B,C)
  • Experiment give the test to a student
  • Outcome a string of 10 symbols
  • Sample space set of all possible 10 symbols
    strings
  • Numeric value the grade assigned to each string
    (some form of distance to correct string)

64
RANDOM VARIABLES
  • We call RANDOM VARIABLE a real-valued function of
    the outcome of an experiment
  • Given an experiment, and the corresponding set of
    possible outcomes, a random variable associates a
    particular number with each outcome

65
RANDOM VARIABLES
  • Example
  • Sample space AAA, AAB, AAC, .
  • Random variableAAA?3AAB?2AAC?3
  • This could be a model of grading a test

66
RANDOM VARIABLES
  • Why are RANDOM VARIABLES important ?
  • They allow us to model uncertain situations in a
    quantitative way, we will talk aboutthe
    EXPECTED temperature on january 25, or the
    EXPECTED number of students that will pass the
    test, etc.
  • We can also talk about expected deviations from
    this estimate

67
RANDOM VARIABLES(continuous vs discrete)
  • A random variable is called discrete if its range
    (the set of values it can take) is finite or
    COUNTABLY infinite
  • It is called continuous for example - if its
    range is the real axis (but we will not deal with
    this case today)

68
RANDOM VARIABLES
  • Examples of discrete random variables
  • Number of things (number of tails in1000 coin
    tosses)
  • Number of minutes this class will last
  • Roll of 2 dice, sum or product of the outputs is
    a discrete random variable

69
RANDOM VARIABLES
  • The 2- dice example
  • Let us call A B C D E
    F

70
RANDOM VARIABLES
  • Let us consider the following random variable N
    associated to one diceN(A)1N(B)2N(
    C)3N(D)4N(E)5N(F)6

71
RANDOM VARIABLES
  • Sample space of the 2 dice experiment
    AA,AB,AC,AD,AE,AF, BA,BB,BC,BD,BE,BF,
    CA,CB,CC,CD,CE,CF, DA,DB,DC,DD,DE,DF,
    EA,EB,EC,ED,EE,EF, FA,FB,FC,FD,FE,FF,

72
RANDOM VARIABLES
  • Sum random variableAA?112 S(AA)AB?123
    S(AB)FF?6612 S(FF)
  • Range of random variable2,3,4,5,6,7,8,9,10,11,1
    2

73
RANDOM VARIABLES
  • Similarly we can define the random variable
    PRODUCT, etc
  • So after the same experiment (rolling 2 dice) we
    may define different random variables (sum,
    absolute difference, product, max, min, etc of
    the two individual outcomes )
  • Whatever attaches a numeric value to the OUTCOME
    of the experiment is a RANDOM VARIABLE

74
RANDOM VARIABLESimportant concepts
  • A discrete random variable is a real valued
    function of the outcome of the experiment that
    can take a finite or countably infinite number of
    values
  • A function of a discrete random variable defines
    another random variable
  • We will define MEAN and VARIANCE of a random
    variable
  • We will define independence and all other
    concepts we defined in the previous classes

75
RANDOM VARIABLES
  • For discrete random variables we will define
    PROBABILITY MASS FUNCTIONS, that are probability
    laws that assign a probability to each possible
    numerical value the random variable can assume
  • It will be analogous to what done so far

76
RANDOM VARIABLESnotation
  • We will denote by uppercase letters (X) the
    random variable, by lowercase letters (x) the
    actual value it assumes in a given experiment
  • So we will talk about the probability that Xx,
    for example and we will write it P(Xx)

77
RANDOM VARIABLES
  • Look at the website of the course, where we
    publish the statistics about the past homeworks
  • Random variable GRADE, G
  • A particular grade g
  • For example we can talk about P(G27)

78
RANDOM VARIABLES
  • Easier model multiple choice quiz
  • 10 questions, 3 choices each (A,B,C)
  • Experiment give the test to a student
  • Outcome a string of 10 symbols
  • Sample space set of all possible 10 symbols
    strings
  • Numeric value the grade assigned to each string
    (some form of distance to correct string)

79
RANDOM VARIABLES
  • We call RANDOM VARIABLE a real-valued function of
    the outcome of an experiment
  • Given an experiment, and the corresponding set of
    possible outcomes, a random variable associates a
    particular number with each outcome

80
RANDOM VARIABLESimportant concepts
  • A discrete random variable is a real valued
    function of the outcome of the experiment that
    can take a finite or countably infinite number of
    values
  • A function of a discrete random variable defines
    another random variable
  • We will define MEAN and VARIANCE of a random
    variable
  • We will define independence and all other
    concepts we defined in the previous classes

81
RANDOM VARIABLES
  • For discrete random variables we will define
    PROBABILITY MASS FUNCTIONS, that are probability
    laws that assign a probability to each possible
    numerical value the random variable can assume
  • It will be analogous to what done so far

82
RANDOM VARIABLESnotation
  • We will denote by uppercase letters (X) the
    random variable, by lowercase letters (x) the
    actual value it assumes in a given experiment
  • So we will talk about the probability that Xx,
    for example and we will write it P(Xx)

83
Probability Mass Function (PMF)
  • The most important way to characterize a random
    variable is through the probabilities of the
    values that it can take
  • For the random variable X, these are given by the
    PMF of X, denoted pX.
  • If x is any possible value of X, the probability
    mass of x, pX(x) is the probability of the event
    Xx, consisting of all outcomes that give rise
    to a value of X equal to x
  • pX(x)P(Xx)

84
PMF
  • Example experiment tossing 2 fair coins
  • Random Variable X number of heads obtained
    (range 0,1,2)
  • Compute the PMF of X
  • pX(x)

¼ if x0 ½ if x1 ¼ if x2 0 otherwise
(impossible)
85
PMF
  • Event x0 ? corr. Outcome TT
  • Event x1 ? corr. Outcomes HT or TH
  • Event x2 ? corr. Outcome HH
  • Each outcome has probability ¼
  • ? hence the probabilities given before
  • (grouping outcomes based on value of random
    variable a way to define events )

86
PMF
  • Some propertiessince the events corresponding
    to each value of the random variable must be
    disjoint, and form a partition of the sample
    space,
  • From probability axioms we obtain

87
PMF
  • By a similar argument, we have for any set S of
    possible values of X In coin example
    before, we can say probability of at least 1
    head is ¾ (sum of prob 1 heat prob 2 heads)

88
PMF
89
PMF
  • Some propertiessince the events corresponding
    to each value of the random variable must be
    disjoint, and form a partition of the sample
    space,
  • From probability axioms we obtain

90
PMF
  • By a similar argument, we have for any set S of
    possible values of XIn coin example before,
    we can say probability of at least 1 head is ¾
    (sum of prob 1 heat prob 2 heads)

91
Functions of Random Variables
  • One can generate new random variables as
    functions of random variables

92
CALCULATION OF PMF OF A RANDOM VARIABLE X
  • For each possible value x of X
  • Collect all the possible outcomes that give rise
    to the event Xx
  • Add their probabilities to obtain pX(x)
  • THIS IS IMPORTANT !!

93
Example
  • Probability of having HW grade larger than 30 ?
  • Prob G30 prob G31 prob G40
  • Each probability count number of outcomes,
    divide by total sample space size

94
Expectation
  • The PMF of a random variable provides us with
    several numbers the probabilities of all
    possible values of X
  • We would like to summarize this in few numbers
    that represent the PMF
  • One such number is the EXPECTATION

95
Expectation
  • Expected value of Xweighted average of all
    possible values of X (using probabilities as
    weights)

96
Expectation
  • Suppose you roll a dice many times, and each time
    you receive as many dollars as the outcome of the
    dice-roll
  • How much money would you expect for each roll ?
  • We need to specify these terms

97
Expectation
  • Suppose you roll the dice K times, and Ki is the
    number of times the outcome is i
  • Sample space 1,2,3,4,5,6
  • The total amount of money you receive is

98
Expectation
  • The total amount in K rolls is
  • So the amount per roll is

99
Expectation
  • If we have been rolling the dice many times (K
    is v. large), we can approximate the probability
    of an outcome with its frequency piKi/K
  • Then we can write the expected amount of money
    as

100
Expectation
  • We define the expected value (expectation, or
    mean) of a random variable X, with PMF pX, by

101
Expectation
  • Remark we can consider this as the center of
    gravity of the distribution

102
Variance
  • Other important quantity to describe PMF.
  • Expectation we know the average behavior of
    the random variable
  • But how often does the random variable deviate
    from the average behavior ?

103
Variance
  • Let us create a NEW random variable describing
    the deviation of X from its mean EX, and let us
    study it
  • What is the expected value of the random variable
    (X-EX)2 ?

104
Variance
  • New random variable (X-EX)2
  • Its expectation E(X-EX)2Var(X) is called
    the variance of X
  • It is always nonnegative
  • Provides a measure of dispersion of X around its
    mean

105
Variance
106
Variance
  • Another related measure of dispersion is the
    standard deviation of X, defined as the square
    root of the variance
  • From a practical viewpoint, the STD is easier to
    use because its has the same units as X(I.e. if
    X is in meters, STD will be in meters, Var(X) in
    square meters)

107
Calculation of Variance
  • Can just study expectation of R.V. Z(X-EX)2
  • X
  • Z
  • Var(X)EZ

108
Expected Value of Functions of Random Values
  • Let X be a random variable with PMF p(x), and let
    g(X) be a function of X
  • ?The expected value of the random variable g(X)
    is

109
Variance
  • So the variance can be calculated as

110
Properties of Mean and Variance
  • Let X be a random variable and let us consider
    the linear function YaXb where a,b are given
    scalars. Then
  • EYaEXb
  • Var(Y)a2Var(X)
  • THIS ONLY if g(X) is linear !!

111
A useful relation(variance as a function of
moments)
  • Var(X)E(X-EX) 2
  • Var(X)EX2-(EX) 2
  • Proof SEE IN LATER SLIDES FOR FULL PROOF Use
    the relation

112
Variance
  • The variance can be calculated as

113
Properties of Mean and Variance
  • Let X be a random variable and let us consider
    the linear function YaXb where a,b are given
    scalars. Then
  • EYaEXb
  • Var(Y)a2Var(X)
  • THIS ONLY if g(X) is linear !!

114
A useful relation
  • Var(X)E(X-EX 2)
  • Var(X)EX2-(EX) 2
  • Proof either as HW or with TasUse the relation

115
Variance Calculation
  • Var(X)E(X-EX 2) EX2-(EX) 2

We will use this a lot
116
Covariance of 2 RVs
  • In probability theory and statistics, covariance
    is a measure of how much two variables change
    together (variance is a special case of the
    covariance when the two variables are identical).
  • If two variables tend to vary together (that is,
    when one of them is above its expected value,
    then the other variable tends to be above its
    expected value too), then the covariance between
    the two variables will be positive. On the other
    hand, when one of them is above its expected
    value the other variable tends to be below its
    expected value, then the covariance between the
    two variables will be negative.
  • from wikipedia

117
Covariance of 2 RVs
  • The covariance between two real-valued random
    variables X and Y, with expected values E(X)m
    E(Y)n is defined as
  • Cov(X, Y) E(X - m) (Y - n)

118
In Matlab
  • COV Covariance matrix.
  • COV(X), if X is a vector, returns the
    variance. For matrices,
  • where each row is an observation, and each
    column a variable,
  • COV(X) is the covariance matrix.
    DIAG(COV(X)) is a vector of
  • variances for each column, and
    SQRT(DIAG(COV(X))) is a vector
  • of standard deviations. COV(X,Y), where X and
    Y are matrices with
  • the same number of elements, is equivalent to
    COV(X() Y()).

119
Correlation Coefficient
From wikipedia
120
Correlation CoefficientBetween 2 Random Variables
  • CORRCOEF Correlation coefficients.
  • RCORRCOEF(X) calculates a matrix R of
    correlation coefficients for
  • an array X, in which each row is an
    observation and each column is a
  • variable.
  • RCORRCOEF(X,Y), where X and Y are column
    vectors, is the same as
  • RCORRCOEF(X Y).
  • If C is the covariance matrix, C COV(X),
    then CORRCOEF(X) is
  • the matrix whose (i,j)'th element is
  • C(i,j)/SQRT(C(i,i)C(j,j)).

121
(No Transcript)
122
EXTRA MATERIALBELOW THIS POINT
  • WHAT FOLLOWS IS EXTRA MATERIAL FOR REFERENCE
  • Not covered in class 1 of week 2 (refers to
    class 2 of week 2)

123
(No Transcript)
124
(No Transcript)
125
(No Transcript)
126
Bernoulli Random Variable
  • Consider the toss of a (generally not fair) coin,
    probability H p prob T 1-p
  • The BERNOULLI random variable is a RV that takes
    the two values 0 or 1 depending on whether the
    outcome is H or T(remember RV is a function of
    the outcome)
  • X1 if outcome is H X0 if outcome is T

127
Bernoulli Random Variable
  • The PMF of this Bernoulli RV is
  • PX(x)
  • Very important RV in modeling any generic
    situation with just 2 outcomes, e.g. outcome of
    the football match on Sunday,

P if x1 1-p if x0
128
Binomial Random Variable
  • Experiment N coin tosses, each one with
    prob(H)p prob(T)1-p
  • The random variable X is the number of heads in
    the n-toss sequence
  • We refer to X as a BINOMIAL RANDOM VARIABLE WITH
    PARAMETERS n AND p

129
Binomial Random Variable
  • The PMF of X consists of the binomial
    probabilities we have seen some time ago
  • Two parts
  • probability of a sequence with k heads and n-k
    tails
  • Number of sequences with k heads and n-k tails

130
Binomial Random Variable
  • The normalization property can be written as
  • We will study this more in the future

131
Geometric Random Variable
  • We repeatedly toss the same coin as before.
  • RV number of tosses before the first head comes
    up
  • TTTTTTTH
  • TTH
  • H
  • TTTTTTTTTTTTTTTTTTTTTTTTTTTTH

132
Geometric Random Variable
  • PMFtwo parts probability of the prefix of
    k1 tails, and probability of the end H
  • Normalization

133
Geometric Random Variable
  • This can model the process of you trying to
    connect with the modem to an internet service
    provider (how many fails before 1 success ?)

134
Poisson Random Variable
135
Functions of Random Variables
  • One can generate new random variables as
    functions of random variables

136
Expectation
  • The PMF of a random variable provides us with
    several numbers the probabilities of all
    possible values of X
  • We would like to summarize this in few numbers
    that represent the PMF
  • One such number is the EXPECTATION

137
Expectation
  • Expected value of Xweighted average of all
    possible values of X (using probabilities as
    weights)
  • Next time we will develop this and other
    concepts

138
Conclusion
  • Random Variables
  • Probability Mass Functions
  • How to calculate PMFs
  • Bernoulli
  • Binomial
  • Geometric
  • Poisson ?

139
(No Transcript)
140
Probability Mass Function (PMF)
  • The most important way to characterize a random
    variable is through the probabilities of the
    values that it can take
  • For the random variable X, these are given by the
    PMF of X, denoted pX.
  • If x is any possible value of X, the probability
    mass of x, pX(x) is the probability of the event
    Xx, consisting of all outcomes that give rise
    to a value of X equal to x
  • pX(x)P(Xx)

141
PMF
  • Example experiment tossing 2 fair coins
  • Random Variable X number of heads obtained
    (range 0,1,2)
  • Each outcome has probability ¼
  • ? hence the probabilities are
  • Event x0 ? corr. Outcome TT
  • Event x1 ? corr. Outcomes HT or TH
  • Event x2 ? corr. Outcome HH

142
PMF
  • Compute the PMF of X
  • pX(x)

¼ if x0 ½ if x1 ¼ if x2 0 otherwise
(impossible)
143
PMF
144
PMF
  • Some propertiessince the events corresponding
    to each value of the random variable must be
    disjoint, and form a partition of the sample
    space,
  • From probability axioms we obtain

145
PMF
  • By a similar argument, we have for any set S of
    possible values of XIn coin example before,
    we can say probability of at least 1 head is ¾
    (sum of prob 1 heat prob 2 heads)

146
Functions of Random Variables
  • One can generate new random variables as
    functions of random variables

147
Bernoulli Random Variable
  • Consider the toss of a (generally not fair) coin,
    probability H p prob T 1-p
  • The BERNOULLI random variable is a RV that takes
    the two values 0 or 1 depending on whether the
    outcome is H or T(remember RV is a function of
    the outcome)
  • X1 if outcome is H X0 if outcome is T

148
Bernoulli Random Variable
  • The PMF of this Bernoulli RV is
  • PX(x)
  • Very important RV in modeling any generic
    situation with just 2 outcomes, e.g. outcome of
    the football match on Sunday,

P if x1 1-p if x0
149
Mean and Variance
  • EX1p 0(1-p)p
  • EX2 12p 02(1-p)p
  • Var(X)EX2-(EX) 2p-p2p(1-p)

150
Uniform Distribution dice roll
  • see later slides

151
Binomial Random Variable
  • Experiment N coin tosses, each one with
    prob(H)p prob(T)1-p
  • The random variable X is the number of heads in
    the n-toss sequence
  • We refer to X as a BINOMIAL RANDOM VARIABLE WITH
    PARAMETERS n AND p

152
Binomial Random Variable
  • The PMF of X consists of the binomial
    probabilities we have seen some time ago
  • Two parts
  • probability of a sequence with k heads and n-k
    tails
  • Number of sequences with k heads and n-k tails

153
Binomial Random Variable
  • The normalization property can be written as
  • We will study this more in the future

154
Geometric Random Variable
  • We repeatedly toss the same coin as before.
  • RV number of tosses before the first head comes
    up
  • TTTTTTTH
  • TTH
  • H
  • TTTTTTTTTTTTTTTTTTTTTTTTTTTTH

155
Geometric Random Variable
  • PMFtwo parts probability of the prefix of
    k1 tails, and probability of the end H
  • Normalization

156
Bernoulli Random Variable
  • Consider the toss of a (generally not fair) coin,
    probability H p prob T 1-p
  • The BERNOULLI random variable is a RV that takes
    the two values 0 or 1 depending on whether the
    outcome is H or T(remember RV is a function of
    the outcome)
  • X1 if outcome is H X0 if outcome is T

157
Bernoulli Random Variable
  • The PMF of this Bernoulli RV is
  • PX(x)
  • Very important RV in modeling any generic
    situation with just 2 outcomes, e.g. outcome of
    the football match on Sunday,

P if x1 1-p if x0
158
Mean and Variance
  • EX1p 0(1-p)p
  • EX2 12p 02(1-p)p
  • Var(X)EX2-(EX) 2p-p2p(1-p)

159
Two Important Series
  • We do not derive them here.We will apply these
    to calculations of variance

160
(No Transcript)
161
Uniform Distribution dice roll
  • Discrete Uniform PMF over a,b(case of the dice
    rolls)

162
Uniform
The expectation is This can be seen
directly, since the PMF is symmetric around
(ab/2). Or use the series given before... Dice
example 12345621Direct Computation of
Expectation 21/63.5Formula says (16)/23.5

163
Variance of Discrete Uniform
  • We first study case where a1 bn the general
    case will reduce to this
  • We will use relation Var(X)EX2-(EX)2

Can verify this by inductionof just believe it
164
Variance of Discrete Uniform
Notice we are still working with special case
a1 bn
165
Variance of Discrete Uniform
  • Now we can study the general case by SHIFTING a
    distribution, its variance does not change (so we
    can study a,b case by studying variance of
    1,b-a1 case)
  • So setting nb-a1 in the previous equation
    gives the general case

166
Variance of Discrete Uniform
  • Example I get 1 for each point on the dice, I
    can expect 3.5 dollars at each roll, and a
    Standard Deviation of sqrt(35/12)1.7

167
Binomial Random Variable
  • Experiment N coin tosses, each one with
    prob(H)p prob(T)1-p
  • The random variable X is the number of heads in
    the n-toss sequence
  • We refer to X as a BINOMIAL RANDOM VARIABLE WITH
    PARAMETERS n AND p

168
Binomial Random Variable
  • The PMF of X consists of the binomial
    probabilities we have seen some time ago
  • Two parts
  • probability of a sequence with k heads and n-k
    tails
  • Number of sequences with k heads and n-k tails

169
Binomial Random Variable
  • The normalization property can be written as
  • We will study this more in the future

170
QUESTION
  • There are 94 students
  • Each has probability 1/3 to get an A
  • The number of students that get an A is a random
    variable
  • What is its mean ? (how many are expected to get
    an A)

171
Mean of the Binomial
  • If we want the mean of the binomial, we first
    need to learn how to handle JOINT PMFs of
    MULTIPLE RANDOM VARIABLES

172
JOINT PMFs of MULTIPLE RANDOM VARIABLES
  • Consider 2 discrete random variables, X and Y
    associated with the same experiment
  • The probabilities of the values that X and Y can
    take, are captured by the JOINT PMF of X and Y,
    written pX,Y
  • pX,Y(x,y)P(Xx,Yy)

173
JOINT PMF of 2 RV
  • (if we consider the pair X,Y as a random
    variable, all ideas transfer )
  • If A is an event (set of pairs (x,y) that have a
    certain property) then

P((X,Y) in A)S(x,y in A)pX,Y(x,y)
174
students
  • Consider the random variable Xi that is 1 if
    student i gets an A, and 0 otherwise
  • If n students, probability p, this is np

175
Conclusion
  • Mean of Random Variables
  • Variance of Random Variables
  • Properties, relations for variance and moments
  • Bernoulli
  • Discrete Uniform,
  • General Methods for variance calculation

176
(No Transcript)
177
topics
  • Some probability distributions
  • Some real applicationsdecision making modeling
    clashes between ants
  • Modeling the distribution of ping times

178
Marginalization
  • For a fixed value y,
  • Using the definition of conditional probability,
    we have

179
Random Variables
  • Joint probability
  • Conditional probability
  • Independence

180
Joint Probability
  • It is common for several random variables to be
    defined on the same sample space. If X and Y are
    random variables, the functionf(x,y) PrX x
    and Y y is the joint probability mass function
    of X and Y.

181
Independent Random Variables
  • We define two random variables X and Y to be
    independent if for all x and y, the events X x
    and Y y are independent or, equivalently, if
    for all x and y, we have PrX x and Y y
    PrX x PrY y.

182
Functions of Random Variables
  • Given a set of random variables defined over the
    same sample space, one can define new random
    variables as sums, products, or other functions
    of the original variables.

183
Expected value of a random variable
  • The simplest and most useful summary of the
    distribution of a random variable is the
    "average" of the values it takes on. The expected
    value (or, synonymously, expectation or mean) of
    a discrete random variable X is

184
Expectation of joint RVs
  • Given random variables X and Y, and given their
    PMF PXx and Yy, what is their joint
    expectation ? EX,Y
  • Easy if they are independent

185
Expectation of Joint Independent RVs
186
In general
  • In general, when n random variables X1, X2, . . .
    , Xn are mutually independent,EX1X2 Xn
    EX1EX2 EXn .

187
More about independent RVs
  • When X and Y are independent random variables,
  • VarX Y VarX VarY.
  • (whereas for ANY random variablesthe expectation
    of the sum is the sum of their expectations, that
    is,
  • EX Y EX EY , )

188
The Geometric Distribution
  • A coin flip is an instance of a Bernoulli trial,
    which is defined as an experiment with only two
    possible outcomes success, which occurs with
    probability p, and failure, which occurs with
    probability q 1 - p.
  • When we speak of Bernoulli trials collectively,
    we mean that the trials are mutually independent
    and that each has the same probability p for
    success.
  • Two important distributions arise from Bernoulli
    trials the geometric distribution and the
    binomial distribution.

189
Geometric Distribution
  • Take a sequence of Bernoulli trials, each with a
    probability p of success and a probability q 1
    - p of failure.
  • How many trials occur before we obtain a success?

190
Geometric Distribution
  • Let the random variable X be the number of trials
    needed to obtain a success. Then X has values in
    the range 1, 2, . . ., and PrX k qk-1p ,
    (for k larger than 0)since
    we have k - 1 failures before the one success.
  • A probability distribution satisfying this
    equation is said to be a geometric distribution.

191
Geometric Distribution
  • This is the geometric dictribution (picture
    taken from Cormen, Leiserson and Rivests book on
    Algorithms)
  • In this case, the coin has probability p 1/3 of
    success and a probability q 1 - p of failure

192
Geometric distribution
  • Expectationwe can use the relation
  • That holds when the summation is infinite and x
    lt 1

193
Geometric Distribution
The expectation of the distribution is 1/p 3.
194
Geometric Distribution
  • The variance, which can be calculated similarly,
    isVarX q/p2
  • Example repeatedly roll two dice until we
    obtain either a seven or an eleven.Of the 36
    possible outcomes, 6 yield a seven and 2 yield an
    eleven. Thus, the probability of success is p
    8/36 2/9, and we must roll 1/p 9/2 4.5
    times on average to obtain a seven or eleven.
  • NEXT WEEK we will implement things like this .

195
BINOMIAL DISTRIBUTION
  • How many successes occur during n Bernoulli
    trials, where a success occurs with probability p
    and a failure with probability q 1 - p?

196
Binomial Distribution
  • Define the random variable X to be the number of
    successes in n trials. Then X has values in the
    range 0, 1, . . . , n, and for k 0, . . . ,
    n,
  • since there are ways to pick which k of
    the n trials are successes, and the probability
    that each occurs is pkqn-k. A probability
    distribution satisfying this equation is said to
    be a binomial distribution.

197
Binomial Distribution
  • Let Xi be the random variable describing the
    number of successes in the ith trial. Then EXi
    p1 q0 p, and by linearity of expectation,
    the expected number of successes for n trials is

198
Binomial Distribution
  • Similarly we can do for the variance, exploiting
    the relation VarXEX2 - E2X
  • Since Xi only takes on the values 0 and 1, we
    have EX2 EXp
  • And hence VarXi p - p2 pq .
  • Then we can use independence, to move from
    VarXi to the variance of the binomial

199
Binomial Distribution
  • The binomial distribution increases as k runs
    from 0 to n until it reaches the mean np, and
    then it decreases.
  • Picture from cormen, leiserson, rivests book

200
Binomial Distribution
201
Conclusion
  • Conditional PMF in RVs
  • Independence
  • Expectation and Variance for RVs
  • Geometric distribution
  • Binomial distribution
  • ? next we will implement all of these ideas

202
  • EXTRA MATERIAL (NOT COVERED IN CLASS)

203
Cards
? ? ? ?
Ace 2 3 4 5 6 7 8 9 10 Jack Queen King
204
(No Transcript)
205
Counting
  • Probability of Generating a growing sequence of
    cards (1,2,3,4,5,6,7,8,9,)
  • Probability of starting with a 1 probability of
    having a 2 probability of having a king

206
COUNTING METHODS
  • How many ways to obtain K heads and N-K tails in
    N coin tosses ?
  • How many ways to have a 4-of-a-kind ?

207
Basic Counting
  • Two experiments are performed. The first one can
    have any one of N possible outcomes, the second
    one any of M possible outcomes.
  • ? there are MN possible outcomes for the two
    experiments considered together

208
Basic Counting
  • How many different arrangements of the letters
    A,B,C are possible ?
  • ABCACBBACBCACABCBA
  • Each arrangement known as a PERMUTATION.
  • There are 6 possible permutations of a set of 3
    objects
  • There are N! permutations of a set of N
    objectsN!N(N-1)(N-2)321

209
Combinations
  • How many different groups of M objects can I form
    from a total of N objects ?(e.g. how many groups
    of 5 cards can I form from a deck of 52 ?)
  • (there are 52 ways to select the first 51 to
    select the second but we are counting each
    group each time we see one of its possible
    orderings we need to correct for this )
  • (5251504948)/(54321)

210
Combinations
  • Ways of choosing k elements out of a set of n
    elements

211
Combinations and Permutations
  • How many ways to put N balls in K boxes ?
  • OOO11O1O1OOO11 ? example
  • 1 is the boundary of the box ? will use G(K-1)
    1s
  • O is the ball ? N will use Os
  • (NG)!
  • Correct for permutations of the 1s and of the 0s
  • (NG)!/(N!G!)
  • If create and MNG
  • M!/(M-G)!G! Same as before

In exampleG6 K7N8 M14
212
COUNTING METHODS
  • Combinations VS permutations
  • How many sets of 3 numbers out of 10 ?
  • How many ordered sets of 3 numbers ?

213
Pascals Triangle
214
Pascals Triangle
215
Binomial Coefficient and Pascals Triangle
  • A number in the triangle can be found by nCr (n
    Choose r) where n is the number of the row and r
    is the element in that row. For example, in row
    3, 1 is the zeroth element, 3 is element number
    1, the next three is the 2nd element, and the
    last 1 is the 3rd element. The formula for nCr
    is
  • n!--------r!(n-r)!

216
Examples
  • How many ways to select 5 cards from the deck ?
  • How many ways to have 4 equal cards in a set of 5
    ?
  • Probability of selecting 5 cards containing a
    poker ?

217
Poker Probabilities
  • Deck of 52 cards, rankedace, king, queen, jack,
    10,9,8,7,6,5,4,3,2 (and ace again it can be
    either high or low)
  • 4 suits spades, hearts, diamonds and clubs
  • 5 card draw 5 cards make up a poker hand
  • The highest hand wins
  • Hands are ranked as follows

218
Poker Probabilities
  • Royal flush ? 10, J, Q, K, A of the same suit
  • Four of a kind ? 4 cards of the same RANK
  • Full house ?3 cards of the same rank 2 cards of
    the same rank
  • Flush ? 5 cards of the same suit

219
Poker Probabilities
  • How many poker hands ?2,598,960

220
Poker Probabilities
  • How many combinations of royal flush?4
    (probability 0.00000154)
  • How many combinations of 4-of-a-kind ?624

221
  • Consider a number of experiments with poker cards
  • Write down SAMPLE SPACE
  • Count possible outcomes for each experiment (see
    book, or handouts)

? ? ? ?
Ace 2 3 4 5 6 7 8 9 10 Jack Queen King
222
Kind of questions
  • Probability of having King of ? at first draw ?
  • Probability of having 4 kings ?
  • Probability of having any set of 4 equal cards ?
  • When we ask to write sample space for 5-cards
    experiment, we do not mean to list all of the
    outcomes (they are about 2.5 million), just to
    show you know what the sample space is e.g.all
    hands of 5 cards, or 2S, 2C,2D,
    2H,3S,KS,KC,KD,KH, AC,

223
How to do the homework
  • Always write down probabilistic model
  • Use one of the 3 formulae we have for COUNTING
    number of events of a certain type, or of
    outcomes
  • Use definitions like P(event) outcomes in
    event / possible outcomes

224
Combinations
  • Ways of choosing k elements out of a set of n
    elements
  • HOW MANY COMMITTEES OF 5 PEOPLE CAN WE MAKE OUT
    OF A CLASS OF 10 PEOPLE ?

225
Poker Probabilities
  • How many poker hands ?2,598,960

226
Poker Probabilities
  • How many combinations of royal flush?4
    (probability 0.00000154)
  • How many combinations of 4-of-a-kind ?624
Write a Comment
User Comments (0)
About PowerShow.com