Title: Basic Probability Theory
1BERHANU TAMERU Center for Computational
Epidemiology, Bioinformatics and Risk
Analysis TUSKEGEE UNIVERSITY
Berhanu Tameru, CCEBRA, Tuskegee University
2General Risk Modelling Approach
3Risk Analysis
Risk Management
Risk Assessment
Process Initiation
? Science based
? Policy based
Risk Communication
? Interactive exchange of information and
opinions concerning risks
4Risk Analysis
Process Initiation
Risk Assessment
Risk Management
? Science based
Problem Identification Triad
? Policy based
Problem Management Triad Risk Mitigations
Risk Communication
? Interactive exchange of information and
opinions concerning risks
5What is Risk Assessment?
- Part of the process of Risk Analysis
- A scientifically based process composed of 4
steps hazard identification, exposure
assessment, hazard characterization, and risk
characterization - An estimate of the probability of occurrence and
severity of the consequences of exposure to a
pathogen on human health
6Risk Assessment
7Terminology - HazardsPotential to cause
disease/injury/harm/damage/loss
8How is Risk Assessment done?
Risk Assessment consists of answering these
questions
- 1. What can go wrong that could lead to an
outcome of hazard exposure? - 2. How likely is this to happen?
- 3. If it happens, what consequences are expected?
91. What can go wrong that could lead to an
outcome of hazard exposure?
- Given a system/process with defined
goals/methodologies - failures of components/procedures can occur
- The process of identifying all possible hazards
is called Hazard Analysis. - The process of defining the possible scenarios
that lead to the outcomes or events of interest,
is called Scenario Analysis - The graphic depiction of all the events (failures
of procedures/components) that lead to the
outcome of hazard exposure is called an Event
Tree, Scenario Tree, or Risk Pathway Tree.
10Terminology Uncertainty
- Uncertainty - refers to lack of knowledge about
specific factors, parameters, or models.
Includes - Parameter uncertainty (measurement, sampling, and
systematic errors) - Model uncertainty (due to necessary
simplification of real world processes,
mis-specification of model structure, model
misuse, use of inappropriate surrogate variables) - Scenario uncertainty (descriptive errors,
aggregation errors, errors in professional
judgment, incomplete analysis) - Further measurement or study
- Reduces Uncertainty
11Terminology Variability
- Variability - differences attributable to true
heterogeneity, or diversity in population or
parameter. Sources are a result of natural random
processes and stem from environmental/lifestyle/ge
netic differences among humans, animals, plants,
cells, or microbes. Examples of variability
include - Plant, Animal or human physiological variation
(Age, bodyweight, height, blood pressure, heart
rate, drinking water intake rates) - Climatic variability, variation in soil types,
differences in contaminant concentrations - Further measurement or study
- Reduces Uncertainty
- Better Characterizes Variablity
12Decision-making under Uncertainty
- The purpose of risk assessment is to help
decision-makers make informed decisions about
regulations. - Risk assessments should inform decision-makers
about the level of risk associated with
contemplated or proposed regulatory options. - Risk assessments should also inform
decision-makers about the level/degree of
uncertainty surrounding the risk values presented.
13Decision-making under Uncertainty
- Science based decision making depends on several
analyses (risk, economic, environmental) - The validity, quality objectivity of these
analyses is essential. - The data and assumptions in all analytical
documents used to support decision-making must be
consistent. - Decision Makers need to understand have
confidence in these analyses. - The analyses should stand up to scrutiny
14Why Probability?
Berhanu Tameru, CCEBRA, Tuskegee University
15RISK ASSESSMENT MODEL
QUALITATIVE
QUANTITATIVE
Deterministic (Point estimate)
Probabilistic (Range Distribution)
Berhanu Tameru, CCEBRA, Tuskegee University
16IMPORT RISK ANALYSIS (RA) MODEL
COUNTRY A FROM MULTY COUNTRIES IMPORT RA
COUNTRY A TO COUNTRY B IMPORT RA
GENERIC IMPORT RA
Berhanu Tameru, CCEBRA, Tuskegee University
17DETERMINSTIC
EXAMPLE IF on average a cow gives 5L milk per
day how much milk can you harvest in 30
days? 5 L per day x 30 days 150 L Both the
input and the output are expressed as single
value
?
OUTPUT
INPUT
RISK ASSESSMENT
POINT VALUES 150 L
POINT VALUES 5 L per day 30 days
Berhanu Tameru, CCEBRA, Tuskegee University
18Scenario Analysis
What if the cow .. In real
problem there are a lot of factors affecting a
process. Which implies that various states of
phenomenon occur with certain uncertainty and
variability.
Berhanu Tameru, CCEBRA, Tuskegee University
19PROBABLISTIC
INPUT
OUTPUT
RISK ASSESSMENT
PROBABILITY DISTRIBUTION
PROBABILITY DISTRIBUTION
- When uncertainty variability are present, the
chance of any outcome occurring is described by
probability distributions.
Berhanu Tameru, CCEBRA, Tuskegee University
20- Probability is conventionally expressed on a
scale from 0 to 1 a rare event has a probability
close to 0, a very common event has a probability
close to 1. - An outcome is the result of an experiment or
other situation involving uncertainty. - The set of all possible outcomes of a probability
experiment is called a sample space which is
usually denoted by S. - Any subset of the sample space is an event.
Berhanu Tameru, CCEBRA, Tuskegee University
21- Classical Definition
- If an event E can occur in r mutually exclusive,
equally likely ways, then the relative frequency
is given by - The number of times the event E occurs ( r )
- The total number of times an experiment is
carried out (n). - The probability of an event E, denoted by P(E),
can be thought of as the limiting value of its
relative frequency when the experiment is carried
out many times.
Berhanu Tameru, CCEBRA, Tuskegee University
22- Example 1
- Experiment (Posteriori )
- Tossing a fair coin 50 times (n 50)
- Event E 'heads'
- Result 30 heads, 20 tails, so r 30
- Relative frequency r/n 30/50 3/5 0.6
Berhanu Tameru, CCEBRA, Tuskegee University
23- Example 2
- Classical (Priori )
- A herd of 500 animals has 250 infected animals.
- The total possible number of outcomes is 500 (n
500) - Event E infected animals
- Number of ways event E can occur is 250 (r 250)
Berhanu Tameru, CCEBRA, Tuskegee University
24- Subjective Probability
- A subjective probability describes an
individual's personal judgment about how likely a
particular event is to occur. It is not based on
any precise computation but is often a reasonable
assessment by a knowledgeable person. - A person's subjective probability of an event
describes his/her degree of belief in the event.
Berhanu Tameru, CCEBRA, Tuskegee University
25- The empty set an impossible event
- P( the empty)0.
- S (the sample space) an event that is certain
to occur - Thus
- P (the sample space)1
- Rules of probability.
- 1. 1 ? P (x) ? 0 for all values of x.
- 2. ? p (x) 1
All x
Berhanu Tameru, CCEBRA, Tuskegee University
26- Combining probability
- Set theory is used to represent relationships
among events. - In general, if A and B are two events in the
sample space S, then - (AU B A union B) 'either A or B occurs or
both occur' -
Venn Diagram of A U B
Berhanu Tameru, CCEBRA, Tuskegee University
27 (A ? B A intersection B) 'both A and B
occur' Venn Diagram of A ?
B Independent Events Two events are independent
if the occurrence of one of the events gives us
no information about whether or not the other
event will occur that is, the events have no
influence on each other. Thus, P(A ? B)
P(A)P(B)
Berhanu Tameru, CCEBRA, Tuskegee University
28- Mortality and product rule example.
- Assume that the mortality rate of animals
in a certain herd due to a disease is 15. Thus,
the probability that an animal will die from this
disease is .15 and will survive is .85. If you
buy two animals from this herd what is the
probability that both will die (Assume that they
are independent). - Let A be the event that the first animal dies
and B event that the second animal dies. So we
want to find P(A ? B) .
P(A ? B) P(A)P(B) .15.15
.0225
We can extend this to any possible number of
animals. Let p .15, then for two animals we
have pp p2, for three animals ppp p3 and
in general for n animals we have p n.
Berhanu Tameru, CCEBRA, Tuskegee University
29- Additive Rule of Probability
- P (A ? B) P (A) P (B) - P (A ? B)
- If A and B are mutually exclusive events
- (i.e. if one occurs the other can not ), then
P (A ? B)0, - hence
- P (A ? B) P (A) P (B)
A ? B Empty set
Berhanu Tameru, CCEBRA, Tuskegee University
30- (A the complement of A) 'event A does not
occur' - P(A' ) 1 P(A)
- If we wanted to calculate the probability that
there is at least one (i.e. one or more )
infected animal among a group of size n, selected
at random from a herd with prevalence p, then we
proceed as follows - We want to find the complement of non is
infected. The probability that a randomly
selected animal is not infected is 1 p. - From the mortality and product rule example the
probability that none of the n animals is
infected is (1 p) n - The probability that at least one of the n
animals is infected is 1 - (1 p) n
Berhanu Tameru, CCEBRA, Tuskegee University
31- Conditional Probability
- How the outcome of one event is influenced by
that of another event. - The conditional probability that event A occurs
given that event B occurs can be calculated by
dividing the probability that both A and B occur
by the probability that B occurs that is - P(AB) P (A ? B)
- P(B)
- P (A ? B) P(B)P(AB) or
- P (A ? B) P(A)P(BA)
S
A ? B
B
A
Berhanu Tameru, CCEBRA, Tuskegee University
32Thus, we get Bayes Theorem And in general,
we have Pr( Ai B) Pr(Ai)Pr(BAi)
? Pr(BAj) Pr(Aj)
Prior probability
Likelihood
Posterior probability
All j
Berhanu Tameru, CCEBRA, Tuskegee University
33Bayes Theorem
A Person has cancer p(A)0.1 (prior)
B Person is smoker p(B)0.5
p(B A)0. 8 (likelihood)
What is p(A B)? (posterior) (i.e. p( The
person has cancer, given that the person is
smoker)
Prior probability
Likelihood
Posterior probability
So p(AB)0.16
Berhanu Tameru, CCEBRA, Tuskegee University
34We wish to know the probability that an animal
will be infected D , given that it passes, T- ,
a specific veterinary check.
Objective To find Pr( D T-).
Berhanu Tameru, CCEBRA, Tuskegee University
35- From Bayes Theorem
- Pr( D T-) P(T-/D) P(D)
- P(T-/D) P(D) P(T-/D-) P(D-)
- (1-Se)p
- (1-Se)p Sp(1-p)
- Where p P(D) true prevalence and thus
P(D-)1-p - P(T/D)the sensitivity of the test Se. Hence
- P (T-/D)False Negative (1-Se)
- P(T-/D-) the specificity of the test Sp. Hence
- P(T/D-) False Positive (1 Sp)
Berhanu Tameru, CCEBRA, Tuskegee University
36Home WorkFrom the Big Bookpage 212 - 214, 218
and 219
Berhanu Tameru, CCEBRA, Tuskegee University
37The Probability Distribution
Berhanu Tameru, CCEBRA, Tuskegee University
38Counting
- Experiments?
- Coin toss, die roll, DNA sequencing
- Basic Principle of Counting
- For r number of experiments
- First Experiment can have n1 possible outcomes,
second experiment n2, third n3, - Total number of possible outcomes is
- n 1 n 2 n 3 n r
39Factorials
- m! m(m-1)(m-2)(m-3)321
- 5! 54321 120
- 0! 1
- 1! 1
- 100! 1009998321 9 x 10157
40Permutations
- Groups of Ordered arrangements of things
- How many 3 letter permutations of the letters a,
b, c are there? - abc, acb, bac, bca, cba, cab ? 6 total
- Can use Basic Principle of Counting
- 321 6
nPr n! .
(n-r)!
n(n -
1)(n -2) ... n - (r - 1)
General Formula
- n total number of things
- r size of the groups you are taking
- r
- 3!/(3-3)! 6
41Permutations
- For example Find the permutation of 7 from 10.
- 10P7
-
- 10x9x8x7x6x5x4x3x2x1
- 3x2x1
- 604880
10!_ (10-7)!
42Permutations
- Remember
- In permutations, order is important!
- A permutation is a way of determining the number
of elements in the set of outcomes of an
experiment. - Elements in a permutation are selected from a
set without replacement. -
- Â
43Permutations
- What if some of the things are identical?
- How many permutations of the letters
- a, a, b, c, c c are there?
Where n1, n2, nr are the number of objects that
are alike
6 letters 2 as, 3 cs and one b Hence
6! / (3!2!) 60
44Combinations
- Groups of things (Order doesnt matter)
- How many 3 letter combinations of the letters a,
b, c are there? - 1 abc
- How many 2 letter combinations of the letters a,
b, c are there? - 3 ab, ac, bc
- ab ba ac ca bc cb
- Order doesnt
matter
45Combinations
General Formula
- n total number of things
- k size of the groups you are taking
- k
n choose k
- Remember In combinations, order means nothing!
- Select combination
- Order permutation
46- Random Variable Is a function that assigns one
and only one numerical value to each simple event
of an experiment. - Random variable that can assume a countable
number of values are called Discrete. - Random variable that can assume value
corresponding to any of the points contained in
one or more intervals are called continuous.
Berhanu Tameru, CCEBRA, Tuskegee University
47Random Variables
- Types of variables (defined by state type)
- Discrete Labeled , Discrete Numeric
- Continuous
- Finite or Infinite
- If X is a random variable with discrete states
x1, x2, ..., xn
Berhanu Tameru, CCEBRA, Tuskegee University
48Random Variables
Summary Statistics for discrete RVs
Expected Value (mean or arithmetic
average) Variation where Median (state
value where cumulative probability ½)
Berhanu Tameru, CCEBRA, Tuskegee University
49- 5. Special Discrete Distributions
- In this workshop we study the probabilistic
models (probability distributions) of several
discrete random variables. These random
variables may serve to model a vast number of
applications in many diverse sciences. Among
these distributions are - The Bernoulli distribution.
- The binomial distribution.
- Poisson distribution and Poisson process.
- The geometric distribution.
- The negative-binomial (Pascal) distribution.
- The hypergeometric distribution.
50- Bernoulli Trials
- The Bernoulli trials process, named after James
Bernoulli. Essentially, the process is the
mathematical abstraction of coin tossing, but
because of its wide applicability, it is usually
stated in terms of a sequence of generic trials
that satisfy the following assumptions - Each trial has two possible outcomes, generically
called success and failure. - The trials are independent. Intuitively, the
outcome of one trial has no influence over the
outcome of another trial. - On each trial, the probability of success is p
and the probability of failure is 1 - p.
Berhanu Tameru, CCEBRA, Tuskegee University
51Important Distribution
Distribution
52BINOMIAL DISTRIBUTION
Berhanu Tameru, CCEBRA, Tuskegee University
53Binomial Process
- A binomial experiment or process is a Bernouli
trial which has the following characteristics - The experiment consists of n identical trials
- Each trial results in one of two outcomes (a
success or a failure) - The probability of success on a single trial is
equal to p and remains the same from trial to
trial - The trials are independent (not influenced by the
results of previous tests) - The interest is in x, the number of successes
observed in n trials, for x 0,1,2,..n.
Berhanu Tameru, CCEBRA, Tuskegee University
54Binomial Distribution
- Probability of x successes in n trials of an
experiment where the probability of success in
each trial, p, is constant is equal to the
product of - the number of different ways of getting x
successes in n trials and - the probability of x successes and
- the probability of n-x failures
To determine the probability of no successes,
P(X0)
Berhanu Tameru, CCEBRA, Tuskegee University
55The probability of at least one success is the
same as compliment of no success, i.e. 1
probability of no success.
P(X 1) 1 - P(X0) 1 - (1-p)n
56Number of Trials needed to achieve s
successes. Sometimes, we know how many successes
s we wish to have, we know the probability p, and
would like to know the number of trials that we
will have to complete in order to achieve the s
successes, assuming we stop once the sth success
has occurred.
Let x be the total number of failures, then the
total number of trials that we will have is (x
s) , and by the (x s -1) trial we achieve the
very last sth successes. Thus the probability of
(s 1) success in (x s -1) trial is given by
the binomial distribution as
P(Xx) (x s -1) C (s -1) (p)s (1-p) x
PNegBin(Xx)
n s x s NegBin(s,p)
Berhanu Tameru, CCEBRA, Tuskegee University
57The pmf of Negative Binomial
The pmf of the negative binomial r.v. X number
of trials until we observe the r th success, with
parameters r number of successes, is
E(X) r/p, V(X) r(1-p)/p2
58Binomial Process-for unknown parameters
- IF the assumptions of the binomial process can be
satisfied, AND two of the values n,p or x are
known, THEN the third value can be estimated from
the following distributions - Binomial distribution is used to model the number
of successes x - X Binomial( n, p )
- Beta distribution is used to model the
probability of success, p - p Beta( x1, n-x1 )
- Negative binomial distribution is used to model
the number of trials n, undertaken before x
successes have occurred - n x Negative binomial( x, p )
x Binomial( n, p)
p Beta( x1, n-x1)
n x Negative Binomial( x, p)
Berhanu Tameru, CCEBRA, Tuskegee University
59Geometric and Negative Binomial Distributions
Let X1, X2, Xn, (independent and identically
distributed) iid-Geom.(p)
Negative binomial random variable represented as
a sum of geometric random variables.
60HYPERGEOMETRIC DISTRIBUTION
Berhanu Tameru, CCEBRA, Tuskegee University
61Hypergeometric Distribution
- The Hypergeometric distribution returns the
number of items of a particular characteristic
when n items are resampled from a population M
and where D of M items are known to have this
characteristic.
Berhanu Tameru, CCEBRA, Tuskegee University
62The equation for the hypergeometric distribution
is Â
P(X x)
where x Defect in the selected sample n
number of sample selected D population
defective N Total population
HYPGEOMDIST is used in sampling without
replacement from a finite population.
Berhanu Tameru, CCEBRA, Tuskegee University
63Hypergeometric Distribution Example
- Suppose 1000 animals are to be imported in a
consignment and 50 are to be tested for a
pathogen. We estimate that if any are infected,
at least 7 will be infected because of
cross-infection by the time the consignment
arrives at port.
Berhanu Tameru, CCEBRA, Tuskegee University
64Hypergeometric Distribution
1000
D (infected)
7
n (selected)
50
Berhanu Tameru, CCEBRA, Tuskegee University
65Hypergeometric Distribution Example
- If there is any infection in the consignment, the
minimum number of infected animals that will be
tested is estimated as follow
X Hypergeo (50,7,1000)
Berhanu Tameru, CCEBRA, Tuskegee University
66Hypergeometric Distribution Example
The Hypergeometric (50, 7, 1000)
67Hypergeometric Distribution
- Suppose a company has a stock of 2000 tiles which
is known to contain 70 faulty tiles. The tiles,
unfortunately are all mixed. A customer orders
800 tiles. The probability of getting faulty
tiles (0 to 70) can be calculated from
- X Hypergeometric (800,70,2000)
- Where X faulty tiles (0-70)
n 800 (customer order) - D 70 (total faulty tiles)
M 2000 (total no. of tiles)
68Hypergeometric Distribution
- The binomial distribution approximate the
hypergeometric distribution when n when the sample size n is small,
69The Poisson Distribution
70The Poisson Process
- A Poisson process has the following
characteristics - It models the number of events x, that occur in
an interval t, of time or space - It is characterized by one parameter lambda l,
the average number of events per unit interval of
space or time (l 1/b ) where b is the mean
interval between events. - There is a constant and continuous probability of
an event occurring per unit interval - The number of events that occur in any one
interval is independent of the number that occur
in any other interval. It does not matter how far
apart the events are in space or time. For
instance, an event may have only just been
observed or there may have been a considerable
interval between them. - The interval t is measured in either space (per
l,kg,km) or time (per hour/day/year)
71Poisson Distribution
- Average number events that occur per unit of
exposure ? - Random variable X number of events that occur
in t units of exposure. - Prob. Mass Function Pr(x ?,t) (?t)xe-?t
-
x! - X 0,1,2,
- ?t 0
- Mean events in t units of exposure ?t
- Variance ?t
72Poisson Distribution, cont.
Pr(x ?,t) (?t)xe-?t x!
X 0,1,2, and ?t 0
- Outcome is events in amount of exposure.
Events do not have equivalent of failure and
success. Event might be earthquakes in 1 year. No
meaningful definition of non-event. - Medium of exposure is continuous (e.g., time,
volume, etc.), not discrete like trial in
Binomial. - No upper bound to random variable as in binomial.
- Note that syntax is different in Excel and _at_Risk.
73Poisson Distribution, syntax in Excel, _at_Risk
- Syntax used in most texts
- Pr(x ?,t) (?t)xe-?t
-
x! - Where ? average number events per unit
exposure, e.g. 3 infestations per year. - b average interval between events 1/ ?
- Syntax in Excel POISSON(x, expected value?t,
0) - and _at_Risk Poisson(expected value?t)
-
74Poisson Process distributions
- Gamma distribution is used to model
- a distribution of ?, the average number of events
per unit interval is - ? Gamma(x , 1/t )
- a distribution of the time until the next x
events have occurred - tx Gamma(x , 1/l ?)
- When x 1, then exponential tnext Gamma(1 ,
1/l ?) Expon (1/?) - Thus the lower bound for b is also given by 1/
tnext 1/Gamma(1 , 1/l ?) 1/Expon (1/?)
75Poisson Process-for unknown parameters
- IF the assumptions of the Poisson process can be
satisfied
x Poisson (lt)
? Gamma(x , 1/t )
tx Gamma(x , 1/l ?)
Berhanu Tameru, CCEBRA, Tuskegee University
76The Poisson Process
- A Poisson process has the following
characteristics - The number of bacteria per liter of water (per
area of dish). - The number of outbreaks of disease per year.
- The number of earth quakes per decade.
77The Poisson Process
- Historical information indicates an outbreak
occurs on average every 24 months. b 24, l
1/24 0.04 outbreaks per month. The number of
outbreaks in the next 6 months is then modelled
as Poisson (60.04) - If we observed 3 outbreaks over a period of 36
months we could estimate that the average number
of outbreaks per month is ? Gamma(x , 1/t )
Gamma ( 36, 1/3) - Probability of at least one event in an interval
- P(x 1) 1 Exp ( -t/b).
- What is the probability of at least one outbreak
during next 6 months given that the mean interval
between outbreaks is 24 months. - P(x 1) 1 Exp ( -6/24) 0.22
78INTRODUCTION TO TOXICOLOGY
- WHAT TOXICOLOGISTS DO
- -involved in the recognition, identification,
- and quantitation of hazard
- -develops standards and regulations to
- protect health and the environment
- - involved in safety assessment and use of
- data as basis for regulatory control of
hazards - - determines risk associated with use of
chemicals -
79 INTRODUCTION TO TOXICOLOGY
- INTERRELATED COMPONENTS OF
- THE RISK ASSESSMENT
- chemical or physical agent
- biological system
- effect or response
- exposure situation
80 INTRODUCTION TO TOXICOLOGY
- AREAS OF TOXICOLOGY (FIELDS OF
- SPECIALTY)
- -descriptive -environmental
- -mechanistic
- -regulatory
- -forensic
- -clinical
81 INTRODUCTION TO TOXICOLOGY
- FACTORS CONSIDERED IN DETERMINING
ACCEPTABLE RISK - benefits
- availability of substitutes
- anticipated public use
- employment considerations
- economic considerations
- effects on environmental quality
- conservation of natural resources
82 INTRODUCTION TO TOXICOLOGY
- MAJOR FACTORS THAT INFLUENCE
- TOXICITY
- -route of administration
- -duration and frequency of exposure
- -dose or concentration
83 INTRODUCTION TO TOXICOLOGY
- RAPIDITY OF RESPONSE WITH RESPECT TO ROUTE
OF EXPOSURE - -intravenous -intradermal
- -inhalation -topical
- -intraperitoneally
- -subcutaneous
- -intramuscular
84 INTRODUCTION TO TOXICOLOGY
- DOSE RESPONSE
- -ASSUMPTIONS
- -response is due to chemical administered
- -the response is related to the dose
- -there is a receptor site with which
the - chemical interacts
85 INTRODUCTION TO TOXICOLOGY
- DOSE RESPONSE
- -ASSUMPTIONS (contd)
- -the degree of response is related to
- the concentration at the site
- -the concentration at the site is
related - to the dose administered
- -has a quantifiable method of measuring and a
- precise means of expressing the toxicity