Title: Bernoulli Trials
1Bernoulli Trials
A Bernoulli trial is an experiment with only two
outcomes
- Success This event occurs with probability p
- Failure This event occurs with probability q
1 - p
- A coin flip is the classic example of a Bernoulli
trial - In the context of finding a thymine or CT in
DNA, it could mean - T is success, any other nucleotide is failure
- A pyrimidine is success, a purine is failure
When we refer to trials in the plural, its
assumed they are independent and that they have
the same probabilities
2Geometric Distribution
The geometric distribution follows immediately
from the idea of conducting multiple Bernoulli
trials
Q whats the probability that it takes k trials
to get a success??
- Before we can succeed at trial k, we must first
have had k-1 failures!! - Each failure occurred with probability q, so
there is a term with - qk-1
- Finally, a single success occurs with probability
p, so there is a term - p1
But each trial is mutually independent, so we can
write PrXk qk-1p
3Geometric Distribution
Geometric distribution for p 1/5
Image from encyclopedia of math
PrXk qk-1p What is the probability that the
random variable X takes on the value k? X
represents the number of trials required to get a
success
4Binomial Distribution
The binomial distribution also arises naturally
from the idea of conducting multiple Bernoulli
trials
Q whats the probability that well get k
successes in n trials??
- Again trials are mutually independent, so we can
write - qn-kpk
BUT WAIT!!! Were not done!
5Binomial Distribution
We also need to consider how many different ways
we can generate those k successes from n trials.
Here we show all the ways you can get 3 Ts in a
total of 5 nucleotides (symbol V here is the
non-T nucleotide ambiguity code a failure in
our Bernoulli trials) 'TTTVV', 'TTVTV',
'TTVVT', 'TVTTV', 'TVTVT', TVVTT', VTTTV',
VTTVT', VTVTT', VVTTT'
6Binomial Distribution
Putting it all together
( )
qn-kpk
PrXk
What is the probability that the random variable
X takes on the value k? X represents the number
of trials k out of a total of n that were
successes
One small fly in the ointment.. DNA has four
bases, not just two really we want a multinomial
distribution -- a generalization of the binomial
distribution. But close enough for government
work, eh?
7Binomial Distribution
Image from zoonek2.free.fr
n 100, p 0.5
What is the expected value of these
distributions? Normal curve with same mean and SD
drawn over top
8Poisson Distribution
Another common limiting case of binomial is when
we have large N and small p such that the
expected (mean) value is a moderate number
(between 0 and 5-10). Then the distribution is
close to a Poisson distribution
Binomial(10,.1)
Poisson(1)
9Characteristics of Poisson
Single parameter (mean) l Np P( k l )
exp(- l) lk/k! Variance Mean l SD vl
For l gt 10, Normal approximation N(l, l) is
fine
l 3
l 7
10Scientific Computing in Python
SciPy
..and more
NumPy
Matplotlib
http//scipy.org
11Scientific Computing in Python
numPy implements very efficient low-level
n-dimensional array processing and other basic
numerical routines
Our interest in numPy is mostly restricted to the
fact that both the sciPy library and matplotlib
depend on numPy
http//docs.scipy.org/doc/scipy-0.13.0/reference/
12Scientific Computing in Python
SciPy is the name for the whole ecosystem and a
specific scientific computing library!
The sciPy library has many numerical algorithms,
but also domain specific toolboxes Our interest
is primarily in the statistics toolbox
http//docs.scipy.org/doc/scipy-0.13.0/reference/
13Distributions in scipy.stats
SciPy supports both continuous and discrete
random variables and associated distributions
each distribution in turn supports a number of
methods , e.g.
- rvs Random variates
- pdf Probability Density Function
- cdf Cumulative Distribution Function
- sf Survival Function (1-CDF)
- ppf Percent Point Function (Inverse of CDF)
- isf Inverse Survival Function (Inverse of SF)
- stats Return mean, variance, (Fishers) skew, or
(Fishers) kurtosis - moment non-central moments of the distribution
http//docs.scipy.org/doc/scipy/reference/tutorial
/stats.html
14Distributions in Python
The scipy.stats library supports a rich
collection of distributions and their methods.
One example
from scipy.stats import binom n, p
100,0.5 myList random_var binom(n,
p) for k in xrange(100) myList.append(random
_var.pmf(k)) .pmf is the probability mass
function
myList here will contain the probabilities
associated with the first 100 values of k, and if
plotted would recapitulate the earlier binomial
distribution histogram
http//docs.scipy.org/doc/scipy/reference/stats.ht
ml
15The matplotlib python library
An very powerful tool for professional-quality
plots
Many usage examples are given in the documentation
http//matplotlib.org/