Title: DISTRIBUTIONS e'g' MENDELs PEAS
1DISTRIBUTIONS - e.g. MENDELs PEAS
2P.D.F./C.D.F.
- If X is a R.V. with a finite countable set of
possible outcomes, x1 , x2,.., then the
discrete probability distribution of X - and D.F. or C.D.F.
- While, similarly, for X a R.V. taking any value
along an interval of the real number line - So if first derivative exists, then
- is the
continuous pdf, with
3EXPECTATION/VARIANCE
4Moments and M.G.Fs
- For a R.V. X, and any non-negative integer k, kth
moment about the origin, defined as expected
value of X k - Central Moments (about Mean) 1st 0 i.e.
EX?, second variance - To obtain Moments, use Moment Generating Function
- If X has a p.d.f. f(x), mgf is expected value of
e tX - For a continuous variable, then
- For a discrete variable, then
- Generally rth moment of the R.V. is rth
derivative evaluated at t0
5Examples- m.g.f.s
- Suppose
- Then
- i.e.
M.g.f. MX(t) for exponential t ltb - Suppose p.d.f.
-
6PROPERTIES - Expectation/Variance etc.
Distribution Functions
- As for R.V.s generally. For X a discrete R.V.
with p.d.f. pX, then for any real-valued
function g - e.g.
-
-
Applies for more than 2 R.V.s also - Variance - again has similar properties to
previously - e.g.
-
7MENDELs Example
- Let X record the no. of dominant A alleles in a
randomly chosen genotype, then X a R.V. with
sample space S 0,1,2 - Outcomes in S correspond to events
- Note Further, any function of X is also a R.V.
- Where Z is a variable for seed character
phenotype
8Example contd.
- So that, for Mendels data,
- and
with - and
9JOINT/MARGINAL DISTRIBUTIONS
- Joint cumulative distribution of X and Y,
marginal cumulative for X, without regard to Y
and joint distribution (p.d.f.) of X and Y then,
respectively - where similarly for continuous case e.g. (2)
becomes
10Example Backcross 2 locus model (AaBb ? aabb)
Observed and Expected
frequencies Genotypic S.R 11 Expected S.R.
crosses 1111
- Cross
- Genotype 1 2
3 4 Pooled - Frequency AaBb 310(300) 36(30) 360(300)
74(60) 780(690) - Aabb 287(300) 23(30)
230(300) 50(60) 590(690) - aaBb 288(300)
23(30) 230(300) 44(60) 585(690) - aabb 315(300)
38(30) 380(300) 72(60) 805(690) - Marginal A Aa 597(600) 59(60)
590(600) 124(120) 1370(1380) - aa 603(600)
61(60) 610(600) 116(120) 1390(1380) - Marginal B Bb 598(600) 59(60)
590(600) 118(120) 1365(1380) - bb 602(600)
61(60) 610(600) 122(120) 1395(1380) - Sum 1200 120
1200 240 2760
11CONDITIONAL DISTRIBUTIONS
- Conditional distribution of X, given that Yy
- where for X and Y independent
and - Example Mendels expt. Probability that a round
seed (Z1) is a homozygote AA i.e. (X2)
12Standard Statistical DistributionsImportance
Modelling practical applications
Mathematical properties are known Described
by few parameters, which have natural
interpretations.Bernoulli Distribution.This is
used to model a trial which gives rise to two
outcomes success/ failure male/ female, 0 / 1.
Let p be the probability that the outcome is
one and q 1 - p that the outcome is zero.
EX p (1) (1 - p) (0)
p VARX p (1)2 (1 - p) (0)2 - EX2 p (1
- p).
Prob
1
p
1 - p
0
1
p
13Standard distributions - Binomial
Binomial Distribution.Suppose that we are
interested in the number of successes X in n
independent repetitions of a Bernoulli trial,
where the probability of success in an
individual trial is p. Then ProbX k nCk
pk (1-p)n - k, (k 0, 1, , n) EX
n p VARX n p (1 - p)
(n4, p0.2)
Prob
1
4
np
This is the appropriate distribution to use in
modelling e.g. Number of recombinant gametes
produced by a heterozygous parent for a 2-locus
model - extension for gt3 loci is multinomial
14Standard distributions - Poisson
- Poisson Distribution.The Poisson distribution
arises as a limiting case of the binomial
distribution, where n , p 0 in such a way
that np l ( Constant) - PX k exp ( - l ) lk / k ! (k 0,
1, 2, ). E X lVAR X l.Poisson is
used to model No.of occurrences of a certain
phenomenon in a - fixed period of time or space, e.g. O
particles emitted by radioactive source in fixed
direction for ? T O people arriving in a
queue in a fixed interval of time - O genomic mapping functions, e.g.
crossing over as a random - event
-
1
5
X
15Standard distributions Geometric and Negative
Binomial in brief
Prob
- Geometric. This arises in the time or No.
- of steps k to the first success in a series
of - independent Bernoulli trials. The density
is ProbX k p (1 - p) k -1 (k 1, 2,
). EX 1/p VAR X (1 - p) /p2 - Negative Binomial This is used to model the
number of failures k that occur before the rth
success in a series of independent Bernoulli
trials. The density is Prob X k r k -1Ck
p r (1 - p)k (k 0, 1, 2, )
E X r (1 - p) / p Added Note
Alternative form - VARX r (1
- p) / p2 based directly on
No. successes -
- see Tables
1
X
16Standard distributions Hypergeometric
- Consider a population of M items, of which W are
deemed to be successes. Let X be the number of
successes that occur in a sample of size n, drawn
without replacement from the population. The
density is - Prob X k WCk M-WCn-k / MCn
( k 0, 1, 2, ) - Then E X n W / M VAR X n W (M -
W) (M - n) / M2 (M - 1) - Sampling without replacement from a finite
population
17Standard p.d.f.s Gamma and Exponential in brief
- The Gamma distribution e.g. from queueing theory,
-time to the arrival of the nth customer in
single-server queue, (mean arrival rate l).
P.d.f. written in terms of gamma function - or directly
-
-
with E X n / l and VAR X n / l 2 - Exponential special case of the Gamma
distribution with n 1 used e.g. to model
inter-arrival time of customers, or time to
arrival of first customer, in a simple queue,
fragment lengths in genome mapping etc. - The p.d.f. is f (x) l exp ( - l x
), x ³ 0, l gt 0 0 otherwise
18Standard p.d.f.s - Gaussian/ Normal
- A random variable X has a normal distribution
with mean m and standard deviation s if it has
density - and
- Arises naturally as the limiting distribution of
the average of a set of independent, identically
distributed random variables with finite
variances. - Plays a central role in sampling theory and is a
good approximation to a large class of empirical
distributions. Default assumption ?in many
empirical studies is that each observation is
approx. Normally. - Statistical tables of the Normal distribution
are of great importance in analysing practical
data sets. X is said to be a Standardised Normal
variable if m 0 and s 1.
19Standard p.d.f.s Students t-distribution
- A random variable X has a t -distribution with n
d.o.f. ( tn ) if it has density
0 otherwise.Symmetri
cal about the origin, with EX 0 VAR X
n / (n -2). - For small n, the tn distribution is very flat.
For n ³ 25, the tn distribution ? standard normal
curve. - Suppose Z a standard Normal variable, W has a
cn2 distribution and Z and W independent then
r.v. - If x1, x2, ,xn is a random sample from N(m ,
s2) , and, if define
-
then
20Chi-Square Distribution
- A r.v. X has a Chi-square distribution with n
degrees of freedom (n a positive integer) if it
is a Gamma distribution with l 1, so its p.d.f.
is - EX n Var X 2n
- Two important applications- If X1, X2, , Xn
a sequence of independently distributed
Standardised Normal Random Variables, then the
sum of squares - X12 X22 Xn2 has a ?2 distribution
(n degrees of freedom). - - If x1, x2, , xn is a random sample from
N(m ,s2), then - and
and - s2 has ?2 distribution, n - 1 d.o.f., with r.v.s
and s2 independent.
Prob
c2 n (x)
X
21F-Distribution
- A r.v. X has an F distribution with m and n
d.o.f. if it has a density function ratio of
gamma functions for xgt0 and 0 otherwise. -
and -
- For X andY independent r.v.s, X cm2 and Y
cn2 then - One consequence if x1, x2, , xm ( m ³ 2) is
a random sample from N(m1, s12), and y1, y2, ,
yn ( n ³ 2) a random sample from N(m2,s22),
then