Bojan%20Basrak - PowerPoint PPT Presentation

About This Presentation
Title:

Bojan%20Basrak

Description:

After two meiosis and. some other developments. X(t)=0, X(s)=1. X(t)= number of alleles ... locations of crossovers in meiosis are frequently modelled ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 39
Provided by: Bas90
Category:

less

Transcript and Presenter's Notes

Title: Bojan%20Basrak


1
EXTREME VALUES, COPULAS AND GENETIC MAPPING
  • Bojan Basrak
  • Department of Mathematics,
  • University of Zagreb, Croatia

EVA 2005, Gothenburg
2
Genetic mapping
  • Genetic map gives the relative positions of genes
    on the chromosomes with distances between them
    typically measured in centimorgans (cM)
  • Linkage analysis aims to find approximate
    location of genes associated with certain traits
    in plants and animals.
  • It is a statistical method that compares genetic
    similarity between two individuals (at a marker)
    to similarity of their physical or psychological
    traits (phenotype).
  • Among the most studied traits are inheritable
    diseases.

3
QTL
  • Quantitative trait A measurable trait that shows
    continuous variation, e.g. skin pigmentation,
    height, cholesterol, etc.
  • Quantitative traits are normally influenced by
    several genes and the environment.
  • QTL or quantitative trait locus a locus (or a
    gene) affecting quantitative trait.
  • There is even The Journal of Quantitative Trait
    Loci.

4
  • Genetic similarity between two individuals at a
    given locus is typically measured by a number
    called identity by descent (IBD) status.
  • Two genes of two different people are IBD if one
    is a physical copy of the other, or if they are
    both copies of the same ancestral gene.
  • For any two people IBD status is a number in the
    set 0,1,2. In real-life, this number typically
    needs to be estimated.

5
  • Linkage analysis is very effective with Mendelian
    inheritance.
  • Mapping genes involved in inheritable diseases
    can be done by comparing IBD status of affected
    relatives (e.g. breast cancer)
  • Mapping QTLs in animals or plants is performed by
    arranging a cross between two inbred strains,
    which are substantially different in a
    quantitative trait (e.g. tomato fruit mass or pH).

6
IBD status of two half sibs
Mother chromosomes
Chromosomes of two half sibs
Sib 1
After two meiosis and some other developments
Sib 2
X(t) number of alleles identical by descent
distance in Morgans
t s
X(t)0, X(s)1
7
  • Recombinations, or more specifically, locations
    of crossovers in meiosis are frequently modelled
    by a stochastic process (standard choice is the
    Poisson process, suggested by Haldane in 1919.)
  • The process (X(t)) is an ON-OFF process in the
    case of half-sibs, or sum of two independent such
    processes in the case of siblings.
  • In particular, under Poisson process model,
    (X(t)) is a stationary Markov process. Moreover,
    X(t) is Bernoulli distributed for each t in the
    case of half sibs.

8
  • In the Haldane model, we have
  • where
  • is the recombination probability.
  • For simplicity, we assume that IBD status is
    known at each marker (i.e. markers are completely
    genetically informative).

9
  • Human genome consists of over 3 109 basepairs
    (in two copies) on 23 chromosomes. The average
    length of a chromosome is 140 cM.
  • Total length of female (autosomal) genome is
    4296cM
  • Total length of male genome is 2851 cM
  • That is there is 1 expected crossover over 105
    Mb in males and over 88 Mb in females. Thus, on
    human genome, 1 cM approximately equals 1Mb.

10
Data
  • From n sib-pairs we observe
  • - a sequence of iid phenotypes, with continuous
    marginal distribution
  • and
  • - a sequence of iid processes

11
IBD 1 at t IBD 0 at t
12
Haseman-Elston
  • In 1972, they suggested to test whether there is
    a linear regression with negative slope between
  • Soon, this became the standard tool for mapping
    of QTLs in human genetics

13
Variance Components Model
  • Variance components model (Fulker and Cherny)
    essentially assumes that the joint distribution
    of the phenotypes is
  • bivariate normal, conditionally on the IBD status
    x, with the same marginal distributions,
  • and the correlation

14
Linkage Analysis
  • The main question
  • Does higher IBD status mean stronger dependence
    between the two trait values?
  • In variance components model this translates into
    the test of Ho
  • against HA

15
Test statistic
  • Statistical test is based on the log-likelihood
    ratio statistic
  • Or (equivalently) on the efficient score statistic

16
  • Where
  • is the score function, and
  • is appropriate entry of Fisher information matrix
    and
  • needs to be estimated in practice.

17
Z(t)
tmax
18
Significance in genome-wide scans
  • If we have more than one marker we need to deal
    with the issue of multiple testing. The solution
    of this problem depends on the intermarker
    spacings and the sample size.
  • One could use permutation tests or other
    simulation based methods to obtain p-values.
  • If the sample size is large, one can apply a nice
    asymptotic theory that determines significance
    thresholds from the analysis of extremes of
    certain Gaussian processes (see. Lander and
    Botstein, Siegmund et al.)

19
  • For an illustration, we assume that the markers
    are dense, that is IBD status is measured
    continuously along the genome. It turns out that
    under our assumptions and the null hypothesis one
    can show that
  • where is Ornstein-Uhlenbeck process with mean
    zero and covariance function
  • over each chromosome.

20
  • Now, approximate thresholds for a given
    significance level can be obtained by studying
    extremes of Ornstein-Uhlenbeck process (cf.
    Leadbetter et al) over finite interval. Hence, we
    get
  • For 23 human chromosomes with average length of
    140 cM and significance level 0.05 we get
    threshold b4.08 (3.62 on LOD scale).

21
Other models
  • The asymptotic theory does not change for other
    more realistic models of the recombination
    process (e.g. Kosambi model or chi squared
    model), since the asymptotic results for extremes
    of Gaussian processes depend only on the local
    behavior of the autocorrelation function of the
    process.
  • Howver, for all of these models it holds that
    corr(Xs,Xt)1-rt-s as t-s converges to 0. So
    in the limit we obtain Gaussian process with the
    same behavior of autocorrelations.

22
Disadvantages
  • Normality assumption is frequently questionable
  • Correlation can be a very bad measure of
    dependence if this assumption does not hold
  • Risch and Zhang (1995) show how
  • "The majority of such pairs provide little power
    to detect linkage only pairs that are concordant
    for high values, low values, or extremely
    discordant pairs (for example, one in the top 10
    percent and other in the bottom 10 percent of the
    distribution) provide substantial power"

23
Copula
  • Copula of a random pair is the
    distribution function C of the random vector
  • where we assume that the marginal distributions
    F1 and F2 of Y1and Y2 are invertible. Hence the
    marginal distributions of the copula are both
    uniform on 0,1.
  • It is well known that the distribution of a
    random pair splits into two marginal
    distributions and the copula. Also copula is
    invariant under continuous increasing
    transformations.

24
  • It is straightforward to check that
  • i.e. the distribution of a random pair splits
    into two marginal distributions and the copula
  • Copula is invariant under monotone
    transformations, that is
  • have the same copula, for increasing function h.

25
Basic Examples
26
Linkage analysis rephrased
  • The main question
  • Does higher IBD status mean stronger dependence
    between the two trait values?
  • could be rephrased as
  • Does higher IBD status mean that the two trait
    values have more diagonalized copula?
  • Note marginal distributions do not change with
    IBD status.

27
Normal Copula
  • Normal copula is a copula of a normally
    distributed random vector. Thus, if
  • then the random vector has the bivariate
    normal copula.
  • Since it depends only on we denote it by

28
Bivariate Normal Copula
29
New Model
  • Assume that the pair has
  • the same copula as in the variance components
    model, i.e.
  • conditionally on the IBD status x
  • and the same (but arbitrary) continuous marginal
    distribution i.e. F1 F2 .

30
  • The model is not so new after all, equivalently,
    there is an h such that
  • satisfies the assumption of the v.c. model.
  • Suppose that has the standard normal
    distribution function then
  • That is

31
  • We can proceed in two ways
  • we could guess (estimate) h, or
  • we could guess (estimate) F1
  • The first method is already frequently applied in
    practice,
  • while the second one is easier to justify using
    the empirical
  • distribution function of the phenotypes.
  • To estimate F1 we may use data from a larger
    sample if
  • available.

32
Transformation
  • In practice we might have only 2n sib-pairs to
    estimate marginal distribution. So we could use
  • Transformed phenotypes are

33
  • If , one can show the following
  • Theorem
  • as
  • Observe that we essentially use van der Waerden
    normal scores rank correlation coefficient to
    measure dependence between the traits.
  • Klaassen and Wellner (1997) showed that this is
    asymptotically efficient estimator of the
    correlation parameter in bivariate normal copula
    model.

34
  • Hence, it is also efficient estimator of the
    maximum correlation coefficient.
  • For a pair of random variables Y1 and Y2 ,
    maximum correlation coefficient is defined as
  • where supremum is taken over all real
    transformations a and b such that a(Y1) and b(Y2)
    have finite nonzero variance.

35
Simulation study
36
Application - Lp(a)
  • Twin data on lipoprotein levels, collected in 4
    populations in three countries (Australia, the
    Netherlands, Sweden).
  • Analysis was performed using the variance
    components method and published by Beekman et al.
    (2003).

37
Ad hoc transformation
38
Lp(a) - chromosome 1
39
Lp(a) - chromosome 6
40
Discussion
  • The normal copula based method has correct
    critical levels under the null hypothesis for any
    marginal distribution. Its power seems to be
    close to optimal.
  • The method easily extends to general pedigrees,
    discrete data, multiple QTLs, etc.
  • It is straightforward to implement in any
    existing software.
  • Other families of copulas (Clayton, Gumbel, etc.)
    could be more suitable in certain applications.

41
Discrete data
  • In biomedical applications, phenotypes are
    frequently measured on some ordinal scale that
    is for some natural number l
  • If we want to detect if higher IBD status
    translates into more similar phenotypic values we
    may apply nonparametric methods or discretize
    some parametric family of copulas, and test if
    the parameters change with IBD status.

42
Discrete data
43
Acknowledgments
  • C. Klaassen (UvA, Eurandom)
  • D. Boomsma (VUA)
  • M. Beekman (LUMC)
  • N. Martin (Australia)
Write a Comment
User Comments (0)
About PowerShow.com