Title: Gene Mapping Quantitative Traits using IBD sharing
1Gene Mapping Quantitative Traits using IBD sharing
References Introduction to Quantitative
Genetics, by D.S. Falconer and T. F.C. Mackay
(1996) Longman Press Chapter 5, Statistics in
Human Genetics by P. Sham (1998) Arnold
Press Chapter 8, Mathematical and Statistical
Methods for Genetic Analysis by K. Lange (2002)
Springer
2What is a Quantitative Trait?
A quantitative trait has numerical values that
can be ordered highest to lowest. Examples
include height, weight, cholesterol level,
reading scores etc. There are discrete values
where the values differ by a fixed amount and
continuous values where the difference in two
values can be arbitrarily small. Most methods for
quantitative traits assume that the data are
continuous (at least approximately).
3- Why use quantitative traits?
- More power. Fewer subjects may need to be
examined (phenotyped) if one uses the
quantitative trait rather than dichotomizing it
to create qualitative trait.
affecteds
unaffecteds
x
y
w
z
v
Individuals w and x have similar trait values,
yet w is grouped with z and x is grouped with y.
Note that even among affecteds, knowing the trait
value is useful (v and z are more similar than v
and w).
4Why use Quantitative Traits?
- (2) The genotype to phenotype relationship may be
more direct. Affection with a disease could be
the culmination of many underlying events
involving gene products, environmental factors
and gene-environment interactions. The
underlying events may differ among people,
resulting in heterogeneity. -
-
5Why use quantitative traits?
- (3) End stage disease may be too late. If the
disease is late onset, then parents may not be
available anymore. However if there is a
quantitative trait that is known to predict
increased risk of the disease, then it might be
measured earlier in a persons lifetime. Their
parents may also be available for genotyping
resulting in more information.
6Why not use quantitative traits?
- (1) The quantitative trait doesnt meet the
assumptions of the proposed statistical method.
For example many methods assume the quantitative
traits are unimodal but not all quantitative
traits are unimodal. - (2) The values of the quantitative trait might be
very unreliable. - (3) There are no good intermediate quantitative
phenotypes for a particular disease. The
quantitative traits available arent telling the
whole story.
7Components of the Phenotypic Variance of a
Quantitative Trait
The total variance in a quantitative trait,
termed the phenotypic variance, can be
partitioned into the variance due to genetic
components, the environmental components and
gene-environment interaction components.
8Components of Phenotypic Variance of a
Quantitative Trait
- Often we make simplifying assumptions, for
example that there is no variance component due
to interactions, that there is no shared
environment and that all genes are acting
independently.
In this case we can write the phenotypic
variance, VP, as the sum of the genetic variance,
VG, and the environmental variance, VE. VP
VEVG
9The Additive and Dominance Components of Variance
VG VA VD
VA, the additive genetic variance is
attributed the inheritance of individual
alleles. VD, the dominance genetic variance is
attributed to the alleles acting together as
genotypes. VG / VP heritability in the
broad-sense. VA /VP heritability in the
narrow-sense.
10The degree of correlation between two relatives
depends on the theoretical kinship coefficient
- An important measure of family relationship is
the theoretical Kinship coefficient. - It is the probability that two alleles, at a
randomly chosen locus, one chosen randomly from
individual i and one from j are identical by
descent. - The kinship coefficient does not depend on the
observed genotype data.
11Covariance between relatives under an polygenic
model depends on the theoretical kinship
coefficient and the probability that, at any
arbitrary autosomal locus, the pair share both
genes IBD
Relationship kinship coefficient P(IBD2)
covariance parent-offspring 1/4 0
1/2VA full siblings 1/4 1/4
1/2VA1/4VD uncle-nephew 1/8 0
1/4VA first cousins 1/16 0
1/8VA
Note This doesnt depend on any measured
genotype effects (marker information).
12Covariance among relatives also depends upon
the allele sharing at a trait locus Allele
Sharing Identity-by-Descent (IBD)
13The proportion of alleles shared IBD is
equivalent to twice the conditional kinship
coefficient.
The conditional kinship coefficient is the
probability that a gene chosen randomly from
person i at a specific locus matches a gene
chosen randomly from person j given the available
genotype information at markers.
14We expect two siblings with similar, extreme
trait values to share more alleles IBD at the
trait locus than two siblings who have dissimilar
extreme trait values.
15The dependence of the traits covariance on the
IBD sharing at a marker is a function of the
distance between the trait and the marker loci as
well as the strength of the QTL. As the map
distance increases, the covariance of the trait
values becomes less dependent on IBD sharing at
the marker and so the apparent QTL variance
component will decrease.
16We expect two siblings with similar, extreme
trait values to share more alleles IBD at the
trait locus than two siblings who have dissimilar
extreme trait values. Or another to think about
it, we expect that the correlation among trait
values will depend on IBD sharing.
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21QTL mapping using a variance component model
Another way to test whether the covariance among
relatives trait values is correlated with the IBD
sharing at a locus is to use a variance component
model.
22 A simple variance component model has one major
trait locus, a polygenic effect, environmental
factors that are independent of genetic effects
and independent across family members (no
household effects). The major gene and
polygenic effects are also independent
23Mathematically YimbTaigiqiei where m is
the population mean, a are the environmental
predictor variables, q is the major trait locus,
g is the polygenic effect, and e is the residual
error.
24(No Transcript)
25 Some things to consider The estimates of VA
and VD are not the actual variances due to the
QTL - they depend on how far the map location is
from the QTL and the sampled data. For a
parent-child pair, cov(Yi,Yj)1/2 VA 1/2 VG
for any map location. Why is the conditional
kinship coefficient always 1/4? Why is the
dominance variance missing from this equation?
For two siblings i and j,
26(No Transcript)
27(No Transcript)
28As the number of traits increases the complexity
of the log-likelihood also increases
The loglikelihood is maximized using a steepest
ascent algorithm. It becomes more and more
difficult to find the global maximum as multiple
local maximum exist. One solution is to use
several starting points for the maximization.
29Besides the usual commands PREDICTOR Grand
Trait1 PREDICTOR SEX Trait1 PREDICTOR AGE
Trait1 PREDICTOR BMI Trait1 COEFFICIENT_FI
LE Coefficient19b.in ltibd info from
sibwalk QUANTITATIVE_TRAIT Trait1 COVARIANCE_CLA
SS Additive ltpolygenic COVARIANCE_CLASS
Environmental COVARIANCE_CLASS Qtl ltnow specify
an additive qtl GRID_INCREMENT 0.005 ltspacing
of the map points ANALYSIS_OPTION
Polygenic_Qtl VARIABLE_FILE Variable19b.in PROBA
ND 1 PROBAND_FACTOR PROBAND
30Results
- Get a summary file and a full output file
- The summary file looks like
- MARKER MAP LOCATION AIC
NUMBER OF - DISTANCE SCORE
FACTORS - Marker01 0.0000 1.5892 6.6816
1 - Marker02 0.0010 1.5679 6.7798
1 - -- 0.0050 1.6693 6.3126
1 - -- 0.0100 1.8603 5.4329
1 - -- 0.0150 2.1112 4.2778
1 - -- 0.0200 2.4028 2.9346
1 - Marker03 0.0228 2.5740 2.1463
1 - Marker04 0.0238 2.5757 2.1383
1 - -- 0.0250 2.5666 2.1804
1 - -- 0.0300 2.4896 2.5351
1 - AIC -2ln(L(Z))2n The smaller the AIC the
better the fit - n number of parameters number of constraints
- Factors will be explained in a little while.
31There is more information in the output file
including parameter estimates. However, the
estimates of locus specific additive variance and
narrow sense heritability obtained from genome
wide scans are upwardly biased. Therefore these
estimates could lead one to over estimate the
importance of the QTL in determining trait values
(Goring et al, 2001, AJHG 691357-1369).
32(No Transcript)
33Using more than one quantitative trait in the
analysis
- The model extends so that multiple traits can be
considered at the same time. - The phenotypic variance is now a matrix.
- The variance components get more complicated.
Instead of one term per variance component, there
are (1n) (n1)n/2 terms where n is the
number of quantitative traits. - As an example, consider two traits X and Y.
34For technical reasons it is better to
reparameterize the variances using factor
analytic approach
- Factor refers to hidden underlying variables that
capture the essence of the data - Each variance component is parameters in terms of
factors. - We will illustrate with the additive genetic
variance matrix for two traits X and Y (in
principle any number of traits or any of the
components could have been used). - There exists a matrix
such that
35Factors can be used to search for pleiotropic
effects?
- Could a single factor explain QTL variance
component? - A single factor is consistent with pleiotropy
although there may be other explanations a single
factor. - When are more than two traits we could have
reduced numbers of factors. -
36Reduction in Parameters
Recall the original factor matrix for the QTL
37Modifications to the control file
- QUANTITATIVE_TRAIT Trait1
- QUANTITATIVE_TRAIT Trait2
- PREDICTOR Grand Trait1
- PREDICTOR SEX Trait1
- PREDICTOR AGE Trait1
- PREDICTOR BMI Trait1
- PREDICTOR Grand Trait2
- PREDICTOR SEX Trait2
- PREDICTOR AGE Trait2
- PREDICTOR BMI Trait2
- COVARIANCE_CLASS Additive
- COVARIANCE_CLASS Environmental
- COVARIANCE_CLASS Qtl
38One factor explains the results as well as two
- MARKER MAP LOCATION AIC
NUMBER OF - DISTANCE SCORE
FACTORS - Marker01 0.0000 1.6533 24.3863
1 - Marker02 0.0010 1.6492 24.4052
1 - -- 0.0050 1.7558 23.9143
1 - -- 0.0100 1.9508 23.0161
1 - .
- .
- .
- Marker10 0.0931 0.5852 29.3049
1 - Marker11 0.0941 0.5111 29.6464
1 - 2 factors
- Marker01 0.0000 1.6605 26.3529
2 - Marker02 0.0010 1.6520 26.3925
2 - -- 0.0050 1.7568 25.9098
2 - -- 0.0100 1.9508 25.0162
2 - .
- .
- .
39Summary
- Variance component models can be used to
understand the correlations among traits in
families - They can also be used to map QTLs
- Variance component models provide a powerful
approach for multivariate quantitative trait
data.