Title: Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees
1Powerful Regression-based Quantitative Trait
Linkage Analysis of General Pedigrees
- Pak Sham, Shaun Purcell,
- Stacey Cherny, Gonçalo Abecasis
2The Problem
- Maximum likelihood variance components linkage
analysis - Powerful (Fulker Cherny 1996) but
- Not robust in selected samples or non-normal
traits - Conditioning on trait values (Sham et al 2000)
improves robustness but is computationally
intensive in large pedigrees - Haseman-Elston regression
- More robust but
- Less powerful
- Applicable only to sib pairs
3Aim
- To develop a regression-based method that
- Has same power as maximum likelihood variance
components, for sib pair data - Will generalise to general pedigrees
4- Penrose (1938)
- quantitative trait locus linkage for sib pair
data - Simple regression-based method
- squared pair trait difference
- proportion of alleles shared identical by descent
5Haseman-Elston regression
(X - Y)2
IBD
2
1
0
6Sums versus differences
- Wright (1997), Drigalenko (1998)
- phenotypic difference discards sib-pair QTL
linkage information - squared pair trait sum provides extra information
for linkage - independent of information from HE-SD
7- New dependent variable to increase power
- mean corrected cross-product (HE-CP)
- But this was found to be less powerful than
original HE when sib correlation is high
8- Clarify the relative efficiencies of existing HE
methods - Demonstrate equivalence between a new HE method
and variance components methods - Show application to the selection and analysis of
extreme, selected samples
9NCPs for H-E regressions
10Weighted H-E
- Squared-sums and squared-differences
- orthogonal components in the population
- Optimal weighting
- inverse of their variances
11Weighted H-E
- A function of
- square of QTL variance
- marker informativeness
- complete information Var( )1/8
- sibling correlation
- Equivalent to variance components
- to second-order approximation
- Rijsdijk et al (2000)
12Combining into one regression
- New dependent variable
- a linear combination of
- squared-sum
- squared-difference
- Inversely weighted by their variances
13Simulation
- Single QTL simulated
- accounts for 10 of trait variance
- 2 equifrequent alleles additive gene action
- assume complete IBD information at QTL
- Residual variance
- shared and nonshared components
- residual sibling correlation 0 to 0.5
- 10,000 sibling pairs
- 100 replicates
- 1000 under the null
14Unselected samples
15Sample selection
- A sib-pairs squared mean-corrected DV is
proportional to its expected NCP - Equivalent to variance-components based selection
scheme - Purcell et al (2000)
16Sample selection
17Analysis of selected samples
- 500 (5) most informative pairs selected
r 0.05
r 0.60
18Selected samples H0
19Selected samples HA
20Extension to General Pedigrees
- Multivariate Regression Model
- Weighted Least Squares Estimation
- Weight matrix based on IBD information
21Switching Variables
- To obtain unbiased estimates in selected samples
- Dependent variables IBD
- Independent variables Trait
22Dependent Variables
- Estimated IBD sharing of all pairs of relatives
- Example
23Independent Variables
- Squares and cross-products
- (equivalent to non-redundant squared sums and
differences) - Example
24Covariance Matrices
Obtained from prior (p) and posterior (q) IBD
distribution given marker genotypes
25Covariance Matrices
- Independent
- Obtained from properties of multivariate normal
distribution, - under specified mean, variance and correlations
26Estimation
- For a family, regression model is
- Estimate Q by weighted least squares, and obtain
sampling variance, family by family - Combine estimates across families, inversely
weighted by their variance, to give overall
estimate, and its sampling variance
27Average chi-squared statistics fully informative
marker NOT linked to 20 QTL
Average chi-square
N1000 individuals Heritability0.5 10,000
simulations
Sibship size
28Average chi-squared statistics fully informative
marker linked to 20 QTL
Average chi-square
N1000 individuals Heritability0.5 2000
simulations
Sibship size
29Average chi-squared statistics poorly
informative marker NOT linked to 20 QTL
Average chi-square
N1000 individuals Heritability0.5 10,000
simulations
Sibship size
30Average chi-squared statistics poorly
informative marker linked to 20 QTL
Average chi-square
N1000 individuals Heritability0.5 2000
simulations
Sibship size
31Average chi-squares selected sib pairs, NOT
linked to 20 QTL
20,000 simulations 10 of 5,000 sib pairs selected
Average chi-square
Selection scheme
32Average chi-squares selected sib pairs, linkage
to 20 QTL
2,000 simulations 10 of 5,000 sib pairs selected
Average chi-square
Selection scheme
33Mis-specification of the mean,2000 random sib
quads, 20 QTL
"Not linked, full"
34Mis-specification of the covariance,2000 random
sib quads, 20 QTL
"Not linked, full"
35Mis-specification of the variance,2000 random
sib quads, 20 QTL
"Not linked, full"
36Cousin pedigree
37Average chi-squares for 200 cousin pedigrees, 20
QTL
Poor marker information Poor marker information Full marker information Full marker information
REG VC REG VC
Not linked 0.49 0.48 0.53 0.50
Linked 4.94 4.43 13.21 12.56
38Conclusion
- The regression approach
- can be extended to general pedigrees
- is slightly more powerful than maximum likelihood
variance components in large sibships - can handle imperfect IBD information
- is easily applicable to selected samples
- provides unbiased estimate of QTL variance
- provides simple measure of family informativeness
- is robust to minor deviation from normality
- But
- assumes knowledge of mean, variance and
heritability of trait distribution in population
39The End