Introduction to Linkage and Association for Quantitative Traits

About This Presentation

Title:

Introduction to Linkage and Association for Quantitative Traits

Description:

Title: PowerPoint Presentation Author: Outreach Publications Last modified by: Michael C Neale Created Date: 2/18/2002 6:01:30 PM Document presentation format –

Number of Views:206

Avg rating:3.0/5.0

Slides: 46

Provided by: Outrea3

Learn more at: http://ibgwww.colorado.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Linkage and Association for Quantitative Traits

1
Introduction to Linkage and Association for
Quantitative Traits

Michael C Neale
Boulder Colorado Workshop March 2 2009

2
Overview

A brief history of SEM
Regression
Maximum likelihood estimation
Models
Twin data
Sib pair linkage analysis
Association analysis

3
Origins of SEM

Regression analysis
Reversion Galton 1877 Biological phenomenon
Yule 1897 Pearson 1903 General Statistical
Context
Initially Gaussian X and Y Fisher 1922 YX
Path Analysis
Sewall Wright 1918 1921
Path Diagrams of regression and covariance
relationships

4
Structural Equation Modeling Basics

Two kinds of relationships
Linear regression X -gt Y single-headed
Unspecified covariance Xlt-gtY double-headed
Four kinds of variable
Squares observed variables
Circles latent, not observed variables
Triangles constant (zero variance) for
specifying means
Diamonds observed variables used as moderators
(on paths)

5
Linear Regression Covariance SEM
Var(X)
Res(Y)
b
Y
X
Models covariances only Of historical interest
6
Linear Regression SEM with means
Var(X)
Res(Y)
b
Y
X
M u(y)
M u(x)
1
Models Means and Covariances
7
Linear Regression SEM Individual-level
Yi a bXi
Res(Y)
X i
b
D
Yi
1
a
1
Models Mean and Covariance of Y only Must have
raw (individual level) data Xi is a definition
variable Mean of Y different for every observation
8
Single Factor Covariance Model
9
Two Factor Model with Covs Means
1
1.00
1.00
mF1
mF2
F1
F2
lm
l3
l1
l2
S1
S2
S3
Sm
mSm
e1
e2
e3
e4
mS2
mS3
mS1
N.B. Not identified
1
10
Factor model essentials

In SEM the factors are typically assumed to be
normally distributed
May have more than one latent factor
The error variance is typically assumed to be
normal as well
May be applied to binary or ordinal data
Threshold model

11
Multifactorial Threshold Model
Normal distribution of liability. Affected
when liability x gt t
t
0.5
?
0.4
0.3
0.2
0.1
0
0
1
2
3
4
-1
-2
-3
-4
x
12
Measuring Variation

Distribution
Population
Sample
Observed measures
Probability density function pdf
Smoothed out histogram
f(x) gt 0 for all x

13
Flipping Coins
4 coins 5 outcomes
Probability
0.4
0.3
0.2
0.1
0
HHHH
HHHT
HHTT
HTTT
TTTT
Outcome
14
Bank of China Coin Toss

Infinite outcomes
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
-1
-2
-3
-4
Heads-Tails
De Moivre 1733 Gauss 1827
15
Variance Average squared deviation
Normal distribution
?
xi
di
0
1
2
3
-1
-2
-3
Variance ? di2/N
16
Deviations in two dimensions
?x

?y

17
Deviations in two dimensions dx x dy
?x
dx

dy
?y
18
Covariance

Measure of association between two variables
Closely related to variance
Useful to partition variance
Analysis of Variance term coined by Fisher

19
Variance covariance matrix
Univariate Twin/Sib Data
Var(Twin1) Cov(Twin1,Twin2)
Cov(Twin2,Twin1) Var(Twin2) Suitable
for modeling when no missing data Good conceptual
perspective
20
Maximum Likelihood Estimates Nice Properties

1. Asymptotically unbiased
Large sample estimate of p -gt population value
2. Minimum variance Efficient
Smallest variance of all estimates with property
1
3. Functionally invariant
If g(a) is one-to-one function of parameter a
and MLE (a) a
then MLE g(a) g(a)
See http//wikipedia.org

21
Full Information Maximum Likelihood (FIML)
Calculate height of curve for each raw data vector
-1
22
Height of normal curve ?x 0
Probability density function
?x
?(xi)
0
1
2
3
-1
-2
-3
xi
?(xi) is the likelihood of data point xi for
particular mean variance estimates
23
Height of normal curve at xi ?x .5
Function of mean
?x
?(xi)
0
1
2
3
-1
-2
-3
xi
Likelihood of data point xi increases as ?x
approaches xi
24
Likelihood of xi as a function of ?
Likelihood function
L(xi)
MLE

0
1
2
3
-1
-2
-3
xi
?x
L(xi) is the likelihood of data point xi for
particular mean variance estimates
25
Height of normal curve at x1
Function of variance
?x
??(xi var 1)
??(xi var 2)
??(xi var 3)
xi
0
1
2
3
-1
-2
-3
Likelihood of data point xi changes as variance
of distribution changes
26
Height of normal curve at x1 and x2
?x
??(x1 var 1)
??(x1 var 2)
??(x2 var 2)
??(x2 var 1)
x1 x2
0
1
2
3
-1
-2
-3
x1 has higher likelihood with var1 whereas x2
has higher likelihood with var2
27
Height of bivariate normal density function
Likelihood varies as f(???? ???? ?1, ?2, ??
y
x
28
Likelihood of Independent Observations

Chance of getting two heads
L(x1xn) Product (L(x1), L(x2) , L(xn))
L(xi) typically lt 1
Avoid vanishing L(x1xn)
Computationally convenient log-likelihood
ln (a b) ln(a) ln(b)
Minimization more manageable than maximization
Minimize -2 ln(L)

29
Likelihood Ratio Tests

Comparison of likelihoods
Consider ratio L(data,model 1) / L(data, model
2)
ln(a/b) ln(a) - ln(b)
Log-likelihood lnL(data, model 1) - ln L(data,
model 2)
Useful asymptotic feature when model 2 is a
submodel of model 1
-2 (lnL(data, model 1) - lnL(data, model 2))
???
df parameters of model 1 - parameters of
model 2
BEWARE of gotchas!
Estimates of a2 q2 etc. have implicit bound of
zero
Distributed as 5050 mixture of 0 and ????

0
1
2
3
-1
-2
-3
l
30

Two Group ACE Model for twin data
1
1(MZ) .5(DZ)
1
1
1
1
1
1
A
C
E
A
C
E
e
a
c
e
c
a
PT1
PT2
m
m
1
31
Linkage vs Association

Linkage
Family-based
Matching/ethnicity generally unimportant
Few markers for genome coverage (300-400 STRs)
Can be weak design
Good for initial detection poor for fine-mapping
Powerful for rare variants

Association
Families or unrelated individuals
Matching/ethnicity crucial
Many markers req for genome coverage (105 106
SNPs)
Powerful design
Ok for initial detection good for fine-mapping
Powerful for common variants rare variants
generally impossible

32
Identity by Descent (IBD)
Number of alleles shared IBD at a locus, parents
AB and CD Three subgroups of sibpairs
AC
AD
BC
BD
AC
2
1
1
0
AD
1
2
0
1
BC
1
0
2
1
BD
0
1
1
2
33
Partitioned Twin Analysis

Nance Neale (1989) Behav Genet 191
Separate DZ pairs into subgroups
IBD0 IBD1 IBD2
Correlate Q with 0 .5 and 1 coefficients
Compute statistical power

34
Partitioned Twin Analysis Three DZ groups
.5
.5
1
.25
IBD1 group
A1
C1
D1
E1
Q1
Q2
E2
D2
C2
A2
P1
P2
IBD2 group
IBD0 group
1
0
.5
1
.25
.5
1
.25
A1
C1
D1
E1
Q1
Q2
E2
D2
C2
A2
A1
C1
D1
E1
Q1
Q2
E2
D2
C2
A2
P1
P2
P1
P2
35
Problem 1 with Partitioned Twin analysis Low
Power

Power is low

36
Problem 2 IBD is not known with certainty

Markers may not be fully informative
Only so much heterozygosity in e.g., 20 allele
microsatellite marker
Less in a SNP
Unlikely to have typed the exact locus we are
looking for
Genome is big!

37
IBD pairs vary in similarity
Effect of selecting concordant pairs
IBD2
t
IBD1
IBD0
t
38
Improving Power for Linkage

Increase marker density (yaay SNP chips)
Change design
Families
Larger Sibships
Selected samples
Multivariate data
More heritable traits with less error

39
Problem 2 IBD is not known with certainty

Markers may not be fully informative
Only so much heterozygosity in e.g., 20 allele
microsatellite marker
Less in a SNP
Unlikely to have typed the locus that causes
variation
Genome is big!
The Universe is Big. Really big. It may seem like
a long way to the corner chemist, but compared to
the Universe, that's peanuts. - D. Adams

40
(No Transcript)
41
Using Merlin/Genehunter etc

Several Faculty experts
Goncalo Abecasis
Sarah Medland
Stacey Cherny
Possible to use Merlin via Mx GUI

42
Pi-hat approach

1 Pick a putative QTL location
2 Compute p(IBD0) p(IBD1) p(IBD2) given
marker data Use Mapmaker/sibs or Merlin
3 Compute ?i p(IBD2) .5p(IBD1)
4 Fit model
Repeat 1-4 as necessary for different locations
across genome

Elston Stewart
43
Basic Linkage (QTL) Model
?
?i p(IBDi2) .5 p(IBDi1) individual-level
1
1
1
1
1
1
1
Pihat
F1
F2
Q1
Q2
E2
E1
e
f
q
q
f
e
P1
P2
Q QTL Additive Genetic F Family
Environment E Random Environment3
estimated parameters q, f and e Every
sibship may have different model
P
P
44
Association Model
1
Geno1
Geno2
LDL1i a b Geno1i Var(LDLi)
R Cov(LDL1,LDL2) C C may be f(?i)
in joint linkage association
G2
G1
b
a
a
b
LDL1
LDL2
C
R
R
45
Between/Within Fulker Association Model
M
Geno1
Geno2
Model for the means
G1
G2
0.50
LDL1i .5bGeno1 .5bGeno2
.5wGeno1 - .5wGeno2
.5( b(Geno1Geno2)
w(Geno1-Geno2) )
-0.50
0.50
0.50
S
D
m
m
w
b
B
W
1.00
-1.00
1.00
1.00
LDL1
LDL2
R
R
C

Write a Comment

User Comments (0)