Introduction to Linkage and Association for Quantitative Traits - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Linkage and Association for Quantitative Traits

Description:

Title: PowerPoint Presentation Author: Outreach Publications Last modified by: Michael C Neale Created Date: 2/18/2002 6:01:30 PM Document presentation format – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 46
Provided by: Outrea3
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Linkage and Association for Quantitative Traits


1
Introduction to Linkage and Association for
Quantitative Traits
  • Michael C Neale
  • Boulder Colorado Workshop March 2 2009

2
Overview
  • A brief history of SEM
  • Regression
  • Maximum likelihood estimation
  • Models
  • Twin data
  • Sib pair linkage analysis
  • Association analysis

3
Origins of SEM
  • Regression analysis
  • Reversion Galton 1877 Biological phenomenon
  • Yule 1897 Pearson 1903 General Statistical
    Context
  • Initially Gaussian X and Y Fisher 1922 YX
  • Path Analysis
  • Sewall Wright 1918 1921
  • Path Diagrams of regression and covariance
    relationships

4
Structural Equation Modeling Basics
  • Two kinds of relationships
  • Linear regression X -gt Y single-headed
  • Unspecified covariance Xlt-gtY double-headed
  • Four kinds of variable
  • Squares observed variables
  • Circles latent, not observed variables
  • Triangles constant (zero variance) for
    specifying means
  • Diamonds observed variables used as moderators
    (on paths)

5
Linear Regression Covariance SEM
Var(X)
Res(Y)
b
Y
X
Models covariances only Of historical interest
6
Linear Regression SEM with means
Var(X)
Res(Y)
b
Y
X
M u(y)
M u(x)
1
Models Means and Covariances
7
Linear Regression SEM Individual-level
Yi a bXi
Res(Y)
X i
b
D
Yi
1
a
1
Models Mean and Covariance of Y only Must have
raw (individual level) data Xi is a definition
variable Mean of Y different for every observation
8
Single Factor Covariance Model
9
Two Factor Model with Covs Means
1
1.00
1.00
mF1
mF2
F1
F2
lm
l3
l1
l2
S1
S2
S3
Sm
mSm
e1
e2
e3
e4
mS2
mS3
mS1
N.B. Not identified
1
10
Factor model essentials
  • In SEM the factors are typically assumed to be
    normally distributed
  • May have more than one latent factor
  • The error variance is typically assumed to be
    normal as well
  • May be applied to binary or ordinal data
  • Threshold model

11
Multifactorial Threshold Model
Normal distribution of liability. Affected
when liability x gt t
t
0.5
?
0.4
0.3
0.2
0.1
0
0
1
2
3
4
-1
-2
-3
-4
x
12
Measuring Variation
  • Distribution
  • Population
  • Sample
  • Observed measures
  • Probability density function pdf
  • Smoothed out histogram
  • f(x) gt 0 for all x

13
Flipping Coins
4 coins 5 outcomes
Probability
0.4
0.3
0.2
0.1
0
HHHH
HHHT
HHTT
HTTT
TTTT
Outcome
14
Bank of China Coin Toss

Infinite outcomes
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
-1
-2
-3
-4
Heads-Tails
De Moivre 1733 Gauss 1827
15
Variance Average squared deviation
Normal distribution
?
xi
di
0
1
2
3
-1
-2
-3
Variance ? di2/N
16
Deviations in two dimensions
?x














?y



















17
Deviations in two dimensions dx x dy
?x
dx

dy
?y
18
Covariance
  • Measure of association between two variables
  • Closely related to variance
  • Useful to partition variance
  • Analysis of Variance term coined by Fisher

19
Variance covariance matrix
Univariate Twin/Sib Data
Var(Twin1) Cov(Twin1,Twin2)
Cov(Twin2,Twin1) Var(Twin2) Suitable
for modeling when no missing data Good conceptual
perspective
20
Maximum Likelihood Estimates Nice Properties
  • 1. Asymptotically unbiased
  • Large sample estimate of p -gt population value
  • 2. Minimum variance Efficient
  • Smallest variance of all estimates with property
    1
  • 3. Functionally invariant
  • If g(a) is one-to-one function of parameter a
  • and MLE (a) a
  • then MLE g(a) g(a)
  • See http//wikipedia.org

21
Full Information Maximum Likelihood (FIML)
Calculate height of curve for each raw data vector
-1
22
Height of normal curve ?x 0
Probability density function
?x
?(xi)
0
1
2
3
-1
-2
-3
xi
?(xi) is the likelihood of data point xi for
particular mean variance estimates
23
Height of normal curve at xi ?x .5
Function of mean
?x
?(xi)
0
1
2
3
-1
-2
-3
xi
Likelihood of data point xi increases as ?x
approaches xi
24
Likelihood of xi as a function of ?
Likelihood function
L(xi)
MLE

0
1
2
3
-1
-2
-3
xi
?x
L(xi) is the likelihood of data point xi for
particular mean variance estimates
25
Height of normal curve at x1
Function of variance
?x
??(xi var 1)
??(xi var 2)
??(xi var 3)
xi
0
1
2
3
-1
-2
-3
Likelihood of data point xi changes as variance
of distribution changes
26
Height of normal curve at x1 and x2
?x
??(x1 var 1)
??(x1 var 2)
??(x2 var 2)
??(x2 var 1)
x1 x2
0
1
2
3
-1
-2
-3
x1 has higher likelihood with var1 whereas x2
has higher likelihood with var2
27
Height of bivariate normal density function
Likelihood varies as f(???? ???? ?1, ?2, ??
y
x
28
Likelihood of Independent Observations
  • Chance of getting two heads
  • L(x1xn) Product (L(x1), L(x2) , L(xn))
  • L(xi) typically lt 1
  • Avoid vanishing L(x1xn)
  • Computationally convenient log-likelihood
  • ln (a b) ln(a) ln(b)
  • Minimization more manageable than maximization
  • Minimize -2 ln(L)

29
Likelihood Ratio Tests
  • Comparison of likelihoods
  • Consider ratio L(data,model 1) / L(data, model
    2)
  • ln(a/b) ln(a) - ln(b)
  • Log-likelihood lnL(data, model 1) - ln L(data,
    model 2)
  • Useful asymptotic feature when model 2 is a
    submodel of model 1
  • -2 (lnL(data, model 1) - lnL(data, model 2))
    ???
  • df parameters of model 1 - parameters of
    model 2
  • BEWARE of gotchas!
  • Estimates of a2 q2 etc. have implicit bound of
    zero
  • Distributed as 5050 mixture of 0 and ????

0
1
2
3
-1
-2
-3
l
30

Two Group ACE Model for twin data
1
1(MZ) .5(DZ)
1
1
1
1
1
1
A
C
E
A
C
E
e
a
c
e
c
a
PT1
PT2
m
m
1
31
Linkage vs Association
  • Linkage
  • Family-based
  • Matching/ethnicity generally unimportant
  • Few markers for genome coverage (300-400 STRs)
  • Can be weak design
  • Good for initial detection poor for fine-mapping
  • Powerful for rare variants
  • Association
  • Families or unrelated individuals
  • Matching/ethnicity crucial
  • Many markers req for genome coverage (105 106
    SNPs)
  • Powerful design
  • Ok for initial detection good for fine-mapping
  • Powerful for common variants rare variants
    generally impossible

32
Identity by Descent (IBD)
Number of alleles shared IBD at a locus, parents
AB and CD Three subgroups of sibpairs
AC
AD
BC
BD
AC
2
1
1
0
AD
1
2
0
1
BC
1
0
2
1
BD
0
1
1
2
33
Partitioned Twin Analysis
  • Nance Neale (1989) Behav Genet 191
  • Separate DZ pairs into subgroups
  • IBD0 IBD1 IBD2
  • Correlate Q with 0 .5 and 1 coefficients
  • Compute statistical power

34
Partitioned Twin Analysis Three DZ groups
.5
.5
1
.25
IBD1 group
A1
C1
D1
E1
Q1
Q2
E2
D2
C2
A2
P1
P2
IBD2 group
IBD0 group
1
0
.5
1
.25
.5
1
.25
A1
C1
D1
E1
Q1
Q2
E2
D2
C2
A2
A1
C1
D1
E1
Q1
Q2
E2
D2
C2
A2
P1
P2
P1
P2
35
Problem 1 with Partitioned Twin analysis Low
Power
  • Power is low

36
Problem 2 IBD is not known with certainty
  • Markers may not be fully informative
  • Only so much heterozygosity in e.g., 20 allele
    microsatellite marker
  • Less in a SNP
  • Unlikely to have typed the exact locus we are
    looking for
  • Genome is big!

37
IBD pairs vary in similarity
Effect of selecting concordant pairs
IBD2
t
IBD1
IBD0
t
38
Improving Power for Linkage
  • Increase marker density (yaay SNP chips)
  • Change design
  • Families
  • Larger Sibships
  • Selected samples
  • Multivariate data
  • More heritable traits with less error

39
Problem 2 IBD is not known with certainty
  • Markers may not be fully informative
  • Only so much heterozygosity in e.g., 20 allele
    microsatellite marker
  • Less in a SNP
  • Unlikely to have typed the locus that causes
    variation
  • Genome is big!
  • The Universe is Big. Really big. It may seem like
    a long way to the corner chemist, but compared to
    the Universe, that's peanuts. - D. Adams

40
(No Transcript)
41
Using Merlin/Genehunter etc
  • Several Faculty experts
  • Goncalo Abecasis
  • Sarah Medland
  • Stacey Cherny
  • Possible to use Merlin via Mx GUI

42
Pi-hat approach
  • 1 Pick a putative QTL location
  • 2 Compute p(IBD0) p(IBD1) p(IBD2) given
  • marker data Use Mapmaker/sibs or Merlin
  • 3 Compute ?i p(IBD2) .5p(IBD1)
  • 4 Fit model
  • Repeat 1-4 as necessary for different locations
    across genome


Elston Stewart
43
Basic Linkage (QTL) Model
?
?i p(IBDi2) .5 p(IBDi1) individual-level
1
1
1
1
1
1
1
Pihat
F1
F2
Q1
Q2
E2
E1
e
f
q
q
f
e
P1
P2
Q QTL Additive Genetic F Family
Environment E Random Environment3
estimated parameters q, f and e Every
sibship may have different model
P
P
44
Association Model
1
Geno1
Geno2
LDL1i a b Geno1i Var(LDLi)
R Cov(LDL1,LDL2) C C may be f(?i)
in joint linkage association
G2
G1
b
a
a
b
LDL1
LDL2
C
R
R
45
Between/Within Fulker Association Model
M
Geno1
Geno2
Model for the means
G1
G2
0.50
LDL1i .5bGeno1 .5bGeno2
.5wGeno1 - .5wGeno2
.5( b(Geno1Geno2)
w(Geno1-Geno2) )
-0.50
0.50
0.50
S
D
m
m
w
b
B
W
1.00
-1.00
1.00
1.00
LDL1
LDL2
R
R
C
Write a Comment
User Comments (0)
About PowerShow.com