Parametric linkage analysis and lod scores presentation

About This Presentation

Title:

Parametric linkage analysis and lod scores

Description:

of Human Genetics & Biostatistics. UCLA. Contents. the big picture: meiotic mapping techniques ... known as reverse genetics if you start with the phenotype (e. ... –

Number of Views:883

Avg rating:3.0/5.0

Slides: 48

Provided by: SHor1

Category:

more less

Transcript and Presenter's Notes

Title: Parametric linkage analysis and lod scores

1
Parametric linkage analysis and lod scores

Steve Horvath
Depts. of Human Genetics Biostatistics
UCLA

2
Contents

the big picture meiotic mapping techniques
genetic distances and genetic maps
map functions
LOD (log of the odds) score analysis
2-point analysis
testing for linkage between a marker and an
affectation status locus
example rare, fully penetrant, dominant
Mendelian disease
more general disease models
parameters in parametric linkage analysis
multipoint analysis algorithms for LOD scores
significance levels, thresholds and false
positives

3
The big picturelocating (mapping) disease
genes
4
Meiotic mapping allows to identify DNA segments
that contain disease genes
trait 1
Reverse genetics trait - DNA
trait 3
trait 2

Mapping is part of the positional cloning
strategy.
works well for Mendelian diseases,
correspond to rare, highly penetrant disease
alleles

5
Different ways of expressing the goal of genomics

goal find stretches of DNA that are risk factors
for a disease.
known as reverse genetics if you start with the
phenotype (e.g. affectation status)
aka. positional cloning (Collins FS)
3 step procedure (adapted)
first meiotic mapping (linkage, linkage
disequilibrium)
second, physical mapping (includes sequencing)
third, find mutation and verify functional role

6
Different kinds of meiotic mapping methods

parametric (better model-based) lod score
analysis
single point
multipoint
non-parametric (better model-free) linkage
analysis
allele sharing methods
key concept identity by descent
confusing factoid non-parametric models
sometimes equivalent to parametric methods (Knapp
M, 1993?)
association studies, linkage disequilibrium
mapping
family-based methods (TDT, FBAT)
population-based methods (chi-square test,
log-linear model)

7
What do meiotic mapping methods have in common?

based on meiosis
made possible through the violation of Mendels
law of independent assortment
crossing over effects, recombination, ....
recombination fraction ?
requires genetic markers, and sometimes the
distances between them (genetic map)
usually test hypothesis of no linkage H ?1/2
but sometimes test for no linkage disequilibrium

8
What is parametric linkage analysis?

A meiotic mapping technique based on
constructing a disease gene transmission model to
explain the inheritance of a disease in
pedigrees.
Meaning will become clear....

9
Genetic markers

desirable properties of genetic markers
locus-specific
polymorphic in the studied population
many heterozygotes
easily genotyped
quality measures for markers
heterozygosity homozygotes are uninformative!
or Polymorphism Information Content
probability that the parent is heterozygous x
probability that the offspring is informative

10
Important co-dominant genetic markers

microsatellites
variations in the number of tandem repeats
high level of polymorphism
even distribution across the genome
2nd generation map
SNPs
single nucleotide polymorphisms
bi-allelic codominant marker
heterozygosity is limited at 50 percent
3rd generation map

11
Genetic distances and genetic maps
Will be very relevant for multipoint linkage
studies.
12
The recombination fraction is a measure of
distance between 2 loci

recombination fraction ?the probability that a
recombinant gamete is transmitted
If two loci are on different chromosomes, they
will segregate independently
recombination fraction ?.5.
if two loci are right next to each other, they
will segregate together during meiosis
recombination fraction ?0
terminology
?
?.5 the loci are far apart (they are not
linked)

13
Genetic distance (unit is Morgan) expected no.
of cross-over pts per gamete

notation let a and b be 2 points in the genome.
Nab number of chiasmata between them
chiasmatacrossing-over points
Definition the genetic (map) distance is
dE(Nab)/2
Why factor of 2? Want no. of chiasmata per
gamete.
Example if on average 49 crossovers per per cell
in meiosis
then total genetic map distance49/224.5
Morgans
1 Morgan100 centimorgan

14
There is a relationship between crossing over and
recombination fraction

Mathers formula ?.5P(Nab0)
for small distances d approximately equal to ?,
since in this case E(Nab)P(Nab0)
P(Nab0) is related to dE(Nab)/2
different probability models for Nab lead to
different relationships between ? and d.
each sensible relationships between ? and d is
called a map functions
Great reference Lange K Mathematical and
Statistical methods in genetic analysis book,
Springer

15
The mathematical relationship between
recombination fraction and genetic distance is
called mapping function

Haldanes mapping function
d-.5 ln(1-2?)
the distance d is measure in centimorgan
perfect if crossovers occurred at random (no
interference)
Kosambis mapping function
d.25 ln(12?)/(1-2?)
again distance is measured in centimorgan
suitable if there is (crossover) interference
one cross-over prevents another from taking place
nearby
widely used

Note for both mapping functions
if ?.5, d infinite Morgans (infinite
distance)
if ?.0, d 0 M (0 distance)
if ?27, Haldane.3939cM, Kosambi .30
Morgans30cM

17
Men are genetically shorter than women

Total male map length2851cM
Total female map length4296cM (excluding the X)
Thus over 3000Mb (megabases) autosomal genome
1 male cM averages 1.05 Mb
1 female cM averages 0.88Mb

18
Meiotic versus physical maps

meiotic maps measure distances in genetic
distances, i.e. centimorgan
pretty coarse and often inaccurate
problem 1 which marker order?
problem2 which mapping function?
physical maps measure distances in base pairs
extremely high resolution allows you to find the
actual mutation
Connection between the 2 maps
rule of thumb 1cM equals 1 million base pairs
but this thumb is very crooked!!!

19
Computing the lod score
20
The likelihood

likelihoodprobability of data given the
parameters
likelihoods are useful for estimation and for
testing
example phase-known fully informative case
observed data Rno. of recombinations, NRno of
non-recomb.
parameter the recombination fraction
?Pr(recombination)
likelihood is proportional to ?R(1- ?)NR
maximum likelihood likelihood estimate
use the log of the likelihood for mathematical
convenience

21
Advantages of max. likelihood estimation

advantages
asymptotically most efficient,
high precision
asymptotically consistent
it will converge closer and closer to the true
value
asymptotically unbiased
corresponding likelihood ratio test enjoys
similar optimality criteria

22
How to compute lod scores?
Lod scores are computed for each pedigree (i)
as For a given value of ?, pedigree
-specific lod scores are summed across the F fam
ilies to yield an overall lod score

23
Example lod score calculation

PEDIGREE DRAWING Message disease status is n
ot required....
24
2 point parametric linkage analysis
25
2 point parametric linkage analysis

Setting
genotype of 1 marker locus is known for family
members
the genotypes of the other locus (disease
susceptibility locus) are unknown
but the disease locus phenotype (affectation
status) is known
GOAL
test whether the disease locus and marker are
linked
Q Why is it important?
A If they are linked, the disease locus must be
close to the marker, i.e. we have localized the
disease gene.

26
Test for linkage is carried out in 3 steps

Step 1 use the disease status to infer the
underlying disease locus genotypes
Step 2 count the number of recombinations and n
on-recombinations for the different possible
paternal phases Step 3 compute the lod score a
nd check whether it is bigger than 3.0
27
DATA for a single pedigree

rare, fully penetrant, dominant disease
Grandpa unaffected, 22, Grandma affected 11
father affected
28
Step 1-3

STEP 1
we assume that the disease locus carries 2
alleles
since the disease genotype is fully penetrant,
the genotypes of the unaffecteds must equal dd
the genotype of the grandma is Dd or DD. Since
the disase is rare, it is probably Dd.
thus we get the same pedigree as described
earlier
STEPs 2-3 were already carried out earlier.

29
Parameters in parametric linkage analysis
30
Glitch for non-Mendelian diseases

the relation between disease locus genotypes and
affectation status is in general very complex and
can no longer be solved by inspection
need powerful statistical and computation
methods
start with likelihood (easy to write down)
compute the likelihood (hard)

31
Most general form of the likelihood of pedigree
data

summation of j is over all founders (specify
allele frequencies)
product (k,l,m) is taken over all
parent-offspring triples.
transmission probabilities depend on ?
for multiple markers (multipoint analysis) need
to specify
a mapping function, e.g., Kosambi

32
Marker parameters

notation marker alleles denoted here by 1, 2,
.
relation between marker genotype and phenotype
usually known (example ABO blood group)
SNPs and microsatellites are codominantrelation
is trivial
allele frequencies p1,p2, .
if parents are unavailable, the results may
depend critically on getting them right. Also
homozygosity mapping.
vary between different populations
but can be estimated from the pedigree data
genetic marker map for multiple markers
marker order
genetic distance
increasingly accurate because of DNA sequencing

33
Disease locus parameters

notation often 2 alleles D (bad) and d (normal)
allele frequencies pD and pd
pentrancesP(affected/genotype)
fDDP(affected/genotype DD)
fDdP(affected/genotype Dd)
fddP(affected/genotype dd)
liability classes
fancy terminology for letting penetrances between
individuals
example different penetrances for men and
women,
or age dependence young versus old

34
The biology is modeled through penetrance values

fully penetrant, dominant disease, no
phenocopies
fDDfDd1, fdd0
fully penetrant, recessive disease, no
phenocopies
fDD1, fDdfdd0
no effect
fDDfDdfdd
incomplete penetrance fDD
definition phenocopies are affecteds without
disease genes
phenocopies are present if fdd0
for the experts imprinting is modeled by using 4
penetrances and keeping track of maternally and
paternally transmitted alleles

35
2-point versus multipoint linkage analysis
36
Two point mapping

computerized lod score analysis is best way to
analyze complex pedigrees for linkage with
mendelian traits
use computer software, e.g., Mendel
the result of a linkage analysis is a table of
lod scores at various recombination fractions
the result can be plotted to give curves,
region with lod3 are linked and those with
lod
the curve will peak at the most likely
recombination fraction

37
Output of a 2 point linkage analysis
significant
excluded

Equivalently, consider the table
? 0.01, 0.10, 0.20, 0.30, 0.35, 0.40, 0.45,
0.50
lod -5.0, -2.0, 1.0, 3.3, 4.0, 3.0,
1.0, 0.0

38
Multipoint mapping is more efficient than two
point mapping

idea analyze data for more than 2 loci
simultaneously
helps overcome limited informativeness of
markers
especially relevant for SNPs
peak heights depend crucially on the precise
distances between markers and the mapping
function-problematic
highest peak marks the most likely location
powerful method for scanning the genome in 20-Mb
segments

39
Standard lod score analysis is not without
problems

genotyping errors misdiagnosis- loss of power
lead to spurious recombinants - inflates the
length of the genetic map
multi-locus maps can detect such errors by
checking for double recombinants
locus heterogeneity is always a pitfall
mutations in unlinked loci may produce the same
clinical phenotype
use Genehunter of Homog to test for homogeneity
computational difficulties limit the pedigrees
that can be analyzed (na
not really....)

40
Comparing different multipoint linkage analysis
algorithms
41
Limitations of the different methods

Slide from webpage http//watson.hgen.pitt.edu/do
cs/simwalk2.html

42
Computation times of the algorithms.
General-Pedigree Linkage Analysis Packages

43
Critical values for linkage tests
44
Distinction between pointwise (nominal) and
genome-wide significance

pointwise p-valueprobability of exceeding
observed value at a given point, under H?1/2
genome-wide p-valueprob that the observed value
will be exceeded anywhere in the genome
reality check about p-values
if the p-value finding is significant
the smaller the p-value, the higher the
statistical significance
genome-wide p-valuepointwise p value

45
Lod score thresholds should ensure a .05
genomwide false positive rate

genomwide false positive rate alphachance of a
false positive result occurring anywhere during a
whole genome scan
for single point, classically want lod 3.0
multipoint threshold for a Mendelian disease 3.3
Lander Schork 1994
multipoint threshold for a complex disease
3.3-4.0 (depends on the study design, Lander and
Kruglyak 1995)
pointwise p value for significant linkage
510(-5)

46
How to relate the pointwise (?P) to the
genome-wide false positive rate (?G).

conservative Bonferroni correction
?P ?G/(no of potential pointwise tests)
Example no. of potential pointwise testsno of
potential SNPs1 million, ?G.05 ?P
510(-8)
ignores dependencies (linkage) between markers
Lander and Kruglyak 1995 found the asymptotic
relation
?G(T) C9.2?GT?P(T)
Tthreshold lod score
Cnumber of chromosomes23
?crossover rate, depends on relationship being
studied, e.g., sibs
Glength of the genome in Morgans33
for sibpairs use 3.6 for IBD testing and 4.0 for
IBS testing