Title: Genetic analysis of complex traits
1Genetic analysis of complex traits
- Studying inheritance of traits that show no clear
Mendelian inheritance but cluster in families - Suggests there is a genetic component
- Usually a mix of genetic and environmental
factors - Several models
- Single gene modified by environment
- Several genes each with significant contributions
(oligogenes) - Many genes each making a small contribution
(polygenes) - Last two- multifactorial inheritance
2Problems of complex diseases
- Identification of involved genes hampered by
- genetic heterogeneity
- Alleles at more than one locus can trigger a
specific disease - reduced penetrance
- Individuals with predisposing genotype are
unaffected - phenocopy
- Disease is triggered by environmental factors in
the absence of a predisposing genotype
3Mapping using a parametric model
- In monogenic disorders, mapping was achieved
through pedigree analysis in which linkage is
sought between markers and the disease gene using
LOD score analysis to combine data from multiple
families - In complex diseases, often there is a major locus
that segregates in a Mendelian fashion (often
with incomplete penetrance) - In some families, the disease shows such an
inheritance and can be mapped with LOD score
analysis
4LOD (Log ratio of odds) scores
- Used to overcome not having large numbers of
progeny to accurately measure recombination
frequency - Tests a genetic model which states that the
disease locus and a genetic marker are linked
with a recombination fraction of ? and which
requires that parameters are specified - Dominant or recessive
- Degree of penetrance
- Allele frequencies
- Logarithm of the ratio of odds (Z) of the
observed outcome if two loci are linked with a
recombination fraction of ? to the odds of the
observed outcome if they are unlinked (? 0.5) - Zx odds of observed results if ? x odds
of observed results if ? 0.5 - LOD ? x log 10 Zx
5LOD scores
- Calculation can be repeated using different
values of ? (ie. different distances between gene
and marker) - Result is a numerical value that measures the
odds that two loci are linked at a given
recombination fraction (ie. distance), compared
with chances they are unlinked - Threshold for declaring linkage is a LOD score of
3 or more (ie. Z 10001) (translation- if there
is a 10001 odds that the locus is linked with a
certain distance (?) versus being unlinked) - Unlikely that one family would give a LOD 3
can combine data from a number of families - MLS maximum likelihood score
- LOD score for the most likely of a series of
alternatives - Model with highest LOD score is judged most
likely to be correct - Since parameters have to be specified in these
calculations, analysis is called parametric
6example of LOD score analysis
- Are A and B linked?
- All children inherited aB from II-2
- Other chromosome came from II-1
- If ? 0 (totally linked), then for each child,
chance of observed result is 0.5 (since it gets
one of two chromosomes from II-1) - For the 3 children, odds of observed results are
(0.5)3 0.125 - If ? 0.5 (loci unlinked), then there is 0.25
chance of the observed genotype (independent
assortment) odds are (0.25)3 0.0156
7LOD score analysis
- Zx odds of observed results if ? x odds
of observed results if ? 0.5 - Z ? 0 0.125 8 0.0156
- LOD ? 0 log 10 Zx 0.9
- try again using a different ? 0.25 (linked with
RF of 0.25) - Now odds of observed results are
- 0.5 x 0.75 0.375? for family (0.375)3 0.053
- Why 0.75? Since odds of inheriting
non-recombinant chromosome (as is seen in
progeny) is 0.75 and there are two possible
chromosomes that could be inherited - Z ? 0.25 0.053 3.39 0.0156
- LOD ? 0.25 log 10 Zx 0.53
8LOD score analysis
- So if we compare the two scenarios, with a ? 0,
which means the loci are totally linked , we get
a LOD score of 0.9 - With ? 0.25, which means the loci can be
separated by recombination 25 of the time, we
get a LOD score of 0.53 - Since ? 0 gives higher score, this is the MLS,
meaning loci are most likely linked, although LOD
lt 3 makes it not statistically significant - Since LOD scores are logarithmic, they can be
added, so 4 similar families would combine to
give a score of 3.6 at ? 0.
9Parametric models have mapped disease genes
- BRCA1
- Analyzed 23 families that showed clustering,
indicating a familial mode of inheritance - Linkage to a specific marker with a LOD 2.35
- When age of onset was considered
- analysis was restricted to 7 families with age of
onset lt 45 - Gave LOD score of 5.98
- Alzheimer disease
- Hereditary non-polyposis colorectal cancer
- Familial adenomatous polyposis
- Non-insulin dependent diabetes
10Psychiatric disorders
- Schizophrenia and manic depression show
clustering but not Mendelian inheritance - Attempts to map traits have not been successful
- Many reports that could not be replicated
- problems encountered
- Assessment of phenotype is variable
- Genetic heterogeneity
- LOD scores are very sensitive to status of a few
key individuals in families- if their phenotype
changes, affects outcome of analysis greatly
11Multifactorial inheritance
- Those that involve two or more genes and a strong
environmental influence - Continuous variation
- Controlled by polygenic traits
- each involved gene contributes additively to
phenotype - Phenotypic expression of multifactorial traits
varies widely due to gene interactions and
environmental factors - Polygenic traits
- Height
- Weight
- Skin colour
12Multifactorial inheritance
- Other polygenic traits
- Neural tube defects
- Cleft palate
- Clubfoot
- Diabetes
- Hypertension
- Behavioural disorders
- Distributions of phenotypes in F2 varies
depending on 2,3, or 4 genes involved - As number of loci increases, the number of
phenotypic classes increases, and the less
phenotypic variation between classes - Environmental influence smoothes out variation
between genotypes even more
13Model for inheritance of height
- Trait controlled by 3 genes with 2 alleles (A, a,
B, b, C, c) - Each dominant allele contributes equally to
phenotype and recessive alleles make no
contribution - The effect of each dominant allele is additive
- Genes are not linked
- Assume
- base height of 5 feet
- Each dominant allele adds 3 inches
- aabbcc individual is 5 AABBCC individual is
66 - If environment affects all people the same, the
individuals with 3 dominant alleles (the most
frequent genotype) would be the average height of
59 - Genotype reflects genetic potential for height-
environment (nutrition) will affect the full
expression of the genotype
14Environmental effects
- 3 genes give rise to 7 genotypic classes
- Environment blurs the distinction- actually see
continuous variation
15Liability
- take height example and apply to disease state
- If genetic risk is modified by environment-
susceptibility (or liability) is normally
distributed - How many pre-disposing alleles are present? If
three bi-allelic loci are involved, then 0 - 6 - Some traits do not show continuous variation in
phenotypes ? either affected or not - depends on threshold
16Linkage and association
- Linkage studies use individual families where
members are affected and attempt to demonstrate
linkage between the occurrence of the disease and
genetic markers (creates associations within
families, but not among unrelated people) - Association studies are based on populations and
attempt to show an association between a
particular allele and susceptibility to disease
(a statistical statement about the co-occurrence
of alleles or phenotypes)
17Non-parametric linkage analysis
- parametric methods require a genetic model, and
are only useful for complex traits with a single
genetic component - model-free or non-parametric linkage analysis
- ignores unaffected people
- look for alleles or chromosomal segments that are
shared between affected people (within families
or in whole populations association studies) - if a gene contributes to the disease, then the
genomic region will be co-inherited from a common
ancestor by members of an affected pedigree more
often than would be expected by chance
18Genome scan
- follows the inheritance of of polymorphic
micro-satellite markers in members of a pedigree - if affected members co-inherit the same allele
more often than expected by chance, then that
genomic region may contain a gene that
contributes to susceptibility - usually based on affected sibling pairs
- involves genotyping markers evenly spread
throughout genome - sibs are expected to share 0, 1 or 2 alleles at
an expected ratio of 25, 50 and 25
19IBS and IBD
- need to distinguish DNA segments that are
identical by descent (IBD) versus identical by
state (IBS) - IBS alleles look the same but are not derived
from a common known ancestor - IBD are copies of the same ancestral (usually
parental) allele - analysis most informative if there are multiple
alleles or if there is a multi-locus multi-allele
haplotype - IBD studies require parental samples
20IBS versus IBD
both share allele A1 IBS IBD only
obvious if parental genotype is known
21sib pair analysis
Shares 2
Shares 1
Shares 1
Shares 0
- share 1 (1/4), 2 (1/2) or 0 (1/4) parental
haplotypes by random segregation
- pairs of sibs affected by dominant condition
share 1 or 2 haplotypes
- pairs of sibs affected by recessive condition
share both parental haplotypes for affected region
22Genome scanning- to identify commonly inherited
areas
- assemble families containing 2 affected sibs
- DNA samples from sibs and parents are collected
- microsatellites are genotyped using PCR with
fluorescent tagged primers
23Genome scanning cont
- Can analyze 18 loci in one lane and up to 500 on
the same gel - initial scan will cover whole genome in 10 cM
intervals - each locus will have two alleles distinguished by
length of PCR product - can determine parental and sib genotypes
- determine which alleles are IBD and how these
vary from what is predicted based on allele
frequencies - big question- how do you decide whether excessive
allele sharing is statistically significant? - use LOD score analysis with lower MLS 1 as
starting point - try linkage disequilibrium studies
24Association studies
- statistical statement about the co-occurrence of
alleles or phenotypes - allele A is associated with disease D if people
who have D also have A more (or less) often than
would be predicted from individual frequencies of
A and D in the population - eg. HLA-DR4 is found in 36 of UK population, but
in 78 of people with rheumatoid arthritis - population associations depend on population
history - in the UK, two unrelated people share a common
ancestor 22 generations ago (500 years)(44
meioses) - if they inherit a disease susceptibility allele
from their common ancestor, then during the many
meioses, recombination will have reduced the
shared segment to a small region. Only tightly
linked loci will still be shared
25Linkage disequilibrium
- non-random association in a population of
alleles at two closely linked loci (so one allele
closely linked to disease) - based on having a common ancestor
- alleles that are closely linked will be commonly
inherited - but, in time, disequilibrium will disappear due
to recombination (ie. Allele frequencies will
equalize) - if two alleles 1 Mb apart are in disequilibrium,
then in 70 generations the disequilibrium will
decay by 50 - more distantly located alleles will decay faster-
get a gradient of disequilibrium, with highest
value being closest to gene - L.D. influenced by populations- effect is
greatest in small, homogeneous populations
(greatest chance of founder effects) (Finland),
smallest in large heterogeneous populations (USA)
26Other reasons for associations
- direct causation
- having allele A makes you susceptible to disease
D (increases the likelihood) - expect to see same allele A associated with
disease in any population (bypasses common
ancestor) - natural selection
- people with disease may have a competitive
advantage if they also carry allele A - these are unlikely if the associated DNA is a
variant in non-coding DNA - studies in ethnically diverse populations are
useful to distinguish between these causes and
L.D.
27Transmission disequilibrium test (TDT)
- tests done to check the results of an association
study - confirm whether a parent heterozygous for an
associated and a non-associated allele transmits
the associated allele more often to affected
offspring - starts with couples with more than one affected
offspring - select parent that is heterozygous for marker M1
- test compares number of such parents who transmit
the M1 allele to their affected offspring versus
transmitting other allele - can be used if only one parent is available
- fundamentally a test of association, not linkage