Title: Linkage analysis
1Linkage analysis
6
2Finding causal mutations
- 2 opposing strategies
- sequence then select
- select then sequence
- Sequencing
- traditional Sanger sequencing only possible after
selection - Massively parallel sequencing possible prior to
or after selection - RNA sequencing
- exome sequencing
- genome sequencing
3Finding causal mutations
- Selection
- positional (prior to sequencing)
- linkage analysis
- GWAS
- structural variations (e.g. microdeletions)
- functional (prior to after sequencing)
- candidate genes selected based on known function
or involvement in related disorders - filtering of variants based on functional
predictions - overlap (after sequencing)
- looking for genes / variants that occur in
multiple independent patients - mostly a combination is used
4exome sequencing
5Aims
- Interprete microsatellite results
- Add genotypes to pedigrees
- Create pedigree and genotype files
- Calculate and interprete LOD-scores
- Delineate linkage intervals
-
- Basic principles of linkage analysis
- Analyze other types of markers
- Association studies
- Learn how to work with specific pedigree programs
6Starting linkage analysis
7Preparations
- Clearly define the phenotype
- If not specific enough than you may analyze
different disorders that can map to different
genomic loci - LOD scores are additive
- Find suitable families
- larger is better
- more patients is better
- Collect genomic DNA from as much family members
as possible - Determine the type of inheritance
- Calculate the power to prove linkage with the
available material (SLink not part of this
course)
8Linkage analysis types
- Directed linkage analysis
- Evaluate linkage at a specific locus such as a
candidate gene - Common approach evaluate an intragenic, 5 and
3 markeroften microsattelites - Genome wide linkage analysis
- Screen for linkage for markers spread across the
entire genome - Microsatellites 400 markers spaced at about
10cM - SNPs 500k SNP array
- Homozygosity mapping
- Screen only affected individuals in inbred
families - Select homozygous markers (typically SNP markers)
- Very efficient technology
- Fine mapping
- Some linked markers are known, but the borders of
the linkage interval still need to be defined
9Exercise Part 1
- 2 inbred families with a recessive disorder
- With a homozygosity mapping based on 500k SNP
arrays 2 candidate regions could be identified
- Chromosome 4
- Patient 1 homozygous for
- 6.052Mb - 14.488Mb
- 21.008Mb 37.477Mb
- Patient 2 homozygous for
- 11.186Mb 37.219Mb
- Task find microsatellite markers to confirm
linkage
10Find additional flanking markers
- Find physical position of marker in NCBI gt UniSTS
- NCBI map viewer http//www.ncbi.nlm.nih.gov/mapvi
ew/ - Go to Homo sapiens and to the wright chromosome
- Maps options show
- DeCode, Généthon Marshfield (genetic maps)
- Genes
- Set region e.g. 2Mb up- and downstream of your
marker - Click Data as table view
- Click on STS behind a marker to see its details
- Select markers that
- locate to only 1 genomic location
- have a PCR product with an extended size
rangeone size ? not polymorphic
11http//www.ncbi.nlm.nih.gov/projects/mapview
12http//www.ncbi.nlm.nih.gov/projects/mapview
13http//www.ncbi.nlm.nih.gov/projects/mapview
14Exercise Part 1 gt possible solution
- Markers in 1st candidate region
- D4S3017 (21.078Mb)
- D4S3044 (25.189Mb)
- D4S1618 (33.857Mb)
- D4S3350 (33.857Mb)
- D4S2988 (36.889Mb)
- Markers in 2nd candidate region
- D4S1582 (10.311Mb)
- D4S2906 (12.321Mb)
- D4S2944 (13.141Mb)
- D4S1602 (14.059Mb)
- D4S2960 (15.437Mb)
- ? Order primers analyze them on all family
members
15Analyzing microsatellite data
16Microsatellites gt basics
- Repeats of short sequences (e.g.
2bp)NNNNAC(AC)nACNNNN - Number of repeats is variable (instable sequence)
- Number of repeats determines the allele
- Number of repeats corresponds to specific length
of PCR product - allel 1 NNNNACACACACACNNNN (5AC ? 18bp)
- allel 2 NNNNACACACACACACNNNN (6AC ? 20bp)
- allel 3 NNNNACACACACACACACNNNN (7AC ? 22bp)
- ...
- Determine length to know the allele (sequencer)
17Microsatellites gt basics
18Microsatellites gt determine size
- Use internal size standard (other color)
230bp
220bp
225bp
19Microsatellites gt heterozygotes
230bp
220bp
225bp
223bp
20Microsatellites gt stutter peaks
- Repeats are difficult to copy ? polymerase slips
- Some amplicons have 1 repeat lessa few even
loose multiple repeats - Small repeats are more prone to slippage and show
more pronounced stutter peaks - Largest product is the correct one
- Distance between peaks length of a repeat
21Microsatellites gt stutter peaks
allelic peak
1st stutter peak
2nd stutter peak
22Microsatellites gt stutter peaks
- Allelic peaks are the heighest
- Stutter peaks are lower
A1
A2
23Microsatellites gt stutter peaks
A1
A2
24Microsatellites gt A peaks
- Taq polymerase tends to add an extra A at the 3
end - Variable degree of products with or without this
extra A - Do not confuse with stutter peaks (only 1bp
difference)
allelic peak
allelic peak A
1st stutter peak
1st stutter peak A
2nd stutter peak
2nd stutter peak A
25Microsatellites gt complex plots (stutter A)
A1
A2
26Microsatellites gt mutliplex
- Combine multiple markers in a single analysis
() - Different size range
- Multicolor
- Commercial kits e.g. 16 markers / lane
27Microsatellite plots examples
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36Genotyping pedigrees
37Genotyping pedigrees
- Screen one or multiple markers for some or all
family members - For every marker
- Make a list of all occuring allele sizes
- Due to technical variation on sizing the same
allele can have a slightly different size in
different measurements (-0.4bp _ 0.4bp). Give
all alleles within this range the same allele
number - Add the allele numbers to the pedigree at the
corresponding individual/marker combination - Find the wright phase
- Advanced software like GeneMapper can generate
tables with allele numbers for every sample /
marker - Advanced pedigree programs like Progeny can store
genotype information for family members - Verify inheritance
38Exercise Part 2
- Genotype 3 markers in all available individuals
of 2 families - Pedigrees microsatellite plots
inExercisePart2-GenotypingData.pdf - Add allele numbers for the 3 markers to the
pedigree - Interprete the genotyped pedigrees linked?
39Family 1
40Family 2
41Exercise Part 2 gt Conclusions
- D4S1582
- Mendelian error ? can not be interpreted
- D4S2944
- Linked
- D4S3017
- Not-linked unaffected individuals with the same
genotype as a patient
42Calculate LOD scores
43EasyLinkage
- EasyLinkage UI for linkage analysis
- http//genetik.charite.de/hoffmann/easyLINKAGE/ind
ex.htmlstart - Bioinformatics. 2005 Feb 121(3)405-7 PMID
15347576 - Bioinformatics. 2005 Sep 121(17)3565-7 PMID
16014370 - Interface for many linkage analysis programs
- Input
- Pedigree file (linkage format)
- Genotype file(s)
- Marker information (already provided for popular
markers) - Settings
44Pedigree file
- Naming requirements for EasyLinkagep_xxx.pro ?
e.g. p_SMMD.pro - Format
- Tab delimited text file
- 1 individual per row
- Columns
- 1 ? family ID
- 2 ? person ID
- 3 ? father ID
- 4 ? mother ID
- 5 ? sex (1male, 2female, 0unknown)
- 6 ? affection status (1unaffected, 2affected,
0unknown) - 7 ? DNA availability (optional, relevant for
power calculations) - 8 ? liability class (to be provided if multiple
liability classes are used)
45Genotype files
- Person IDs have to match exactly with those
provided in the pedigree file - Naming requirements for EasyLinkageMarkerName_xx
x.abi ? e.g. D1S1609_SMMD.abi - Format
- Tab delimited text file
- 1 individual per row
- Columns (for microsatellite based analysis)
- 1 ? marker (same as in file name and matching a
marker in an available marker set) - 2 ? custom information (content doesnt matter,
but column must be present) - 3 ? individual ID (match person ID in pedigree
file) - 4 5 ? genotypes for 2 alleles (unknown0)
46Marker information
- Contains information on the chromosome and
position of every marker - Already available for a number of commercial
SNP-arrays and for the microsatellite markers
from - Genethon
- Marshfield
- DeCode
- Custom marker sets can be created (see manual)
47EasyLinkage settings
- Choose a program
- FastLink ? Parametric, single-point
- SuperLink ? Parametric, single-/multipoint
- SPLink ? Nonparametric, single-point
- Genehunter ? Nonpara-/parametric,
single-/multipoint - Genehunter Plus ? Nonpara-/parametric,
single-/multipoint - Genehunter MOD ? Nonpara-/parametric,
single-/multipoint - Genehunter Imprinting ? Nonpara-/parametric,
single-/multipoint - GeneHunter TwoLocus ? Parametric, two-locus,
single-/multipoint - Merlin ? Nonpara-/parametric, single-/multipoint
- SimWalk ? Nonparametric, single-/multipoint
- Allegro ? Nonpara-/parametric, single-/multipoint
simulation, single-/multi-point - PedCheck ? Mendelian error check
- FastSLink ? Simulation, single-/multi-point
48EasyLinkage settings
- Parametric lt-gt non-parametric
- Single point lt-gt multipoint
- Frequency of the disease allele
- Penetrance vectors (wt/wt, wt/mt, mt/mt)
- Standard dominant 0 1 1
- Standard recessive 0 0 1
- Reduced penetrance replace 1 by penetrance (e.g.
0.9) - Phenocopy replace 0 by percentage of phenocopy
(e.g. 0.1) - Example 0.01 0.9 0.991 chance to show a
similar phenotype despite a normal genotype90
chance to show the phenotype when 1 mutant allele
(dominant with incomplete penetrance)99
likelihood to present with the phenotype if both
alleles are mutant
49Evaluate calculated LOD-scores
- Maximum LOD-scores can be seen in EasyLinkage
- Details about LOD-scores at different
recombination fractions can be found in text
files generated by EasyLinkage ? process in Excel
(generate graphs, ...) - Standard rules for LOD-scores
- gt3 ? significant linkage
- 2ltLODlt3 ? suggestive linkage
- -2ltLODlt2 ? uninformative
- lt-2 ? significant absence of linkage
50Interpreting LOD plots
51Exercise Part 3
- Generate one pedigree file containing all family
members of both families (use Global IDs) - Generate a genotype file for each of the tested
markers - Run SuperLink analysis with the right settings
- Evaluate results
52Exercise Part 3 gt Results
53Strengthen the evidence
- Analyze more family members
- Analyze more families
- Analyze flanking markers
- Look for more informative markers that result in
higher LOD-scores - A series of flanking markers allows for
multipoint linkage analysis - A series of linked markers gives more confidence
(subjective) - Flanking markers can also be used to fine-map the
linkage interval
54Determine the linkage interval
55Exercise 2 find the linkage interval
56Post linkage
- Create a list of all the genes within the linkage
interval - NCBI map viewer
- UCSC (also for non-coding RNAs)
- Evaluate known gene functions for relevance to
the investigated phenotype - Sequence genes
- Start with those that seem the most relevant to
the disorder - Start with the coding regions
- Screen the entire region with capture sequencing
- Finding a mutation and proving its causality is
the ultimate proof