Title: Linkage analysis
1Linkage analysis
6
Chapter
2Aims
- Interprete microsatellite results
- Add genotypes to pedigrees
- Create pedigree and genotype files
- Calculate and interprete LOD-scores
- Delineate linkage intervals
-
- Basic principles of linkage analysis
- Analyze other types of markers
- Association studies
- Learn how to work with specific pedigree programs
3Starting linkage analysis
4Preparations
- Clearly define the phenotype
- If not specific enough than you may analyze
different disorders that can map to different
genomic loci - Find suitable families
- larger is better
- more patients is better
- Collect genomic DNA from as much family members
as possible - Determine the type of inheritance
- Calculate the power to proove linkage with the
available material (SLink not part of this
course)
5Linkage analysis types
- Directed linkage analysis
- Evaluate linkage at a specific locus such as a
candidate gene - Common approach evaluate an intragenic, 5 and
3 marker - Genome wide linkage analysis
- Screen for linkage for markers spread across the
entire genome - Microsatellites 400 markers spaced at about
10cM - SNPs 500k SNP array
- Homozygosity mapping
- Screen only affected individuals in inbred
families - Select homozygous markers
- Very efficient technology
6Exercise Part 1
- 2 inbred families with a recessive disorder
- With a homozygosity mapping based on 500k SNP
arrays 2 candidate regions could be identified
- Chromosome 4
- Patient 1 homozygous for
- 6.052Mb - 14.488Mb
- 21.008Mb 37.477Mb
- Patient 2 homozygous for
- 11.186Mb 37.219Mb
- Task find microsatellite markers to confirm
linkage
7Find additional flanking markers
- Find physical position of marker in NCBI gt UniSTS
- NCBI map viewer http//www.ncbi.nlm.nih.gov/mapvi
ew/ - Go to Homo sapiens and to the right chromosome
- Maps options show
- DeCode, Généthon Marshfield (genetic maps)
- Genes
- Set region e.g. 2Mb up- and downstream of your
marker - Click Data as table view
- Click on STS behind a marker to see its details
- Select markers that
- locate to only 1 genomic location
- have a PCR product with an extended size
rangeone size ? not polymorphic
8Exercise Part 1 gt possible solution
- Markers in 1st candidate region
- D4S3017 (21.078Mb)
- D4S3044 (25.189Mb)
- D4S1618 (33.857Mb)
- D4S3350 (33.857Mb)
- D4S2988 (36.889Mb)
- Markers in 2nd candidate region
- D4S1582 (10.311Mb)
- D4S2906 (12.321Mb)
- D4S2944 (13.141Mb)
- D4S1602 (14.059Mb)
- D4S2960 (15.437Mb)
- ? Order primers analyze them on all family
members
9Analyzing microsatellite data
10Microsatellites gt basics
- Repeats of short sequences (e.g.
2bp)NNNNAC(AC)nACNNNN - Number of repeats is variable (instable sequence)
- Number of repeats determines the allele
- Number of repeats corresponds to specific length
of PCR product - allel 1 NNNNACACACACACNNNN (5AC ? 18bp)
- allel 2 NNNNACACACACACACNNNN (6AC ? 20bp)
- allel 3 NNNNACACACACACACACNNNN (7AC ? 22bp)
- ...
- Determine length to know the allele (sequencer)
11Microsatellites gt basics
12Microsatellites gt determine size
- Use internal size standard (other color)
230bp
220bp
225bp
13Microsatellites gt heterozygotes
230bp
220bp
225bp
223bp
14Microsatellites gt stutter peaks
- Repeats are difficult to copy ? polymerase slips
- Some amplicons have 1 repeat lessa few even
loose multiple repeats - Small repeats are more prone to slippage and show
more pronounced stutter peaks - Largest product is the correct one
- Distance between peaks length of a repeat
15Microsatellites gt stutter peaks
allelic peak
1st stutter peak
2nd stutter peak
16Microsatellites gt stutter peaks
- Allelic peaks are the heighest
- Stutter peaks are lower
A1
A2
17Microsatellites gt stutter peaks
A1
A2
18Microsatellites gt A peaks
- Taq polymerase tends to add an extra A at the 3
end - Variable degree of products with or without this
extra A - Do not confuse with stutter peaks (only 1bp
difference)
allelic peak
allelic peak A
1st stutter peak
1st stutter peak A
2nd stutter peak
2nd stutter peak A
19Microsatellites gt complex plots (stutter A)
A1
A2
20Microsatellites gt mutliplex
- Combine multiple markers in a single analysis
() - Different size range
- Multicolor
- Commercial kits e.g. 16 markers / lane
21Microsatellite plots examples
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Genotyping pedigrees
31Genotyping pedigrees
- Screen one or multiple markers for some or all
family members - For every marker
- Make a list of all occuring allele sizes
- Due to technical variation on sizing the same
allele can have a slightly different size in
different measurements (-0.4bp _ 0.4bp). Give
all alleles within this range the same allele
number - Add the allele numbers to the pedigree at the
corresponding individual/marker combination - Find the wright phase
- Advanced software like GeneMapper can generate
tables with allele numbers for every sample /
marker - Advanced pedigree programs like Progeny can store
genotype information for family members - Verify inheritance
32Exercise Part 2
- Genotype 3 markers in all available individuals
of 2 families - Pedigrees microsatellite plots
inExercisePart2-GenotypingData.pdf - Add allele numbers for the 3 markers to the
pedigree - Interprete the genotyped pedigrees linked?
33Family 1
34Family 2
35Exercise Part 2 gt Conclusions
- D4S1582
- Mendelian error ? can not be interpreted
- D4S2944
- Linked
- D4S3017
- Not-linked unaffected individuals with the same
genotype as a patient
36Calculate LOD scores
37EasyLinkage
- EasyLinkage UI for linkage analysis
- http//genetik.charite.de/hoffmann/easyLINKAGE/ind
ex.htmlstart - Bioinformatics. 2005 Feb 121(3)405-7 PMID
15347576 - Bioinformatics. 2005 Sep 121(17)3565-7 PMID
16014370 - Interface for many linkage analysis programs
- Input
- Pedigree file (linkage format)
- Genotype file(s)
- Marker information (already provided for popular
markers) - Settings
38Pedigree file
- Naming requirements for EasyLinkagep_xxx.pro ?
e.g. p_SMMD.pro - Format
- Tab delimited text file
- 1 individual per row
- Columns
- 1 ? family ID
- 2 ? person ID
- 3 ? father ID
- 4 ? mother ID
- 5 ? sex (1male, 2female, 0unknown)
- 6 ? affection status (1unaffected, 2affected,
0unknown) - 7 ? DNA availability (optional, relevant for
power calculations) - 8 ? liability class (to be provided if multiple
liability classes are used)
39Genotype files
- Person IDs have to match exactly with those
provided in the pedigree file - Naming requirements for EasyLinkageMarkerName_xx
x.abi ? e.g. D1S1609_SMMD.abi - Format
- Tab delimited text file
- 1 individual per row
- Columns (for microsatellite based analysis)
- 1 ? marker (same as in file name and matching a
marker in an available marker set) - 2 ? custom information (content doesnt matter,
but column must be present) - 3 ? individual ID (match person ID in pedigree
file) - 4 5 ? genotypes for 2 alleles (unknown0)
40Marker information
- Contains information on the chromosome and
position of every marker - Already available for a number of commercial
SNP-arrays and for the microsatellite markers
from - Genethon
- Marshfield
- DeCode
- Custom marker sets can be created (see manual)
41EasyLinkage settings
- Choose a program
- FastLink ? Parametric, single-point
- SuperLink ? Parametric, single-/multipoint
- SPLink ? Nonparametric, single-point
- Genehunter ? Nonpara-/parametric,
single-/multipoint - Genehunter Plus ? Nonpara-/parametric,
single-/multipoint - Genehunter MOD ? Nonpara-/parametric,
single-/multipoint - Genehunter Imprinting ? Nonpara-/parametric,
single-/multipoint - GeneHunter TwoLocus ? Parametric, two-locus,
single-/multipoint - Merlin ? Nonpara-/parametric, single-/multipoint
- SimWalk ? Nonparametric, single-/multipoint
- Allegro ? Nonpara-/parametric, single-/multipoint
simulation, single-/multi-point - PedCheck ? Mendelian error check
- FastSLink ? Simulation, single-/multi-point
42EasyLinkage settings
- Parametric lt-gt non-parametric
- Single point lt-gt multipoint
- Frequency of the disease allele
- Penetrance vectors (wt/wt, wt/mt, mt/mt)
- Standard dominant 0 1 1
- Standard recessive 0 0 1
- Reduced penetrance replace 1 by penetrance (e.g.
0.9) - Phenocopy replace 0 by percentage of phenocopy
(e.g. 0.1) - Example 0.01 0.9 0.991 chance to show a
similar phenotype despite a normal genotype90
chance to show the phenotype when 1 mutant allele
(dominant with incomplete penetrance)99
likelihood to present with the phenotype if both
alleles are mutant
43Evaluate calculated LOD-scores
- Maximum LOD-scores can be seen in EasyLinkage
- Details about LOD-scores at different
recombination fractions can be fount in text
files generated by EasyLinkage ? process in Excel
(generate graphs, ...) - Standard rules for LOD-scores
- gt3 ? significant linkage
- 2ltLODlt3 ? suggestive linkage
- -2ltLODlt2 ? uninformative
- lt-2 ? significant absence of linkage
44Interpreting LOD plots
45Exercise Part 3
- Generate one pedigree file containing all family
members of both families (use Global IDs) - Generate a genotype file for each of the tested
markers - Run SuperLink analysis with the right settings
- Evaluate results
46Exercise Part 3 gt Results
47Strengthen the evidence
- Analyze more family members
- Analyze more families
- Analyze flanking markers
- Look for more informative markers that result in
higher LOD-scores - A series of flanking markers allow for multipoint
linkage analysis - A series of linked markers gives more confidence
(subjective) - Flanking markers can also be used to fine-map the
linkage interval
48Determine the linkage interval
49Exercise 2 find the linkage interval
50Post linkage
- Create a list of all the genes within the linkage
interval - NCBI map viewer
- UCSC (also for non-coding RNAs)
- Evaluate known gene functions for relevance to
the investigated phenotype - Sequence genes
- Start with those that seem the most relevant to
the disorder - Start with the coding regions
- Finding a mutation and proving its causality is
the ultimate proof