Linkage analysis - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Linkage analysis

Description:

... Sanger sequencing only possible after selection Massively parallel sequencing possible prior to or after selection RNA sequencing exome sequencing genome ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 57
Provided by: pet5193
Category:

less

Transcript and Presenter's Notes

Title: Linkage analysis


1
Linkage analysis
6
  • Jan Hellemans

2
Finding causal mutations
  • 2 opposing strategies
  • sequence then select
  • select then sequence
  • Sequencing
  • traditional Sanger sequencing only possible after
    selection
  • Massively parallel sequencing possible prior to
    or after selection
  • RNA sequencing
  • exome sequencing
  • genome sequencing

3
Finding causal mutations
  • Selection
  • positional (prior to sequencing)
  • linkage analysis
  • GWAS
  • structural variations (e.g. microdeletions)
  • functional (prior to after sequencing)
  • candidate genes selected based on known function
    or involvement in related disorders
  • filtering of variants based on functional
    predictions
  • overlap (after sequencing)
  • looking for genes / variants that occur in
    multiple independent patients
  • mostly a combination is used

4
exome sequencing
5
Aims
  • Interprete microsatellite results
  • Add genotypes to pedigrees
  • Create pedigree and genotype files
  • Calculate and interprete LOD-scores
  • Delineate linkage intervals
  • Basic principles of linkage analysis
  • Analyze other types of markers
  • Association studies
  • Learn how to work with specific pedigree programs

6
Starting linkage analysis
7
Preparations
  • Clearly define the phenotype
  • If not specific enough than you may analyze
    different disorders that can map to different
    genomic loci
  • LOD scores are additive
  • Find suitable families
  • larger is better
  • more patients is better
  • Collect genomic DNA from as much family members
    as possible
  • Determine the type of inheritance
  • Calculate the power to prove linkage with the
    available material (SLink not part of this
    course)

8
Linkage analysis types
  • Directed linkage analysis
  • Evaluate linkage at a specific locus such as a
    candidate gene
  • Common approach evaluate an intragenic, 5 and
    3 markeroften microsattelites
  • Genome wide linkage analysis
  • Screen for linkage for markers spread across the
    entire genome
  • Microsatellites 400 markers spaced at about
    10cM
  • SNPs 500k SNP array
  • Homozygosity mapping
  • Screen only affected individuals in inbred
    families
  • Select homozygous markers (typically SNP markers)
  • Very efficient technology
  • Fine mapping
  • Some linked markers are known, but the borders of
    the linkage interval still need to be defined

9
Exercise Part 1
  • 2 inbred families with a recessive disorder
  • With a homozygosity mapping based on 500k SNP
    arrays 2 candidate regions could be identified
  • Chromosome 4
  • Patient 1 homozygous for
  • 6.052Mb - 14.488Mb
  • 21.008Mb 37.477Mb
  • Patient 2 homozygous for
  • 11.186Mb 37.219Mb
  • Task find microsatellite markers to confirm
    linkage

10
Find additional flanking markers
  • Find physical position of marker in NCBI gt UniSTS
  • NCBI map viewer http//www.ncbi.nlm.nih.gov/mapvi
    ew/
  • Go to Homo sapiens and to the wright chromosome
  • Maps options show
  • DeCode, Généthon Marshfield (genetic maps)
  • Genes
  • Set region e.g. 2Mb up- and downstream of your
    marker
  • Click Data as table view
  • Click on STS behind a marker to see its details
  • Select markers that
  • locate to only 1 genomic location
  • have a PCR product with an extended size
    rangeone size ? not polymorphic

11
http//www.ncbi.nlm.nih.gov/projects/mapview
12
http//www.ncbi.nlm.nih.gov/projects/mapview
13
http//www.ncbi.nlm.nih.gov/projects/mapview
14
Exercise Part 1 gt possible solution
  • Markers in 1st candidate region
  • D4S3017 (21.078Mb)
  • D4S3044 (25.189Mb)
  • D4S1618 (33.857Mb)
  • D4S3350 (33.857Mb)
  • D4S2988 (36.889Mb)
  • Markers in 2nd candidate region
  • D4S1582 (10.311Mb)
  • D4S2906 (12.321Mb)
  • D4S2944 (13.141Mb)
  • D4S1602 (14.059Mb)
  • D4S2960 (15.437Mb)
  • ? Order primers analyze them on all family
    members

15
Analyzing microsatellite data
16
Microsatellites gt basics
  • Repeats of short sequences (e.g.
    2bp)NNNNAC(AC)nACNNNN
  • Number of repeats is variable (instable sequence)
  • Number of repeats determines the allele
  • Number of repeats corresponds to specific length
    of PCR product
  • allel 1 NNNNACACACACACNNNN (5AC ? 18bp)
  • allel 2 NNNNACACACACACACNNNN (6AC ? 20bp)
  • allel 3 NNNNACACACACACACACNNNN (7AC ? 22bp)
  • ...
  • Determine length to know the allele (sequencer)

17
Microsatellites gt basics
18
Microsatellites gt determine size
  • Use internal size standard (other color)

230bp
220bp
225bp
19
Microsatellites gt heterozygotes
230bp
220bp
225bp
223bp
20
Microsatellites gt stutter peaks
  • Repeats are difficult to copy ? polymerase slips
  • Some amplicons have 1 repeat lessa few even
    loose multiple repeats
  • Small repeats are more prone to slippage and show
    more pronounced stutter peaks
  • Largest product is the correct one
  • Distance between peaks length of a repeat

21
Microsatellites gt stutter peaks
allelic peak
1st stutter peak
2nd stutter peak
22
Microsatellites gt stutter peaks
  • Allelic peaks are the heighest
  • Stutter peaks are lower

A1
A2
23
Microsatellites gt stutter peaks
A1
A2
24
Microsatellites gt A peaks
  • Taq polymerase tends to add an extra A at the 3
    end
  • Variable degree of products with or without this
    extra A
  • Do not confuse with stutter peaks (only 1bp
    difference)

allelic peak
allelic peak A
1st stutter peak
1st stutter peak A
2nd stutter peak
2nd stutter peak A
25
Microsatellites gt complex plots (stutter A)
A1
A2
26
Microsatellites gt mutliplex
  • Combine multiple markers in a single analysis
    ()
  • Different size range
  • Multicolor
  • Commercial kits e.g. 16 markers / lane

27
Microsatellite plots examples
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Genotyping pedigrees
37
Genotyping pedigrees
  • Screen one or multiple markers for some or all
    family members
  • For every marker
  • Make a list of all occuring allele sizes
  • Due to technical variation on sizing the same
    allele can have a slightly different size in
    different measurements (-0.4bp _ 0.4bp). Give
    all alleles within this range the same allele
    number
  • Add the allele numbers to the pedigree at the
    corresponding individual/marker combination
  • Find the wright phase
  • Advanced software like GeneMapper can generate
    tables with allele numbers for every sample /
    marker
  • Advanced pedigree programs like Progeny can store
    genotype information for family members
  • Verify inheritance

38
Exercise Part 2
  • Genotype 3 markers in all available individuals
    of 2 families
  • Pedigrees microsatellite plots
    inExercisePart2-GenotypingData.pdf
  • Add allele numbers for the 3 markers to the
    pedigree
  • Interprete the genotyped pedigrees linked?

39
Family 1
40
Family 2
41
Exercise Part 2 gt Conclusions
  • D4S1582
  • Mendelian error ? can not be interpreted
  • D4S2944
  • Linked
  • D4S3017
  • Not-linked unaffected individuals with the same
    genotype as a patient

42
Calculate LOD scores
43
EasyLinkage
  • EasyLinkage UI for linkage analysis
  • http//genetik.charite.de/hoffmann/easyLINKAGE/ind
    ex.htmlstart
  • Bioinformatics. 2005 Feb 121(3)405-7 PMID
    15347576
  • Bioinformatics. 2005 Sep 121(17)3565-7 PMID
    16014370
  • Interface for many linkage analysis programs
  • Input
  • Pedigree file (linkage format)
  • Genotype file(s)
  • Marker information (already provided for popular
    markers)
  • Settings

44
Pedigree file
  • Naming requirements for EasyLinkagep_xxx.pro ?
    e.g. p_SMMD.pro
  • Format
  • Tab delimited text file
  • 1 individual per row
  • Columns
  • 1 ? family ID
  • 2 ? person ID
  • 3 ? father ID
  • 4 ? mother ID
  • 5 ? sex (1male, 2female, 0unknown)
  • 6 ? affection status (1unaffected, 2affected,
    0unknown)
  • 7 ? DNA availability (optional, relevant for
    power calculations)
  • 8 ? liability class (to be provided if multiple
    liability classes are used)

45
Genotype files
  • Person IDs have to match exactly with those
    provided in the pedigree file
  • Naming requirements for EasyLinkageMarkerName_xx
    x.abi ? e.g. D1S1609_SMMD.abi
  • Format
  • Tab delimited text file
  • 1 individual per row
  • Columns (for microsatellite based analysis)
  • 1 ? marker (same as in file name and matching a
    marker in an available marker set)
  • 2 ? custom information (content doesnt matter,
    but column must be present)
  • 3 ? individual ID (match person ID in pedigree
    file)
  • 4 5 ? genotypes for 2 alleles (unknown0)

46
Marker information
  • Contains information on the chromosome and
    position of every marker
  • Already available for a number of commercial
    SNP-arrays and for the microsatellite markers
    from
  • Genethon
  • Marshfield
  • DeCode
  • Custom marker sets can be created (see manual)

47
EasyLinkage settings
  • Choose a program
  • FastLink ? Parametric, single-point
  • SuperLink ? Parametric, single-/multipoint
  • SPLink ? Nonparametric, single-point
  • Genehunter ? Nonpara-/parametric,
    single-/multipoint
  • Genehunter Plus ? Nonpara-/parametric,
    single-/multipoint
  • Genehunter MOD ? Nonpara-/parametric,
    single-/multipoint
  • Genehunter Imprinting ? Nonpara-/parametric,
    single-/multipoint
  • GeneHunter TwoLocus ? Parametric, two-locus,
    single-/multipoint
  • Merlin ? Nonpara-/parametric, single-/multipoint
  • SimWalk ? Nonparametric, single-/multipoint
  • Allegro ? Nonpara-/parametric, single-/multipoint
    simulation, single-/multi-point
  • PedCheck ? Mendelian error check
  • FastSLink ? Simulation, single-/multi-point

48
EasyLinkage settings
  • Parametric lt-gt non-parametric
  • Single point lt-gt multipoint
  • Frequency of the disease allele
  • Penetrance vectors (wt/wt, wt/mt, mt/mt)
  • Standard dominant 0 1 1
  • Standard recessive 0 0 1
  • Reduced penetrance replace 1 by penetrance (e.g.
    0.9)
  • Phenocopy replace 0 by percentage of phenocopy
    (e.g. 0.1)
  • Example 0.01 0.9 0.991 chance to show a
    similar phenotype despite a normal genotype90
    chance to show the phenotype when 1 mutant allele
    (dominant with incomplete penetrance)99
    likelihood to present with the phenotype if both
    alleles are mutant

49
Evaluate calculated LOD-scores
  • Maximum LOD-scores can be seen in EasyLinkage
  • Details about LOD-scores at different
    recombination fractions can be found in text
    files generated by EasyLinkage ? process in Excel
    (generate graphs, ...)
  • Standard rules for LOD-scores
  • gt3 ? significant linkage
  • 2ltLODlt3 ? suggestive linkage
  • -2ltLODlt2 ? uninformative
  • lt-2 ? significant absence of linkage

50
Interpreting LOD plots
51
Exercise Part 3
  • Generate one pedigree file containing all family
    members of both families (use Global IDs)
  • Generate a genotype file for each of the tested
    markers
  • Run SuperLink analysis with the right settings
  • Evaluate results

52
Exercise Part 3 gt Results
53
Strengthen the evidence
  • Analyze more family members
  • Analyze more families
  • Analyze flanking markers
  • Look for more informative markers that result in
    higher LOD-scores
  • A series of flanking markers allows for
    multipoint linkage analysis
  • A series of linked markers gives more confidence
    (subjective)
  • Flanking markers can also be used to fine-map the
    linkage interval

54
Determine the linkage interval
55
Exercise 2 find the linkage interval
56
Post linkage
  • Create a list of all the genes within the linkage
    interval
  • NCBI map viewer
  • UCSC (also for non-coding RNAs)
  • Evaluate known gene functions for relevance to
    the investigated phenotype
  • Sequence genes
  • Start with those that seem the most relevant to
    the disorder
  • Start with the coding regions
  • Screen the entire region with capture sequencing
  • Finding a mutation and proving its causality is
    the ultimate proof
Write a Comment
User Comments (0)
About PowerShow.com