Title: MOLECULAR MARKER TECHNOLOGIES
1MOLECULAR MARKER TECHNOLOGIES
- Training Workshop on Forest Biodiversity
- 5-16 June 2006
Lee Soon Leong Forest Research Institute Malaysia
2Outlines
- Organization and flow of genetic information
- Molecular techniques to reveal genetic variation
- Type of molecular markers
- Which marker for what purpose
- Microsatellite marker
- Case study 1 using microsatellites to estimate
gene flow via pollen - Case study 2 using microsatellites for
individual-specific DNA fingerprints
3FLOW OF GENETIC INFORMATION
4Deoxyribonucleic Acid (DNA) The molecule that
encodes genetic information
A pairs with T C pairs with G
DNA molecule consists of two strands that wrap
around each other to resemble a twisted ladder
5(No Transcript)
6- Organisms genomic DNAs are subjected to mutation
as a result of normal cellular operations or
interactions with environment
7- Mutations in genomic DNA can be classified into
several categories
8Through long evolutionary accumulation, many
different instances of mutation as mentioned
above should exist in any given species
The number and degree of the various types of
mutations define the genetic diversity within a
species
It has been widely recognized that loss of
genetic diversity is a major threat for the
maintenance and adaptive potential of species
9(No Transcript)
10- For many plant species, ex situ and in situ
conservation strategies have been developed to
safeguard the extant of genetic diversity
- To manage this genetic diversity effectively the
ability to identify genetic diversity is
indispensable
- In addition, for this variation to be useful, it
must be heritable and discernable as
recognizable phenotypic variation or as genetic
mutation distinguishable through molecular marker
technologies
11Definition of molecular markers
A sequence of DNA or protein that can be screened
to reveal key attributes of its state or
composition and thus used to reveal genetic
variation
12- Four major molecular techniques are commonly
applied to reveal genetic variation. These are
- Polymerase chain reaction (PCR)
- Electrophoresis
- Hybridization
- DNA sequencing
13POLYMERASE CHAIN REACTION
The method was invented by Kary Banks Mullis in
1983, for which he received the Nobel Prize in
Chemistry ten years later
three temperature-controlled steps
14ELECTROPHORESIS
Technique for separating the components of a
mixture of charged molecules (proteins, DNAs, or
RNAs) in an electric field within a gel or other
support
Migration rate depend on electrical charge and
size
15HYBRIDIZATION
One of the most commonly used nucleic acid
hybridization techniques is Southern blot
hybridization
Southern blotting was named after Edward M.
Southern who developed this procedure at
Edinburgh University in the 1975
16SEQUENCING
In 1977, 24 years after the discovery of the
structure of DNA, two separate methods for
sequencing DNA were developed chain termination
method and chemical degradation method
17Recent detection techniques
TaqMan a probe used to detect specific
sequences in PCR products by employing 5 to 3
exonuclease activity of the Taq DNA polymerase
Pyrosequencing refers to sequencing by
synthesis, a simple to use technique for accurate
analysis of DNA sequences
Microarray Technology a high throughput
screening technique based on the hybridization
between oligonucleotide probes (genomic DNA or
cDNA) and either DNA or mRNA
18TYPES OF MOLECULAR MARKERS
- Due to rapid developments in the field of
molecular genetics, a variety of molecular
markers has emerged during the last few decades
19Allozyme (biochemical marker)
- The alternative forms of a particular protein
visualized on a gel as bands of different
mobility. Polymorphism due to mutation an amino
acid has been replaced, the net electric charge
of the protein may have been altered
Technique Electrophoresis and enzyme staining
20RFLP (Non-PCR based marker)
- Targets variation in DNA restriction sites and in
DNA restriction fragments. Sequence variation
affecting the occurrence (absence or presence) of
endonuclease recognition sites is considered to
be main cause of length polymorphisms
Techniques Electrophoresis and hybridization
21RAPD (PCR-based marker)
Uses primers of random sequence to amplify DNA
fragments by PCR. Polymorphisms are considered to
be primarily due to variation in the primer
annealing sites, but they can also be generated
by length differences in the amplified sequence
between primer annealing sites
Techniques PCR and Electrophoresis
22AFLP (PCR-based marker)
- A variant of RAPD. Following restriction enzyme
digestion of DNA, a subset of DNA fragments is
selected for PCR amplification and visualization
Techniques PCR and Electrophoresis
23Microsatellite (PCR based marker)
- Targets tandem repeats of a small (1-6 base
pairs) nucleotide repeat motif. Polymorphism due
to the number of tandem repeats
Techniques PCR and Electrophoresis
24- Other markers
- Cleaved Amplified Polymorphic Sequence
(CAPS/PCR-RFLP) - Inter Simple Sequence Repeat (ISSR)
- Single-strand conformation Polymorphism (SSCP)
- Sequence Characterized Amplified Region (SCAR)
- More recent markers
- Single-Nucleotide Polymorphism (SNP)
- Retrotransposon-based markers
- Sequence-Specific Amplified Polymorphism (S-SAP)
- Inter-retrotransposon Amplified Polymorphism
(IRAP) - Retrotransposon-Microsatellite Amplified
Polymorphism (REMAP) - Retrotransposon-Based Insertional Polymorphism
(RBIP)
25Weising, K., Nybom, H., Wolff, K. and Kahl, G.
2005. DNA Fingerprinting in Plants, Priciples,
Methos, and Applications. 2nd Edition. CRC
Press, Boca Raton, Florida, USA.
Spooner, D., van Treuren, R. and de Vicente, M.C.
2005. Molecular markers for genebank
management. IPGRI Technical Bulletin No. 10.
International Plant Genetic Resources Institute,
Rome, Italy.
Henry, R.J. 2001. Plant Genotyping The DNA
Fingerprinting of Plants. CAB International
Publishing, Wallingford, U.K.
26Markers differ with respect to important features
27Dominant marker A marker shows dominant
inheritance with homozygous dominant individuals
indistinguishable from heterozygous individuals
Codominant marker A marker in which both alleles
are expressed, thus heterozygous individuals can
be distinguished from either homozygous state
28None of the available techniques is superior to
all others for a wide range of applications, but
the key-question rather is which marker to use in
which situation
- Within and among population variation Allozyme,
SSR, AFLP and RAPD
- Mating system study Allozyme or microsatellite
- Estimating gene flow via pollen and seed
Microsatellite (SSR)
- Clonal identification AFLP or RAPD
- Polyploidy multilocus dominant marker (AFLP)
- Genetic Linkage Mapping AFLP, RAPD, Allozyme,
RFLP, SSR, CAPS, SNP
- Phlogenetic study conserve within species (DNA
sequencing)
.
29- A framework for selecting appropriate techniques
for plant genetic resources conservation can be
referred to
Karp, A., Kresovich, B., Bhat, K.V., Ayad, W.G.
and Hodgkin, T. 1997. Molecular Tools in Plant
Genetic Resources Conservation A Guide to the
Technologies. IPGRI Technical Bulletin No. 2.
International Plant Genetic Resources Institute,
Rome, Italy
30Microsatellite marker
- What are microsatellite?
- Where are microsatellites found?
- How do microsatellites mutate?
- Abundance in genome
- Why do microsatellite exist?
- Models of mutation
- Development of microsatellite primers
- Genotyping procedure
- Advantages
- Disadvantages
- Applications
31What are microsatellite?
- Tandem repeated sequences with a 1-6 repeat motif
- Dinucleotide (CT)6 - CTCTCTCTCTCT
- Trinucleotide (CTG)4 - CTGCTGCTGCTG
- Tetranucleotide (ACTC)4 - ACTCACTCACTCACTC
- Synonymous to SSR and STR Depending on nature of
repeat tract, SSR can further divided into four
categories
Perfect repeat when repeat tract pure for one motif CTCTCTCTCTCT
Compound SSR when repeat tract pure for two motifs CTCTCTCACACA
Imperfect SSR if single base substitution CTCTCTACTCTCT
Region of cryptic simplicity if complex but repetitive structure GTGTCACAGAGT
32Where are microsatellites found?
Majority are in non-coding region
33How do microsatellites mutate?
- Microsatellites alleles change rather quickly
over time - E. coli 10-2 events per locus per replication
- Drosophila 6 X 10-6 events per locus per
generation - Human 10-3 events per locus per generation
34Abundance in genome
35Why do microsatellite exist?
- Majority are found in non-coding regions thought
no selective pressure as "junk" DNA?
- Regulate gene expression and protein function,
e.g., human diseases caused by expansions of
polymorphic trinucleotide repeats in genes
fragile X and myotonic dystrophy
- In plant, high density of SSRs were found in
close proximity to coding regions regulatory
properties
- High level of polymorphism a necessary source of
genetic variation
36Models of Mutation
- Several statistics based on estimates of allele
frequencies (e.g., Fst Rst) rely explicitly on
a mutation model
- Size matters when doing statistical tests of
population substructuring
37Development of microsatellite primers
- Can be time consuming and expensive. May be
obtained by screening sequence in databases or
screening libraries of clones
- Standard method to isolate microsatellites from
clones - Creation of a small insert genomic library
- Library screening by hybridization
- DNA sequencing of positive clones
- Primer design and PCR analysis
- Identification of polymorphisms
- This approach can be extremely tedious and
inefficient for species with low microsatellite
frequencies
38- Alternative strategies to overcome
- Selective hybridization using nylon membrane
- Selective hybridization using steptavidin coated
beads - RAPD based
- Primer extension
39Genotyping procedure
PCR
40- The use of fluorescently labeled primers, combine
with automated electrophoresis system greatly
simplified the analysis of microsatellite allele
sizes
41120/120
122/122
120/122
120/124
120/126
120/128
42(No Transcript)
43Advantages
- Low quantities of template DNA required (10-100
ng) - High genomic abundance
- Random distribution throughout the genome
- High level of polymorphism
- Band profiles can be interpreted in terms of loci
and alleles - Codominance of alleles
- Allele sizes can be determined with an accuracy
of 1 bp, allowing accurate comparison across
different gels - High reproducibility
- Different SSRs may be multiplexed in PCR or on
gel - Wide range of applications
- Amenable to automation
44Disadvantages
- High development costs in case primers are not
yet available. Primers might be species specific - Heterozygotes may be misclassified as homozygotes
when null-alleles occur due to mutation in the
primer annealing sites - Stutter bands on gels may complicate accurate
scoring of polymorphisms - Underlying mutation model (infinite alleles model
or stepwise mutation model) largely unknown - Homoplasy due to different forward and backward
mutations may underestimate genetic divergence
45Applications
Generally, high mutation rate makes them
informative and suitable for intraspecific
studies but unsuitable for studies involving
higher taxonomic levels
- Population genetics investigations within a
genus of centers of origin, genetic diversity,
population structures and relationships among
species - Parentage analysis seed orchard monitoring,
mating systems and gene flow via pollen seed - Fingerprinting clone confirmation and
individual-specific fingerprints - Genome mapping - Constructing full coverage or
QTL maps - Comparative mapping - Genome structure, framework
maps, or transferring trait and marker data among
species
46Case study 1 Using microsatellites to estimate
gene flow via pollen
47(No Transcript)
48Shorea parvifolia
Shorea leprosula
49Methodology
50Microsatellite Loci
No. of clones sequenced No. of clones with SSR () No. of unique SSR clones () Core sequence (no. of clones repeat times)
624 592 (94.9) 315 (53.2) CT/GA (266 84.4 6-78) GT/CA (29 9.2 8-46) Others (20 6.4 6-40)
Locus Primer sequence (5 3) Repeat motif Length N Size range He PIC
lep074a F ATC ACC AAG TAC CTA TCA TCA R GCA ATG GCA CAC AGT CTA TC (CT)11 124 11 110-130 0.824 0.791
lep079 F GTT GTC TGT TCT TAC CAG GAA G R GCA TAA GTA TCG TCG CCA (CT)11 162 13 155-198 0.830 0.798
lep111a F GGA AAC TAC TGG AGC AGA GAC R GGT GGG TTA TGG AGA ATG AG (GA)14 152 12 138-154 0.855 0.821
lep118 F AAA GCG TAC AAA TTC ATC A R CTA TTG GTT GGG TCA GAA GG (GA)16 170 15 145-176 0.892 0.861
lep280 F GCA ACT AAA ATG GAC CAG A R GAG TAA GGT GGC AGA TAT AGA G (CT)7 119 11 107-137 0.851 0.816
lep384 F CCA AGA CAA CTC AAT CCT CA R AGA TGA AGG TGT TGC TGT G (CT)13 206 14 191-219 0.657 0.632
lep562 F TGA TTT GGG TGG TTG TAG R TAT TAC ATT TTT CAA GTC AAG TC (GT)8 164 12 154-180 0.883 0.852
Lee, S.L. et al. 2004. Isolation and
characterization of 21 microsatellite loci in an
important tropical tree Shorea leprosula and
their applicability to S. parvifolia. Molecular
Ecology Notes 4 222-225
5150 ha demographic plot in Pasoh Forest Reserve
52Pasoh Forest Reserve - 50-ha plot (190
individuals of S. leprosula and 102 of S.
parvifolia ? 27 cm dbh within the 50-ha plot)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56- Shorea leprosula 9 loci (Pe 0.999)
- lep074a, lep384, lep111a, lep118, lep280, lep267,
lep294, lep475 lep562 - PCR (500 x 9 4500 reactions)
-
- Shorea parvifoila 6 loci (Pe 0.999)
- lep074a, lep384, lep111a, lep118, lep280 lep294
- PCR (360 x 6 2160 reactions)
-
57S. leprosula (SL48)
MT48
58S. parvifolia (SP35)
MT35
59Mother tree (no. of seed analyzed) Mean distance between MT outcrossing (no. of seed) pollen outside plot Mean pollen flow distance
Shorea leprosula Shorea leprosula Shorea leprosula Shorea leprosula Shorea leprosula
SL048 (45) 267.1 ? 136.2 93.3 (42) 20.0 (9) 152.9 ? 99.6
SL062 (44) 363.2 ? 151.6 88.6 (39) 20.5 (9) 302.6 ? 188.9
SL074 (48) 259.2 ? 151.2 85.4 (41) 18.8 (9) 148.6 ? 187.2
SL075 (43) 292.6 ? 145.8 67.4 (29) 18.6 (8) 173.1 ? 103.8
SL084 (46) 512.6 ? 228.3 82.6 (38) 23.9 (11) 448.2 ? 245.3
SL109 (45) 343.7 ? 158.8 95.6 (43) 33.3 (15) 285.0 ? 154.5
SL160 (44) 567.1 ? 243.1 81.8 (36) 31.8 (14) 580.3 ? 288.4
Mean 372.2 ? 121.6 85.0 ? 9.3 23.8 ? 6.2 298.7 ? 164.0
Shorea parvifolia Shorea parvifolia Shorea parvifolia Shorea parvifolia Shorea parvifolia
SP009 (32) 309.0 ? 166.5 59.4 (19) 9.4 (3) 61.9 ? 100.5
SP014 (48) 307.7 ? 165.1 62.5 (30) 14.6 (7) 105.1 ? 140.9
SP020 (42) 348.7 ? 172.2 85.6 (36) 33.3 (14) 194.0 ? 146.7
SP022 (47) 239.6 ? 133.2 72.3 (34) 21.3 (10) 148.2 ? 125.0
SP025 (46) 376.2 ? 192.4 56.5 (26) 19.6 (9) 317.1 ? 277.0
SP035 (44) 244.2 ? 139.9 22.7 (10) 2.3 (1) 185.0 ? 159.7
Mean 304.2 ? 54.7 59.8 ? 21.1 16.8 ? 10.7 168.6 ? 88.1
60Mother tree (no. of seed analyzed) Breeding unit parameters Breeding unit parameters Breeding unit parameters
Mother tree (no. of seed analyzed) Size (individual) Area (ha) Radius (m)
Shorea leprosula Shorea leprosula Shorea leprosula Shorea leprosula
SL048 (45) 203.6 63.6 450.1
SL062 (44) 208.0 65.0 454.9
SL074 (48) 205.0 64.1 451.6
SL075 (43) 221.0 69.0 468.8
SL084 (46) 225.2 70.4 473.3
SL109 (45) 245.7 76.8 494.4
SL160 (44) 261.8 81.8 510.3
Mean 224.3 ? 22.1 70.1 ? 6.9 471.9 ? 23.0
Shorea parvifolia Shorea parvifolia Shorea parvifolia Shorea parvifolia
SP009 (32) 81.9 59.4 434.7
SP014 (48) 90.0 65.2 455.6
SP020 (42) 112.9 81.8 510.3
SP022 (47) 97.8 70.8 474.8
SP025 (46) 105.5 76.5 493.4
SP035 (44) 76.7 55.6 420.5
Mean 94.1 ? 13.9 68.2 ? 10.1 464.9 ? 34.5
61 Negative exponential curve
y ae(-x/c)
62Conclusion
- Moderate pollen flow (150 300 m) Thrips as
pollinators - Predominant outcrossing (85) mix-mating (60)
- Model for pollen dispersal negative exponential
model - Optimum population size for conservation -
breeding unit area breeding unit size obtained
(about 70 ha)
63Case study 2 Using microsatellites for
individual-specific DNA fingerprints
64In forensic applications in forestry and chain of
custody certification, two types of databases are
required
65DNA markers to match the illegal log into its
original stump
66- However, In DNA testimony, it is necessary to
provide an estimate of the weight of the evidence - Three possible outcomes of a DNA test no match,
inconclusive, or MATCH between samples examined - If MATCH, it would not be scientifically
justifiable to speak of a match as poor proof of
identity in the absence of underlying data that
permit some reasonable estimate of how rare the
matching characteristics actually are - Therefore, in forensic casework, a population
database must be established for statistical
evaluation of the evidence to extrapolate the
possibility of a random match
67Neobalanocarpus heimii
68Methodology
69Sample collection
70SSRs screening
- 51 SSR primer pairs developed for dipterocarps
- Neobalanocarpus heimii (6) (Iwata et al. 2000)
- Shorea lumutensis (2) (Lee et al. 2006)
- Shorea leprosula (21) (Lee et al. 2004a)
- Hopea bilitonensis (15) (Lee et al. 2004b)
- Shorea curtisii (7) (Ujino et al. 1998)
71Specific amplification
72Mode of inheritance
Qualitative observations (each progeny possessed
at least one maternal allele) to support the
postulation of single-locus mode of inheritance
73Null allele
- Homozygote excess (MICROCHECKER Van Oosterhout
et al. 2004) - Examine patterns of inheritance
- If any Individuals repeatedly fail to amplify any
alleles at just one locus while other loci
amplify normally
74Repeat motif
Dinucleotide repeats (CT)n to mononucleotide
repeats (A)n
75Size homoplasy
76(No Transcript)
77- What model to use product rule or subpopulation
models?
Pasoh Forest Reserve (231 individuals)
- Perform statistical tests to check
- Hardy-Weinberg equilibrium for allele
independence - Linkage equilibrium for locus independence
- Results clearly showed that population is
deviated from HWE
Population substructuring
Inbreeding
78- Random match probability need to be calculated
using subpopulation model and corrected for
coancestry (FST) and inbreeding (FIS)
coefficients
79Population structure of N. heimii throughout P.
Malaysia
80DNA fingerprinting databases of N. heimiii
throughout P. Malaysia
Hardy-Weinberg equilibrium for allele
independence Linkage equilibrium for locus
independence
81Applications of the databases
82Genotypes Genotypes
Nhe004 262/262 262/262
Nhe005 129/129 129/129
Nhe011 176/186 176/186
Nhe015 143/181 143/181
Nhe018 141/169 141/169
Nhe019 214/220 214/220
Hbi016 140/141 140/141
Hbi161 102/105 102/105
Sle111a 137/140 137/140
Sle392 187/189 187/189
Sle605 120/120 120/120
Slu044a 148/148 148/148
Shc03 131/139 131/139
Shc04 85/117 85/117
Shc07 169/169 169/169
Shc09 190/201 190/201
Locus
83Using database to extrapolate the possibility of
a random match
84(No Transcript)