Title: Sequence Diversity in Evolution and Crop Improvement
1Sequence Diversity in Evolution and Crop
Improvement
Teosinte
Maize Landraces
Inbreds/Hybrids
- Sherry Flint-Garcia
- Research Geneticist
- USDA-ARS
- MU Division of Plant Sciences
Photos courtesy J. Doebley
2Sequence Diversity
- Evolution
- What are the forces that cause evolution?
- Speciation hybridization
- Uncovering evolutionary history
- Crop Improvement
- The teosinte-maize story
3The Four Forces of Evolution
- Mutation -- spontaneous changes in the DNA of
gametes. Prerequisite to all other evolution. - Natural Selection -- genetically-based
differences in survival or reproduction that
leads to genetic change in a population. - Gene flow -- movement of genes between
populations. In plants this can be accomplished
by pollen or seed dispersal. - Genetic drift -- random changes in gene
frequency. This is very important in small
populations.
4Mutation Generation of New Alleles
- Mutations are the result of mistakes in DNA
replication, exposure to UV or to some chemicals
(mutagens) and other causes. - Point mutations
- changing one nucleotide to another
- e.g., C--gtT
5Sickle Cell Anemia
A single point mutation causes a dramatic change
in phenotype.
6Other types of mutations
- Indels
- insertions/deletions
- Cause frame-shifts, usually premature stops
- Gene duplication
- May lead to new functions
- Chromosomal mutations
- Inversions, translocations, deletions
- Polyploidy
- Very common in plants
- May lead to new species in one step
7Most point mutations have no effect or almost no
effect. Why?
Most of the genome seems to be junk -- at least
it doesnt code for proteins. Many mutations
within protein-coding region of genes dont
change the amino acid specified. i.e., there is
redundancy in the genetic code.
For example, 6 different codons specify the
amino acid leucine.
8The Four Forces of Evolution
- Mutation -- spontaneous changes in the DNA of
gametes. Prerequisite to all other evolution. - Natural Selection -- genetically-based
differences in survival or reproduction that
leads to genetic change in a population. - Gene flow -- movement of genes between
populations. In plants this can be accomplished
by pollen or seed dispersal. - Genetic drift -- random changes in gene
frequency. This is very important in small
populations.
9Natural Selection
- Peppered moth (Biston betularia) evolution during
the industrial revolution in England - Early 1800s pre-industrial
- Bark of trees were white
- Almost all moths were of typica form
- 1895 Industrial Era
- Bark of trees were covered in black soot
- 98 of moths were of carbonaria form
- Today Clean Air laws enforced
- Prevalence of carbonaria form declining
typica form
carbonaria form
10(No Transcript)
11Brassica oleracea
12The Four Forces of Evolution
- Mutation -- spontaneous changes in the DNA of
gametes. Prerequisite to all other evolution. - Natural Selection -- genetically-based
differences in survival or reproduction that
leads to genetic change in a population. - Gene flow -- movement of genes between
populations. In plants this can be accomplished
by pollen or seed dispersal. - Genetic drift -- random changes in gene
frequency. This is very important in small
populations.
13Gene Flow
- Tends to homogenize populations.
- Rates of gene flow depend on the spatial
arrangement of populations.
Directional movement of alleles
Migration occurs at random among a group of
equivalent populations.
14Migration along a linear set of populations
Populations are continuous.Â
15(No Transcript)
16The Four Forces of Evolution
- Mutation -- spontaneous changes in the DNA of
gametes. Prerequisite to all other evolution. - Natural Selection -- genetically-based
differences in survival or reproduction that
leads to genetic change in a population. - Gene flow -- movement of genes between
populations. In plants this can be accomplished
by pollen or seed dispersal. - Genetic drift -- random changes in gene
frequency. This is very important in small
populations.
17(No Transcript)
18Founder effect Gene flow and genetic drift are
responsible for the limited genetic variation on
islands, relative to mainland populations.
19Speciation and Hybridization
- Speciation how do new species arise?
- What is a species, anyway?
- Most species were originally described by their
morphology. - The Problem Convergence
- Similar features in unrelated organisms due to
evolution of traits that work in similar
environments
20Convergent structures in the ocotillo (left) from
the American Southwest, and in the allauidia
(right) from Madagascar.
21Nectar feeders have converged on this hovering
long-tongued morphology.
22Speciation and Hybridization
- Biological Species Concept (BSC)
- Based on reproductive compatibility
- Natural spatial, temporal, and morphological
discontinuities generally correspond to fertility
barriers - The Problem In plants, many named species can
hybridize.
23Most dandelions are asexual. So the biological
species concept (BSC) doesnt apply. How can you
name species depending on who can mate with whom
when the organisms do not mate at all?!
24Scarlet and Black oaks can hybridize and inhabit
the same range -- but they have different
microhabitat preferences so hybridization is rare.
25These pines can also hybridize but they shed
their pollen at different times of the season
26Speciation by Hybridization
Hybridization often shows how difficult it is
to apply the BSC to plants. The hybrid in this
case is a new species. The rearrangements of its
chromosomes make it infertile with either parent.
hybrid
27As the climate becomes drier the desert splits
the range of this hypothetical tree species.
This reduces gene flow between the now isolated
populations and sets the stage for speciation.
28Evolution of species that are geographically
separated. Genetic drift plays a significant
role.
Edge effect where evolution of reproductive
barriers occurs between neighboring populations.
Requires considerable selection pressure.
Establishment of a new population with a
different ecological niche within the same
geographical range of the parental population
29Uncovering Evolutionary History
- Taxonomy vs. Systematics
- Estimating Phylogeny
- Distance Methods
- Maximum Parsimony Methods
- Maximum Likelihood Methods
-
30Taxonomy vs. Systematics
- Taxonomy
- Discovering
- Describing
- Naming
- Classifying
- Systematics
- Figuring out the evolutionary relationships of
species - Summarize the evolutionary history of a group
31Plant Taxonomy
- taxon - any group at any rank
- corn common name
- kingdom Plantae (Viridiplantae)
- division (phylum) Anthophyta
- class Liliopsida
- order Commelinales
- family Poaceae
- genus Zea
- species Zea mays
always capitalized
never capitalized
32Plant Systematics
- A phylogenetic tree is used to illustrate
systematicrelationships - Modern taxonomic groups generally correspond to
clades on a phylogenetic tree (i.e. cladogram) - Example phylogenetictree of the grass family
Mathews et al. 2000 American Journal of Botany
33Angiosperm Phylogeny Group TreeDicots are not
a monophyletic group.
34Data Types that can be used to Estimate a
Phylogeny
- Cross Compatibility
- Uses the Biological Species Concept
- Morphological
- Continuous traits
- Meristic (countable) traits
- Cytological
- Chromosome number
- Chromosome features
- Pairing in hybrids
- Molecular data
- Secondary chemicals
- Proteins
- DNA
- Allele frequencies at many loci (isozymes, SSR)
- DNA sequences, considered as a whole
- DNA sequences, considered site-by-site
35Maximum Parsimony (Minimum Evolution) Methods
- The process of attaching preference to the
pathway that requires the invocation of the
smallest number of mutational events. - Most effective when examining sequences with
strong similarity - Underlying premises
- Mutations are exceedingly rare events.
- The more unlikely events a model invokes, the
less likely the model is to be correct.
36Using only trait 1
Traits must have discrete character states.
Must have same character state in at least 2 taxa.
37But traits 3 4 disagree with trait 1.
sp2
sp5
Redlt-gtblue
Alt-gtG
sp3
sp1
sp4
38- Every possible tree is considered individually
for each informative site (computationally
intensive). - After all informative sites have been considered,
the tree that invokes the smallest total number
of substitutions is the most parsimonious.
4
2
1
5
3
3
5
2
1
4
Blue
Blue
0
0
G
G
0
Blue
4 substitutions required
5 substitutions required
G
Red
Red
A
A
1
1
39Distance-based approaches
Compare each taxon to every other taxon to
estimate a distance matrix
Distances are then clustered to estimate a
phylogenetic tree.
d12
d13
d14
d15
d25
d24
d23
d35
d34
d45
40Distance-based approaches
Compare each taxon to every other taxon to
estimate a distance matrix
Example DNA sequence considered as a whole
10 20 30 40 50Sp1 GTGCTGCACG GCTCAGTAT
A GCATTTACCC TTCCATCTTC AGATCCTGAASp2
ACGCTGCACG GCTCAGTGCG GTGCTTACCC TCCCATCTTC AGATCC
TGAASp3 GTGCTGCACG GCTCGGCGCA GCATTTACCC TCCCATC
TTC AGATCCTATCSp4 GTATCACACG ACTCAGCGCA GCATTTGC
CC TCCCGTCTTC AGATCCTAAASp5 GTATCACATA GCTCAGCGC
A GCATTTGCCC TCCCGTCTTC AGATCTAAAA
9
8
12
15
18
15
11
13
10
5
41Distance-based approaches
Distances are then clustered to estimate a
phylogenetic tree.
Example UPGMA algorithm Unweighted Pair-Group
Method using Arithmetic means
9
8
12
15
18
15
11
13
10
The smallest distance is identified, the average
of the two combined taxa is calculated, and the
matrix is recalculated. This iteration is
repeated.
5
2.5
2.5
42Distance-based approaches
Sp1
Sp2
Sp3
4-5
0
9
8
13.5
Sp1
16.5
11
0
Sp2
11.5
0
Sp3
0
4-5
2.5
2.5
4
4
43Distance-based approaches
Sp2
1-3
4-5
0
10
16.5
Sp2
12.5
0
1-3
0
4-5
2.5
2.5
4
4
5
2
4
5
1
3
44Distance-based approaches
1-2-3
4-5
0
12.5
1-2-3
0
4-5
6.5
6.5
2.5
2.5
4
4
5
2
4
5
1
3
45Maximum Likelihood Methods
- Best suited for DNA and protein sequence data
- Requires a model of evolution
- Each nucleotide/amino acid substitution has an
associated likelihood - A function is derived to represent the likelihood
of the data given the tree, branch-lengths and
additional parameters - Function is minimized
461 A C G C G T T G G G 2 A C G C G T T G G
G 3 A C G C A A T G A A 4 A C A C A G G G A A
L(Tree 1) L0 x L1 x L2 x L3 x L4 x L5 x L6 5
x 10-13
471 A C G C G T T G G G 2 A C G C G T T G G
G 3 A C G C A A T G A A 4 A C A C A G G G A A
Repeat for each of node assignment, and each site
in alignment. Probability of that unrooted tree
is the sum of all individual trees. Repeat for
each unrooted tree and choose the tree with the
highest liklihood.
L(Tree 1) L0 x L1 x L2 x L3 x L4 x L5 x L6 5
x 10-13
L(Tree 2) L0 x L1 x L2 x L3 x L4 x L5 x L6 1
x 10-18
48The Teosinte-Maize Story
- The practical side of sequence diversity
- PLANT BREEDING!
- Sequence Diversity in Teosinte
- Sequence Diversity in Maize
- Selection During Domestication and Improvement
49Sequence Diversity and Plant Breeding
- Genetic diversity within a crop species is the
raw material for current plant breeding - Genetic diversity is the insurance policy to
enable plant breeders to adapt crops to changing
environments
50The Problem
- To what degree is limiting genetic diversity
- inhibiting genetic improvement in corn?
51Two Views of the Problem
- Most of the corn germplasm in use in the USA
today is derived from mixtures of only two major
races out of 300 races total (Wallace and
Brown, 1956). The simplest means of correcting
this situation and of increasing the genetic
diversity of this important crop is to introduce
unrelated sources of germplasm (Brown and
Goodman, 1977, Races of Corn, in Corn and Corn
Improvement) - From a project comparing sequence diversity in
21 genes of nine U.S. inbred lines with 16
diversity maize landraces We found that our
sample of U.S. inbreds contained a level of
SNP diversity that was 77 the level of
diversity in our landrace sample. (Tenaillon et
al., 2001, PNAS, 989161-9166)
52Sequence Diversity in Maize
- How has selection shaped sequence diversity in
maize? - Survey SNPs from 1800 genes in diverse maize and
teosinte germplasm - Screen 4000 candidate genes for evidence of
selection - Practical Goal identify genes exhibiting
selection - Domestication, agronomic improvement, and local
adaptation
53Allele Frequencies
teosinte
Domestication
landraces
Plant Breeding
modern inbreds
Unselected Gene
Domestication Gene
Improvement Gene
54Can we develop genomic screens to identify genes
that have undergone selection?
1. Invariant SSR approach 2. Direct Sequencing
Approach What proportion of genomic sequences
that have low allelic diversity among inbreds
result from selection for domestication? Contrast
sequence diversity among teosintes, landraces,
and inbreds
55Screening SSR primers against 12 inbred lines
1,772Â total SSRs 1,053Â were polymorphic (Class
I) 719Â were invariant (Class II)
Invariant SSR primers
56Invariant SSR Screening
- 470 invariant SSR primer sets
- 321 monomorphic throughout
- 60 polymorphic in both exotics and teosintes
- 14 polymorphic only in exotics
- 75 polymorphic only in teosintes (Class II-E)
Vigouroux et al. 2002. PNAS 99 9650
57Analysis of Class II-E SSRs
- 31 Class I SSRs and 44 Class II-E SSRs
- 44 teosinte and 45 landrace accessions
- Tested for selection (loss of diversity)
- 0 Class I SSRs showed evidence of selection
- 15 Class II-E SSRs showed evidence of selection
- Extrapolated back to the 1772 total SSRs
- 1.4 genes have been selected
58Direct Sequencing Approach
- Purpose to develop a SNP resource for the maize
community - Result a LOT of data!!!
59Distribution of SNP Haplotypes (patterns)
470 maize Unigenes in 14 maize lines Mean
haplotype 4.46 gt 80 of unigenes have 2 to
7 haplotypes
For each gene, a few haplotypes account for much
of the diversity
60Are genes with low inbred diversity enriched for
domestication and improvement candidates?
(Masanori Yamasaki, post-doc in McMullen Lab)
36 genes with no diversity among a 14-inbred
set Sequenced same region in 16 landraces, 16
teosintes, and a Tripsacum dactyloides
sample. Â Test for selection on inbreds,
landraces and teosintes compared to four neutral
genes.
61Selection Tests for 33 (of 36) Genes
5 genes were significant in both the inbreds and
the landraces (evidence for domestication
genes). Â 7 genes were significant in the inbreds
but not the landraces (evidence for improvement
genes). 1 additional gene was classified as
either domestication or improvement depending on
the test. 13 out of 33 genes 39 !!
Yamasaki et al. submitted
62Selection on a Genomic Scale
- Sequenced 774 maize unigenes in 14 maize inbreds
and 16 teosinte accessions - Tested for selection using coalescent simulations
- Result 2-4 had experienced artificial selection
- Assume 59,000 genes in maize
- 59,000 x 2 1200 selected genes
Wright et al. 2005 Science 308 1310
63Where are we going with this?
- Before genomics, 11 genes had been identified as
selected by population genetic approaches. - By sequencing 1000 genes, have 50 novel
candidates. - We need
- 1. to completely sequence the maize genome to
identify ALL genes. - 2. to resequence all remaining genes in multiple
maize inbreds and teosinte accessions.
1140 more !
64Signatures of Selection
- If selected genes were important in the past
improvement, continued manipulation might
contribute to future gain. - If selected genes suffered a loss of diversity
because of selection, they are prime candidates
for introgressive breeding from wild relatives. - Hypothesis manipulation of the expression of
domestication and improvement genes will alter
key agronomic traits
65Selection for Amino Acid Content?
- Four genes that show evidence of selection are
involved in amino acid biosynthesis
66Selection for Amino Acid Content?
- Are there more genes in amino acid pathways that
have been selected? - Sequenced 16 genes in 28 maize inbreds, 16
teosinte, and 2 tripsacum. - Result we found 4 genes that may have been
selected during domestication/improvement.
67The Ultimate Selection Project
B73 inbred line
68Sequence Diversity in Evolution and Crop
Improvement
Teosinte
Maize Landraces
Inbreds/Hybrids
- Sherry Flint-Garcia
- Research Geneticist
- USDA-ARS
- MU Division of Plant Sciences
Photos courtesy J. Doebley
69Molecular Diversity SNP Single nucleotide
polymorphism InDel Insertion deletion SNPs
and Indels are used markers for genetic analysis