Title: Molecular Systematics
1Molecular Systematics
2 One of the most exciting developments in the
past decade has been the application of nucleic
acid data to problems in systematics. The term
molecular systematics is used to mean
macromolecular systematics the use of DNA and
RNA to infer relationships among organisms. Here
we will discuss the kinds of data available, the
molecules and genomes most commonly used, and
some aspects of data analysis that are unique to
molecular data. Molecular data have
revolutionized our view of phylogenetic
relationships, not because of any innate
difference between these characters and
traditional systematic characters, but because
there are so many more potential characters in
the DNA than in morphology and anatomy and their
interpretation is generally easier.
3 In many cases the molecular data have supported
the monophyly of groups that were recognized on
morphological grounds (e.g., Poaceae, Fabaceae,
Rosaceae). More importantly, molecular data
often have allowed systematists to choose among
competing hypotheses of relationships (e.g., to
decide what group is the sister group of the
Asteraceae or the Poaceae). In other cases,
molecular data have allowed the placement of taxa
whose relationships were known to be problematic.
For example, although the Hydrangeaceae were
traditionally placed in or near the
Saxifragaceae, it was clear the two were
unrelated. Only with molecular data, however,
was there a strong alternative hypothesis for the
placement of the Hydrangeaceae in the order
Cornales.
4Plant Genomes
The plant cell contains three different genomes
those of the chloroplast, the mitochondrion, and
the nucleus. Systematists have used data from
all three.
5 Like the eubacteria from which they are derived,
mitochondria and chloroplasts have circular
genomes. The order of genes in the mitochondrion
is variable, and they are separated by large
regions of noncoding DNA. The mitochondrial
genome rearranges itself frequently, so many
rearranged forms can occur in the same cell. The
chloroplast, in contrast, is stable, both within
cells, and within species. The most obvious
feature of the chloroplast genome is the presence
of two regions that encode the same genes, but in
opposite directions these are known as inverted
repeats. Between them are a small single-copy
region and a large single-copy region.
Rearrangements of the chloroplast genome are rare
enough in evolution that they can be used to
demarcate major groups.
6(No Transcript)
7 The order of genes in the nuclear genome is also
presumed to be stable, at least within species,
and it may be stable across groups of species as
well. These studies are still in their infancy,
but may become important systematic characters in
the near future. DNA sequences change at a rate
different from the rate of genomic rearrangement.
Chloroplast genes tend to accumulate mutations
more rapidly than do mitochondrial genes in
plants. It is more difficult to generalize about
nuclear genes, which is hardly surprising because
there are so many of them.
8Generating Molecular Data
Molecular systematics has been and remains
technique driven as new molecular methods become
available, they expand the kinds and amounts of
systematic data that can be extracted from
nucleic acids. If useful comparisons are going
to be made across many taxa, the technique
applied has to be fast and easy. This is why
molecular systematics was barely possible until
the invention of recombinant DNA, became easier
as sequencing techniques were improved, and took
another leap forward with the invention of the
polymerase chain reaction (PCR) technique.
9Restriction Site Mapping - Most molecular
systematic studies were initially done with
restriction site analysis. This technique can be
used to generate maps of individual genes or
entire genomes. Much of what we know about
chloroplast and mitochondrial genome structure
comes from such studies (at least prior to
sequencing of whole genomes!). In restriction
site analysis, DNA is extracted from a plant and
is then cut with restriction enzymes enzymes
that cut DNA at a particular sequence. The
enzyme known as BamHI, for example, cuts DNA
everywhere it finds the sequence GGATCC, and
EcoRI cuts at GAATTC. A map is constructed by
first cutting the DNA with one enzyme and
examining the resulting pattern of fragment
sizes, then cutting it with a second enzyme, and
finally cutting it with both enzymes together.
This process creates a sort of puzzle from which
the order of restriction sites can be constructed.
10Gene Sequencing - DNA sequencing of genes, parts
of genes, or noncoding regions is now widely used
in systematics. Sequencing determines the
precise order of nucleoidesadenine (A), cytosine
(C), guanine (G), or thymine (T)in a stretch of
DNA. The central difficulty of sequencing has
always been obtaining enough DNA to work
with. The initial approach was to clone genes
into bacteria and allow the bacteria to replicate
the genes along with their own genomes. This
method is quite slow, but it is reliable and
avoids some of the possible artifacts of more
efficient methods. This laborious approach was
later replaced by the polymerase chain reaction
(PCR) technique, in which DNA is replicated
enzymatically, allowing omission of the cloning
step.
11 PCR requires some knowledge of the sequence to
be studied. Small pieces of single-stranded DNA
(primers) are produced to match the DNA sequence
at either end of the region of interest. These
primers are placed in a tube with DNA from the
organism, a DNA polymerase, and free nucleotides.
The mix is then subjected to repeated heating
and cooling. As it heats, the double-stranded
organismal DNA denatures and becomes
single-stranded. Then as it cools, the primers
bind to their complementary sequence at either
end of the target region. The temperature is
then raised to the point at which the polymerase
becomes active. It binds to the DNA primer
complex and begins synthesizing a complementary
strand using the free nucleotides in the
solution. This process is repeated numerous
times to create large amounts of DNA for
sequencing.
12(No Transcript)
13 This rapid method has allowed systematists to
study the same region in many species of a
particular group. One disadvantage of PCR is
that the polymerase itself introduces occasional
errors. As the major genome sequencing projects
progress, the technology for gene sequencing is
improving and becoming more automated.
14Types of Molecular Data
Virtually all molecular phylogenetics is now
done with either genome rearrangements or
sequences of DNA as characters.
Genome Rearrangements
Investigators study chloroplast and mitochondrial
genomes by constructing restriction site maps,
which reveal the order of genes in the genome.
One of the early successes of molecular
systematics was the identification of the
earliest-diverging members of the Compositae
(sunflower family) by Jansen and Palmer (1987).
Using restriction site mapping, they found that
almost all members of the family have a unique
order of genes in the large single-copy region of
the chloroplast genome. This order could be
15explained by a single inversion of the DNA. All
other angiosperms lack the inversion. The few
composites that have the ancestral arrangement of
the genome are members of the subtribe
Barnedisiinae, a South American group with
bilabiate corollas. This finding strongly
suggests that the Barnedisiinae is the sister
group to the rest of the enormous sunflower
family, and that the latter is monophyletic. The
context of this discovery is important. Many
previous researchers had speculated on what the
ancestral composite might have looked like, and
they had suggested several extant groups that
might represent the earliest lineages. The
Barnedisiinae was one of several possibilities
and had been supported by cladistic analyses of
morphological data.
16Analysis of Molecular Data
There is a huge literature on the use of DNA
sequences in phylogeny reconstruction. Issues
involved in phylogeny reconstruction include (1)
mutation rate, (2) alignment, (3) analytic
technique, and (4) the relationship between the
history of the gene and the history of the
organisms (gene trees versus species
trees). Genes accumulate mutations at different
rates, in part because gene products differ in
how many changes they can tolerate and still
function. This affects how we choose what genes
to use to reconstruct phylogenies at different
levels. If a gene that evolves slowly is used to
reconstruct the phylogeny of organisms that are
very closely related, we will not get resolution
of relationships due to too little change in the
gene.
17Alternatively, if the gene chosen evolves quickly
and we are studying deep phylogenetic
relationships, there might be too much change in
the gene to accurately reconstruct
relationships. Alignment of sequences deals with
the concept that in order to reconstruct
phylogeny by comparing DNA sequences, the
sequences need to be lined up next to each other
to determine where the changes in the DNA have
occurred. This involves determining homologous
DNA positions. For many genes this is not a
problem, but for quickly evolving genes, DNA that
does not code for a gene product, and comparisons
between very distantly related taxa, it can
become difficult.
18 Homoplasy and long branches are conditions that
negatively affect phylogeny reconstruction.
Where multiple mutations occur at a site, or
where convergence or reversals occur, this can
obscure our ability to reconstruct phylogenies.
Long branch attraction is the condition where
there have been a large number of changes to the
gene on two branches and by chance some of the
changes might be the same. These changes might
be inferred to be synapomorphies rather than
parallelisms, placing the two branches together
rather than in their proper locations. Chapter 2
covered three different means of reconstructing
phylogenies (we did not cover this section and
these are not the only three possible ways of
reconstructing phylogenies!) parsimony, distance
methods, and maximum likelihood.
19Gene Trees Versus Species Trees
If a species has a single history, then we
expect all parts of the plant to reflect that
history. We also then expect any phylogeny based
on any gene to reflect the history of the
organisms bearing the genes, but we know this is
not necessarily true. Nuclear genes may or may
not track the history of the nucleus, and
chloroplasts and mitochondria may or may not have
a history different from that of the nucleus.
There are three main reasons for these
differences 1. Mutation is a random process
therefore the phylogeny reconstruction for a
particular gene may differ from other genes by
chance alone.
202. Hybridization or introgression may transfer
some DNA into a different lineage. This is
particularly true for organelles, which are not
linked to particular nuclear genomes. 3.
Polymorphisms in an ancestral species can be lost
in descendant species. By chance, this can
result in a history of the genes that is actually
different from the history of the organisms.
21(No Transcript)
22(No Transcript)