Applications and Discoveries - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Applications and Discoveries

Description:

Photo: Plant Laboratory, Department of Biology, University of York ... This sharing of information is especially important in this case as, while ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 37
Provided by: theresa81
Category:

less

Transcript and Presenter's Notes

Title: Applications and Discoveries


1
Applications and Discoveries
2
Contents
  • Use of model plants
  • What weve learned
  • Gene numbers, alternative splicing
  • Genome sizes
  • Function of non-genic regions
  • Duplications
  • Predicting gene function
  • Microarrays
  • Use of genomics in evolutionary biology,
    conservation, etc.
  • Future of comparative genomics

3
Use of model plants
  • As mentioned, one of the benefits of comparative
    genomics is the ability to use information
    generated in one species to facilitate discovery
    in other plant species. This has impelled the use
    of model plants plants that are studied
    extensively with the hope of extrapolating the
    findings to other plants.
  • There are several advantages to using model
    organisms
  • More efficient use of resources (financial,
    material, human)
  • The ability to take advantage of certain aspects
    of particular species (smaller genome size,
    shorter generation time, etc. see next slide for
    the example of Arabidopsis)
  • Consolidation of research and therefore more
    comprehensive knowledge (i.e. it is sometimes
    better to understand developmental processes by
    studying them intensely in one plant rather than
    superficially in many plants).

4
Model plants Example, Arabidopsis
Arabidopsis thaliana, more commonly called
simply Arabidopsis, was the first high-profile
model plant. Research on Arabidopsis actually
began as long ago as the late 1800s, but the
1980-90s saw a dramatic increase in the emphasis
of Arabidopsis as a model plant and it became the
first plant genome to be fully sequenced, despite
it having absolutely no commercial usefulness.
  • The advantages of using Arabidopsis as a model
    are well-documented
  • It has a very small genome size 125 Mb
  • It has a short generation time approximately 6
    weeks
  • It is easy to grow, very small, and produces a
    lot of seed
  • It is easily transformable

Image and information from The Arabidopsis
Information Resource (TAIR) http//www.arabidopsis
.org/ See Table 1 in the first section to
compare this with other organisms
5
Model plants Example, Arabidopsis
Certainly Arabidopsis has some disadvantages as
well, in particular the fact that it does not
produce fruit also, it is a dicot and many of
the worlds food staples are monocots.
Therefore there are limitations to the amount
of information we can extrapolate to other
plants.
?
6
Lessons from ArabidopsisWhat weve learned
  • Arabidopsis was the first plant genome to be
    fully sequenced, in 2000 (The Arabidopsis Genome
    Initiative). Many informative discoveries came
    out of the analysis, just a few of which are
    noted here
  • There were just over 25,000 genes - much less
    than expected
  • The absolute number of gene families and
    singletons (types) was in the same range as the
    other multicellular eukaryotes, indicating that a
    proteome of 11,00015,000 types is sufficient for
    a wide diversity of multicellular life
    (Arabidopsis had more duplicated genes and
    multi-gene families)
  • Although Arabidopsis had been thought to be one
    of the simplest, ancestral plants, large amounts
    of duplications were found, signifying that it
    itself has had two rounds of duplication during
    its evolutionary history (Ku et al. 2000, Vision
    et al. 2000). Indeed, polyploidy events seem to
    have played an important role in the evolutionary
    history of most if not all plants (Adams and
    Wendel 2005)

7
The power of community another lesson from
Arabidopsis
An interesting side note to the Arabidopsis story
is the exemplary way that the community of
Arabidopsis researchers worked together on this
project. The funding of the project was an
international collaboration, and, in the first
report, the researchers involved decided to have
the authorship of the paper be The Arabidopsis
Genome Initiative rather than letting a
controversy over author order cloud the
publication
Arabidopsis Genome Initiative (2000) Analysis of
the genome sequence of the flowering plant
Arabidopsis thaliana. Nature 408 796 - 815
Arabidopsis the best characterized weed on
Earth From Genomics and Beyond, BBSRC life, UK
http//www.bbsrc.ac.uk/life/genomics/whatis.html P
hoto Plant Laboratory, Department of Biology,
University of York
8
Ongoing and future work Arabidopsis 2010
  • There are now extensive resources available for
    Arabidopsis, including
  • whole genome sequence
  • high-resolution genetic and physical maps
  • thousands of mutants documented and publicly
    available
  • microarray information
  • Ongoing work continues. One goal in particular is
    to identify the function of every single gene by
    the year 2010.

http//www.arabidopsis.org Browse 2010 projects
for updates and ongoing projects
9
Model plants Rice
Because Arabidopsis is only distantly related to
those crops that are most important to the
worlds food supply, the grains (or cereals), the
next plant species to be sequenced was rice. The
data from this project is of course extensive and
will continue to be analyzed for some time, but
some interesting discoveries already include
  • Although 80.6 of predicted Arabidopsis thaliana
    genes had a homolog in rice, only 49.4 of
    predicted rice genes had a homolog in A. thaliana
  • Although gene order is not conserved between
    Arabidopsis and rice, many gene functions are
    conserved (light receptors, flowering pathways,
    stress responses, developmental pathways, etc.)
  • There appears to be nearly 50,000 genes in the
    rice genome, more than that of the human

(Goff et al. 2002, Rensink Buell 2004, Yu et
al. 2002)
10
Comparative genomics Example from the Solanaceae
In Doganlar (2002), genomic locations of fruit
shape, fruit weight, color and other QTL were
compared between eggplant, tomato, potato, and
pepper (all members of the Solanaceae).
(continued next slide..)
From Doganlar S, A Frary, M-C Daunay, RN Lester
SD Tanksley (2002). Conservation of Gene
Function in the Solanaceae as Revealed by
Comparative Mapping of Domestication Traits in
Eggplant. Genetics 161 17131726
11
Lessons learned from the Solanaceae
The results of this study suggest that
domestication of the Solanaceae has been driven
by mutations in just a few loci that have major
phenotypic effects, such as the fruit weight
locus on chromosome 2 seen in the previous
slide. This is a good example of how the use of
comparative genomics allows information to be
shared and compared from one species to another,
combining to give answers to problems that
information from one species alone could not.
This sharing of information is especially
important in this case as, while tomato is
currently being sequenced through an
international collaboration (see the SOL genomics
network, http//www.sgn.cornell.edu/), eggplant
is not likely to be sequenced in the near future
- however, the understanding of the eggplant -
tomato relationship allows eggplant researchers
to take advantage of the new tomato genome
sequence.
http//www.sgn.cornell.edu
12
Comparing genomes Example from the grasses
This is now one of the most well-known figures in
plant comparative genomics.
This consensus comparative map of 7 grasses shows
how the genomes can be aligned in terms of rice
linkage blocks (Gale and Devos 1998). Any radial
line starting at rice, the smallest genome and
innermost circle, will pass through regions of
similar gene content in each of the other
species. Therefore a gene on the chromosome of
one grass species can be anticipated to be
present in a predicted location on a specific
chromosome of a number of other grass family
species. This has facilitated much sharing among
researchers working on any of these species and
others that may be also related (Phillips
Freeling 1998).
13
Information gained
  • This seminal consensus map demonstrated several
    fundamental points to our understanding of plant
    genomes
  • The level of conservation of gene order among
    these grasses is high enough that this consensus
    can be used to predict the locations of genes
    from one crop species to another (the resolution
    varies depending on the crop and the region
    recall the explanation of macrosynteny vs.
    microsynteny in the previous section).
  • It appears that rice is the ancestral genome and
    major chromosomal rearrangements have taken place
    during the evolution of the other grasses.

Millet (R. Nelson, photo)
Key point The extensive resources developed in
rice and other main-stream crop species, such as
sequence and other types of data, can be taken
advantage of by those working in related species
with less resources developed.
14
Comparing genomes Limitations
On the other hand, although the level of synteny
discovered among the grasses has greatly
facilitated research in these crops, there are
definite limits to the amount of synteny we can
expect among more distantly related species. A
review by Brendel et al. (2002) described a clear
lack of conservation of gene order (microsynteny
or micro-colinearity) between Arabidopsis and
maize, even though approximately 90 of maize
proteins had a homolog in Arabidopsis. Therefore
sequencing many more plant genomes will still be
not only helpful but necessary.
?
15
Gene numbers
An early discovery of genomics research was that
the number of genes each organism contains is
much less than was expected. In addition, the
number of genes an organism has does not
necessarily correlate with how complex the
organism seems to be (according to our
preconceptions of complexity) (Claverie 2001).
For example, the human genome was estimated
early on to contain over 100,000 genes. Current
estimates are only around 35,000 - about the same
number as the mouse. How can this be?
Approximate gene numbers of some organisms
Organism Yeast Caenorhabditis elegans (worm)
Fruit fly Arabidopsis Human
Number of genes 5,800 19,000 13-14,000
25,498 30,00040,000
16
Alternative splicing
Alternative splicing is one of the ways an
organism can make a small set of genes turn into
a large number of proteins. Recall that, during
transcription, the introns are eliminated
(spliced out) leaving just the exons (genic
sequences) to be translated. However, as depicted
below, changing the splicing positions can lead
to a completely different set of functional
genes. This mechanism makes it possible for an
organism to produce many more proteins than what
would seem possible from a linear reading of the
genes.
Splice sites
exon
intron
exon
exon
intron
DNA sequence
mRNA sequence
Same DNA sequence with different splicing
locations
exon
exon
intron
Alternative mRNA sequence
17
Genome sizes
  • When it became clear that a bigger genome does
    not equal more genes, questions arose
  • Why are some genomes much bigger than others?
  • What is the extra DNA?
  • What causes, and what is the function (if any)
    of, the extra DNA?
  • It was found that as genomes increase in size,
    gene density decreases and more repetitive
    elements are found between genes.
  • For a good summary, see Walbot V Petrov DA
    (2001) Gene galaxies in the maize genome. PNAS
    98 8163-8164

18
Mechanisms of genome expansion
  • There are several mechanisms by which, over time,
    can cause the genome of an organism to expand
  • Duplication of either all or part of the genome
    through polyploidy events
  • Unequal recombination and/or nonreciprocal
    translocations
  • Retrotransposable elements (that leave the
    original copy behind while inserting additional
    copies throughout the genome).

Some wheat species have undergone polyploid
events during hybridizations and now have 4-6
sets of chromosomes rather than the original set
of 2.
19
Functionality of extra DNA
It is now known that much of an organisms
genome, even up to 98, is not directly involved
in the production of proteins (see cartoon
below). This non-coding DNA was originally termed
junk DNA, as it was thought to serve no
purpose.
However, recent studies imply that this extra
DNA may indeed have other functions, including
structural stability (helping stabilize the DNA
molecule), moderating gene expression, and
possibly other functions, including evolutionary
ones (Andolfatto 2005). Much ongoing research is
now focusing on this important junk DNA.
Cartoon from 2001 Genome Research Limited (GRL),
http//www.yourgenome.org/primer/all/ For a good
review, see http//www.eurekalert.org/pub_release
s/2005-07/cshl-lcg071205.php See also Andolfatto
2005
20
Types of genes in the genome
Although plants have evolved and thus diverged
over millions of years and clearly are very
diverse, studies of sequences in rice,
Arabidopsis, tomato, and other plants have shown
that all flowering plants contain a similar set
of genes (Somerville Somerville, 1999 van der
Hoeven et al. 2002).
Figure of the distribution of tomato unigenes
whose putative functions could be assigned
through annotation (from van der Hoeven et al.
2002). This is very similar to the percentages of
gene types found in other plants.
21
Gene function not always due to the gene itself
There are other elements that affect the function
and expression of a gene, such as promoters,
enhancers, silencers and others. Frary et al.
(2000) describes a quantitative trait locus (QTL)
associated with the difference in fruit weight
between wild and cultivated varieties of
tomatoes. They found the function of this gene to
be most likely due to changes upstream in the
promoter region rather than in the sequence of
the encoded protein itself.
22
Evolution of genic regions over time
By comparing genomes of various species, it has
been discovered that different regions of the
genome evolve (change) at different rates exons
change at a slower rate over time, while introns
(the intervening sequences) change faster. This
has important implications for genomic studies
(see next slide).
Over time, the similarity between the 2
sequences becomes less and less in the introns,
while the genic regions stay more similar.
Figure. Evolution of functionally important
regions over time. Immediately after a speciation
event, the two copies of the genomic region are
100 identical. Over time, regions under little
or no selective pressure, such as introns, are
saturated with mutations, whereas regions under
negative selection, such as most exons, retain a
higher percent identity.
intron
exon
Image from Miller W, Makova KD, Nekrutenko A,
Hardison RC (2004) Comparative genomics. Annu
Rev Genomics Hum Genet 515-56
23
Implications levels of polymorphism
One example of where this difference in
divergence between introns and exons becomes
important is in the development of what are
called universal primers primers that will
amplify orthologous sequences in different
species. For instance, Wu et al. (2006)
developed sets of primers flanking exons that
would amplify not only in tomato, potato,
eggplant and other Solanaceae species but also
coffee and tobacco. However, when high levels of
polymorphism were needed, e.g. for mapping
purposes or studying more closely related
organisms, designing primers that flanked or
included introns was more successful, as the rate
of divergence (as measured by number of base
substitutions) of introns was found to be nearly
3 times higher. The figure below demonstrates how
different sets of primers (denoted by arrows)
might be developed to flank exons or introns,
respectively.
exon
exon
intron
intron
Since the blue primers would amplify a region
encompassing an intron, they would be more likely
to amplify products that highlighted differences
between 2 species, even if they were closely
related.
24
Predicting gene function
Many scientists are interested in knowing the
function of a particular gene. However,
uncovering and confirming a definite function can
be very difficult. Sometimes researchers can
infer a putative (possible) function by searching
for similar genes (under the assumption that
genes that have similar sequences may have
similar functions). But this is not an absolute
confirmation.
This figure shows that, as of 2003, large
proportions of the genes in these organisms still
had unknown functions, and another large
proportion have only been characterized by
similarity. In the end, all functions must be
confirmed experimentally to be known
definitively.
Image from Koonin EV, Galperin, MY (2002)
Sequence - Evolution - Function Computational
Approaches in Comparative Genomics
http//www.ncbi.nlm.nih.gov/books/bv.fcgi?ridsef.
TOCdepth1
25
Gene annotation a caution
As mentioned, frequently a gene in a database is
associated with a particular function, through a
process now commonly known as annotation
(technically speaking, this term simply means the
addition of other information). However, as seen
in the previous slide, most gene functions have
not yet been proved experimentally, they are
often assumed because of sequence similarity.
Researchers using this information need to be
aware that until a gene function has been proven
in the laboratory, its annotation is just a
guess.
For example, the figure below shows a blast
result of a tomato EST sequence (SGN-E247181).
Note the phrase similar to cell division control
protein (in rice). This tomato sequence may or
may not turn out to actually have this function -
it may be a good place for an interested
researcher to start investigating, but at this
point it is not known.
From http//www.sgn.cornell.edu
26
The use of association studies
The use of association studies, a method of
discovering genome regions correlated with
particular traits, seems to hold great potential
for identifying the genes responsible for
quantitative traits, with the caveat that
stringent statistical analyses must take into
account issues of population structure and a high
level of false positives. With the development
of new statistical tools, this method has been
particularly useful in maize, mapping such traits
as flowering time, sweet taste, etc. (Yu
Buckler 2006, Yu et al. 2006).
See http//www.maizegenetics.net/ for more about
this topic, including the related statistics
27
The use of microarrays
As discussed in the previous section, microarrays
are a way to study the expression of many genes
or even the whole genome at once. Analyzing the
results of a microarray experiment, however, can
by quite complex. A number of software programs
are available to assist with this but they are
quite complicated and most are quite expensive.
The next slide shows an example of a microarray
that was developed to compare the expression of
genes related to fruit development in tomato and
pepper.
A single microarray experiment can generate tens
of thousands of data points.
28
Microarray tomato example
Comparison of gene expression in tomato and
pepper fruit using a cDNA microarray prepared
from tomato EST clones. The TOM1 array contains
12,899 features derived from the tomato genome
(Alba 2004). The white square encircles a single
sub-grid (420 cDNA features) that is enlarged and
shown in (b).
From Alba R, Fei Z, Payton P, Liu Y, Moore SL,
Debbie P, Cohn J, DAscenzo M, Gordon JS, Rose
JKC, Martin G, Tanksley SD, Bouzayen M, Jahn MM,
Giovannoni J (2004) ESTs, cDNA microarrays, and
gene expression profiling tools for dissecting
plant physiology and development. The Plant
Journal 39 (5) 697-714 For more information see
Tomato Expression Database (http//ted.bti.cornell
.edu/)
29
Analysis of microarrays
  • Much of this data is still being analyzed, but
    some of the goals of this and other analyses from
    this research group are to
  • Compare expression of loci in breaker-stage
    pericarp between tomato and pepper
  • Identify regulatory mechanisms governing the
    ripening process that determine fruit quality and
    / or nutritional characteristics
  • Other examples of discoveries using microarrays
    in plants include
  • The identification of the pathways involved in
    cold tolerance in Arabidopsis (Fowler Thomashow
    2002)
  • The identification of an acyl transferase that
    contributes to flavor development in strawberry
    (Aharoni et al. 2000)

30
Systems Biology the next step
Systems biology is a term used to describe
research that tries to integrate data from many
sources, which may include DNA sequence, gene
expression, functional, metabolite, genotype,
phenotype and other data, to produce a more
comprehensive understanding of a biological
process or processes. This is the next logical
step in genomics research, and could lead to such
understanding of how organisms function that we
will be better able to tackle the worlds large
problems, such as hunger, fuel and energy
reserves, and freshwater supplies (Minorsky
2003). One of the goals towards this end is the
creation of a so-called in silico plant, a
computer-modeled virtual plant that would be
available online to researchers around the world
to aid in understanding all plants functions
(for example, see The Arabidopsis Information
Resource website, the 2010 project
http//www.arabidopsis.org/portals/masc/projects.j
sp.
31
The rest of the story
Even after the whole sequence of an organisms
genome is known, there are still many things
unknown about how the organism functions, and
many steps involved in these processes. Below is
a brief summary of the various steps, the level
of study, and the name of the study of each of
these steps. This work is ongoing.
DNA sequence (genomics)
RNA (transcriptomics, expression profiling)
Protein (proteomics)
Metabolites (metabolomics)
Biological activity (functional genomics)
32
Future of Comparative Genomics
Miller et al. (2004) is a very comprehensive
overview of what we have learned from comparative
genomics and what the future holds. Their wish
list for future advances includes
  • Alignment Software that Automatically and
    Accurately Handles a Wider Spectrum of
    Evolutionary Operations
  • Better Tools for Identifying Well-Conserved
    Regions within Long Alignments
  • Precise and Comprehensive Formulations of the
    Genome-Comparison Problem (e.g. whole genome
    alignment)
  • Improved Methods to Evaluate Genome-Alignment
    Software
  • Improved Tools for Linking Alignments to Other
    Sequence-Based Information

Miller W, Makova KD, Nekrutenko A, Hardison RC
(2004) Comparative genomics. Annu Rev Genomics
Hum Genet 515-56
33
Ethical considerations
As with any scientific research, there are many
ethical (as well as legal and social)
implications that need to be considered with new
developments in plant genomics. Many of these
have to do mainly with the production of
transgenic crops (genetically modified
organisms). Some of the ethical questions
concerning genomic research are
  • Who will benefit, and will those who need it the
    most have access?
  • Are GM foods and other products safe to humans
    and the environment?
  • How will these technologies affect developing
    nations' dependence on the West?
  • Who owns genes and other pieces of DNA?
  • Will patenting DNA sequences limit their
    accessibility and development into useful
    products?

For more information about these questions and
current legislative issues, see the Human Genome
Project Ethical, Legal and Social Issues page
http//www.ornl.gov/sci/techresources/Human_Genome
/elsi/elsi.shtml
34
Credit in science
  • The increasing number of multi-institutional
    projects leads to questions about who gets credit
    for the research.
  • Who contributed to the work? (it may be hundreds
    of people)
  • Have they all contributed equally?
  • Who had the key ideas?
  • Which institution and author should be listed
    first?

This publication had 100 authors (Venter et al.
2001)
35
Accountability Fraudthe risks of community
data
Genbank (maintained by the National Center for
Biotechnology Information (NCBI) receives over 3
million new DNA sequences per month from a large
number of dispersed contributors. Though not
common, the risk of erroneous and low-quality
and/or even fraudulent data must be considered
and would be difficult to identify. The error
rate in Genbank, as an example, has remained at a
low 0.1 over many years but sequence data in the
database exceeded 100 gigabases in 2005.
You must be ready to personally confirm ALL
data!
36
Impact of comparative genomics
The impact of comparative genomics will be
far-reaching. For example
The genomic revolution is having a tremendous
impact on the study of natural variation. It is
making it possible finally to discover the
molecular basis of complex traits, a fundamental
question in evolutionary biology, and a question
of immense practical importance in many other
fields. (Borevitz Nordborg 2003) This will
not only help us understand biology better, but
aid in our exploitation of natural diversity for
crop improvement, plant breeding efforts,
biodiversity conservation ..all things
important to the quality of life on this planet.
Borevitz JO Nordborg M (2003) The impact of
genomics on the study of natural variation in
Arabidopsis. Plant Physiology 132 718725.
Write a Comment
User Comments (0)
About PowerShow.com