Human Genome: Mapping, sequencing, Techniques Diseases - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Human Genome: Mapping, sequencing, Techniques Diseases

Description:

... so we all: humans and oak trees, butterflies and whales are distant cousins. ... So it is reciprocal evolutionary change in interacting species. ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 30
Provided by: admi1145
Category:

less

Transcript and Presenter's Notes

Title: Human Genome: Mapping, sequencing, Techniques Diseases


1
Human Genome Mapping, sequencing, Techniques
Diseases
Lecture 6 BINF 7580
1
2
In the last lectures we discussed biological
evolution The definitionBiological evolution
encompasses small-scale evolution (changes in
gene frequency in a population from one
generation to the next) and large-scale evolution
(the descent of different species from a common
ancestor over many generations. The central idea
of biological evolution is that all life on Earth
shares a common ancestor, so we all humans and
oak trees, butterflies and whales are distant
cousins.
Over a large number of years, evolution produces
tremendous diversity in forms of life.
A genealogy illustrates change with inheritance
over a small number of years..
3
We know four basic mechanisms of evolutionary
change. Mutation, Migration, Genetic drift and
Natural selection
H.A. Shortly describe these processes.
Today we will talk about Coevolution First some
definitions coevolution is a change in the
genetic composition of one species (or group) in
response to a genetic change in another. So it is
reciprocal evolutionary change in interacting
species.
For example, an evolutionary change in the
morphology of a plant, might affect the
morphology of an herbivore that eats the plant,
which in turn might affect the evolution of the
plant, which might affect the evolution of the
herbivore...and so on.
4
  • Coevolution is likely to happen when different
    species have close ecological interactions with
    one another. These ecological relationships
    include
  • Predator/prey and parasite/host
  • Competitive species
  • Mutualistic species
  • Predation is used here to include all (almost
    all) interactions in which one organism consumes
    all or part of another. For example. all animals
    must eat to survive. With predators always on the
    lookout for a meal, prey must constantly avoid
    being eaten.
  • Predation is an important evolutionary force
    natural selection favors more effective predators
    and more evasive prey.

5
The Development of Predation Theory Mathematical
models of predation are amongst the oldest in
ecology. The idea that a coupled system of
predator and prey would cycle (predator-prey
cycle).
The Lotka-Volterra
equations, is the predator-prey equations, in
which two species interact, one a predator and
one its prey.
A pair of first order, non-linear, differential
equations, where
y is the number of
some predator (for example, wolves)

x is the number of its prey (for example,
rabbits)
dy/dt and dx/dt represents the
growth of the two populations against time
t
represents the time and
a, ß, ? and d are parameters
representing the interaction of the two species
6
The equations have periodic solutions, i.e. a
function repeats its values after some definite
period
.
An example problem Suppose there are two species
of animals, a baboon (prey) and a cheetah
(predator). If the initial conditions are 80
baboons and 40 cheetahs, one can plot the
progression of the two species over time. Time is
dimensionless.
7
Logic and mathematical theory suggest that when
prey are numerous their predators increase in
numbers, reducing the prey population, which in
turn causes predator number to decline. The prey
population eventually recovers, starting a new
predator-prey cycle.
H.A. What do you think the Predation Theory is a
part of ecology problem only or it is
investigated in Evolution theory as well? Why?
But sometime a predator-prey cycle doesnt
work. TOO MANY DEER? Deer
population there are few natural predators
around such as wolves, bears, and lynx to
keep deer populations in check. But The hunting
season has begun in Princeton, NJ. It's part of a
five-year plan to reduce the township's deer
population from 1600 to just 400. Deer wander
onto roadways and cause traffic accidents that
frequently result in injury and even death.
8
2. Competitive species
Competition between members of a species for
resources such as food, water, territory is the
driving force behind evolution and natural
selection. Thus, each of the species competes
with the others to gain the resources. As a
result, several species less suited to compete
for the resources may either adapt or die out.
For example, a smaller tree
will receive less sunlight from an adjacent tree
which is larger, so the larger tree is competing
with the smaller one.
Gause's Law of competitive exclusion states that
two species that compete for the exact same
resources cannot stably coexist. One of the two
competitors will always have an ever so slight
advantage over the other that leads to extinction
of the second competitor in the long run.
9
3. Mutualistic species both of the interacting
species profit from the interaction.

Such interactions within a species are
known as co-operation.

Mutualistic arrangements are most likely to
develop between organisms with widely differing
living requirements.
Exampes of Mutualism Mushrooms aids the tree in
absorbing water from the soil, increases the
stability of the root system, and protects the
roots from drying out and the effects of heavy
metals. In return the tree provides sugars and
starches to the mushrooms that they use in its
metabolism.
10
The paper Mutualistic Webs of Species (Science,
2006, Vol. 312. no. 5772, pp. 372 - 373 ,
http//www.sciencemag.org/cgi/content/full/312/577
2/372) The following points are made by J.
Thompson 1. No multicellular eukaryotic organism
is capable of surviving and reproducing using
only its nuclear genes and the gene products it
makes. The central importance of mutualistic
associations can be presented in a simple thought
experiment Try to imagine a plant that can
survive and reproduce in a real ecosystem without
using, in addition to its nuclear genome, most of
the following a mitochondrial genome (to convert
energy) a chloroplast genome (to regulate
photosynthesis) mycorrhizal fungal genomes (to
improve nutrient and water uptake) the genomes
of pollinators (to assist in reproduction) and
the genomes of a few birds, mammals, or ants (to
move seeds around the ecosystem). Each plant is
part of a complex web of interacting mutualists.
H.A. Read this paper and present other statements
and ideas of the paper.
11
It might seem that because all organisms interact
with other organisms, everything is involved in
coevolution. But definitely we cant say that
any evolutionary change is a result of "evolution
together" In fact, in many cases there is not
good evidence for coevolution because the
reciprocal changes, i.e. a result of or evolved
from the interaction between the two species. The
classic example a plant has chemical defenses,
an insect evolves the biochemistry to detoxify
these compounds, the plant in turn evolves new
defenses that the insect in turn "needs" to
further detoxify. It looks as the coevolution
mutations, is it correct?
H.A. Why? any examples
12
However, to have the evidence for these types of
reciprocal adaptations we have to have the
evidence of the relative timing of the evolution
of the various traits that appear to be part of
the coevolution. If the presumed reciprocally
induced, sequential traits actually evolved in
the plant (host) before the insect (parasite)
became associated with it, we should not call it
coevolution.
Thus presence of a parasite on a host does not
constitute evidence for coevolution.
13
Summary In order to survive and reproduce, each
organism must adapt to its physical environment,
as well as to the other organisms in that
environment it shares relationships with.
You can watch a
video about the different types of symbiotic
relationships that exist between species
(commensalism, mutualism, predator and prey, and
competition) in Australia's Great Barrier Reef.
http//www.teachersdomain.org/resources/tdc02/sci/
life/eco/coralreefconnections/index.html
14
The concept of correlated mutations is one of the
fundamental ideas behind the theory of
coevolution.
Once a residue is changed this mutation can be
compensated by an additional mutation of a
complementary residue in a protein. Let us think
how many questions we have to ask to understand
this statement. Q1. How do we know that these
mutations of residues are correlated? A1.
Coevolution is evident at the molecular level,
when mutations exhibit similar evolution rates.

(Evolutionary rate in the protein
interaction network by Fraser et al Science,
2002296750)
15
Q2. How do we know that these residues are
complementary to each other in a correlated
mutation. A2 See the answer 1. Both complementary
residues have the same evolution rates. Q3. Where
residue correlation is observed? A3 The concept
of coevolution is not restricted to certain sites
in a protein structure. It can be observed in
both an interior of protein, residue pair
stabilize the protein fold, both on a surface.
Functional constraints are expected to limit the
amino acid mutation, resulting in a higher
conservation of functional sites with respect to
the rest of the protein surface. Once a residue
is changed, given the functional constraints
operating on it, this mutation can be compensated
by an additional mutation of a complementary
residue across the interface. This enables the
coevolution of two proteins that can lead to high
specificity and high affinity.
H.A. Do you have more questions to this
statement? Try to ask and answer.
16
Biological Mechanisms Generating Correlated
Mutations (Halperin et al. Proteins, 2006) Two
major sources for covariation/ correlated
mutations are currently known mutations and gene
conversion. Mutations two independently
occurring single mutations are preserved by
positive selection. This process should be
independent of the genomic distance between the
mutated positions (if the effect of mutational
hot spots is neglected). It is expected to play
an equal role in intra- and intergenic correlated
mutations. Gene conversion can only contribute
to intragenic correlated mutations.
H.A. Explain the comments about effect of hot
spots
17
Gene conversion is a nonreciprocal transfer of
genetic information.
.
(a) Two DNA molecules. (b) Gene conversion - the
red DNA donates part of its genetic information
(e-e' region) to the blue DNA.  The blue DNA uses
the invaded segment (e') as template to "correct"
the mismatch, resulting in gene conversion (c)
DNA crossover - the two DNAs exchange part of
their genetic information (f-f' and F-F').
H.A. Explain what is the difference between gene
conversion and crossover
18
  • Applications of Correlated Mutations for
  • RNA secondary structure prediction, RNA folding
    and RNAprotein interface prediction.
  • 2) Protein folding prediction, guided mutagenesis
    experiments, fold recognition is based on the
    analysis of the genetic location of the
    correlated residues positions that are in the
    same gene, while the second group
  • 3) Proteinprotein interaction prediction is
    based on the analysis intergenic correlated
    residue pairs.

H.A. How correlated mutation can be used for RNA
prediction? Hint
Strands formation in RNA because of corresponded
W-C pairing. Correlated mutations of nucleotides
gives us...
19
Correlated Mutations how to measure It is clear
that to measure correlated mutation we have to
use the concept from the theory of probability
and statistics. Correlation coefficient refers to
the departure of two variables from independence.
The best known is the Pearson correlation
coefficient (r). What is a correlation? To get a
measure of how strongly X and Y values are
related, we will use the correlation coefficient.
Correlation is concerned with trends if X
increases, does Y tend to increase or decrease?
How much? How strong is this tendency? A
correlation is a number between -1 and 1 that
measures the degree of association between two
variables. A positive value for the correlation
implies a positive association (large values of X
tend to be associated with large values of Y and
small values of X tend to be associated with
small values of Y). A negative value for the
correlation implies a negative or inverse
association (large values of X tend to be
associated with small values of Y and vice versa).
20
r  -1
r  1
r  0
data lie on a perfect data lie on a perfect
no linear relationship straight line with
straight line with between the
variables a negative slope a
positive slope
21
The formula for the Pearson correlation Suppose
we have a series of n  measurements of X  and Y 
written as Xi  and Yi  where i 1, 2, ..., n, ,
with means Xm and Ym respectively and standard
deviations SX and SY respectively.
The standard deviation measures how widely spread
the values in a data set are. Formally, the
standard deviation is the root mean square (RMS)
deviation of values from their arithmetic mean.
For example, in the population 4, 8, the mean
is 6 and the deviations from mean are -2, 2.
Those deviations squared are 4, 4 the average
of which (the variance) is 4. Therefore, the
standard deviation is 2. In this case 100 of the
values in the population are at one standard
deviation of the mean.
the Pearson coefficient can be used to estimate
the correlation between X and Y
22
When will a correlation be positive? Suppose that
an X value was above average, and that the
associated Y value was also above average. Then
the product
would be the product of two positive numbers
which would be positive. If the X value and the Y
value were both below average, then the product
above would be of two negative numbers, which
would also be positive. Therefore, a positive
correlation is evidence of a general tendency
that large values of X are associated with large
values of Y and small values of X are associated
with small values of Y. It is clear when a
correlation will be negative and a large values
of X are associated with small values of Y.
23
studying residue coevolution in proteins
Bioinformatics, 2008,
290 Coevolving residues in a protein are detected
in a two-step process (1) construction of the
multiple sequence alignment (MSA) of the protein
and its homologs (2) a coevolution score is
calculated for each pair of sites in the MSA.

There are
two main difficulties in this process. First, it
can be difficult to choose from a large number of
scoring functions , as they exhibit subtle yet
significant differences.
Second, coevolution analyses could be
confounded by uneven sequence representations,
insufficient evolutionary divergence and the
presence of gaps in the MSA.
A
successful coevolution study has to take all
these details into account.
24
Coevolution analysis of protein residues Getting
started A complete analysis of a protein
involves the following steps 1. Data
gathering ? Multiple sequence alignment (MSA)
it provides the basic information for computing
the coevolution signal between site pairs. ?
Phylogenetic tree it is used in tree-based
sequence weighting
(The Phylogenetic tree will be analyzed in our
next lectures) ? Structure if the crystal
structure of the reference protein is known, it
can be used in the analysis to inspect the
correlation between the coevolution scores and
the inter-residue distances. 2. Inter-residue
distance calculation The distance between two
residues is defined as the minimum distance
between the two closest atoms of the residues
minus their van der Waals radii. 3. Coevolution
score calculation is based on the specified
score function and preprocessing methods. By
convention, a higher score implies a stronger
coevolution signal.
25
4. Coevolution score plotting this step produces
scatterplots of the inter-residue distances
against the coevolution scores to give a rough
illustration of the correlation between the two.
5. Coevolution score analysis this step
analyzes the effectiveness of the coevolution
score functions and the preprocessing methods in
predicting the inter-residue distances.
Coevolution score function For a pair of sites i
and j in an MSA, the correlation score is
where sikl is the score for substituting the i-th
residue of sequence k by that of sequence l,
are the mean and SD of
substitution scores at site i, N is the number of
sequences in the MSA and wkl is the weight for
the sequence pair k, l. If the two sites
are coevolving then the substitutions at the
first site are accompanied by radical
substitutions at the second site, so the
correlation is high.
26
For calculation uses the classical McLachlan
matrix that scores substitutions based on the
physiochemical properties of the residues, as
well on residue volume, pI and hydropathy index,
The isoelectric point (pI). Amino acids which
make up proteins may be positive, negative,
neutral or polar in nature, and together give a
protein its overall charge. Proteins are affected
by pH of their surrounding environment and can
become more positively or negatively charged due
to the loss or gain of (H).
pI is the pH at which a molecule carries no
net electrical charge. At a pH below their pI,
proteins carry a net positive charge. Above their
pI they carry a net negative charge.
The hydropathy index of a protein is a number
representing its hydrophilic or hydrophobic
properties. This is a very important parameter in
protein structure.
H.A. Why does hydropathy index is a very
important parameter in protein structure?
27
A little more details about matrix of residue
substitutions Volume change it assumes that
coevolving sites should be compensatory in terms
of volume. pI change it assumes that coevolving
sites should exhibit not only correlated
substitutions in terms of pI changes, the
substitutions should also be compensatory in
terms of pI. Hydropathy index change it assumes
that coevolving sites should exhibit not only
correlated hydropathy index changes, the
substitutions should also be compensatory in
terms of hydropathy index.
28
Coevolution score analysis Protein structure The
protein structure is used to evaluate the
effectiveness of the score functions in
predicting the inter-residue distances. The
distance between two residues is defined as the
shortest distance between the two residues with
the corresponding residue numbers. Sequence
filtering Maximum gaps per sequence If a sequence
contains very many gaps, it will be removed from
the MSA before the calculation of coevolution
scores. Maximum similarity between two
sequences The similarity between two sequences is
defined as the fraction of sites that the two
sequences have exactly the same non-gap residues.
For any pair of sequences, if the similarity
between them is higher than the parameter, one of
them will be filtered.
29
H.A. Make Coevolution analysis of protein
residues using the integrated system.
http//coevolution.gersteinlab.org/coevolution/in
dex.jsp Click Load example and explain the
results.
Write a Comment
User Comments (0)
About PowerShow.com