Title: Molecular Evolution and Phylogeny
1Molecular Evolution and Phylogeny
2Weakly deleterious mutations
- Weakly deleterious mutations can reach high
frequencies in local populations and, thus, may
contribute significantly to genetic variance in
disease susceptibility.
3Sequencing of human polymorphisms
- A team at Celera Genomics sequenced by
exon-specific polymerase chain reaction (PCR)
amplification 20,362 loci in 20 European
Americans, 19 African Americans and one male
chimpanzee with the initial intention of finding
novel nonsynonymous single nucleotide
polymorphisms (SNPs) based on their 2001 build of
the human genome.
4Divergence between human and mouse
- A total of 34,099 fixed synonymous differences
between 39 humans and the chimpanzee yield a
genomic average synonymous divergence of dS
1.02. - 20,467 non-synonymous differences dN 0. 242
across 11.81 megabases (Mb) of aligned coding DNA.
5Polymorphisms
- 15,750 synonymous and 14,311 non-synonymous SNPs
among the human subjects, yielding average
synonymous and non-synonymous SNP densities of pS
0.470 and pN 0.169.
6Polymorphisms are more than divergence
- a highly significant excess of amino acid
variation relative to divergence.
7Can you comment on the following?
- Evolution of human populations since sharing a
last common ancestor with chimps - Type of nonsynonymous mutations (very deleterious
or mildly deleterious) in human populations - Positively selection
- Negative selection
- Disease associations?
8Non-neutral evolution
- dN/dS 1 neutral evolution
- dN/dS gt 1 positive selection
- dN/dS lt1 negative selection
9Accelerated evolution of genes
10What makes us a vertebrate?
- Neural crest?
- Highly sophisticated nervous system?
- Bones/cartilage?
- Vertebrate specific genes?
11Origin of bilateria
- Some vertebrate genes date prior to the origin of
bilateria
12Bilateria
- Bilateria a monophyletic group of metazoan
animals characterized by bilateral symmetry.
13Radial symmetry
- Bilateria excludes the Cnidaria, Ctenophora (sea
gooseberries), Porifera (sponges) and Placozoa.
14A little taxonomy
15Cnidaria
- Cnidaria a basal phylum, has two body layers,
radial symmetry and being at the tissue grade of
morphological organization. - There are two basic morphologies the sessile
polyp and the swimming medusa or jellyfish. - The phylum contains four classes (examples),
including jellyfish, sea anemone and hydra
16Body Axis
- Oralaboral axis the single obvious body axis of
the two radiate phyla (Cnidaria and
Ctenophora), marked at one end by the mouth or
oral pore.
17Wnts signaling
http//www.stanford.edu/rnusse/reviews/NaVReviewF
inal438747a.pdf
18Wnt Signaling
- In Wnt signalling pathway, ligand binding
triggers the formation of a receptor complex, and
protein kinases modify the receptor tails,
leading to recruitment of cytoplasmic factors. - In other signalling pathways, receptor-induced
protein phosphorylation amplifies the signal, and
the receptor-associated kinase acts as a catalyst
for the modification of many substrate molecules.
19Wnt genes
- Mammals have 19 wnts
- Sea anemone has 12
- Nematostella vectensis, a diploblast
Kusserow A, Pang K, Sturm C, Hrouda M, Lentfer J,
Schmidt HA, Technau U, von Haeseler A, Hobayer B,
Martindale MQ, Holstein TW (2005) Unexpected
complexity of the Wnt gene family in a sea
anemone. Nature 433156-160.
20Nematostella vectensis
http//www.nematostella.org/
21Phylogenetic tree of wnts
22Expression of wnts
The original bilaterian was equipped with a
fairly elaborate set of molecular tools.
23Endoderm, ectoderm, mesoderm
- For example, the Nematostella ectodermal genes,
NvWnt1, NvWnt2, NvWnt4 and NvWnt7 correspond to
the neuroectodermal Wnt genes in the higher
Bilateria. - NvWnt5, NvWnt6 and NvWnt8 are expressed in the
endoderm, whereas the corresponding genes in
deuterostomes are all expressed in the mesoderm.
24Collagen
- Bone is significantly linked to cartilage, both
in development and evolution, with earlier forms
having a cartilaginous skeleton that is replaced
by bone. In vertebrates, cartilage also contains
threads of collagen running through it.
25Collagen
- Bone is a living tissue continually remodeling
the mineral matrix threaded with fibers of a
protein, type II collagen, gives strength.
26Collagen
- Collagen is an ancient protein (800 million years
ago?). - There are about 27 different types of collage in
at least a dozen different classes. - http//web.indstate.edu/thcme/mwking/extracellular
matrix.html - One particular type, type II collagen, is an
essential part of the matrix of bones and
cartilages.
27A primitive jawless fish from the late Devonian,
around 370 million years ago. Do lampreys have
collagen?
28Initially it was thought lampreys dont have
collagen
- Zhang et al. screened a library of lamprey
sequences and isolated two forms of collagen II,
Col2a1a and Col2a1b. - The presence of a collagen homolog related to
human collagen II the gene arose before the
(jawless)lamprey-gnathostome (true-jaws) split. - Col2a1 is used in developing branchial
cartilaginous skeleton.
Proc Natl Acad Sci U S A. 2006 Feb 21 Lamprey
type II collagen and Sox9 reveal an ancient
origin of the vertebrate collagenous
skeleton. Zhang G, Miyamoto MM, Cohn MJ.
29but they do!
30Collagen phylogeny
31Bootstrapping
- The bootstrap is a procedure that involves
choosing random samples with replacement from a
data set and analyzing each sample the same way.
32Bootstrapping
- Sampling with replacement means that every sample
is returned to the data set after sampling. So a
particular data point from the original data set
could appear multiple times in a given bootstrap
sample.
33Bootstrapping
- The number of elements in each bootstrap sample
equals the number of elements in the original
data set. The range of sample estimates we obtain
allows us to establish the uncertainty of the
quantity we are estimating.
34Reliability of a tree
- reliability of an estimated tree is to examine
the reliability of each interior branch.
35Bootstrap
- the reliability of an inferred tree is examined
by using Efrons bootstrap resampling technique. - A set of nucleotide sites is randomly sampled
with replacement from the original set, and this
random set is used for constructing a new
phylogenetic tree. - This process is repeated many times, and the
proportion of replications in which a given
sequence cluster appears is computed. - If this proportion (PB) is high (say, PB gt 095)
for a sequence cluster, this cluster is
considered to be statistically significant.
36Bootstrap values
37Bootstrapping
- Open Matlab
- Open Help
- Type bootstrap and read
38Example
- gt load lawdata
- gt plot(lsat,gpa,'')
- gt lsline
39Plot of lsat vs. gpa
40Calculate correlation between lsat and gpa
- gt rhohat corrcoef(lsat,gpa)
- gt rhohat
- 1.0000 0.7764
- 0.7764 1.0000
41Is 0.78 significant?
- Now we have a number, 0.7764, describing the
positive connection between LSAT and GPA, but
though 0.7764 may seem large, we still do not
know if it is statistically significant.
42Bootstrp function
- Using the bootstrp function we can resample the
lsat and gpa vectors as many times as we like and
consider the variation in the resulting
correlation coefficients.
43Generate 1000 lsat and gpa vectors by resampling
from the original vectors
- rhos1000 bootstrp(1000,'corrcoef',lsat,gpa)
- hist(rhos1000(,2),30)
44What is the uncertainty associated with the
observed correlation?
- gtgt mean(rhos1000(,2))
- ans
- 0.7711
- gtgt std(rhos1000(,2))
- ans
- 0.1350
- gtgt 0.13501.96
- ans
- 0.2646
- Mean /-1.96std
45You have data on the expression pattern of two
genes
- HOXA1 and CDK6 expression values in different
tissues are collected. - Open the excel file named data.xls
- Copy and paste the numerial data columns (two of
them) into the workspace as follows naming the
data as a - gtgt a paste.here and close bracket
46Calculate the uncertainty associated with the
correlation btw HOXA1 and CDK6 genes
- Plot the expression values (x, HOXA1 and y,
CDK6). - Place a lsline on the data
- Calculate the correlation coefficient between the
genes - Generate 1000 bootstrapped samples to estimate
the sample correlation coefficient. - Determine the 95 confidence interval around the
bootstrapped correlation coefficient.
47Bootstrap of align2.m
- Generate 1000 samples of bootstraped alignment
score and its 95 confidence interval using the
bootstrp function.
48Bayesian Inference
- There are three basic methods that have been used
to estimate phylogeny, including distance,
maximum parsimony (MP),and maximum likelihood
(ML). - Bayesian statistics differs in that in addition
to the current data, prior knowledge is included
in the testing of the hypothesis.
49Medical tests and Bayesian Stats
- Assume that previous studies have evaluated the
accuracy of this test and have shown that, if you
are in fact ill, there is a 99 likelihood that
the test will give a true positive result (and
thus, a 1 likelihood that the test will give a
false negative).
50Medical tests and Bayesian Stats
- It was also found that if you are healthy, there
is a 0.1 likelihood of a false positive result
from the test. If we were simply using the data
(i.e., the test result), we would then conclude
that a positive test result had approximately a
99 chance of being correct.
51Medical tests and Bayesian Stats
- If we were to examine this question in a Bayesian
framework, we could incorporate prior
knowledgein this case that other studies have
shown that the base rate of this illness is 0.1
in the population. - Thus, of a population of 100,000 individuals, 100
would be ill and 99,900 would be healthy.
52Medical tests and Bayesian Stats
- Using the likelihood values mentioned above, we
could conclude that a positive test result would
be seen in 99 of the ill individuals (99 true
positives) and 0.1 of the healthy individuals
(approximately 100 false positives).
53Medical tests and Bayesian Stats
- This leaves us with a conclusion that if a person
has a positive test result, there is a 99/199 or
approximately 50 chance that the test is correct
and this person is actually ill. Therefore, by
including prior knowledge of the base rate of the
illness in the population, the perceived chance
that a positive result indicates that an
individual actually has the illness drops from
99 to 50.