Title: HW Clarifications
1HW Clarifications
Identity and Homology
- Homology implies shared ancestry
- Partial sequence identity does not necessarily
imply homology - A high coverage of sequence identity can imply
homology
2HW Clarifications
Insertions and Deletions
3Prediction of functional/structural sites in a
protein using conservation and hyper-variation
(ConSeq, ConSurf, Selecton)
4Empirical findings ofconservation variation
among sites
Functional/Structural sites evolve slower than
nonfunctional/nonstructural sites
5Conservation functional/structural importance
6Histone 3 protein
7Alignment pre-pro-insulin
Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHL Bos
MALWTRLRPLLALLALWPPPPARAFVNQHL
. .. . Xenopus
CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ Bos
CGSHLVEALYLVCGERGFFYTPKARREVEG
Xenopus
AQVNGPQDNELDG-MQFQPQEYQKMKRGIV Bos
PQVG---ALELAGGPGAGGLEGPPQKRGIV
.. Xenopus
EQCCHSTCSLFQLENYCN Bos
EQCCASVCSLYQLENYCN
.
8(No Transcript)
9(No Transcript)
10Conservation based inference
- Conserved sites
- Important for the function or structure
- Not allowed to mutate
- Slow evolving sites Low rate of evolution
- Variable sites
- Less important (usually)
- Change more easily
- Fast evolving sites High rate of evolution
11Detecting conservation Evolutionary rates
- Rate distance/time
- Distance number of substitutions per site
- Time 2years (doubled because the sequences
evolved independently)
d
12Rate computation
MSA
Phylogeny
Evolutionary Model
13http//conseq.tau.ac.ilSite-specific rate
computation tool
14Locating the active site of Pyruvate kinase
Glycolysis pathway
15(No Transcript)
16(No Transcript)
17(No Transcript)
18Conservation scores
- The scores are standardized the average score of
all residues is 0, and the standard deviation is
1 - Negative values slowly evolving ( low
evolutionary rate). conserved sites - The most conserved site in the protein has the
lowest score - Positive values rapidly evolving ( fast
evolutionary rate). variable sites - The most variable site in the protein has the
highest score
Scores are relative to the protein and cannot be
compared between different proteins!!!
19(No Transcript)
20(No Transcript)
21Combining protein structure
- Each protein has a particular 3D structure that
determines its function - Protein structure is better conserved than
protein sequence and more closely related to
function - Analyzing a protein structure is more
informative than analyzing its sequence for
function inference
22Conservation in the structure
Protein core structurally constrained - usually
conserved Active site functionally constrained
- usually conserved Surface tolerant to
mutations - usually variable
Active site
Surface
Core
23http//consurf.tau.ac.il
Same algorithm as ConSeq, but here the results
are projected onto the 3D structure of the
protein
24The structure-function of the potassium channel
transmembrane region
cytoplasm
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29 ConSeq/ConSurf user intervention(advanced
options)
- Choosing the method for calculating the
amino-acid conservation scores (Bayesian/Max
Likelihood) - Entering your own MSA file
- Performing the MSA using (MUSCLE/CLUSTALW)
- Collecting the homologs from (SWISS-PROT/UniProt)
- Max. number of homologs (50)
- No. of PSI-BLAST iterations (1)
- PSI-BLAST 3-value cutoff (0.001)
- Model of substitution for proteins
(JTT/Dayhoff/mtREV/cpREV/WAG) - Entering your own PDB file
- Entering your own TREE file
30Codon-level selection
- ConSeq/ConSurf
- Compute the evolutionary rate of amino-acid sites
? the data are amino acids - Compute only the rate of non-synonymous
substitutions
UUU ? UUC (Phe ? Phe ) synonymous UUU ? CUU
(Phe ? Leu) non-synonymous
31Synonymous vs. non-synonymous substitutions
For most proteins, the rate of synonymous
substitutions is much Higher than the
non-synonymous rate This is called purifying
selection ( conservation in ConSeq/Surf)
32Synonymous vs. nonsynonymous substitutions
There are rare cases where the non-synonymous
rate is much higher than the synonymous rate
This is called positive (Darwinian) selection
33Positive Selection
- The hypothesis
- promotes the fitness of the organism
- Examples
- Pathogen proteins evading the host immune system
- Proteins of the immune system detecting pathogen
proteins - Pathogen proteins that are drug targets
- Proteins that are products of gene duplication
- Proteins involved in the reproductive system
34Computing synonymous and non-synonymous rates
Phylogeny
Codon MSA
Evolutionary Model
35Inferring positive selection
-
- Look at the ratio between the non-synonymous rate
(Ka) and the synonymous rate (Ks)
36Inferring positive selection
- Ka/Ks lt 1 purifying selection
- Ka/Ks gt 1 positive selection
- Ka/Ks 1 no selection (neutral)
37- Our evolutionary model assumes there is positive
selection in the data - By chance alone we expect our model to find a few
sites with Ka/Ks gt1 - Is this really indicative of positive selection
or plain randomness?
- Maybe theres no positive selection after all?
38Solution statistically compare between hypotheses
- H0 Theres no positive selection
- H1 There is positive selection
- Perform a statistical test to accept or reject H0
- (likelihood ratio test)
39Note saturation of synonymous substitutions
Syn.
Nonsyn.
Human and wheat are too evolutionary remote
saturation of synonymous substitutions Pick
closer sequences for positive selection analysis
40http//selecton.tau.ac.il
41Selecton input
Codon-level sequences !!!
- Coding sequences - only ORFs
- No stop codons
- If an MSA is provided it must be codon aligned
(RevTrans) - The user must provide the sequences no
psi-blast option
42Positive selection in the primateTRIM5a
43PrimateTRIM5a
TRIM5a from humans, rhesus monkeys, and African
green monkeys are all unable to restrict
retroviruses isolated from their own species, yet
are able to restrict retroviruses from the other
species
TRIM5a is an important natural barrier to
cross-species retrovirus transmission
TRIM5a is in an antagonistic conflict with the
retroviral capsid proteins
TRIM5a is under positive selection
44Positive selection analysis
45Positive selection analysis in Selecton
H1
H0
46Comparing H0 and H1 in Selecton
47Comparing H0 and H1 in Selecton
48(No Transcript)
49Selecton results
50(No Transcript)
51Results
Human rhesus swaps at sites 332, 335-340 (SPRY)
significantly elevate human resistance to HIV and
rhesus resistance to SIV