1-Month Practical Master Course - PowerPoint PPT Presentation

About This Presentation

Title:

1-Month Practical Master Course

Description:

1-Month Practical Master Course. Genome Analysis. Jaap Heringa ... Salamander 100 109. Amoeba dubia 670 109. Three main principles. DNA makes RNA makes Protein ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 37

Provided by: heri4

Category:

more less

Transcript and Presenter's Notes

Title: 1-Month Practical Master Course

1
1-Month Practical Master Course Genome
AnalysisJaap Heringa Centre for Integrative
Bioinformatics VU (IBIVU) Vrije Universiteit
Amsterdam The Netherlands www.ibivu.cs.vu.nl heri
nga_at_cs.vu.nl
2
(No Transcript)
3
Biological Sequence AnalysisPair-wise sequence
alignment Residue exchange matrices Multiple
sequence alignment Phylogeny
4
DNA sequence
.....acctc ctgtgcaaga acatgaaaca nctgtggttc
tcccagatgg gtcctgtccc aggtgcacct gcaggagtcg
ggcccaggac tggggaagcc tccagagctc aaaaccccac
ttggtgacac aactcacaca tgcccacggt gcccagagcc
caaatcttgt gacacacctc ccccgtgccc acggtgccca
gagcccaaat cttgtgacac acctccccca tgcccacggt
gcccagagcc caaatcttgt gacacacctc ccccgtgccc
ccggtgccca gcacctgaac tcttgggagg accgtcagtc
ttcctcttcc ccccaaaacc caaggatacc cttatgattt
cccggacccc tgaggtcacg tgcgtggtgg tggacgtgag
ccacgaagac ccnnnngtcc agttcaagtg gtacgtggac
ggcgtggagg tgcataatgc caagacaaag ctgcgggagg
agcagtacaa cagcacgttc cgtgtggtca gcgtcctcac
cgtcctgcac caggactggc tgaacggcaa ggagtacaag
tgcaaggtct ccaacaaagc aaccaagtca gcctgacctg
cctggtcaaa ggcttctacc ccagcgacat cgccgtggag
tgggagagca atgggcagcc ggagaacaac tacaacacca
cgcctcccat gctggactcc gacggctcct tcttcctcta
cagcaagctc accgtggaca agagcaggtg gcagcagggg
aacatcttct catgctccgt gatgcatgag gctctgcaca
accgctacac gcagaagagc ctctc.....
5
Genome size

Organism Number of base pairs
?X-174 virus 5,386
Epstein Bar Virus 172,282
Mycoplasma genitalium 580,000
Hemophilus Influenza 1.8 ? 106
Yeast (S. Cerevisiae) 12.1 ? 106
Human 3.2 ? 109
Wheat 16 ? 109
Lilium longiflorum 90 ? 109
Salamander 100 ? 109
Amoeba dubia 670 ? 109

6
Three main principles

DNA makes RNA makes Protein
Structure more conserved than sequence
Sequence Structure Function

7
Regulation, signalling cascades, chaperonins,
compartmentalisation
8
How to go from DNA to protein sequence
A piece of double stranded DNA 5
attcgttggcaaatcgcccctatccggc 3 3
taagcaaccgtttagcggggataggccg 5
DNA direction is from 5 to 3
9
How to go from DNA to protein sequence
6-frame translation using the codon table (last
lecture) 5 attcgttggcaaatcgcccctatccggc
3 3 taagcaaccgtttagcggggataggccg 5
10
Evolution and three-dimensional protein structure
information
Isocitrate dehydrogenase The distance from the
active site (in yellow) determines the rate of
evolution (red fast evolution, blue slow
evolution)
Dean, A. M. and G. B. Golding Pacific Symposium
on Bioinformatics 2000
11
Protein Sequence-Structure-Function
Ab initio prediction and folding
Sequence Structure Function
Threading
Function prediction from structure
Homology searching (BLAST)
12
Widely used tool for homology detection PSI-BLAST

Heuristic tool to cut down computations required
for database searching (1M sequences in DB)
Sensitivity gained by iteratively finding hits
(local alignments) and repeating search

Q
hits
DB
T
PSSM
13
Threading
Template sequence
Compatibility score
Query sequence
Template structure
14
Threading
Template sequence
Compatibility score
Query sequence
Template structure
15
Fold recognition by threading
Fold 1 Fold 2 Fold 3 Fold N
Query sequence
Compatibility scores
16
Bioinformatics

Nothing in Biology makes sense except in the
light of evolution (Theodosius Dobzhansky
(1900-1975))
Nothing in bioinformatics makes sense except in
the light of Biology

17
Divergent evolution

Ancestral sequence ABCD
ACCD (B C)
ABD (C ø)
ACCD or ACCD
Pairwise Alignment
AB-D A-BD

mutation deletion
18
Divergent evolution

Ancestral sequence ABCD
ACCD (B C)
ABD (C ø)
ACCD or ACCD
Pairwise Alignment
AB-D A-BD

mutation deletion
true alignment
19
Mutations under divergent evolution
G
(a)
G
(b)
Ancestral sequence
G
C
A
C
One substitution - one visible
Two substitutions - one visible
Sequence 1
Sequence 2
G
(c)
G
(d)
1 ACCTGTAATC 2 ACGTGCGATC D 3/10
(fraction different sites (nucleotides))
G
A
A
A
Back mutation - not visible
Two substitutions - none visible
G
20
Convergent evolution

Often with shorter motifs (e.g. active sites)
Motif (function) has evolved more than once
independently, e.g. starting with two very
different sequences adopting different folds
Sequences and associated structures remain
different, but (functional) motif can become
identical
Classical example serine proteinase and
chymotrypsin

21
Serine proteinase (subtilisin) and chymotrypsin

Different evolutionary origins, no sequence
similarity
Similarities in the reaction mechanisms.
Chymotrypsin, subtilisin and carboxypeptidase C
have a catalytic triad of serine, aspartate and
histidine in common serine acts as a
nucleophile, aspartate as an electrophile, and
histidine as a base.
The geometric orientations of the catalytic
residues are similar between families, despite
different protein folds.
The linear arrangements of the catalytic residues
reflect different family relationships. For
example the catalytic triad in the chymotrypsin
clan (SA) is ordered HDS, but is ordered DHS in
the subtilisin clan (SB) and SDH in the
carboxypeptidase clan (SC).

22
A protein sequence alignment MSTGAVLIY--TSILIKECHA
MPAGNE----- ---GGILLFHRTHELIKESHAMANDEGGSNNS
A DNA sequence
alignment attcgttggcaaatcgcccctatccggccttaa att---
tggcggatcg-cctctacgggcc----

23
What can sequence tell us about structure (HSSP)
Sander Schneider, 1991
24
Searching for similarities What is the function
of the new gene? The lazy investigation (i.e.,
no biologial experiments, just bioinformatics
techniques) Find a set of similar protein
sequences to the unknown sequence Identify
similarities and differences For long proteins
identify domains first
25

Evolutionary and functional relationships
Reconstruct evolutionary relation
Based on sequence
-Identity (simplest method)
-Similarity
Homology (common ancestry the ultimate goal)
Other (e.g., 3D structure)
Functional relation
Sequence Structure Function

26
Searching for similarities
Common ancestry is more interesting Makes it
more likely that genes share the same
function Homology sharing a common ancestor a
binary property (yes/no) its a nice tool When
(an unknown) gene X is homologous to (a known)
gene G it means that we gain a lot of information
on X what we know about G can be transferred to
X as a good suggestion.
27
Biological definitions for related sequences

Homologues are similar sequences in two different
organisms that have been derived from a common
ancestor sequence. Homologues can be described
as either orthologues or paralogues.
Orthologues are similar sequences in two
different organisms that have arisen due to a
speciation event. Orthologs typically retain
identical or similar functionality throughout
evolution.
Paralogues are similar sequences within a single
organism that have arisen due to a gene
duplication event.
Xenologues are similar sequences that do not
share the same evolutionary origin, but rather
have arisen out of horizontal transfer events
through symbiosis, viruses, etc.

28
How to evolve

Important distinction
Orthologues homologous proteins in different
species (all deriving from same ancestor)
Paralogues homologous proteins in same species
(internal gene duplication)
In practice to recognise orthology,
bi-directional best hit is used in conjunction
with database search program (this is called an
operational definition)

29
So this means
Source http//www.ncbi.nlm.nih.gov/Education/BLAS
Tinfo/Orthology.html
30
Example today Pairwise sequence alignment needs
sense of evolution Global dynamic programming
MDAGSTVILCFVG
Evolution
M D A A S T I L C G S
Amino Acid Exchange Matrix
Search matrix
MDAGSTVILCFVG-
Gap penalties (open,extension)
MDAAST-ILC--GS
31
How to determine similarity Frequent evolutionary
events at the DNA level 1. Substitution 2.
Insertion, deletion 3. Duplication 4. Inversion
We will restrict ourselves to these events
32
nucleotide one-letter code
A DNA sequence alignment attcgttggcaaatcgcccctatcc
ggccttaa att---tggcggatcg-cctctacgggcc----
A protein sequence
alignment MSTGAVLIY--TSILIKECHAMPAGNE----- ---GGIL
LFHRTHELIKESHAMANDEGGSNNS

amino acid one-letter code
33
Dynamic programmingScoring alignments
Substitution (or match/mismatch) DNA
proteins Gap penalty Linear gp(k)ak
Affine gp(k)bak Concave, e.g.
gp(k)log(k) The score for an alignment is the
sum of the scores over all alignment columns
34
Dynamic programmingScoring alignments
Sa,b - gp(k) gapinit
k?gapextension affine gap penalties
35
DNA define a score for match/mismatch of
letters Simple Used in genome
alignments
A C G T
A 1 -1 -1 -1
C -1 1 -1 -1
G -1 -1 1 -1
T -1 -1 -1 1
A C G T
A 91 -114 -31 -123
C -114 100 -125 -31
G -31 -125 100 -114
T -123 -31 -114 91
36
Dynamic programmingScoring alignments
T D W V T A L K T D W L - - I K
20?20
10
1
Affine gap penalties (open, extension)
Amino Acid Exchange Matrix
Score s(T,T)s(D,D)s(W,W)s(V,L)-Po-2Px
s(L,I)s(K,K)

Write a Comment

User Comments (0)