Title: Computational method on biochemistry
1Computational method on biochemistry
2??
- Protein Structure and Dynamics
- Bioinformatics
- Comparative modeling
- Other method
3Protein structure and dynamics
- Time scale in biological phenomena
- Newtonian mechanics
- Force field
- CHARMM
- AMBER
- Energy minimization
- Molecular Dynamics
- Example
4Time scale in biological phenomena
-15
ns
ms
ms
s
ps
fs
hr
5(No Transcript)
6Force field
- ??? ???? ? ???? ??-????? ???? ??.
- ? ?? ??? ??? ???? ?? ???? ???? ?? ????? ?????
???? ??? ??.
7Newtonian mechanics
- Fma
- vv0atf(t)
- sv0tat2/2g(t)
- Emv2/2
?? ???? ??? ??? ??? ??? ??, ???? ???
8(No Transcript)
9(No Transcript)
10(No Transcript)
11Energy minimization
12Energy minimization
??? ???!!
13Molecular Dynamics
14Molecular Dynamics
15CHemistry at HARvard Macromolecular Mechanics
- CHARMm forcefields
- CHARMm, which derives from CHARMM (CHemistry at
HARvard Macromolecular Mechanics), is a highly
flexible molecular mechanics and dynamics program
originally developed in the laboratory of Dr.
Martin Karplus at Harvard University. It was
parameterized on the basis of ab initio energies
and geometries of small organic models. - Applicability
- CHARMm performs well over a broad range of
calculations and simulations, including
calculation of geometries, interaction and
conformation energies, local minima, barriers to
rotation, time-dependent dynamic behavior, free
energy, and vibrational frequencies (Momany
Rone, 1992). CHARMm is designed to give good (but
not necessarily "the best") results for a wide
variety of modelled systems, from isolated small
molecules to solvated complexes of large
biological macromolecules however, it is not
applicable to organometallic complexes.
16Assisted Model Building with Energy Refinement
- AMBER forcefield
- The standard AMBER forcefield (Weiner et al.
1984, 1986) is parameterized to small organic
constituents of proteins and nucleic acids. Only
experimental data were used in parameterization. - However, AMBER has been widely used not only for
proteins and DNA, but also for many other classes
of models, such as polymers and small molecules.
For the latter classes of models, various authors
have added parameters and extended AMBER in other
ways to suit their calculations. The AMBER
forcefield has also been made specifically
applicable to polysaccharides (Homans 1990, and
see Homans' carbohydrate forcefield). - AMBER is used mainly for modeling proteins and
nucleic acids. It is generally lower in accuracy
and has a limited range of applicability. The use
of AMBER is recommended mainly for those
customers who are familiar with AMBER and have
developed their own AMBER-specific parameters. It
generally gives reasonable results for gas-phase
model geometries, conformational energies,
vibrational frequencies, and solvation free
energies.
17Application
- protein motion
- protein folding
- enzyme mechanism
- model optimization
18In silico protein folding
1us1,000,000,000 fs(or step) 644 step/sec on 256
CPUs CRAY machine
19Simulation of the travel of potassium
20(No Transcript)
21Bioinformatics
- Introduction
- Sequence alignment
- Pairwise sequence alignment
- BLAST
- Multiple sequence alignment
- CLUSTALW
- T-COFFEE
- Scoring matrix
- Structure Alignment
- Example
22Pairwise alignment
- Smith-Waterman Algorithm
- BLAST local alignment
- FASTA global alignment
23Smith-Waterman Algorithm
Align S1ATCTCGTATGATG S2GTCTATCAC
0
0
0
0
0
0
2
1
0
0
2
1
0
2
2
3
4
?1, ?1
5
7
9
8
10
24BLAST
- Basic Local Alignment Search Tool
- Altschul, S.F., Gish, W., Miller, W.,
- Myers, E.W. Lipman, D.J.
- Journal of Molecular Biology
- v. 215, 1990, pp. 403-410
- Used to search sequence databases for local
alignments to a query
25BLAST algorithm
- Keyword search of all words of length w from the
in the query of length n in database of length m
with score above threshold - w 11 for nucleotide queries, 3 for proteins
- Do local alignment extension for each found
keyword - Extend result until longest match above threshold
is achieved - Running time O(nm)
26BLAST algorithm (contd)
keyword
Query KRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVL
KIFLENVIRD
GVK 18 GAK 16 GIK 16 GGK 14 GLK 13 GNK 12 GRK
11 GEK 11 GDK 11
Neighborhood words
neighborhood score threshold (T 13)
extension
Query 22 VLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLK
60 DN G IR L GK I L E
RGK Sbjct 226 IIKDNGRGFSGKQIRNLNYGIGLKVIADLV-EK
HRGIIK 263
High-scoring Pair (HSP)
27Original BLAST
- Dictionary
- All words of length w
- Alignment
- Ungapped extensions until score falls below some
threshold - Output
- All local alignments with score gt statistical
threshold
28Original BLAST Example
A C G A A G T A A G G T C
C A G T
- w 4
- Exact keyword match of GGTC
- Extend diagonals with mismatches until score is
under 50 - Output result
- GTAAGGTCC
- GTTAGGTCC
C T G A T C C T G G A T T
G C G A
From lectures by Serafim Batzoglou (Stanford)
29ClustalW
- Popular multiple alignment tool today
- Several heuristics to improve accuracy
- Sequences are weighted by relatedness
- Scoring matrix can be chosen on the fly
- Position-specific gap penalties
30ClustalW (contd)
- Often used for protein alignment
- W stands for weighted
- Different parts of alignment are weighted.
- Position/residue specific gap penalties.
- Three-step process
- 1.) Pairwise alignment
- 2.) Build Guide Tree
- 3.) Progressive Alignment
31Step 1 Pairwise Alignment
- Aligns each sequence again each other giving a
distance matrix - Distance exact matches / sequence length
(percent identity)
(.17 means 17 identical)
32Step 2 Guide Tree
- Create Guide Tree using the distance matrix
- ClustalW uses the neighbor-joining method
- Guide tree roughly reflects evolutionary relations
33Step 2 Guide Tree (contd)
S1 S3 S4 S2
Calculates1,3 consensus(s1, s3)s1,3,4
consensus((s1,3),s4)s1,2,3,4
consensus((s1,3,4),s2)
34Step 3 Progressive Alignment
- Align the two most similar sequences
- Following the guide tree, add in the next
sequences, aligning to the existing alignment - Insert gaps as necessary
Sample output FOS_RAT
PEEMSVTS-LDLTGGLPEATTPESEEAFTLPLLNDPEPK-PSLEPVKNIS
NMELKAEPFD FOS_MOUSE PEEMSVAS-LDLTGGLPEASTPE
SEEAFTLPLLNDPEPK-PSLEPVKSISNVELKAEPFD FOS_CHICK
SEELAAATALDLG----APSPAAAEEAFALPLMTEAPPAVPPKEPS
G--SGLELKAEPFD FOSB_MOUSE PGPGPLAEVRDLPG-----
STSAKEDGFGWLLPPPPPPP-----------------LPFQ FOSB_HUM
AN PGPGPLAEVRDLPG-----SAPAKEDGFSWLLPPPPPPP---
--------------LPFQ . . .
.. . .
Dots and stars show how well-conserved a column
is.
35Scoring Matrix
36PAM
- Percentage of Acceptable point Mutations per 108
years - ?? ????? ??? ?????? ?? ? ?? ??? ???? score ??
- matrices are based on global alignments of
closely related proteins. The PAM 1 is the matrix
calculated from comparisons of sequences with no
more than 1 divergence. Scores are derived from
a mutation probability matrix where each element
gives the probability of the amino acid in column
X mutating to the amino acid in row Y after a
particular evolutionary time, for example after 1
PAM, or 1 divergence. A PAM matrix is specific
for a particular evolutionary distance, but may
be used to generate matrices for greater
evolutionary distances by multiplying it
repeatedly by itself. However, at large
evolutionary distances the information present in
the matrix is essentially degenerated. It is rare
that a PAM matrix would be used for an
evolutionary distance any greater than 256 PAMs.
37BLOSUM
- Local alingment? ???? ?? ??
- BLOcks SUbstitution Matrix
- ????? ??? ???? ?? ???? ? ??? ???? ??? ????
scoring matrix?? - BLOSUM 62? ??? 62 ??? ???? ??? ??? ?
38Position Specific Scoring Matrix
- ??? ????? ?? ????? ???? ?? ????? ?? ??? ??????
??? ??? - PSI-BLAST?? ???? ??
- ???? ???? ??? ??? ???? ?? ????? ??
39Homology/Comparative modeling
- Introduction
- Method
- Example
40Introduction
- ??? ??? ?? ???? ??? ??? ??? ??.
- Ex) hemoglobin/myoglobin, ubiquitin/ubiquitin
like proteins. Serine proteases,
thioredoxin/glutaredoxin
41Method
- 30 ??? homology? ?? ??? ? ??? ?? ? ??
- Pairwise or multiple sequence alignment
- Alignment? ???? ??? ???? distance constraint??.
- Model ???
42 Example Modeling of malonly-CoA synthetase
43Firefly luciferase
Malonyl-CoA synthetase
44Other Methods
- Simulated Annealing
- Monte Carlos method
- Docking
45(No Transcript)