Title: Proteins are long chains of Amino Acid AA
1Introduction
- Proteins are long chains of Amino Acid (AA)
- There are 20 different AAs
2(No Transcript)
3(No Transcript)
4StructureFunction
5Experimental methods
- Crystallography- Performed by X-ray diffraction
and neutron-diffraction. - Nuclear Magnetic Resonance (NMR)
- Very expensive and time consuming
6Methods for Protein folding
- Homology modeling
- When one can find a known structure protein
with good sequence similarity (over 30) to the
protein we wish to fold. - Protein threading
- Less conclusive similarity
- Ab-initio
- No homology is available
7Protein threading
- Form a database of known folds
- Given a sequence, find most likely structure from
database - Thread sequence through structure
8If we have an efficient threading algorithm
- Form a database of known folds
- Given a sequence, thread this new sequence
through all the models in the library - See which one does best
9Protein threading
- Profile method
- Core threading method
- -branch and bound
- Dynamic Programming
- -divide and conquer
10Core threading
11- Protein threading problem definition
-         Input  Given protein sequence A
           Core structural model M
           Score functions g1, g2.           Â
Output A threading T.            In shortÂ
Align A to model T. -        Given            A Protein sequence
of length n a1, a2, a3, , an            M
m core segments C1, C2, C3, , Cm           Â
c1, c2, c3, , cm length of core segments
           l1, l2, l3, , lm-1 loop regions
connecting core segments            l1max,
l2max, l3max, , lm-1max maximum lengths of
loop regions            l1min, l2min, l3min,
, lm-1min minimum lengths of loop regions
           Properties of each amino acid
           F, g1, g2 score functions to
evaluate threading - Â Â Â Â
12Score function
13Output T t1, t2, t3, , tm start locations
for core segments
where, g1 and g2Â are based on the given model
M. g1 shows how each segment corresponds to core
segment i in the model, and g2 deals with the
interactions between segments. So to solve the
threading problem, we have to decide on t1, t2,
t3, , tm, so that the overall score is maximum.Â
Thus the threading problem, or alignment problem,
is converted to an optimization problem. Â
14Threading constraints
15(No Transcript)
16Branch and Bound
- Set of all possible threadings defined by initial
position bounds - Divide possible threadings into smaller sets, and
compute new position bounds for each set - Compute a quick score lower bound for each set of
threadings - Keep re-dividing the set with smallest lower
bound, until set size if 1.
17Branch and Bound
18Branch and Bound
- Given a set of threadings defined by position
bounds, one possible score lower bound is
19Branch and Bound-Issues
- Constructing score function
- Calculating lower bound
- Choosing split segment
- Choosing split point
20Dynamic Programming
21Dynamic Programming
- Detect local region of high similarity among the
target and the template sequence. - Local alignment
- Exploit sequence as well as structural signals
22Dynamic Programming
- Any pair of locally aligned segments divides the
unmatched region of both protein into two parts. - They can be processed independently with the same
approach. Divide-and-conquer. - After dividing, the changed structural features
of the template are recorded
23Dynamic Programming
- The algorithm proceeds recursively, until in the
local alignment step no more significant similar
segment pairs are found. E.g. only ONE core
structure.
24Dynamic Programming
25Dynamic Programming
- We can give more than one candidate while doing
local alignment. - This produces a tree.
- At the end, we assemble the respective threading
alignments and compute their scores
26PROSPECT
- Use Divide and Conquer
- References
-
- Xu, Y.D., D. Xu, and E.C. Uberbacher, "An
Efficient Computational Method for Globally
Optimal Threading", Journal of Computational
Biology, 5 (3), 597-614, 1998. -
- Xu, Ying and D. Xu, "Protein Threading using
PROSPECT design and evaluation", Protein
Structure, Function, Genetics, Vol 40, pp 343 -
354, 2000.
27PROSPECT
- Energy function
- In the early edition, all ?s are set to be 1
28(No Transcript)