Title: Constraint logic programming approach to protein structure prediction
1Constraint logic programming approach to protein
structure prediction
A. Dal Palu, A. Dovier, and F. Fogolari BMC
Bioinformatics, 2004.
- Presented by
- Morshed Osmani
2INTRODUCTION
- The protein structure prediction problem is one
of the most challenging problems in biological
sciences. - However, the protein structure prediction problem
can be cast in the form of an optimization
problem. - Constraint Logic Programming which is a
declarative programming paradigm can be used for
solving this combinatorial optimization problem. - Face-Centered Cube (FCC) lattice model is used
here.
3Background
- Faster method for protein structure prediction
evolves along two lines - Assembling the structure of a protein using
structural fragments of similar sequences,
available in the protein structure repository,
and later screening the feasibility of the
resulting structures, using energetic criteria - Representing the protein chain by a highly
simplified model which is, hopefully, treatable. - This paper follows the second strategy.
4Background (Contd.)
- The simplified approach has several advantages
- Less no. of variables so linkage between
kinetics, thermodynamics of protein folding
process, and intramolecular interactions is more
easily addressable - Simplified model supports the idea that details
of atomic interactions between amino-acid
residues are less important than the overall
character of these interactions - Generation and evaluation of energy of a
conformation is easy and less time consuming
5Constraint Logic Programming
- It is a declarative programming paradigm
particularly well suited for encoding
combinatorial minimization problems. - CLP is the natural merger of the two declarative
paradigms known as Constraint Solving and Logic
programming. - It is independent of the problem modeling and the
search strategy.
6CLP (Contd.)
- It is a Constrain Generate technique in
contrast to classical Generate Test technique - First phase is a deterministic phase that
produces a number of constraints - In the second phase solutions space is generated
non-deterministically following those
constraints. - Some languages has built-in support for CLP (for
example, S1CStus Prolog)
7Lattice Models
8Lattice Models (cont.)
- Each side of the cube is 2 unit
- Points at Euclidean distance v2 are linked their
distance is called lattice unit. - For linked points i and j, it holds that xi-xj
yi-yj zi-zj 2. - A contact is defined between two non adjacent
residues placed on two vertices of a side of a
cube.
9Mathematical Formalization
- Given asequence S s1 ... sn, with each si being
an amino acid residue, a fold of S is a function
? 1, ..., n ?FCC such that ?(i) - ?(i
1)v2 and ?(i) - ?(j) 2 for i ? j. - The protein folding problem can be reduced to the
optimization problem of finding the fold ? of S
such that the following energy is minimized - E(w)
- where contact(?(i), ?(j)) is 1 if ?(i) - ?(j)
2, 0 otherwise.
10Important Features
- Secondary structure (alpha helix, beta sheet )
property is used. - Simulated Annealing strategy is used.
- Globularity of the simulated protein is forced by
a harmonic constraint on the radius of gyration. - Size of solution space is 1.26N0.162(10.0364)N
- Imposing additional constraints can reduce this
search space substantially
11Main Program Predicate
- fcc_pf( ID, Time, Compact)-
- initialization,
- protein(ID, Primary, Secondary),
- constrain(Primary, Secondary, Indexes, Tertiary,
Energy, Matrix, Freq, Compact), - writetime,
- solution_search(Time, Primary, Secondary,
Indexes, Tertiary, Energy, Matrix, Freq), - print results(ID,Time,Primary, Secondary,
Tertiary,Compact).
12Constraint Example
- next(X1,Y1,Z1,X2,Y2,Z2),
- next_constraints(X2,Y2,Z2C).
- next_constraints(_,_,_).
- next(X1,Y1,Z1,X2,Y2,Z2)-
- domain(Dx,Dy,Dz,0,1),
- Dx abs(X1-X2),
- Dy abs(Y1-Y2),
- Dz abs(Z1-Z2),
- Dx Dy Dz 2.
13Protein Definition
- protein(1YPA, Primary, Secondary)-
- Primary m,k,t,e,w,p,e,l,v,g,k,a,v,a,a,a,k,k,v,i
,l,q,d,k,p,e,a,q,i,i,v,l,p,v,g,t,i,v,t,m,e,y,r,i,d
,r,v,r,l,f,v,d,k,l,d,n,i,a,q,v,p,r,v, - Secondary helix(13,23), strand(28,33),
strand(45,51), strand(61,63).
14Some more constraints
Distance_constraint ensures that two consecutive
lattice points are separated by more than one
lattice unit Compact_constraint ensures that for
every pair of aminoacids, the norm of the
projection of their distance on each x, y, z
coordinate, is smaller than CompactFactor N.
15Some more constraints (contd.)
- FCC allows angles 60, 90, 120, 180 however
protein fold favors angles 90-150, so 60 and
180 angles are removed from possibility - Index stores torsional angles. Alpha helix
assumes angle 300 and beta sheet 120. This angles
are treated as constraints also.
16Results
17RELATED WORK
- This work is further extended by Dovier et al
1. - Similar works can be found in Backofens
paper2. -
18References
- Alessandro Dal Palù, Agostino Dovier, Enrico
Pontelli Heuristics, optimizations, and
parallelism for protein structure prediction in
CLP(FD). PPDP 2005 230-241 - R. Backofen. The protein structure prediction
problem A constraint optimization approach using
a new lower bound.Constraints, 6(23)223255,
2001. - A. Dal Palu, A. Dovier, and F. Fogolari.
Constraint logic programming approach to protein
structure prediction. BMC Bioinformatics, 5(186),
2004.