Title: Evolving LSystems to Capture Protein Structure Native Conformations
1Evolving L-Systems to Capture Protein
StructureNative Conformations
- Gabi Escuela1, Gabriela Ochoa2 and Natalio
Krasnogor3 - 1,2 Department of Computer Science, Universidad
Simon Bolivar, Caracas, Venezuela - 1gabiescuela_at_netuno.net.ve, 2gabro_at_ldc.usb.ve
- 3 School of Computer Science and I.T., University
of Nottingham - Natalio.Krasnogor_at_nottingham.ac.uk
2Content
- Proteins
- Protein Structure Prediction (PSP)
- The HP model
- EA approaches to PSP current encoding
- L-Systems
- Why a grammatical encoding?
- Methods and Results
- Discussion and Future Work
3D structure of myoglobin, showing coloured alpha
helices.
3Proteins
- Linear chains of 30-400 units from 20 different
amino acids - Fold into a unique functional structure native
state or tertiary structure
Show repeated substructures alpha helices and
beta sheets
1A8M 3-D Structure
4Protein Structure Prediction (PSP)
- Goal Determining the 3D structure of proteins
from their amino acid sequences - Strategy find an amino acid chain's state of
minimum energy - Solution will have practical consequences in
medicine, drug development and agriculture
5The 2D HP Model
- Hydrophobic effect is the main force governing
folding - q ?H, P, each letter of q has to be put in
vertex of a given lattice L (at each point turn
90º Left or Right, or continue ahead) - Scoring function adds -1 for each contact
between two Hs adjacent in the lattice that are
not consecutive in q
2 Amino acids types hydrophobic (H) and polar or
hydrophilic (P)
HPHPPHHPHPPHPHHPPHPH
Square Lattice
9 H-H bonds Score -9
- Objective Find the organization (embedding) of q
in L of minimum score (maximum contacts)
6EA approaches to PSP Current (Direct) Encoding
- EAs and other stochastic methods global
optimization of a suitable energy function - Encoding Cartesian Coordinates, Distance
Geometries, Internal Coordinates - Absolute structure encoded as a string of
symbols. For example In the 2D Square - s Up, Down, Left, Right
- Relative each move is interpreted in terms of
the previous one - s Forward, TurnLeft, TurnRight
7Protein HPHPPHHPHPPHPHHPPHPH L 20
Absolute Encoding
RDDLULDLDLUURULURRD L 19
R
D
D
L
First position is fixed
Relative Encoding
RFRRLLRLRRFRLLRRFR L 18
R
R
F
R
First and second position are fixed
8L-Systems (Lindenmayer, 1968)
- A model of morphogenesis, based on formal
grammars - Rewriting Define complex objects by replacing
parts of a simple object using a set of
productions.
- Symbols F, f, , -, ,
- Axiom (S)
- Production (replacement) rules
r1
S F
r2
f
F
F
start
Ff
1
2
3
9Why a Grammatical Encoding?
- Specifies how to construct the phenotype
- Can achieve greater scalability through
self-similar and hierarchical structure - Proteins exhibit high degree of regularity, and
repeated motifs - Current encoding may not be suitable for
crossover and building block transfer between
individuals
Protein Structure
3D L-System
10Method
- Prove of principle Can a folded protein be
captured (encoded) by an L-system? - How to find that L-system An EA used to evolve
an L-system that capture a folded protein
(inverse problem)
Output L-system L that once derived, will
produce the target string RFRRLLRLRRFRLLRRFR
Input Folded structure in Relative
Coordinates RFRRLLRLRRFRLLRRFR
EA
Axiom 01F Rules 0RFR1, 12L2, 2R0L
11Proposed Grammatical Encoding
- D0L-system (deterministic and context free)
- Alphabet ??t ? ?nt
- ?tF,L,R terminal symbols (relative coord.)
?nt0,1,2,...,m-1 non-terminal symbols
(rewriting rules), m max. number of rules - Axiom a ? ?
- Rewriting rules i wi , where i ? ?nt and wi ?
?
axiom R2 rules 0R03F 1R01L
2F310 3LRL3
Example
12Evolutionary Algorithm
- Generational with rank based selection
- Randomly generated initial population
- Prefixed maximum number of rules
- Axiom and Rules randomly generated strings of
prefixed maximum length - Genetic operators
- Uniform-like (homologous) recombination (rate
1.0) complete production rules are interchanged - Per symbol mutation in both axioms and rules
(deletion (30), insertion (10),
modification(60))
13Derivation, and Fitness Function
- Derivation from genotype (axiom and rules) to
phenotype (folded structure) - Post-processing non-terminal symbols pruning
- Fitness calculation number of matches between
the target string and the solution Min. 0, Max
length of the desired folding.
14Results (1)
15Results (2)
Evolutionary progression towards the target
structure
16Discussion
- The proposed EA discovered L-systems that capture
a target folding under the HP model in 2D
lattices - We are not solving the PSP yet, but ..
- We are proposing a novel and potentially useful,
generative encoding for evolutionary approaches
to PSP
17Future work
- Incorporate problem knowledge about secondary
structures
Beta Turn
Beta Sheet
Alpha Helix
- Explore longer chains and 3D lattices