In Silico Protein Structure Prediction - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

In Silico Protein Structure Prediction

Description:

... values for some amino acid physicochemical vector (volume, hydrophobicity, etc. ... PCA_1 is associated with hydrophobicity, PCA_2 represents a measure of size, ... – PowerPoint PPT presentation

Number of Views:360

Avg rating:3.0/5.0

Slides: 41

Provided by: muizzh

Category:

more less

Transcript and Presenter's Notes

Title: In Silico Protein Structure Prediction

1
In Silico Protein Structure Prediction
New data creates new opportunity
New drug / biological data
New sequence
New structure
Novel drug development
2
The promise of genome projects
Human Genome Watch , 05 Feb. 2003 Draft
3.0 Finished 95.8 Total 98.8
http//www.ncbi.nlm.nih.gov/genome/seq/
3
The promise of genome projects
The genomes of some 800 organisms have now been
either completely or partially sequenced, and
this number will double within the next year
4
The promise of genome projects

Diagnosing and Predicting Disease and Disease
Susceptibility
Disease intervention
Solutions at the DNA level
Gene Therapy. Although promising,
it is not straightforward

5
Proteins The workhorses of living organisms

Proteins are the primary components of the
networks that conduct the flows of mass, energy
and information.
Protein function
Biological (phenotypic, cellular)
Biochemical (molecular)
The amino acid sequence and the 3D structure
confer unique functionality.

6
Disease prevention and management at the protein
level

Diseases can be linked to the aberrant activity
of proteins (enzymes / receptors).
Pharmaceutical research is based on the search of
molecules that will interact in a desirable
manner with the therapeutic target.

7
Drug Discovery and Development
8
Protein Sequence-Structure -Function
structure
ALA - PHE - CYS - LYS - GLU - GLN - PRO - MET-
TRP - TYR - GLY - ARG

Reaction / substrate
Interactions
Metabolic pathway

sequence
function
9
Protein Structure Initiative (PSI)

The Structural Genomics Project aims at
determination of the 3D structure of all
proteins. This aim can be achieved in four steps
Organize known protein sequences into families.
Select family representatives as targets.
Solve the 3D structure of targets by X-ray
crystallography or NMR spectroscopy.
Build models for other proteins by homology to
solved 3D structures.
www.structuralgenomics.org

10
Protein Structure Initiative (PSI)

Unfortunately, only a few hundred protein
structures can be determined every year
experimentally.
There are possibly hundreds of thousands of
protein molecules just in humans.
There is a need for computational/bioinformatics
strategies to accelerate protein structure
prediction

11
Computer simulations to fold proteins

At physiological conditions, biomolecules undergo
several movements and changes.
The time-scales of the motions are diverse,
ranging from few femtoseconds to few seconds.

12
Computer simulations to fold proteins

Newtons second law of motion

13
We can fold small peptides using Molecular
Dynamics
14
We cannot fold proteins using MD
PDB structure of CDK_2
20 ns simulation result
15
Sequence-Structure Theoretical Relationship
Folding time scales. Physicochemical principles.
Simulations
ALA - PHE - CYS - LYS - GLU - GLN - PRO - MET-
TRP - TYR - GLY - ARG
Evolutionary time scales. Knowledge-based.
Structural bioinformatics
16
Protein Structure Prediction Using Bioinformatics
Secondary structure prediction is accurate
ALA - PHE - CYS - LYS - GLU - GLN - PRO - MET-
TRP - TYR - GLY - ARG
17
Protein Structure Prediction Using Bioinformatics
It is not straightforward to fold the secondary
structural elements in 3D space
?
18
Protein Structure Prediction Using Bioinformatics
Can we predict pairs of amino acids, distant in
sequence but proximal in structure?
19
Correlated Mutations

Goal Manipulate multiple sequence alignments of
protein families to identify residues that are
close in 3D space.
Hypothesis During evolution residues that are in
proximity in 3D space mutate in a covariant
fashion, so as to retain structural and
functional properties of the protein.

20
Covariant Mutations

Seq. 57 132
1 MENFQAVEKI..GEGTYGVWY
2 KERNKATGEV..VALKKIRWM
3 TETEGAPSTA..IREISAFWR
4 MEGEGAYRNE..VVATAIIWA
5 MENFIALDPV..PSTAIREWI
6 REPSTFIREI..SFALPRFHI
7 MENGHFTNKH..FCDIGEGHI
8 MEALKFVRLT..ETRCVGPHT

21
Measure of Covariance

Correlation coefficient between positions i and
j.
where qil and qjl are the values for some amino
acid physicochemical vector (volume,
hydrophobicity, etc.) for sequence l at positions
i and j. mi, mj, si and sj are the mean values
and the standard deviations.

22
Covariant Mutations

Seq. 57 132
1 MENFQAVEKI..GEGTYGVWY
2 KERNKATGEV..VALKKIRWM
3 TETEGAPSTA..IREISAFWR
4 MEGEGAYRNE..VVATAIIWA
5 MENFIALDPV..PSTAIREWI
6 REPSTFIREI..SFALPRFHI
7 MENGHFTNKH..FCDIGEGHI
8 MEALKFVRLT..ETRCVGPHT

Seq. 57 132 1 ..
52.60 .. 135.4 .. 2 .. 52.60 .. 135.4 .. 3 ..
52.60 .. 135.4 .. 4 .. 52.60 .. 135.4 .. 5 ..
52.60 .. 135.4 .. 6 .. 113.9 .. 91.90 .. 7 ..
113.9 .. 91.90 .. 8 .. 113.9 .. 91.90 ..
rc57,1321.0
23
Covariant Mutations

Seq. 57 132
1 MENFQAVEKI..GEGTYGVWY
2 KERNKATGEV..VALKKIRWM
3 TETEGAPSTA..IREISAFWR
4 MEGEGAYRNE..VVATAIIWA
5 MENFIALDPV..PSTAIREWI
6 REPSTFIREI..SFALPRFHI
7 MENGHFTNKH..FCDIGEGHI
8 MEALKFVRLT..ETRCVGPHT
9 TETEGFPSTA..IREISAFTR

Seq. 57 132 1 ..
52.60 .. 135.4 .. 2 .. 52.60 .. 135.4 .. 3 ..
52.60 .. 135.4 .. 4 .. 52.60 .. 135.4 .. 5 ..
52.60 .. 135.4 .. 6 .. 113.9 .. 91.90 .. 7 ..
113.9 .. 91.90 .. 8 .. 113.9 .. 91.90 .. 9 ..
113.9 .. 71.20 ..
rc57,1320.97
24
Physicochemical Properties

AAindex collection of published amino acid
properties.
Kawashima S. et al. Nucleic Acids Research, 27,
368, 1999
e.g.

25
Physicochemical Descriptors

142 descriptors used. Redundancy eliminated
calculating principal components.
12 principal components explain 97 of the
variance. PCA_1 is associated with
hydrophobicity, PCA_2 represents a measure of
size, PCA_3 is related to AA electronic
properties, etc.
Four different measures of distance Ca-Ca,
Cß-Cß, minimum distance, COM-COM. Proximity if lt
6 Å.
Which correlation coefficient best predicts
proximity?

26
Validation Model-System CDK-2

Cyclin Dependent Kinase 2
CDKs are the switches that regulate the cell
cycle.
CDKs control gene transcription and coordinate
proliferation. Important therapeutic targets.

27
CDK-2 Multiple Sequence Alignment
28
Correlation Coefficient as Diagnostic Test
29
Accuracy of Diagnostic Tests
AccuracyTP/(TPFP)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
CMA-based constraints for MD
Start with a stretched protein conformation
35
CMA-based constraints for MD
Predict secondary structural elements Add spring
forces between pairs predicted to be proximal
36
CMA-based constraints for MD
37
Fold CDK-2 with 10 constraints
With 6 TP and 4 FP RMSD 14 Angstrom No secondary
structure
With 10 True Positives RMSD 5 Angstrom No
secondary structure
38
Summary

We need protein structures to harness the
promise of genome projects
Further, we need computational tools to
accelerate protein structure determination
Use bioinformatics to guide computer simulations
Proximal residues do mutate in a covariant fashion

39
Next steps

Improve predictive ability of CMA.
Systematically study the influence of constraints
on folding.
Develop additional methods for non-local contact
prediction (free energy based techniques)

40
Acknowledgements
Spyros Vicatos (CEMS) Himanshu Khandelia
(CEMS) Eric Fauman (Pfizer) Sangtae Kim (Eli
Lilly) Biotechnology Institute Digital
Technology Center Minnesota Supercomputing
Institute

Write a Comment

User Comments (0)