In Silico Protein Structure Prediction - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

In Silico Protein Structure Prediction

Description:

... values for some amino acid physicochemical vector (volume, hydrophobicity, etc. ... PCA_1 is associated with hydrophobicity, PCA_2 represents a measure of size, ... – PowerPoint PPT presentation

Number of Views:359
Avg rating:3.0/5.0
Slides: 41
Provided by: muizzh
Category:

less

Transcript and Presenter's Notes

Title: In Silico Protein Structure Prediction


1
In Silico Protein Structure Prediction
New data creates new opportunity
New drug / biological data
New sequence
New structure
Novel drug development
2
The promise of genome projects
Human Genome Watch , 05 Feb. 2003 Draft
3.0 Finished 95.8 Total 98.8
http//www.ncbi.nlm.nih.gov/genome/seq/
3
The promise of genome projects
The genomes of some 800 organisms have now been
either completely or partially sequenced, and
this number will double within the next year
4
The promise of genome projects
  • Diagnosing and Predicting Disease and Disease
  • Susceptibility
  • Disease intervention
  • Solutions at the DNA level
  • Gene Therapy. Although promising,
  • it is not straightforward

5
Proteins The workhorses of living organisms
  • Proteins are the primary components of the
    networks that conduct the flows of mass, energy
    and information.
  • Protein function
  • Biological (phenotypic, cellular)
  • Biochemical (molecular)
  • The amino acid sequence and the 3D structure
    confer unique functionality.

6
Disease prevention and management at the protein
level
  • Diseases can be linked to the aberrant activity
    of proteins (enzymes / receptors).
  • Pharmaceutical research is based on the search of
    molecules that will interact in a desirable
    manner with the therapeutic target.

7
Drug Discovery and Development
8
Protein Sequence-Structure -Function
structure
ALA - PHE - CYS - LYS - GLU - GLN - PRO - MET-
TRP - TYR - GLY - ARG
  • Reaction / substrate
  • Interactions
  • Metabolic pathway

sequence
function
9
Protein Structure Initiative (PSI)
  • The Structural Genomics Project aims at
    determination of the 3D structure of all
    proteins. This aim can be achieved in four steps
  • Organize known protein sequences into families.
  • Select family representatives as targets.
  • Solve the 3D structure of targets by X-ray
    crystallography or NMR spectroscopy.
  • Build models for other proteins by homology to
    solved 3D structures.
  • www.structuralgenomics.org

10
Protein Structure Initiative (PSI)
  • Unfortunately, only a few hundred protein
    structures can be determined every year
    experimentally.
  • There are possibly hundreds of thousands of
    protein molecules just in humans.
  • There is a need for computational/bioinformatics
    strategies to accelerate protein structure
    prediction

11
Computer simulations to fold proteins
  • At physiological conditions, biomolecules undergo
    several movements and changes.
  • The time-scales of the motions are diverse,
    ranging from few femtoseconds to few seconds.

12
Computer simulations to fold proteins
  • Newtons second law of motion

13
We can fold small peptides using Molecular
Dynamics
14
We cannot fold proteins using MD
PDB structure of CDK_2
20 ns simulation result
15
Sequence-Structure Theoretical Relationship
Folding time scales. Physicochemical principles.
Simulations
ALA - PHE - CYS - LYS - GLU - GLN - PRO - MET-
TRP - TYR - GLY - ARG
Evolutionary time scales. Knowledge-based.
Structural bioinformatics
16
Protein Structure Prediction Using Bioinformatics
Secondary structure prediction is accurate
ALA - PHE - CYS - LYS - GLU - GLN - PRO - MET-
TRP - TYR - GLY - ARG
17
Protein Structure Prediction Using Bioinformatics
It is not straightforward to fold the secondary
structural elements in 3D space
?
18
Protein Structure Prediction Using Bioinformatics
Can we predict pairs of amino acids, distant in
sequence but proximal in structure?
19
Correlated Mutations
  • Goal Manipulate multiple sequence alignments of
    protein families to identify residues that are
    close in 3D space.
  • Hypothesis During evolution residues that are in
    proximity in 3D space mutate in a covariant
    fashion, so as to retain structural and
    functional properties of the protein.

20
Covariant Mutations
  • Seq. 57 132
  • 1 MENFQAVEKI..GEGTYGVWY
  • 2 KERNKATGEV..VALKKIRWM
  • 3 TETEGAPSTA..IREISAFWR
  • 4 MEGEGAYRNE..VVATAIIWA
  • 5 MENFIALDPV..PSTAIREWI
  • 6 REPSTFIREI..SFALPRFHI
  • 7 MENGHFTNKH..FCDIGEGHI
  • 8 MEALKFVRLT..ETRCVGPHT

21
Measure of Covariance
  • Correlation coefficient between positions i and
    j.
  • where qil and qjl are the values for some amino
    acid physicochemical vector (volume,
    hydrophobicity, etc.) for sequence l at positions
    i and j. mi, mj, si and sj are the mean values
    and the standard deviations.

22
Covariant Mutations
  • Seq. 57 132
  • 1 MENFQAVEKI..GEGTYGVWY
  • 2 KERNKATGEV..VALKKIRWM
  • 3 TETEGAPSTA..IREISAFWR
  • 4 MEGEGAYRNE..VVATAIIWA
  • 5 MENFIALDPV..PSTAIREWI
  • 6 REPSTFIREI..SFALPRFHI
  • 7 MENGHFTNKH..FCDIGEGHI
  • 8 MEALKFVRLT..ETRCVGPHT

Seq. 57 132 1 ..
52.60 .. 135.4 .. 2 .. 52.60 .. 135.4 .. 3 ..
52.60 .. 135.4 .. 4 .. 52.60 .. 135.4 .. 5 ..
52.60 .. 135.4 .. 6 .. 113.9 .. 91.90 .. 7 ..
113.9 .. 91.90 .. 8 .. 113.9 .. 91.90 ..
rc57,1321.0
23
Covariant Mutations
  • Seq. 57 132
  • 1 MENFQAVEKI..GEGTYGVWY
  • 2 KERNKATGEV..VALKKIRWM
  • 3 TETEGAPSTA..IREISAFWR
  • 4 MEGEGAYRNE..VVATAIIWA
  • 5 MENFIALDPV..PSTAIREWI
  • 6 REPSTFIREI..SFALPRFHI
  • 7 MENGHFTNKH..FCDIGEGHI
  • 8 MEALKFVRLT..ETRCVGPHT
  • 9 TETEGFPSTA..IREISAFTR

Seq. 57 132 1 ..
52.60 .. 135.4 .. 2 .. 52.60 .. 135.4 .. 3 ..
52.60 .. 135.4 .. 4 .. 52.60 .. 135.4 .. 5 ..
52.60 .. 135.4 .. 6 .. 113.9 .. 91.90 .. 7 ..
113.9 .. 91.90 .. 8 .. 113.9 .. 91.90 .. 9 ..
113.9 .. 71.20 ..
rc57,1320.97
24
Physicochemical Properties
  • AAindex collection of published amino acid
    properties.
  • Kawashima S. et al. Nucleic Acids Research, 27,
    368, 1999
  • e.g.

25
Physicochemical Descriptors
  • 142 descriptors used. Redundancy eliminated
    calculating principal components.
  • 12 principal components explain 97 of the
    variance. PCA_1 is associated with
    hydrophobicity, PCA_2 represents a measure of
    size, PCA_3 is related to AA electronic
    properties, etc.
  • Four different measures of distance Ca-Ca,
    Cß-Cß, minimum distance, COM-COM. Proximity if lt
    6 Ã….
  • Which correlation coefficient best predicts
    proximity?

26
Validation Model-System CDK-2
  • Cyclin Dependent Kinase 2
  • CDKs are the switches that regulate the cell
    cycle.
  • CDKs control gene transcription and coordinate
    proliferation. Important therapeutic targets.

27
CDK-2 Multiple Sequence Alignment
28
Correlation Coefficient as Diagnostic Test
29
Accuracy of Diagnostic Tests
AccuracyTP/(TPFP)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
CMA-based constraints for MD
Start with a stretched protein conformation
35
CMA-based constraints for MD
Predict secondary structural elements Add spring
forces between pairs predicted to be proximal
36
CMA-based constraints for MD
37
Fold CDK-2 with 10 constraints
With 6 TP and 4 FP RMSD 14 Angstrom No secondary
structure
With 10 True Positives RMSD 5 Angstrom No
secondary structure
38
Summary
  • We need protein structures to harness the
    promise of genome projects
  • Further, we need computational tools to
    accelerate protein structure determination
  • Use bioinformatics to guide computer simulations
  • Proximal residues do mutate in a covariant fashion

39
Next steps
  • Improve predictive ability of CMA.
  • Systematically study the influence of constraints
    on folding.
  • Develop additional methods for non-local contact
    prediction (free energy based techniques)

40
Acknowledgements
Spyros Vicatos (CEMS) Himanshu Khandelia
(CEMS) Eric Fauman (Pfizer) Sangtae Kim (Eli
Lilly) Biotechnology Institute Digital
Technology Center Minnesota Supercomputing
Institute
Write a Comment
User Comments (0)
About PowerShow.com