3D Protein - PowerPoint PPT Presentation

1 / 77

About This Presentation

Title:

3D Protein

Description:

Homepage: http://www.ice.mbt.cuhk.edu.hk ... FOOD POISONING/DISEASES OUTBREAKS IN HONG KONG. Human Genome Project ... In the Twilight Zone ( 25%) How to proceed? ... – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 78

Provided by: saimin

Category:

more less

Transcript and Presenter's Notes

Title: 3D Protein

1
3D Protein Prof. Sai Ming Ngai, Ice Office EG05
East Block, Science Centre Lab EG08 East Block,
Science Centre Tel 2609 6025 Email
smngai_at_cuhk.edu.hk Homepage http//www.ice.mbt.cu
hk.edu.hk
2
Comparison between Biotechnology and Computer
(electronics)

High volume demand
Mass market
Advancement fostered by entrepreneurial companies

Abelson, PE (1983) Science
3
FOOD POISONING/DISEASES OUTBREAKS IN HONG KONG
4
Human Genome Project
5
TAGCTTTAGCAAAATCCGTCAAGCAAAATACATCTTCAGTGGGGCAGAAG
ATTATTAAAGATGATATAAA ATCACTTCAGTGTAAACAAAAAGATTTGG
AAAACAGGCTTGCATCTGCTAAGCAGGAGATGGAATGTTGT
CTCAACAACATTCTCAAATCAAAACGCTCAACAGAAAAGAAAGGAAAGTT
TACTCTGCCAGGCAGAGAGA AGCAGGCCACTTCTGATGTGCAGGAGTCT
ACTCAGGAATCAGCTACAGTGGAAAAGTTGGAGGAAGACTG
GGAAATAAACAAGGATTCAGCTGTGGAAATGGCTATGTCAAAACAACTTT
CTCTTAATGCTCAAGAAAGC ATGAAAAACACTGAAGATGAGCGGAAAGT
CAATGAGCTGCAAAATCAACCTTTAGAATTAGATACTATGT
TAAGAAATGAACAATTAGAAGAGATAGAGAAATTATATACCCAGTTGGAA
GCAAAGAAAGCAGCCATTAA GCCACTGGAACAAACAGAATGTCTTAACA
AAACAGAAACTGGGGCCTTGGTTCTCCACAATATAGGATAT
TCGGCACAGCATTTGGACAATTTGCTTCAGGCACTTATTACTTTGAAGAA
AAACAAAGAAAGCCAATATT GTGTCCTCAGAGATTTTCAGGAATACCTT
GCTGCAGTTGAATCTTCAATGAAAGCCTTGTTGACAGACAA
GGAAAGTCTTAAAGTAGGACCACTGGACAGTGTAACGTATCTGGACAAAA
TTAAAAAATTCATAGCATCC ATAGAAAAAGAGAAAGATTCTTTAGGCAA
CTTGAAAATCAAATGGGAGAATTTATCAAACCACGTGACTG
ACATGGATAAGAAATTGTTGGAAAGCCAGATTAAGCAACTTGAACATGGT
TGGGAACAAGTGGAACAGCA GATTCAAAAGAAGTATTCTCAGCAGGTAG
TGGAATATGATGAATTTACAACCCTCATGAATAAGGTACAG
GACACTGAGATTTCTCTGCAACAGCAGCAGCAACATCTACAGTTAAGGCT
GAAGTCTCCAGAAGAACGGG CAGGGAACCAAAGCATGATTGCCTTGACC
ACTGACCTCCAGGCTACCAAGCATGGATTTTCTGTTTTAAA
GGGGCAAGCTGAACTTCAGATGAAGAGGATTTGGGGAGAAAAAGAAAAGA
AGAATTTGGAGGATGGAATA AATAACTTGAAGAAACAATGGGAAACATT
GGAGCCATTACACTTAGAAGCAGAAAATCAGATTAAGAAGT
GTGACATAAGGAACAAGATGAAAGAGACTATCTTATGGGCCAAGAATTTG
TTGGGTGAACTTAATCCCTC CATTCCCCTTCTCCCAGATGACATTCTTT
CACAGATCAGAAAGTGCAAAGTGACACATGATGGCATTCTA
GCTAGGCAGCAGTCTGTGGAATCGTTGGCTGAAGAGGTCAAAGATAAGGT
TCCTAGCCTTACAACCTATG AGGGCGGTGATTTAAATAATACCCTAGAG
GACTTACGGAATCAATACCAAATGCTGGTTTTAAAATCAAC
TCAAAGATCACAGCAATTAGAATTTAAGTTGGAAGAAAGAAGCAATTTTT
TTGCTATAATAAGGAAGTTT CAACTTATGGTTCAAGAAAGTGAAACACT
GATAATTCCCAGGGTGGAGACAGCTGCCACGGAAGCTGAAC
TAAAACATCACCATGTTACTTTGGAGGCATCTCAGAAGGAATTGCAAGAA
ATTGACAGTGGAATCTCAAC ACATCTTCAGGAGCTAACAAACATCTATG
AGGAGCTGAATGTGTTTGAAAGATTATTTCTGGAAGATCAG
TTGAAAAATCTTAAGATTAGGACCAACAGAATACAAAGATTCATTCAGAA
TACATGTAATGAAGTGGAAC ACAAGGTAAAGTTTTGCAGACAATTCCAT
GAAAAAACATCAGCGCTTCAGGAGGAGGCTGACAGTATACA
GCGCAATGAACTATTACTTAATCAAGAAGTAAATAAAGGTGTTAAAGAGG
AGATCTATAATCTTAAAGAC AGACTCACCGCTATTAAGTGTTGCATCTT
ACAGGTATTGAAACTTAAAAAAGTGTTTGACTATATTGGAC
TAAACTGGGATTTTTCACAACTTGACCAATTACAAACCCAAGTATTTGAA
AAAGAAAAGGAACTTGAAGA AAAAATTAAGCAGTTGGACACATTTGAGG
AAGAACATGGCAAATATCAGGCATTATTAAGTAAAATGAGA
GCTATTGATTTGCAAATTAAGAAAATGACTGAAGTAGTACTAAAAGCTCC
TGATAGCTCTCCGGAAAGCA
6
Genome (DNA) -Total DNA content of the haploid
cell -1/2 DNA content of a Diploid cell Proteome
(Protein) -Structural and Relationship (3D /
Function) -The complete protein content of a
Cell/organism (At a given time)
7
Bioinformatics Computation methods employed in
studying life sciences Structural Genomics and
Functional Genomics Proteomics Protein profile
studies Protein-Protein Interactions
Methodology Development
8
The Cell
9
Watson and Crick describe structure of DNA(1953)
10
Central DogmaofMolecular Biology
11
DNAmRNAProtein
Transcription
Reverse Transcription
Translation
Post-Translation Modification PTM
Protein
12
A T(U) G C
13
(No Transcript)
14
Codon Table
R.W. Holley
H.G. Khorana
M.W. Nirenberg
The Nobel Prize in Physiology or Medicine 1968
The mystery underlying the genetic code was
deciphered between 1961-66.
15
(No Transcript)
16
Cellular Biology The study of the chemistry of
life Chemical structure of biomolecules Interactio
ns of biomolecules Synthesis and degradation of
biomolecules (metabolism) Conservation and use of
energy Mechanisms for organizing biomolecules and
coordinating their activities Storage,
transmission and expression of genetic information
17
(No Transcript)
18
The goal of the Gene Ontology Consortium To
produce a dynamic controlled vocabulary that can
be applied to all organisms even as knowledge of
gene and protein roles in cells is accumulating
and changing.
19
(No Transcript)
20
(No Transcript)
21
Genome (DNA) -Total DNA content of the haploid
cell -1/2 DNA content of a Diploid cell Proteome
(Protein) -Structural and Relationship (3D /
Function) -The complete protein content of a
Cell/organism (At a given time)
22
3D Protein modeling Concepts and Protocols
23
Overview

Homology Modeling
Hands-on exercise
Modeling using spdbv
Modeling using InsightII

24
Background - Protein-protein interaction

Drug target
Identify binding interface(s)
active site(s) Investigation
Drug design

25
Background

Interface anatomy has been extensively studied
Mutation studies
Energy calculations

26
Bond Energy (covalent bond)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Non-Bond Energy (non-covalent bond)

Electroststic Interaction (Coulombs law)

Hydrogen bonds (dipole-dipole)
Van der Waals interaction (hydrophobic)

31
Force fields

CHARMM19, 27
Cff
MM2, 3, 4

32
Objectives

Find out intrinsic and extrinsic physicochemical
factors
Visualize and potentially utilize such factors
for protein-protein recognition site
identification

33
Determination of dominant thermodynamic factors
?G RT ln Kd ?G ?H T?S
34
Determination of dominant thermodynamic factors

35
NMR X-ray

NMR
Dynamic
Multiple Models (Each conformation is a model)
Aqueous environment
Limitations
Size of molecule
lt 30kD
Example
1BLQ, 1UBA

X-ray
Static
Only one model
Crystal
Limitations
Not limited by size
Examples
7LYZ

36
3D Structure Database

PDB
Brookhaven National Laboratories
Research Collaboratory for Structural
Bioinformatics (RCSB)-Collaborative effort NIST,
Rutgers and San Diego Super Computing Facility
http//www.rcsb.org
Publically available 3-D structures of Proteins,
Proteins Nucleic Acids (DNA), Proteins
complexed with metals and inhibitor
Experimental methods X-ray and NMR

37
Why 3D Modeling?

Rate of structure solving through NMR or X-ray is
slow compared to the deposition of DNA and
Protein sequences
Crystallization is the bottle-neck (time in
months/years). No generic recipe for
crystallization
Swiss-Prot Release 42.4 of 14-Nov-2003 138,347
entries
PDB as of 11-Nov-03 has 23,188 structures
Membrane proteins are difficult to crystallize
30 of proteome of living things
Knowledge of 3D structure is essential for the
understanding of the protein function
Structural information enhances our understanding
of protein-protein or protein-DNA interactions

38
Comparing Homologous enzymes
Family Ubiquitin Conjugating enzyme 1QCQ
Arabidopsis Thaliana 2AAK Bakers Yeast
Sequence Identity 43
Russell et al, JMB, 269, 423-439 1997
39
Overview of Homology Modeling
Sequence from experiment
Experiments
X-ray, NMR, e-Diffraction
Physicochemical Simulations
Comparative ModelingKnowledge-Based
Modeling
40
http//au.expasy.org/spdbv/text/download.htm
41
Proteomics quantitative and physical mapping of
cellular proteins
A General Concept
42
2D Gel Electrophoresis
43
Contemporary Proteomic Processes
2D-PAGE (1D-PAGE)
3 days (1 day)
Visualization
1-3 hours
In-gel Digestion
overnight
MALDI-TOFMS analysis
lt5 minutes /sample
Database Search using Peptides Masses
Identified
10 minutes /sample
Not identified
Peptide Sequencing (PSD or MS/MS)
1 hour to 1 day
Database Search using Peptide Sequences
10 minutes /sample
44
MALDI ToF Mass Spectrometer
45
Welcome!!
46
Applications of Homology Modeling

Ion Channel proteins
Transmembrane region-no 3D structure available
Used Homology Modeling to build a model for the
channel protein
Used InsightII (Ludi) to model the binding of
inhibitors
Docking to study the drug-receptor interaction

47
Homologous Proteins

Homologous Proteins
Having a common evolutionary origin
Evolved evolutionarily from a common ancestor
Many of the essential proteins (key regulators)
present in humans are also present in other
living organisms (eg. Rat, bacteria )
These essential proteins have to conserve their
functionality throughout evolution
DNA polymerases
DNA replication
Necessary for all organisms
MHC Major Histocompatibility Complex
Antigen presentation to trigger an immune
response
Present in higher Eukoryates, rats and humans

48
Sequence Dissimilarity Structural Similarity

What we already know about homologous proteins
Core region is pretty much conserved (main
secondary structural features)
Most dissimilarity is observed in the surface
(loop) regions
Within homologous proteins secondary-structures
can move relative to each other or even disappear
but neither order nor orientation will differ (a
becoming b etc.)
Sequence similarity is less conserved compared to
Structural similarity

49
Homology Modeling Terminology Basic Assumptions

Terminology
Protein sequence we are modeling is called the
Target
Homologous protein used in the modeling is called
the Template
Basic Assumptions
Similar sequences have similar conformations
Core regions provide excellent template for
modeling the target protein. If the Core regions
share 50 identity, then the two proteins can
almost always be superimposed with an RMSD of 1 Å
or less

50
Overview of Homology Modeling
Bioinformatics Basics Rashidi Buehler
51
Database mining

Why Sequence Comparison?
Search for potential homolog
Identification of evolutionary relationship is
easy when similarity level Is high (gt50)
In a Gene Family how many members are known?
For Comparative/Homology Modeling
two sequences related by divergence from a common
ancestor
What kind of alignment is this?
Global Alignment
Overall alignmentsequence homologs with known
3-D str.
Local Alignment
Best for searching local domains
Gaps cannot be introduced endlessly-Biologically
meaningless

52
PAM250 Matrix (identities at 20 level)
Tryptophan Highly conserved-Hydrophobic core
residue-Important for the structure-difficult to
mutate.W-gtF, W-gtY (aromatic acids are the next
choice to replace W) Cystein Well-known for S-S
linkage Important for structure
Unitary Matrix
53
Searching for Templates

Do a Blast/Fasta or use programs within GCG
(Align, gap, bestfit, etc.) for sequence
alignment. Restrict search only to PDB database
why PDB?
Potentially suitable templates
Blast Score lt 0.001 (protein), lt10(-6)
(nucleotide)
Safe threshold is gt 25-30 identity
In the Twilight Zone (lt 25) How to proceed?
Usually more than one protein is chosen as
templates?
Avoid biasing, to model variants (loops etc),
side chain conformations
Final model will be done using one representative
template (called reference)

54
Structurally Conserved Region (SCR) Modeling

After identifying template(s), the next task is
to identify the SCR
What are SCRs?
Inner core (not the surface exposed loops)
How do we identify them?
Multiple Sequence Alignments, secondary structure
elements
The next step is to align the Structurally
aligned templates with the unknown sequence
No gaps are allowed within the SCR regions
Special sequence alignment algorithm used which
discourages gaps within SCR.

55
Structurally Varibale Region (SVR) Modeling (3
methods)

If the reference protein has similar loops then
it can be copied
Perform a database (derived from PDB) search for
structures with loops
Criterion is the conserved residues flanking the
loop area and the of loop residues
Software usually keep a loop database derived
from PDB.
de novo method of building and constrained
minimization
If the number of residues in the template and the
reference differ

56
Modeling Side Chains

Given that each side-chain can be in one of many
different conformationsMultiple minima problem
Following options are generally used
If the residues are same
Copy the same conformation (whysee scoring
matrices)
If they are different
Use built-in libraries based on known info (PDB)
Random conformations without any collisions

57
Homology Modeling By Example
58
Template Alignment

5 template lysozymeproteins (only a-C shown)
structurally uncorrected multiple sequence
alignment
Reference Red
Query Sequence violet

59
Studying the corrected template alignment

Look at CysHow about theStructural
Cons-veration?
Which regions showstructural variation?

60
Structurally corrected MSA
Made using InsightII, Accelrys
Do you see the location of the variable region
(core or surface)
RMS deviation is kept minimum (lt 1 Angs.)
Structurally corrected MSA
61
Target Core Modeling

Target sequence is aligned with the template or
Structurally Corrected Multiple Sequence
alignment (in case of templates)
Which residues can be aligned to the conserved
block region of the multiple sequence alignment
of the reference protein so that one can copy the
coordinates from the reference to the sequence
Do a sequence alignment using a chosen matrix,
gap penalty etc. of the reference with the model
sequence

62
Target Core Modeling

Target sequence is now aligned with the template
or Structurally Corrected Multiple Sequence
alignment (in case of templates)

Made using InsightII, Accelrys
63
Sequence Alignment
Before Aligning the model sequence to the template
Are these insertions reasonable?
Gap insertion, conserved region split
After Aligning the model sequence to the template
Made using InsightII, Accelrys
Gap insertion
64
Suspect the alignment

Look at the alignment and if the gaps introduced
are not in the surface exposed then go examine
the parameters of the alignment (gap-penalty
etc.)
If the deletions occur at the end-terminus,
surface exposed, not in any recognized secondary
structure, then they may be valid deletions
Finally, copy the coordinates from each conserved
group of one of the most similar sequence
template to the model sequence.

Before alignment 2) wrong alignment parameters 3)
correct alignmentparameters (higher gap penalty)

1
2
3
67
Loop Modeling
68
Side Chains will be added if the template has
identical residues Make sure side-chains not
clashing with the backbone
69
Final Model
70
Homology Model Evaluation

Most automated Homology Modeling software
provides a model, even with an inappropriate
template
How to judge the quality of the model?
Absence of R-factors-No way to evaluate the model
Correct models usually have atomic positions
within the experimental uncertainty limit

71
Final Step Energy Minimization

Why? The final model now has backboneside-chains
loops generated from the template(s)
Has atom clashes and non-optimal conformations
Choose a program to perform Energy Minimization
to repair the model structure (bad contacts)
Swiss-Model uses GROMOS
How many steps of Minimization ?
Vacuum (non-solvent)

72
Identifying Incorrect Models

Hydrophobic residues exposed
Buried polar or ionic residues without the
charges satisfied (H-bonds, salt-bridge etc)
Clashes
Unusual bond-lengths, bond-angles
Sequence alignment is not-optimal
Very large RMSD among the templates

73
Quality of Models

Procheck Stereo-chemical quality of the protein
and residue by residue analysis in figures
http//www.biochem.ucl.ac.uk/roman/procheck/proch
eck.html
PDBREPORT http//www.cmbi.kun.nl/gv/pdbreport

74
CASP Test of the Models

Critical Assessment of Techniques for Protein
Structure
http//predictioncenter.llnl.gov/
Showcase for the latest methods in the structure
prediction area
Once in two years
Competition open in three areas
Homology Modeling, Threading and ab-initio
CASP 1998, 2000 2002 showed the reliability of
Homology Modeling when suitable templatesare
available (gt30, above Twilight Zone)

75
Database of Homology Models

Project, 3D-Crunch (1984)
Project submitted all sequences of Swiss-Prot and
trEMBL to SWISS MODEL server
The resulting homology models (64000) are stored
and available to public from SWISS-MODEL
Repository
Database contains Final models, Entire modeling
projects including aligned coordinates of
templates

76
Database of Homology Models