Title: BCB 444544
1 BCB 444/544
- Lecture 21
- Protein Structure Visualization, Classification
Comparison -
- Secondary Structure Prediction
- 21_Oct10
2 Required Reading (before lecture)
- Mon Oct 8 - Lecture 20
- Protein Secondary Structure Prediction
- Chp 14 - pp 200 - 213
- Wed Oct 10 - Lecture 21
- Protein Tertiary Structure Prediction
- Chp 15 - pp 214 - 230
-
- Thurs Oct 11 Fri Oct 12 - Lab 7 Lecture 22
- Protein Tertiary Structure Prediction
- Chp 15 - pp 214 - 230
-
3 Assignments Announcements
- ALL HomeWork 3
- vDue Mon Oct 8 by 5 PM
- HW544 HW544Extra 1
- vDue Task 1.1 - Mon Oct 1 by noon
- Due Task 1.2 Task 2 - Fri Oct 12 by 5 PM
- 444 "Project-instead-of-Final" students should
also submit - HW544Extra 1
- vDue Task 1.1 - Mon Oct 8 by noon
- Due Task 1.2 - Fri Oct 12 by 5 PM
- ltTask 2 NOT required for BCB444 studentsgt
-
4 Seminars this Week - Thurs
- BCB List of URLs for Seminars related to
Bioinformatics - http//www.bcb.iastate.edu/seminars/index.html
- Oct 11 Thurs
- Dr. Klaus Schulten (Univ of Illinois) - Baker
Center Seminar The
Computational Microscope? 210 PM in E164
Lagomarcino http//www.bioinformatics.iastate.edu/
seminars/abstracts/2007_2008/Klaus_Schulten_Semina
r.pdf - Dr. Dan Gusfield (UC Davis) - Computer Science
Colloquium ReCombinatorics Combinatorial
Algorithms for Studying History of Recombination
in Populations 330 PM in Howe Hall Auditorium - http//www.cs.iastate.edu/colloq/new/gusfield.sh
tml -
5 Seminars this Week - Fri
- BCB List of URLs for Seminars related to
Bioinformatics - http//www.bcb.iastate.edu/seminars/index.html
- Oct 12 Fri
- Dr. Edward Yu (Physics/BBMB, ISU) - BCB Faculty
Seminar TBA "Structural Biology" (see
URL below) 210 PM in 102 Sci
http//webdev.its.iastate.edu/web
news/data/site_gdcb_dept_seminars/30/webnewsfilefi
eld_abstract/Dr.-Ed-Yu.pdf - Dr. Srinivas Aluru (ECprE, ISU) - GDCB Seminar
- Consensus Genetic Maps A Graph Theoretic
Approach - 410 PM in 1414 MBB
- http//webdev.its.iastate.edu/webnews/data/site_gd
cb_dept_seminars/35/webnewsfilefield_abstract/Dr.-
Srinivas-Aluru.pdf
6Chp 12 - Protein Structure Basics
- SECTION V STRUCTURAL BIOINFORMATICS
- Xiong Chp 12 Protein Structure Basics
- Amino Acids
- Peptide Bond Formation
- Dihedral Angles
- Hierarchy
- Secondary Structures
- Tertiary Structures
- Determination of Protein 3-Dimensional Structure
- Protein Structure DataBank (PDB)
7Protein Structure Function
- Protein structure - primarily determined by
sequence - Protein function - primarily determined by
structure - Globular proteins compact hydrophobic core
hydrophilic surface - Membrane proteins special hydrophobic surfaces
- Folded proteins are only marginally stable
- Some proteins do not assume a stable "fold" until
they bind to something Intrinsically disordered - Predicting protein structure and function can be
very hard - -- fun!
86 Main Classes of Protein Structure
- 1) a-Domains
- Bundles of helices connected by loops
- 2) ?-Domains
- Mainly antiparallel sheets, usually 2 sheets
forming sandwich - 3) a????Domains
- Mainly parallel sheets with intervening helices,
mixed sheets - 4) ?a????Domains
- Mainly segregated helices and sheets
- 5) Multidomain (a? ? ???
- Containing domains from more than one class
- 6) Membrane cell-surface proteins
9Protein Structure Databases
- PDB - Protein Data Bank
- http//www.rcsb.org/pdb/
- (RCSB) - THE protein structure database
- MMDB - Molecular Modeling Database
- http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?db
Structure - (NCBI Entrez) - has "added" value
- MSD - Molecular Structure Database
http//www.ebi.ac.uk/msd - Especially good for interactions binding sites
-
10PDB (RCSB) - recently "remediated"
http//www.rcsb.org/pdb
11Structure at NCBI http//www.ncbi.nlm.nih.gov/Str
ucture
12MMDB at NCBI http//www.ncbi.nlm.nih.gov/
Structure/MMDB/mmdb.shtml
13MMDB Molecular Modeling Data Base
- Derived from PDB structure records
- "Value-added" to PDB records includes
- Integration with other ENTREZ databases tools
- Conversion to parseable ASN.1 data description
language - Data also available in mmCIF XML (also true
for PDB now) - Correction of numbering discrepancies in
structure vs sequence - Validation
- Explicit chemical graph information (covalent
bonds) - Integrated tool for identifying structural
neighbors Vector Alignment Search Tool (VAST) - http//www.ncbi.nlm.nih.gov/Structure/VAST/vastsea
rch.html
14MSD Molecular Structure Database http//www.ebi.
ac.uk/msd/
15wwPDB World Wide PDB http//www.wwpdb.org
16Experimental Determination of 3D Structure
- 2 Major Methods to obtain high-resolution
structures - X-ray Crystallography (most PDB structures)
- Nuclear Magnetic Resonance (NMR) Spectroscopy
- Note Advantages Limitations of each method
- (See your lecture notes textbook)
- For more info http//en.wikipedia.org/wiki/Pro
tein_structure - Other methods (usually lower resolution, at
present) - Electron Paramagnetic Resonance (EPR - also
called ESR, EMR) - Electron microscopy (EM)
- Cryo-EM
- Scanning Probe Microscopies (AFM - Atomic Force
Microscopy) - http//www.uweb.engr.washington.edu/research/tutor
ials/SPM.pdf - Circular Dichroism (CD), several other
spectroscopic methods
17Chp 13 - Protein Structure Visualization,
Comparison Classification
- SECTION V STRUCTURAL BIOINFORMATICS
- Xiong Chp 13
- Protein Structure Visualization, Comparison
Classification - Protein Structural Visualization
- Protein Structure Comparison
- Protein Structure Classification
18Protein Structure Visualization
- RASMOL decendents PyMol, MolMol
- http//www.umass.edu/microbio/rasmol/index2.htm
- Cn3D - esp. good for structural alignments
- http//www.biosino.org/mirror/www.ncbi.nlm.nih.go
v/Structure/cn3d/ - CHIME (Protein Explorer)
- http//www.umass.edu/microbio/chime/getchime.htm
- MolviZ.Org
- http//www.umass.edu/microbio/chime
- Deep View Swiss-PDB Viewer
- http//www.expasy.org/spdbv
19PyMol http//pymol.sourceforge.net/
20Cn3D http//www.ncbi.nlm.nih.gov/Structure/CN3
D/cn3d.shtml
21Cn3D Displaying 3' Structures
Chloroquine
22Cn3D Structural Alignments
NADH
23Protein Explorer (Chime)http//www.umass.edu/micr
obio/chime/pe_beta/pe/protexpl/frntdoor.htm
24Protein Structure Comparison Methods
We will skip this for now
- 3 Basic Approaches for Aligning Structures
- Intermolecular -
- Intramolecular -
- Combined -
- DALI/FSSP (most commonly used)
- Fully automated structure alignments
- DALI server http//www.ebi.ac.uk/dali/index.html
- DALI Database (fold classification) http//ek
hidna.biocenter.helsinki.fi/dali/start
25Protein Structure Classification
- SCOP Structural Classification of Proteins
- Levels reflect both evolutionary and structural
relationships - http//scop.mrc-lmb.cam.ac.uk/scop
- CATH Classification by Class,
Architecture,Topology Homology http//cathwww.b
iochem.ucl.ac.uk/latest/ - DALI - (recently moved to EBI reorganized)
- DALI Database (fold classification) http/
/ekhidna.biocenter.helsinki.fi/dali/start
Each method has strengths weaknesses.
26SCOP - Structure Classification http//scop.mrc-l
mb.cam.ac.uk/scop/
27CATH - Structure Classification
http//www.cathdb.info/latest/index.html
28Chp 14 - Secondary Structure Prediction
- SECTION V STRUCTURAL BIOINFORMATICS
- Xiong Chp 14
- Protein Secondary Structure Prediction
- Secondary Structure Prediction for Globular
Proteins - Secondary Structure Prediction for Transmembrane
Proteins - Coiled-Coil Prediction
29Secondary Structure Prediction
- Has become highly accurate in recent years (gt85)
- Usually 3 (or 4) state predictions
-
- H ?-helix
- E ?-strand
- C coil (or loop)
- (T turn)
30Secondary Structure Prediction Methods
- 1st Generation methods
- Ab initio - used relatively small dataset of
structures available - Chou-Fasman - based on amino acid propensities
(3-state) - GOR - also propensity-based (4-state)
- 2nd Generation methods
- based on much larger datasets of structures now
available - GOR II, III, IV, SOPM
- 3rd Generation methods
- Homology-based Neural network based
- PHD, PSIPRED, SSPRO, PROF, HMMSTR
- Meta-Servers
- combine several different methods
- Consensus Ensemble based
- JPRED, PredictProtein
31Secondary Structure Prediction Servers
- Prediction Evaluation?
- Q3 score - of residues correctly predicted
(3-state) - in cross-validation experiments
- Best results? Meta-servers
- http//expasy.org/tools/ (scroll for 2'
structure prediction) - http//www.russell.embl-heidelberg.de/gtsp/secstru
cpred.html - JPred www.compbio.dundee.ac.uk/www-jpred
- PredictProtein http//www.predictprotein.org/
Rost, Columbia - Best individual programs? ??
- CDM http//gor.bb.iastate.edu/cdm/
SenJernigan, ISU - GOR V http//gor.bb.iastate.edu/
KloczkowskyJernigan, ISU
32Consensus Data Mining (CDM)
- Developed by Jernigan Group at ISU
- Basic premise combination of 2 complementary
methods can enhance performance by harnessing
distinct advantages of both methods combines
FDM GOR V - FDM - Fragment Data Mining - exploits
availability of sequence-similar fragments in the
PDB, which can lead to highly accurate prediction
- much better than GOR V - for such fragments,
but such fragments are not available for many
cases - GOR V - Garnier, Osguthorpe, Robson V - predicts
secondary structure of less similar fragments
with good performance these are protein
fragments for which FDM method cannot find
suitable structures - For references additional details
http//gor.bb.iastate.edu/cdm/
33Secondary Structure Prediction for Different
Types of Proteins/Domains
- For Complete proteins
- Globular Proteins - use methods previously
described - Transmembrane (TMM) Proteins - use special
methods - (next slides)
- For Structural Domains many under development
- Coiled-Coil Domains (Protein interaction
domains) - Zinc Finger Domains (DNA binding domains),
- others
-
34SS Prediction for Transmembrane Proteins
- Transmembrane (TM) Proteins
- Only a few in the PDB - but 30 of cellular
proteins are membrane-associated ! - Hard to determine experimentally, so prediction
important - TM domains are relatively 'easy' to predict!
- Why? constraints due to hydrophobic environment
- 2 main classes of TM proteins
- ??- helical
- ?- barrel
35SS Prediction for TM ?-Helices
- ??-Helical TM domains
- Helices are 17-25 amino acids long (span the
membrane) - Predominantly hydrophobic residues
- Helices oriented perpendicular to membrane
- Orientation can be predicted using "positive
inside" rule - Residues at cytosolic (inside or cytoplasmic)
side of TM helix, near hydrophobic anchor are
more positively charged than those on lumenal
(inside an organelle in eukaryotes) or
periplasmic side (space between inner outer
membrane in gram-negative bacteria) - Alternating polar hydrophobic residues provide
clues to interactions among helices within
membrane - Servers?
- TMHMM or HMMTOP - 70 accuracy - confused by
hydrophobic signal peptides (short hydrophobic
sequences that target proteins to the
endoplasmic reticulum, ER) - Phobius - 94 accuracy - uses distinct HMM
models for TM helices - signal peptide sequences
36SS Prediction for TM ?-Barrels ?
- ?-Barrel TM domains ?
- ?-strands are amphipathic (partly hydrophobic,
partly hydrophilic) - Strands are 10 - 22 amino acids long
- Every 2nd residue is hydrophobic, facing lipid
bilayer - Other residues are hydrophilic, facing "pore" or
opening - Servers? Harder problem, fewer servers
- TBBPred - uses NN or SVM (more on these ML
methods later) - Accuracy ?
37Prediction of Coiled-Coil Domains
- Coiled-coils
- Superhelical protein motifs or domains, with two
or more interacting ?-helices that form a
"bundle" - Often mediate inter-protein ( intra-protein)
interactions - 'Easy' to detect in primary sequence
- Internal repeat of 7 residues (heptad)
- 1 4 hydrophobic (facing helical interface)
- 2,3,5,6,7 hydrophilic (exposed to solvent)
- Helical wheel representation - can be used
manually detect these, based on amino acid
sequence - Servers?
- Coils, Multicoil - probability-based methods
- 2Zip - for Leucine zippers special type of CC
in TFs - characterized by Leu-rich motif
L-X(6)-L-X(6)-L-X(6)-L
38Chp 15 - Tertiary Structure Prediction
- SECTION V STRUCTURAL BIOINFORMATICS
- Xiong Chp 15
- Protein Tertiary Structure Prediction
- Methods
- Homology Modeling
- Threading and Fold Recognition
- Ab Initio Protein Structural Prediction
- CASP