Title: DNA/Protein%20structure-function%20analysis%20and%20prediction
1DNA/Protein structure-function analysis and
prediction
- Protein-protein Interaction (PPI)
- Protein-protein Interaction
- Interfaces
- Solvation
- Energetics
- Conformational change
- Allostery
- Prediction
- Gene Cluster
- Phylogenetic Profile
- Rosetta Stone
- Sequence co-evolution
- Random Decision forest
- Docking
- Examples
2PPI Characteristics
- Universal
- Cell functionality based on protein-protein
interactions - Cyto-skeleton
- Ribosome
- RNA polymerase
- Numerous
- Yeast
- 6.000 proteins
- at least 3 interactions each
- 18.000 interactions
- Human
- estimated 100.000 interactions
- Network
- simplest homodimer (two)?
- common hetero-oligomer (more)?
- holistic protein network (all)?
3Interface Area
- Contact area
- usually gt1100 Å2
- each partner gt550 Å2
- each partner loses 800 Å2 of solvent accessible
surface area - 20 amino acids lose 40 Å2
- 100-200 J per Å2
- Average buried accessible surface area
- 12 for dimers
- 17 for trimers
- 21 for tetramers
- 83-84 of all interfaces are flat
- Secondary structure
- 50 ?-helix
- 20 ?-sheet
- 20 coil
- 10 mixed
- Less hydrophobic than core, more hydrophobic than
exterior
4Complexation Reaction
- A B ? AB
- Ka AB/AB ? association
- Kd AB/AB ? dissociation
- Free energy
- DGd -RTln Kd
- Kd exp(-DGd / RT)?
- (R 8.3144 J mol-1 K-1 )?
5Experimental Methods
- 2D (poly-acrylamide) gel electrophoresis ? mass
spectrometry - Liquid chromatography
- e.g. gel permeation chromatography
- Binding study with one immobilized partner
- e.g. surface plasmon resonance
- In vivo by two-hybrid systems or FRET
- Binding constants by ultra-centrifugation,
micro-calorimetry or competition - experiments with labelled ligand
- e.g. fluorescence, radioactivity
- Role of individual amino acids by site directed
mutagenesis - Structural studies
- e.g. NMR or X-ray
6PPI Network
http//www.phy.auckland.ac.nz/staff/prw/biocomplex
ity/protein_network.htm
7Protein-protein interactions
- Complexity
- Multibody interaction
- Diversity
- Various interaction types
- Specificity
- Complementarity in shape and binding properties
8Binding vs. Localization
strong
Non-obligatetriggered transient e.g. GTPPO4-
Non-obligatepermanente.g. antibody-antigen
Obligateoligomers
Non-obligateco-localised e.g. in membrane
Non-obligateweak transient
weak
co-expressed
different places
9Some terminology
- Transient interactions
- Associate and dissociate in vivo
- Weak transient
- dynamic oligomeric equilibrium
- Strong transient
- require a molecular trigger to shift the
equilibrium - Obligate PPI
- protomers not stable structures on their own
- (functionally obligate)?
10Strong medium weak
- Nanomolar to sub-nanomolar
- Kd lt 10-9
- Micromolar to nanomolar
- 10-6 gt Kd gt 10-9
- Micromolar
- 10-3 gt Kd gt 10-6
- A B ? AB
- Kd AB/AB ? dissociation
11Analysis of 122 Homodimers
- 70 interfaces single patched
- 35 have two patches
- 17 have three or more
12Patches
- Cluster in different domains
- (structurally defined units often with specific
function)?
two domains
anticodon-binding
catalytic
13Interfaces
14Interface
rim
core
15Interface composition
- Composition of interface essentially the same as
core - But surface area can be quite different!
16Propensities
- Interface vs. surface propensities
- as ln(fint/fsurf)?
17Conformational Change
- Chaperones
- extreme conformational changes upon complexation
- ligand unfolds within the chaperone GroEL/GroES
- Allosteric proteins
- conformational change at 'active' site
- ligand binds to 'regulating' site
- Peptides
- often adopt 'bound' conformation
- different from the 'free' conformation
18Allostery 1
- Regulation by 'remote' modulation of binding
affinity (complex strength)?
www.blc.arizona.edu/courses/181gh/rick/energy/allo
stery.html
19Allostery 2
- Substrate binding is cooperative
- Binding of first substrate at first active site
- stimulates active shape
- promotes binding of second substrate
20Allostery 3
- Committed step of metabolic pathway
- regulated by an allosteric enzyme
- Pathway end product
- can regulate the allosteric enzyme for the first
committed step - Inhibitor binding favors inactive form
21DNA/Protein structure-function analysis and
prediction
- Protein-protein Interaction (PPI)
- Protein-protein Interaction
- Interfaces
- Solvation
- Energetics
- Conformational change
- Allostery
- Prediction
- Gene Cluster
- Phylogenetic Profile
- Rosetta Stone
- Sequence co-evolution
- Random Decision forest
- Docking
- Examples
22Predicting Protein-Protein Interactions
- Gene Cluster
- Gene neighborhood
- Phylogenetic Profile
- Co-occurrence across species/genomes
- Rosetta Stone
- Occurrence of protein with domains linked
- Sequence co-evolution
- Tree correlation indicated functional relation
- Random Decision forest
- Using data on domain interactions
- Shoemaker Panchenko, PLOS-CB 2007 3 e43
23Gene Cluster / Neighborhood
24Gene Cluster / Neighborhood
- Genes with closely related functions encoding
potentially interacting proteins - transcribed as a single unit (operon) in bacteria
- co-regulated in eukaryotes
- Operons can be predicted from intergenic distance
- Neutral evolution tends to shuffle gene order
between distantly related organisms - but gene clusters or operons that encode
co-regulated genes are usually conserved - operons found by gene neighbor methods provide
additional evidence about functional linkage
25Phylogenetic Profile
26Phylogenetic Profile
- hypothesis that functionally linked and
potentially interacting nonhomologous proteins
co-evolve and have orthologs in the same subset
of organisms - components of complexes and pathways should be
present simultaneously in order to perform their
functions. - phylogenetic profile is a vector of N elements
(number of genomes) - presence/absence of protein in genome is 1 or
0 at each position of a profile. - clustered using bit-distance measure
- proteins in a cluster are considered functionally
related. - also for protein domains instead of entire
proteins
27Rosetta Stone
28Rosetta Stone
- infer protein interactions from sequences in
different genomes - some interacting proteins/domains have homologs
that are fused into one protein - a so-called Rosetta Stone protein
- Apparently, gene fusion can occur to optimize
co-expression of genes encoding for interacting
proteins.
29Sequence co-evolution
30Sequence co-evolution
- interacting proteins often co-evolve so changes
in one protein leading to the loss of function or
interaction can be compensated by changes in the
other - orthologs of coevolving proteins also tend to
interact - infer unknown interactions in other genomes
- similarity between phylogenetic trees of two
non-homologous interacting protein families - correlation coefficient between the distance
matrices - requires correspondence between the matrix
elements / tree branches (i.e. ortholog
relations)? - align distance matrices to minimize difference
- predicted interactions correspond to aligned
cols - max. 30 proteins in a family
31Classification / Random Decision Forest
32Random Forest Decision
- Decision trees based on domains of interacting
and non-interacting proteins - All possible combinations of interacting domains
- vector of length N (different domain types or
features)? - 2, 1, or 0 found in both, one, or no protein of
pair - experimental training set of interacting protein
pairs - decision tree (many trees)
- defines the best splitting feature at each node
- from a randomly selected feature subspace
- best feature is selected based on goodness of
fit, - can discriminate interacting and non-interacting
- stops growing the tree when all pairs at a given
node are well-separated - Traverse the tree to classify an unknown protein
pair
33DNA/Protein structure-function analysis and
prediction
- Protein-protein Interaction (PPI) and Docking
- Protein-protein Interaction
- Interfaces
- Solvation
- Energetics
- Conformational change
- Allostery
- Prediction
- Docking
- Search space
- Docking methods
- Examples
34Docking - ZDOCK
- Protein-protein docking
- 3-dimensional (3D) structure of protein complex
- starting from 3D structures of receptor and
ligand - Rigid-body docking algorithm (ZDOCK)
- pairwise shape complementarity function
- all possible binding modes
- using Fast Fourier Transform algorithm
- Refinement algorithm (RDOCK)?
- top 2000 predicted structures
- three-stage energy minimization
- electrostatic and desolvation energies
- molecular mechanical software (CHARMM)?
- statistical energy method (Atomic Contact
Energy)? - 49 non-redundant unbound test cases
- near-native structure (lt2.5Å) for 37 test cases
- for 49 within top 4
35Protein-protein docking
- Finding correct surface match
- Systematic search
- 2 times 3D space!
- Define functions
- 1 on surface
- ? or ? inside
- 0 outside
?
?
36Protein-protein docking
- Correlation function
- C????? 1/N3 ?o ?p ?q exp2?i(o? p? q?)/N
Co,p,q
37Docking Programs
- ZDOCK, RDOCK
- AutoDock
- Bielefeld Protein Docking
- DOCK
- DOT
- FTDock, RPScore and MultiDock
- GRAMM
- Hex 3.0
- ICM Protein-Protein docking
- KORDO
- MolFit
- MPI Protein Docking
- Nussinov-Wolfson Structural Bioinformatics Group