Adaptive Probabilistic Approach: - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Adaptive Probabilistic Approach:

Description:

Total time to obtain complete backbone information ... and inconsistent values are flagged and given lower probability in the next round of PISTACHIO. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 59
Provided by: Mar5327
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Probabilistic Approach:


1
(No Transcript)
2
Adaptive Probabilistic Approach
Applications to Rapid and Robust NMR Structure
Determination
Hamid R. Eghbalnia Department of Biochemistry and
Mathematics, University of Wisconsin-Madison
3
People who made it all happen
Marco Tonelli Klaas Hallenga Gabriel Cornilescu
Liya Wang
Fariba Assadi-Porter Claudia Cornilescu Shanteri
Singh Rob Tyler Anna Füzery Nick Reiter CESG
Arash Barhami
John Markley Milo Westler Eldon Ulrich Jurgen
Doreleijers Mark Anderson
4
brazzein - 53 a.a.
ubiquitin - 76 a.a.
flavodoxin - 176 a.a.
HNCO
HN(CO)CA
HNCA
CBCA(CO)NH
HN(CA)CB
98 of spin systems assigned with PINE
95 of spin systems assigned with PINE
96 of spin systems assigned with PINE
HNCACB
14 h
48 h
12 h
Total time to obtain complete backbone
information - assignment, 2o structure, other
corrections.
5
  • Central ideas in the adaptive probabilistic
    approach
  • Implementation of these ideas as various tools
  • New tools and extensions to our approach
  • A preview of almost-published and unpublished

6
For rapid and robust NMR structure determination,
we need
  • To find a formulation for the problem that is
  • Consequential
  • Addresses the current and future challenges posed
  • Robust
  • Tractable
  • Measurable
  • merit for the solution can be stated.

Rapid and robust NMR structure determinationAu
tomation
7
The (simplified) big picture
Predictions from sequence and available data
chemical shifts, appropriate strategy
Construct design protein production and
labeling screening for suitability
Data deposition and publication
Data collection
Structure refinement and validation
Chemical shift assignments
Structure determination
Secondary structure and other constraint
determinations
8
The big picture (in reality)
Predictions from sequence and available data
chemical shifts, appropriate strategy
Construct design protein production and
labeling screening for suitability
Data deposition and publication
Data collection
Structure refinement and validation
Chemical shift assignments
Structure determination
Secondary structure and other constraint
determinations
9
Decision trees
The basic paradigm for translation of experts
approach to a computer algorithm is to use that
analogy of decision trees.
Decision variables
Decision options
What are the challenges?
10
A useful analogy to 20 questions
Choose a number between 1 and 1000000
gt 500000?
lt 500000?
11
More challenging version of 20 questions!
Responses to the queries are not yes/no
answers, and they are not always the truth!
12
Local to global structures
The challenge put together local data into
globally coherent information
13
What is local to global?
HN(CO)CA
Local information
These examples from NMR structure determination
are representative of a more general phenomena in
biology.
14
Integrating data collection and analysis --
automation
  • Automating analysis in biology is less like
    automating a factory or a sample changer
  • We do not assemble the same product over and
    over. Interesting proteins are unique.
  • Automating analysis in biology is more like
    creating a smart robot to deal with new
    situations as they arise
  • We give the robot the flexibility to interpret
    unknown situations and adapt as needed.
  • For typical fuzzy real-world situations, a
    probabilistic approach provides flexibility and a
    decision-based approach provides adaptability

15
Integrating data collection and analysis in NMR
  • The strategy may depend on
  • Size of the protein (e.g Relaxation)
  • Folds and fold topology (e.g how much overlap)
  • hetero/homo-multi/mono(mer) (e.g degeneracy)
  • Existence of homologs (e.g a priori knowledge)
  • Required resolution (e.g desired accuracy)
  • etc
  • Successful strategies generate more value from
    a given quantity of data.

16
The larger impact of generating more value from
data
  • The idea of generating more value from data is
    emerging as a key problem in biological
    investigations
  • Today, analyzing biological systems remains a
    challenging, sometimes ad hoc, and human
    knowledge-intensive endeavor
  • Most existing methods fail to scale when
    presented with large systems-oriented data sets
  • Robust, reusable, and computationally feasible
    approaches are needed that require little
    subjective intervention but offer tools for
    scientific interpretation
  • This is a tough target

17
Adaptive probabilistic approach generating more
value from data
  • The adaptive probabilistic paradigm offers a
    novel and promising approach to obtaining more
    value from available data (database and
    experimental). It has the potential of becoming
    a key approach for addressing important
    biological questions.
  • Data collection and analysis
  • Protein structure and refinement
  • Dynamics of molecules
  • Function, binding and interaction
  • Fingerprinting and profiling metabolites
  • RNA structure determination and refinement

18
Adaptive probabilistic approach
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
19
A rigorous Adaptive Probabilistic
  • Non-deterministic or randomized
  • No idea what nature has in store.
  • Try options without any preferences
  • Probabilistic
  • Have observed nature and collected data
  • Use statistics to guide my decisions
  • Use models on top of statistics
  • Adaptive
  • Adjust the cost of decisions to based on the
    known

20
Example existing tools for NMR
21
Adaptive probabilistic approach
MLAAKEGAAVSNTPLKK
22
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
Data collection
23
Recording multidimensional experiments is time
consuming
Reduced dimensionality (RD) can be viewed as an
alternative sampling strategy that leads to
collecting less data All RD experiments lose
information. The adaptive and probabilistic
approach taken by HIFI minimizes the information
loss - not simply convergence to n peaks!
t1
t2
t3
128x12816,384 FIDs 136.5 h
High-resolution Iterative Frequency
Identification (HIFI)
simultaneously evolving indirect frequencies are
extracted from 2D RD spectra
multiple tilted planes are used
angle of each tilted plane is chosen adaptively
in real time
24
Reduced dimensionality techniques
RD planes
tilted planes of multidimensional spectra
25
HIFI on CBCA(CO)NH
Combined peaks from HIFI planes are in magenta
Hand picked peaks from 3D spectrum in green
26
Simplified description of the HIFI NMR approach
Eghbalnia et al (2005) JACS 12712528
27
Summary HIFI
  • HIFI versions of nearly all backbone experiments
    are available
  • HIFI is being developed for sidechain experiments
  • HIFI NOE data have been collected -- analysis is
    proceeding
  • HIFI is now completely automated for backbone
    data collection by six robust backbone
    experiments
  • Automation of additional backbone experiments can
    be implemented very easily

28
HIFI applications
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
Restraint generation HIFI RDC
29
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
Automated assignment
30
PISTACHIO (Probabilistic Identification of Spin
Systems and their Assignments including
Coil-Helix Inference as Output)
  • Steps
  • Parse peak lists associated with particular
    experiments into the set of all possible
    tripeptide spin systems specified by the peptide
    sequence
  • Compute probability scores for matching the
    chemical shifts of the overlapping tripeptide
    spin systems to residues in the sequence (this
    makes use of our prior analysis of the BMRB
    database of chemical shifts)
  • Assemble the overlapping tripeptides to match the
    sequence and to achieve the maximum probability
    for correct assignments (the approach used is
    similar to ones used in problems of statistical
    physics and combinatorial optimization)

Use existing data to predict an assignment
configuration.
Use prediction to postulate a configuration
distribution
Compare postulated local configuration to
globally minimal solutions
31
http//bija.nmrfam.wisc. edu/PISTACHIO/
PISTACHIO is run by uploading files in XEASY or
NMR-STAR format Data from up to 15 standard
double- and triple-resonance experiments can be
used as input. Other types of data can be
accommodated on request.
Aug 2005 to June 2006
32
PISTACHIO uses a new data format for
probabilistic assignments
An NMR-STAR data format for probabilistically
assigned protein NMR data has been developed in
collaboration with BMRB The PISTACHIO server
outputs data in this format Algorithms under
development carry the probabilistic assignments
forward and refines them as the structure
determination proceeds BMRB accepts data
depositions in this format A graphical interface
for PISTACHIO / LACS / PECAN results is nearing
completion and will be released soon
33
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
Validation of 13C referencing detection of
possible mis-assignments
34
Carbon chemical shifts irrespective of structure
can be represented by three Gaussian distributions
Data for all alanines in RefDB
Data for 13Ca as a function of d13Ca d13Cb
35
Linear Analysis of Chemical Shifts (LACS) plot
Data for all valine residues in RefDB
L. Wang et al. (2005) J. Biomol. NMR, 3213
36
LACS of a single protein can be used to identify
problems with referencing and possible assignment
outliers
This intercept should be at (0,0) for properly
referenced data
L. Wang et al. (2005) J. Biomol. NMR, 3213-22
37
We have used LACS to re-reference the BMRB
database
  • 11 ( 1.0 ppm )
  • 26 ( 0.5 ppm )
  • 46 ( 0.3 ppm )

L. Wang et al. (2005) J. Biomol. NMR, 3213-22
38
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
2o structure determination
39
PECAN (Protein Energetic Conformational Analysis
from NMR chemical shifts) analysis of secondary
structure from assigned chemical shifts and the
protein sequence energy model for a particular
protein (bmr 4083)
Color key helix, strand, non-helix / non-strand
Energy
Residue number
PECANs average accuracy better than 90 across
all structural regions measured on the largest
data set to date.
Eghbalnia et al. (2005) J. Biomol. NMR 3271-81
40
PECAN (Protein Energetic Conformational Analysis
from NMR chemical shifts) analysis of secondary
structure from assigned chemical shifts and the
protein sequence example of output
helix
transition region
strand
Eghbalnia et al. (2005) J. Biomol. NMR 3271-81
41
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
PINE
42
PINE
PINE combines information in order to refine
probabilities that reflect our state of knowledge
Use existing data to predict an assignment
configuration.
Use existing data to predict an secondary
structure configuration.
Use assignments to predict an chemical shift
configuration.
Use prediction to postulate a configuration
distribution
Use prediction to postulate a configuration
distribution
Use prediction to postulate a configuration
distribution
Compare postulated local configuration to
globally minimal solutions
Compare postulated local configuration to
globally minimal solutions
Compare postulated local configuration to
globally minimal solutions
43
Integration of probabilistic tools current
version of PINE combines PISTACHIO, LACS, and
PECAN
Example HIFI data for ubiquitin
In PINE, assignments from PISTACHO are validated
by LACS, 2o structure is assigned by PECAN, and
inconsistent values are flagged and given lower
probability in the next round of PISTACHIO. The
process is repeated until consistency is achieved.
PISTACHIO alone
PINE
44
brazzein - 53 a.a.
ubiquitin - 76 a.a.
flavodoxin - 176 a.a.
HNCO
HN(CO)CA
HNCA
CBCA(CO)NH
HN(CA)CB
98 of spin systems assigned with PINE
95 of spin systems assigned with PINE
96 of spin systems assigned with PINE
HNCACB
14 h
48 h
12 h
Total time to obtain complete backbone
information - assignment, 2o structure, other
corrections.
45
(No Transcript)
46
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
Chemical shift prediction
47
Chemical shift prediction
  • Chemical shifts are the most easily obtained and
    most precisely measured observables in
    biomolecular NMR
  • Chemical shifts are highly sensitive to structure
  • Chemical shifts are coordinate free
  • Our prediction of chemical shifts simulates an
    adaptive probabilistic walk on the space of
    known chemical shifts (BMRB) using a simple
    principle
  • Protein folding is multi-rate stochastic
    process that is rate-insensitive in each rate
    domain
  • Results can be used to decide where to look for
    unobserved peaks, to derive approximate values
    for missing peaks, and to produce restraints for
    structural refinement

48
Chemical shift prediction -- TONES
Time Evolution
Manuscript in preparation
49
Summary adaptive probabilistic tools for NMR
  • HIFI-NMR, a probabilistic approach to data
    collection, aims to extract multidimensional NMR
    peak positions in an optimally efficient manner
  • PISTACHIO, turns peak lists associated with a
    protein sequence into probabilistic backbone and
    side chain assignments
  • LACS provides the means for checking data sets
    for possible referencing problems and
    misassignments in advance of a structure
    determination
  • PECAN offers a reliable probabilistic analysis of
    protein secondary structure
  • PINE incorporates PISTACHIO, LACS and PECAN
  • Work in progress promises further insights into
    connections between chemical shifts and structure
  • These algorithms and associated software are
    being made available from the NMRFAM website
    (www.nmrfam.wisc.edu)

50
Near-term This year
  • HIFI-NMR
  • Disseminate HIFI for backbone experiments
    algorithms. Add visualization.
  • Disseminate HIFI-RDC Incorporate side-chain
    experiments into HIFI package.
  • PINE
  • Make experimental PINE server available Make
    faster server visualizations available (in
    collaboration with BMRB)
  • HIFI-NMR
  • Faster better resolved
  • Larger proteins
  • Other applications
  • ALMOND
  • Probabilistic restraint model coupled to chemical
    shifts
  • TONES
  • Disseminate, with additional application
  • PINE
  • More detailed information about secondary
    structure

51
Adaptive probabilistic approach
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
52
Progress toward automated probabilistic structure
determination
BACKBONE
SIDE CHAINS
NOESY
STRUCTURE REFINEMENT
53
Almonds A probabilistic relationship between
chemical shifts and conformation space
  • We have carefully refined the relationship of
    sequence and torsion angles - specifically
    triples.
  • To establish the probabilistic relationship, we
    need a more precise understanding of the
    relationship between chemical shifts and the
    assembly of tripeptides. This is particularly
    crucial in the difficult parts of the 2o
    structure.
  • We have made a lot of progress in
    deconstructing the relationship between
    sequence, chemical shifts, and torsion angles.

54
What do we mean by random coil
The region of (?,?)-space sampled in the absence
of any dominant stabilizing interactions The
experimental random coil state is the
energy-weighted distribution of the ensemble of
such conformational states We can use the LACS
approach to remove bias in the reference state
introduced by stabilizing interactions ---
result unbiased random coil chemical shift
(uRCCS) values
55
Derivation of unbiased random coil chemical shift
(uRCCS) values LACS plot of adjusted RefDB
values for Val
L. Wang, H. R. Eghbalnia et al., manuscript in
press
56
Stepwise refinement of the model
57
Establishing a simple model to incorporate into
ALMONDS
We want to build a simple model where the
parameters are related to the observed effects A
multivariate fitting of database values will not
be useful for our application
Wang et al, J. Biomol. NMR, in press
58
Acknowledgments
NIH Grants 1K22 LM8992 NIH Grants U54 GM074901
P50 GM64598 NIH Grant P41 RR02301
Write a Comment
User Comments (0)
About PowerShow.com