Adaptive Probabilistic Approach:

About This Presentation

Title:

Adaptive Probabilistic Approach:

Description:

Total time to obtain complete backbone information ... and inconsistent values are flagged and given lower probability in the next round of PISTACHIO. ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 59

Provided by: Mar5327

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Probabilistic Approach:

1
(No Transcript)
2
Adaptive Probabilistic Approach
Applications to Rapid and Robust NMR Structure
Determination
Hamid R. Eghbalnia Department of Biochemistry and
Mathematics, University of Wisconsin-Madison
3
People who made it all happen
Marco Tonelli Klaas Hallenga Gabriel Cornilescu
Liya Wang
Fariba Assadi-Porter Claudia Cornilescu Shanteri
Singh Rob Tyler Anna Füzery Nick Reiter CESG
Arash Barhami
John Markley Milo Westler Eldon Ulrich Jurgen
Doreleijers Mark Anderson
4
brazzein - 53 a.a.
ubiquitin - 76 a.a.
flavodoxin - 176 a.a.
HNCO
HN(CO)CA
HNCA
CBCA(CO)NH
HN(CA)CB
98 of spin systems assigned with PINE
95 of spin systems assigned with PINE
96 of spin systems assigned with PINE
HNCACB
14 h
48 h
12 h
Total time to obtain complete backbone
information - assignment, 2o structure, other
corrections.
5

Central ideas in the adaptive probabilistic
approach
Implementation of these ideas as various tools
New tools and extensions to our approach
A preview of almost-published and unpublished

6
For rapid and robust NMR structure determination,
we need

To find a formulation for the problem that is
Consequential
Addresses the current and future challenges posed
Robust
Tractable
Measurable
merit for the solution can be stated.

Rapid and robust NMR structure determinationAu
tomation
7
The (simplified) big picture
Predictions from sequence and available data
chemical shifts, appropriate strategy
Construct design protein production and
labeling screening for suitability
Data deposition and publication
Data collection
Structure refinement and validation
Chemical shift assignments
Structure determination
Secondary structure and other constraint
determinations
8
The big picture (in reality)
Predictions from sequence and available data
chemical shifts, appropriate strategy
Construct design protein production and
labeling screening for suitability
Data deposition and publication
Data collection
Structure refinement and validation
Chemical shift assignments
Structure determination
Secondary structure and other constraint
determinations
9
Decision trees
The basic paradigm for translation of experts
approach to a computer algorithm is to use that
analogy of decision trees.
Decision variables
Decision options
What are the challenges?
10
A useful analogy to 20 questions
Choose a number between 1 and 1000000
gt 500000?
lt 500000?
11
More challenging version of 20 questions!
Responses to the queries are not yes/no
answers, and they are not always the truth!
12
Local to global structures
The challenge put together local data into
globally coherent information
13
What is local to global?
HN(CO)CA
Local information
These examples from NMR structure determination
are representative of a more general phenomena in
biology.
14
Integrating data collection and analysis --
automation

Automating analysis in biology is less like
automating a factory or a sample changer
We do not assemble the same product over and
over. Interesting proteins are unique.
Automating analysis in biology is more like
creating a smart robot to deal with new
situations as they arise
We give the robot the flexibility to interpret
unknown situations and adapt as needed.
For typical fuzzy real-world situations, a
probabilistic approach provides flexibility and a
decision-based approach provides adaptability

15
Integrating data collection and analysis in NMR

The strategy may depend on
Size of the protein (e.g Relaxation)
Folds and fold topology (e.g how much overlap)
hetero/homo-multi/mono(mer) (e.g degeneracy)
Existence of homologs (e.g a priori knowledge)
Required resolution (e.g desired accuracy)
etc
Successful strategies generate more value from
a given quantity of data.

16
The larger impact of generating more value from
data

The idea of generating more value from data is
emerging as a key problem in biological
investigations
Today, analyzing biological systems remains a
challenging, sometimes ad hoc, and human
knowledge-intensive endeavor
Most existing methods fail to scale when
presented with large systems-oriented data sets
Robust, reusable, and computationally feasible
approaches are needed that require little
subjective intervention but offer tools for
scientific interpretation
This is a tough target

17
Adaptive probabilistic approach generating more
value from data

The adaptive probabilistic paradigm offers a
novel and promising approach to obtaining more
value from available data (database and
experimental). It has the potential of becoming
a key approach for addressing important
biological questions.
Data collection and analysis
Protein structure and refinement
Dynamics of molecules
Function, binding and interaction
Fingerprinting and profiling metabolites
RNA structure determination and refinement

18
Adaptive probabilistic approach
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
19
A rigorous Adaptive Probabilistic

Non-deterministic or randomized
No idea what nature has in store.
Try options without any preferences
Probabilistic
Have observed nature and collected data
Use statistics to guide my decisions
Use models on top of statistics
Adaptive
Adjust the cost of decisions to based on the
known

20
Example existing tools for NMR
21
Adaptive probabilistic approach
MLAAKEGAAVSNTPLKK
22
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
Data collection
23
Recording multidimensional experiments is time
consuming
Reduced dimensionality (RD) can be viewed as an
alternative sampling strategy that leads to
collecting less data All RD experiments lose
information. The adaptive and probabilistic
approach taken by HIFI minimizes the information
loss - not simply convergence to n peaks!
t1
t2
t3
128x12816,384 FIDs 136.5 h
High-resolution Iterative Frequency
Identification (HIFI)
simultaneously evolving indirect frequencies are
extracted from 2D RD spectra
multiple tilted planes are used
angle of each tilted plane is chosen adaptively
in real time
24
Reduced dimensionality techniques
RD planes
tilted planes of multidimensional spectra
25
HIFI on CBCA(CO)NH
Combined peaks from HIFI planes are in magenta
Hand picked peaks from 3D spectrum in green
26
Simplified description of the HIFI NMR approach
Eghbalnia et al (2005) JACS 12712528
27
Summary HIFI

HIFI versions of nearly all backbone experiments
are available
HIFI is being developed for sidechain experiments
HIFI NOE data have been collected -- analysis is
proceeding
HIFI is now completely automated for backbone
data collection by six robust backbone
experiments
Automation of additional backbone experiments can
be implemented very easily

28
HIFI applications
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
Restraint generation HIFI RDC
29
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
Automated assignment
30
PISTACHIO (Probabilistic Identification of Spin
Systems and their Assignments including
Coil-Helix Inference as Output)

Steps
Parse peak lists associated with particular
experiments into the set of all possible
tripeptide spin systems specified by the peptide
sequence
Compute probability scores for matching the
chemical shifts of the overlapping tripeptide
spin systems to residues in the sequence (this
makes use of our prior analysis of the BMRB
database of chemical shifts)
Assemble the overlapping tripeptides to match the
sequence and to achieve the maximum probability
for correct assignments (the approach used is
similar to ones used in problems of statistical
physics and combinatorial optimization)

Use existing data to predict an assignment
configuration.
Use prediction to postulate a configuration
distribution
Compare postulated local configuration to
globally minimal solutions
31
http//bija.nmrfam.wisc. edu/PISTACHIO/
PISTACHIO is run by uploading files in XEASY or
NMR-STAR format Data from up to 15 standard
double- and triple-resonance experiments can be
used as input. Other types of data can be
accommodated on request.
Aug 2005 to June 2006
32
PISTACHIO uses a new data format for
probabilistic assignments
An NMR-STAR data format for probabilistically
assigned protein NMR data has been developed in
collaboration with BMRB The PISTACHIO server
outputs data in this format Algorithms under
development carry the probabilistic assignments
forward and refines them as the structure
determination proceeds BMRB accepts data
depositions in this format A graphical interface
for PISTACHIO / LACS / PECAN results is nearing
completion and will be released soon
33
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
Validation of 13C referencing detection of
possible mis-assignments
34
Carbon chemical shifts irrespective of structure
can be represented by three Gaussian distributions
Data for all alanines in RefDB
Data for 13Ca as a function of d13Ca d13Cb
35
Linear Analysis of Chemical Shifts (LACS) plot
Data for all valine residues in RefDB
L. Wang et al. (2005) J. Biomol. NMR, 3213
36
LACS of a single protein can be used to identify
problems with referencing and possible assignment
outliers
This intercept should be at (0,0) for properly
referenced data
L. Wang et al. (2005) J. Biomol. NMR, 3213-22
37
We have used LACS to re-reference the BMRB
database

11 ( 1.0 ppm )
26 ( 0.5 ppm )
46 ( 0.3 ppm )

L. Wang et al. (2005) J. Biomol. NMR, 3213-22
38
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
2o structure determination
39
PECAN (Protein Energetic Conformational Analysis
from NMR chemical shifts) analysis of secondary
structure from assigned chemical shifts and the
protein sequence energy model for a particular
protein (bmr 4083)
Color key helix, strand, non-helix / non-strand
Energy
Residue number
PECANs average accuracy better than 90 across
all structural regions measured on the largest
data set to date.
Eghbalnia et al. (2005) J. Biomol. NMR 3271-81
40
PECAN (Protein Energetic Conformational Analysis
from NMR chemical shifts) analysis of secondary
structure from assigned chemical shifts and the
protein sequence example of output
helix
transition region
strand
Eghbalnia et al. (2005) J. Biomol. NMR 3271-81
41
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
PINE
42
PINE
PINE combines information in order to refine
probabilities that reflect our state of knowledge
Use existing data to predict an assignment
configuration.
Use existing data to predict an secondary
structure configuration.
Use assignments to predict an chemical shift
configuration.
Use prediction to postulate a configuration
distribution
Use prediction to postulate a configuration
distribution
Use prediction to postulate a configuration
distribution
Compare postulated local configuration to
globally minimal solutions
Compare postulated local configuration to
globally minimal solutions
Compare postulated local configuration to
globally minimal solutions
43
Integration of probabilistic tools current
version of PINE combines PISTACHIO, LACS, and
PECAN
Example HIFI data for ubiquitin
In PINE, assignments from PISTACHO are validated
by LACS, 2o structure is assigned by PECAN, and
inconsistent values are flagged and given lower
probability in the next round of PISTACHIO. The
process is repeated until consistency is achieved.
PISTACHIO alone
PINE
44
brazzein - 53 a.a.
ubiquitin - 76 a.a.
flavodoxin - 176 a.a.
HNCO
HN(CO)CA
HNCA
CBCA(CO)NH
HN(CA)CB
98 of spin systems assigned with PINE
95 of spin systems assigned with PINE
96 of spin systems assigned with PINE
HNCACB
14 h
48 h
12 h
Total time to obtain complete backbone
information - assignment, 2o structure, other
corrections.
45
(No Transcript)
46
Implementation of our adaptive, probabilistic
strategy
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
Chemical shift prediction
47
Chemical shift prediction

Chemical shifts are the most easily obtained and
most precisely measured observables in
biomolecular NMR
Chemical shifts are highly sensitive to structure
Chemical shifts are coordinate free
Our prediction of chemical shifts simulates an
adaptive probabilistic walk on the space of
known chemical shifts (BMRB) using a simple
principle
Protein folding is multi-rate stochastic
process that is rate-insensitive in each rate
domain
Results can be used to decide where to look for
unobserved peaks, to derive approximate values
for missing peaks, and to produce restraints for
structural refinement

48
Chemical shift prediction -- TONES
Time Evolution
Manuscript in preparation
49
Summary adaptive probabilistic tools for NMR

HIFI-NMR, a probabilistic approach to data
collection, aims to extract multidimensional NMR
peak positions in an optimally efficient manner
PISTACHIO, turns peak lists associated with a
protein sequence into probabilistic backbone and
side chain assignments
LACS provides the means for checking data sets
for possible referencing problems and
misassignments in advance of a structure
determination
PECAN offers a reliable probabilistic analysis of
protein secondary structure
PINE incorporates PISTACHIO, LACS and PECAN
Work in progress promises further insights into
connections between chemical shifts and structure
These algorithms and associated software are
being made available from the NMRFAM website
(www.nmrfam.wisc.edu)

50
Near-term This year

HIFI-NMR
Disseminate HIFI for backbone experiments
algorithms. Add visualization.
Disseminate HIFI-RDC Incorporate side-chain
experiments into HIFI package.
PINE
Make experimental PINE server available Make
faster server visualizations available (in
collaboration with BMRB)

HIFI-NMR
Faster better resolved
Larger proteins
Other applications
ALMOND
Probabilistic restraint model coupled to chemical
shifts
TONES
Disseminate, with additional application
PINE
More detailed information about secondary
structure

51
Adaptive probabilistic approach
Combine informatics, modeling, and experimental
data to achieve fast and robust analysis of
biological systems
Integrating data collection and analysis
52
Progress toward automated probabilistic structure
determination
BACKBONE
SIDE CHAINS
NOESY
STRUCTURE REFINEMENT
53
Almonds A probabilistic relationship between
chemical shifts and conformation space

We have carefully refined the relationship of
sequence and torsion angles - specifically
triples.
To establish the probabilistic relationship, we
need a more precise understanding of the
relationship between chemical shifts and the
assembly of tripeptides. This is particularly
crucial in the difficult parts of the 2o
structure.
We have made a lot of progress in
deconstructing the relationship between
sequence, chemical shifts, and torsion angles.

54
What do we mean by random coil
The region of (?,?)-space sampled in the absence
of any dominant stabilizing interactions The
experimental random coil state is the
energy-weighted distribution of the ensemble of
such conformational states We can use the LACS
approach to remove bias in the reference state
introduced by stabilizing interactions ---
result unbiased random coil chemical shift
(uRCCS) values
55
Derivation of unbiased random coil chemical shift
(uRCCS) values LACS plot of adjusted RefDB
values for Val
L. Wang, H. R. Eghbalnia et al., manuscript in
press
56
Stepwise refinement of the model
57
Establishing a simple model to incorporate into
ALMONDS
We want to build a simple model where the
parameters are related to the observed effects A
multivariate fitting of database values will not
be useful for our application
Wang et al, J. Biomol. NMR, in press
58
Acknowledgments
NIH Grants 1K22 LM8992 NIH Grants U54 GM074901
P50 GM64598 NIH Grant P41 RR02301

Write a Comment

User Comments (0)

About PowerShow.com

Adaptive Probabilistic Approach: - PowerPoint PPT Presentation

Adaptive Probabilistic Approach:

Total time to obtain complete backbone information ... and inconsistent values are flagged and given lower probability in the next round of PISTACHIO. ... – PowerPoint PPT presentation