Title: Artificial Intelligence Research Laboratory
1MSCBB 2007
Artificial Intelligence Research
Laboratory Bioinformatics and Computational
Biology Program Computational Intelligence,
Learning, and Discovery Program Department of
Computer Science
Prediction of RNA-Protein interfaces Using
Structural Features
Fadi Towfic, David C. Gemperline, Cornelia
Caragea, Feihong Wu, Drena Dobbs, and Vasant
Honavar
Abstract RNA-protein interactions play a critical
role in gene expression From splicing to
translation, proteins must be able to recognize
and interact with specific sites of RNA in order
to perform their respective functions. In this
paper, 147 different chains from RNA-binding
proteins in the Protein Databank were
characterized according to multiple structural
features and the type of RNA bound to each
protein chain. Furthermore, Naive Bayes
classifiers were constructed to predict
protein-RNA interfaces on the surface residues of
the proteins. The three structural features used
in this study were surface roughness, solid angle
and CX value.
A possible reason for the aforementioned
discrepancy is that the preliminary clustering
using ANOVA may have not been sophisticated
enough to identify subclusters that lie within
each group. The poor clustering may have
contributed to the poor classification
performance by Naïve Bayes. However, it is
appropriate to note that each of the structural
features had at least one cluster where
classification performance was increased compared
to the No clustering baseline. This result
demonstrates the potential of using more
sophisticated clustering as well as
classification algorithms to improve the
performance of RNA-protein interface prediction
algorithms.
Dataset and Classification The protein chains in
the RB147 dataset available from the RNAbindr
website (http//bindr.gdcb.iastate.edu/) were
classified according to the type of RNA bound by
each chain. Each type of RNA was then clustered
using ANOVA as described by Towfic et al. (Towfic
et al., 2007) as shown in Table 1. A Naïve-Bayes
classification algorithm with 10-fold
cross-validation with a window size of 12 (Witten
and Frank, 2005) was then used to classify each
of the groups shown in table 1.
Table 1 Clustering of each RNA-binding type
based on ANOVA analysis of the propensities for
each chain.
Results As shown in table 2, the clustering of
the RNA types seems to improve the prediction
accuracy, correlation, sensitivity and
specificity in some cases (alpha carbon group2,
roughness value group 1, solid angle value group
1) while contributing to poor performance in
others (alpha carbon group 3, roughness value
group 2, solid angle value group 2) compared to
the classifiers that do not use clustering.
Table 2 Comparison of the performance of the
Naïve Bayes classifier with and without
clustering.
References F. Towfic, D. C. Gemperline, C.
Caragea, F. Wu, D. Dobbs, and V. Honavar .
Structural Characterization of RNA-Binding Sites
of Proteins Preliminary Results. Computational
Structural Bioinformatics Workshop proceedings,
2007. In Press. I. H. Witten and E. Frank . Data
Mining Practical Machine Learning Tools and
Techniques, 2nd Edition, Morgan Kaufmann, 2005
Acknowledgements This research was supported in
part by a grant from the National Institutes of
Health (GM066387) to Vasant Honavar and Drena
Dobbs, an Integrative Graduate Education and
Research Training (IGERT) fellowship to Fadi
Towfic, funded by the National Science Foundation
grant (DGE 0504304) to Iowa State University, and
a Bioengineering and Bioinformatics Summer
Institute (BBSI) fellowship to David Gemperline,
funded by a National Science Foundation award
(EEC 0608769) to Iowa State University. This work
has benefited from discussions with Dr. Robert
Jernigan of Iowa State University.