Computational Analysis of Protein-DNA Interactions - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Computational Analysis of Protein-DNA Interactions

Description:

Identifying amino acid residues involved in protein-DNA interactions from sequence ... Identification of Helix-Turn-Helix (HTH) DNA-binding motifs. 14. HTH Motifs ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 25

Provided by: cyan4

Category:

more less

Transcript and Presenter's Notes

Title: Computational Analysis of Protein-DNA Interactions

1
Computational Analysis of Protein-DNA Interactions

Changhui (Charles) Yan
Department of Computer Science
Utah State University

2
Problem I

Identifying amino acid residues involved in
protein-DNA interactions from sequence

3
Materials And Methods

56 double-stranded DNA binding proteins
previously used in the study of Jones et al.
(2003)
Encoding

4
Materials And Methods
5
Naïve Bayes Classifier

Leave-one-out cross-validation

Naïve Bayes
6
Naïve Bayes Classifier

Leave-one-out cross-validation

Naïve Bayes
7
Leave-One-Out Cross-Validations
Sequence-based Sequence-based Sequence/structure-based Sequence/structure-based
Identities (ID) ID entropy ID rASA ID rASA entropy
Correlation coefficient 0.25 0.29 0.28 0.30
Accuracy() 77 75 76 77
Specificity() 37 37 36 39
Sensitivity() 43 53 51 52
8
Predictions in The Context of 3-D Structures
Actual
Predicted

Pit-1, PDB 1au7
TP30
FP 16
TN 86
FN14
CC 0.51 (2nd)
Accuracy 79

9
Predictions in The Context of 3-D Structures
Predicted
Actual

?-Cro, PDB 6cro
TP10
FP 5
TN 34
FN10
CC 0.37 (19th)
Accuracy 73

10
Predictions Compared With PROSITE Motifs

Predicted binding sites substantially overlap
with 34 of the 37 DNA-binding PROSITE motifs
In 52 of the 56 proteins, the predictor
identifies at least 20 of the DNA-binding
residues
28 of the 56 proteins contain no PROSITE motifs
that are annotated as DNA-binding

11
Comparison With Previous Study
Method Naïve Bayes classifier Ahmad and Sarai method
Correlation Coefficient 0.26 0.23
Accuracy () 80 66
Specificity() 29 21
Sensitivity() 48 68
Ahmad, S. and Sarai, A. (2005) PSSM-based
prediction of DNA binding sites in proteins. BMC
Bioinformatics, 6, 33.
12
Summary

A simple sequence-based Naive Bayes classifier
predicts interface residues in DNA-binding
proteins with 75 accuracy, 37 specificity, 53
sensitivity and correlation coefficient of 0.29
Predicted binding sites
correctly indicate the locations of actual
binding sites
substantially overlap with known PROSITE motifs

13
Problem II

Identification of Helix-Turn-Helix (HTH)
DNA-binding motifs

14
HTH Motifs

Sequences sharing low similarities can fold into
a similar HTH structure
Identifying HTH motifs from sequence is extremely
challenging

15
Trick 1

Including more information
Amino acid sequence
Secondary structure

16
Hidden Markov Model (HMM)
LQQITHIANQL-GLE----KDVVRVWF
17
Hidden Markov Model (HMM_AA_SS)
LQQITHIANQL-GLE----KDVVRVWF
HHHEEHEEEHMHE----HHEEMMEH
18
Trick 2

There are similarities among the 20 naturally
occurred amino acids
Reduced alphabets

19
Reduced Alphabets

Schemes for reducing amino acid alphabet based on
the BLOSUM50 matrix by Henikoff and Henikoff
(1992) derived by grouping and averaging the
similarity matrix elements as described in the
text. (Murphy et al. 2000)

20
Cross-Families Evaluations
True Positive 1 False Positive 2
HMM_AA 3 0
HMM_AA_SS (20 letters) 3 227 0
HMM_AA_SS (Murphy_15) 3 474 0
HMM_AA_SS (Murphy_10) 3 470 3
HMM_AA_SS (Murphy_8) 3 431 5

True positive HTH motifs that are correctly
identified as such.
False positive Non-HTH motifs that are
identified as HTH motifs.
The alphabet used to encode amino acid sequences.

21
Questions
22
Within-family Three-Fold Cross-Validations
Family (number of HTH motifs in the family) HMM_AA HMM_AA_SS (Murphy_15)
PF00126 (1635) 1594 1622
PF00165 (90) 63 80
PF00196 (30) 26 30
PF04545 (164) 137 164
PF01022 (42) 39 39
PF00046 (189) 176 188
PF03965 (48) 48 48
.
23
Comparisons of HMM_AA_SS with FFAS03 in
Cross-Family Evaluations
Total HTH motifs Recognized by both FFAS03 and HMM_AA_SS Recognized by FFAS03 only Recognized by HMM_AA_SS only
563 135 24 71
24
Putative HTH motifs in Ureaplasma parvum
Protein Location Annotation from Uniprot
spQ9PQE5SCPB_UREPA 176-214 Participates to chromosomal partition during cell division
spQ9PQV6RPOB_UREPA 540-587 DNA-directed RNA polymerase
spQ9PR27SYY_UREPA 340-380 Tyrosyl-tRNA synthetase
spQ9PQC2SYA_UREPA 217-265 Alanyl-tRNA synthetase
spQ9PQ74DPO3A_UREPA 365-400 DNA polymerase III subunit alpha
spQ9PQX7Y166_UREPA 507-553 Hypothetical protein

Write a Comment

User Comments (0)