Ubiquitination Sites Prediction - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Ubiquitination Sites Prediction

Description:

Influence of evolutionary consideration. Ubiquitin ... frequencies, Entropy, Net charge, Total charge, Aromatics, Charge-hydrophobicity ... – PowerPoint PPT presentation

Number of Views:1577

Avg rating:5.0/5.0

Slides: 24

Provided by: bioInform1

Category:

Tags: hydrophobicity | prediction | sites | ubiquitination

Transcript and Presenter's Notes

Title: Ubiquitination Sites Prediction

1
Ubiquitination Sites Prediction

Dah Mee Ko
Advisor Dr.Predrag Radivojac
School of Informatics
Indiana University
May 22, 2009

2
Outline

Ubiquitination
Machine Learning
Decision Tree
Support Vector Machines
Prediction of ubiquitination sites
Influence of sequence
Influence of structure
Influence of evolutionary consideration

3
Ubiquitin

A small protein that occurs in all eukaryotic
cells.
Highly conserved among eukaryotic species.
Consists of 76 amino acids and has a molecular
mass of 8.5 kDa.
Key features
its C-terminal tail and Lys residues
Human ubiquitin sequence
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQL
EDGRTLSDYNIQKESTLHLVLRLRGG

4
Ubiquitination

Post-translational modification of a protein
Covalent attachment of one or more ubiquitin
monomers to Lys residues
Reversible
Target proteins for degradation by the proteasome

5
Functions of Ubiquitination

Monoubiquitination
Histone regulation
DNA repair
Endocytosis
Budding of retroviruses from the plasma membrane
Polyubiquitination
Protein kinase activation

6
Machine Learning

Machine learning is programming computers to
optimize a performance criterion using data and
past experience.
Learn general models from a data set of
particular examples.
Build a model that is a good and useful
approximation to the data.

7
Machine Learning

Supervised learning
Learn input/output patterns from given, correct
output.
Split data into training and test set.
Train element on training data.
Evaluate performance on test data.
Unsupervised learning
Learn input/output patterns without known output.

8
Machine Learning Decision Tree

One of classification algorithms
Each internal node tests the value of a feature
and branches according to the results of the
test.
Each leaf node assigns a classification.

9
Machine Learning Random Forest

A machine learning ensemble classifier
Consists of many decision trees
Each tree is constructed using a bootstrap sample
of training data.
After a large number of trees are generated, each
tree casts a unit vote for the most popular class.

10
Machine Learning Support Vector Machines

Viewing input data as two sets of vectors in an
n-dimensional space, an support vector machine
will construct a separating hyperplane in that
space.
The hyperplane maximizes the margin between the
two data sets.

11
Machine Learning Support Vector Machines

H3 does not separate the classes.
H1 separates with a small margin.
H2 separates with the maximum margin.

If a data set is not linearly separable, map into
a higher-dimensional space using kernel approach.

12
Data Sets for Prediction

334 protein sequences from yeast
Positive and negative sites with 25 amino acid
residues centered at lysine
Remove all positive and negative sites that have
more than 40 identity inside the data sets.

13
Features Sequence Information

Relative amino acid frequencies, Entropy, Net
charge, Total charge, Aromatics,
Charge-hydrophobicity ratio, Protein disorder
probability, Vihinen's flexibility, Hydrophobic
moments, B-factors
? 64 X 4 256 features
Relative amino acid frequencies
Window size 11
A 1/11 G 0/11 M 0/11 S 0/11
C 0/11 H 0/11 N 1/11 T 1/11
D 2/11 I 0/11 P 3/11 V 1/11
E 0/11 K 1/11 Q 0/11 W 0/11
F 0/11 L 0/11 R 0/11 Y 1/11

14
Features Evolutionary Information

Position Specific Scoring Matrix
? 21 X 4 84 features
Window size 11
256(Seq) 84(Evol) 340 features.

15
Features Structure Information

BLAST each sequence against PDB database.
Select alignments with greater than 30 identity.
For each mapped site, five shells with 1.5, 3,
4.5, 6, 7,5Å radial boundaries are constructed
from the residues alpha-carbon atom using X, Y,
Z coordinates from PDB.
Amino acid at the center site ? 20 features
e.g. K ? A C D E F G H I K L M N P Q R S T V W Y
0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0
Each shell contains 24 features.
4 for counts of C, N, O, S and 20 for counts
of amino acids
20 24 x 5 140 features
60 sites among 245 positive sites ? 24
3239 sites among 12906 negative sites ? 25
1X140 zero vector for the other sites
256(Seq) 84(Evol) 140(Str) 480 features

16
Prediction Results Random Forest
17
Prediction Results Random Forest
18
Prediction Results SVM
19
Prediction Results SVM
20
Feature Selection

Rank features using correlation coefficients (r).

21
Conclusions

Ubiquitination sites are predictable.
The accuracy is modest.
Long range interactions
Flexibility of structure
Noise in positive sites
Small data set
The sequence features are the most important.

22
Acknowledgements

Prof. Predrag Radivojac
Wyatt Clark
Arunima Ram
Nils Schimmelmann
Prof. Sun Kim
Linda Hostetter
School of Informatics

23
Thank you!

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Identification of Protein Domains PowerPoint PPT Presentation

Identification of Protein Domains - PUG the homologous regions. PUG domains found in proteins with domains ... HMM from the PUG marginal similarity to IRE1p-like Kinases which are known to ... | PowerPoint PPT presentation | free to view

Introduction to Bioinformatics PowerPoint PPT Presentation

Introduction to Bioinformatics - Profile Methods. Structure Prediction. Sequence alignments can be used to infer homology ... Bos taurus (Bovine) Homo sapiens (Human) ORGANISM. Orphan G ... | PowerPoint PPT presentation | free to view

Knowledgebased protocols for protein structure prediction: from protein threading to solvent accessi PowerPoint PPT Presentation

Knowledgebased protocols for protein structure prediction: from protein threading to solvent accessi - Knowledgebased protocols for protein structure prediction: from protein threading to solvent accessi | PowerPoint PPT presentation | free to view

Mouse skin 4 days after inoculation PowerPoint PPT Presentation

Mouse skin 4 days after inoculation - Terminator. Sykes & Johnston (1999) Nat. Biotech. 17 p355-359. Assembly of LEEs by Overlap PCR ... Wilson. Eileen Ward. Steve Liu. Southwestern PGA- project 4 ... | PowerPoint PPT presentation | free to view

Proteins,%20interactions,%20complexes:%20A%20computational%20approach PowerPoint PPT Presentation

Proteins,%20interactions,%20complexes:%20A%20computational%20approach - Hart. Pu. HAC PE. HAC all. HACO. HACO recovers more reference complexes. 5-fold cross-validation ... Hart. Pu. HACO. Reference. Random. Compare proteins within ... | PowerPoint PPT presentation | free to view

Current Topics in Computer Science: Computational Genomics PowerPoint PPT Presentation

Current Topics in Computer Science: Computational Genomics - An Introduction to Bioinformatics Algorithms. Review of ... What is Bioinformatics? Bioinformatics is generally defined as the analysis, prediction, and modeling of ... | PowerPoint PPT presentation | free to view

In silico studies to predict protein protein contacts PowerPoint PPT Presentation

In silico studies to predict protein protein contacts - Growth of genome data has stimulated a lot of research in area (1) ... residues are ordered according to hydrophobicity, with isoleucine as the most ... | PowerPoint PPT presentation | free to view

Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding PowerPoint PPT Presentation

Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding - Posterior probabilities calculation (CRF) Expected utility maximization (posterior decoding) ... Posterior probabilities calculation (CRF) Motivation for using ... | PowerPoint PPT presentation | free to view

Chromatin Modifications PowerPoint PPT Presentation

Chromatin Modifications - Chromatin Modifications | PowerPoint PPT presentation | free to view

TP53 Genetic Variations In Human Cancer PowerPoint PPT Presentation

TP53 Genetic Variations In Human Cancer - The Li-Fraumeni syndrome (LFS, OMIM#151623) is a rare autosomal disorder ... signal-regulated kinase; GSK3b, glycogen synthase kinase 3 b; HIPK2, ... | PowerPoint PPT presentation | free to view

Molecular and Epigenetic Mechanisms Leading to Joint Destruction in Rheumatoid Arthritis PowerPoint PPT Presentation

Molecular and Epigenetic Mechanisms Leading to Joint Destruction in Rheumatoid Arthritis - Molecular and Epigenetic Mechanisms Leading to Joint Destruction in Rheumatoid Arthritis | PowerPoint PPT presentation | free to view

Class I factors: There Are Only Two PowerPoint PPT Presentation

Class I factors: There Are Only Two - Isolated in 1985 as one of two fractions needed for transcription in vitro from ... elements that bind proteins to insulate genes from effects of more distant ... | PowerPoint PPT presentation | free to view

Knowledgebased protocols for protein structure prediction: from protein threading to solvent accessi PowerPoint PPT Presentation

Knowledgebased protocols for protein structure prediction: from protein threading to solvent accessi - from protein threading to solvent accessibility prediction ... Protein structure prediction by sequence-to-structure matching (threading ... Lathrop, Protein ... | PowerPoint PPT presentation | free to view

Introduction to Biological Mathematics: Meeting 1 PowerPoint PPT Presentation

Introduction to Biological Mathematics: Meeting 1 - Introduction to Biological Mathematics: Meeting 1 | PowerPoint PPT presentation | free to view

Gene therapy.ppt PowerPoint PPT Presentation

Gene therapy.ppt - The medical application of gene therapy in dermatology.ppt | PowerPoint PPT presentation | free to view

New mechanism in cancer Studying Epigenetics in Cancers PowerPoint PPT Presentation

New mechanism in cancer Studying Epigenetics in Cancers - ... New mechanism in cancer Studying Epigenetics in Cancers Looking at exposures relationships Epigenetic changes as biomarkers Getting to Cancer Oncogenes Normal ... | PowerPoint PPT presentation | free to view

Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular Biology PowerPoint PPT Presentation

Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular Biology - Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular Biology & Center for Computational Biology / Bioinformatics | PowerPoint PPT presentation | free to view

Chromatin Modification PowerPoint PPT Presentation

Chromatin Modification - Chromatin Modification Reading Seminar in Computational Biology Naomi Habib 5.1.2006 Chromatin Structure Chromatin Roles Compactization Packing ~2 meters of DNA in to ... | PowerPoint PPT presentation | free to view

HIV-1 and Ebola virus encode small peptide motifs that recruit Tsg101 to sites of particle assembly to facilitate egress PowerPoint PPT Presentation

HIV-1 and Ebola virus encode small peptide motifs that recruit Tsg101 to sites of particle assembly to facilitate egress - HIV-1 and Ebola virus encode small peptide motifs that recruit Tsg101 to sites of particle assembly to facilitate egress JUAN MARTIN-SERRANO, TRINITY ZANG & PAUL D ... | PowerPoint PPT presentation | free to view

Protein Intrinsic Disorder, Cell Signaling and Alternative Splicing PowerPoint PPT Presentation

Protein Intrinsic Disorder, Cell Signaling and Alternative Splicing - Protein Intrinsic Disorder, Cell Signaling and Alternative Splicing Outline of Talk Examples of intrinsically disordered proteins Prediction of natural disordered ... | PowerPoint PPT presentation | free to view

Dan Graur PowerPoint PPT Presentation

Dan Graur - Rates of Nucleotide Substitution Dan Graur * * * * * * * * * * * * * * * * * * * In a comparison of 74 non-essential genes with 64 essential ones, the rate of ... | PowerPoint PPT presentation | free to view

An Integrated Approach to Protein-Protein Docking PowerPoint PPT Presentation

An Integrated Approach to Protein-Protein Docking - An Integrated Approach to Protein-Protein Docking Zhiping Weng Department of Biomedical Engineering Bioinformatics Program Boston University What is Protein Docking? | PowerPoint PPT presentation | free to view

Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding PowerPoint PPT Presentation

Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding - Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding Ariel Schwartz, Anna Divoli, and Marti Hearst University of California ... | PowerPoint PPT presentation | free to view

Proteins,%20interactions,%20complexes:%20A%20computational%20approach PowerPoint PPT Presentation

Proteins,%20interactions,%20complexes:%20A%20computational%20approach - Proteins, interactions, complexes: A computational approach Haidong Wang Department of Computer Science Stanford University | PowerPoint PPT presentation | free to view

Group B: Ana Rita Domingues, Nicholas Gauthier, Ilka Hoof, PowerPoint PPT Presentation

Group B: Ana Rita Domingues, Nicholas Gauthier, Ilka Hoof, - Title: Slide 1 Author: Nicholas Gauthier Last modified by: Nicholas Gauthier Created Date: 4/27/2006 1:06:56 PM Document presentation format: Custom | PowerPoint PPT presentation | free to view

LSM3241: Bioinformatics and Biocomputing Lecture 9: Biological Pathway Simulation Prof. Chen Yu Zong Tel: 6874-6877 Email: yzchen@cz3.nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS PowerPoint PPT Presentation

LSM3241: Bioinformatics and Biocomputing Lecture 9: Biological Pathway Simulation Prof. Chen Yu Zong Tel: 6874-6877 Email: yzchen@cz3.nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS - LSM3241: Bioinformatics and Biocomputing Lecture 9: Biological Pathway Simulation Prof. Chen Yu Zong Tel: 6874-6877 Email: yzchen@cz3.nus.edu.sg | PowerPoint PPT presentation | free to view

Proteases and Signaling PowerPoint PPT Presentation

Proteases and Signaling - Proteases and Signaling Sherwin Wilk, ... Drosophila Rhomboid-1 promotes cleavage of at least ... resulting in activation of Rho, Rho-kinase (ROK), and serum response ... | PowerPoint PPT presentation | free to view