Protein Classification Using Averaged Perceptron SVM - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Protein Classification Using Averaged Perceptron SVM

Description:

CS6772 Project Presentation 12/03/2003 Protein Classification Using Averaged Perceptron SVM Eugene Ie Protein Sequence Classification Protein = ( )* | | = 20 amino ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 12
Provided by: Euge80
Category:

less

Transcript and Presenter's Notes

Title: Protein Classification Using Averaged Perceptron SVM


1
Protein Classification Using Averaged Perceptron
SVM
CS6772 Project Presentation 12/03/2003
  • Eugene Ie

2
Protein Sequence Classification
  • Protein (?) ? 20 amino acids
  • Easy to sequence proteins, difficult to obtain
    structure

3D Structure
Sequence
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKL
LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
?
Class Globin family Globin-like superfamily
Function Oxygen transport
3
Sequence Alignment vs. Classification
  • Sequence similarity through alignment

distant homology
SGFIEEDELKLFL SGFIEEEELKFVL
close homology
  • Sequence classification for remote homology

Classifier
4
Structural Hierarchy of Proteins
SCOP
Fold
Superfamily
Negative Test Set
Negative Training Set
Family
Positive Test Set
Positive Training Set
  • Remote homologs
  • Structure and function conserved
  • Sequence similarity - low

5
Remote Homology Detection
  • Discriminative supervised learning approach to
    protein classification

Approach Support Vector Machines with String
Kernels
C. Leslie, E. Eskin, J. Weston, and W. Noble,
Mismatch String Kernels for SVM Protein
Classification. C. Leslie and R. Kuang, Fast
Kernels for Inexact String Matching.
6
QP SVM Training
Sequence Training Data
gtVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS
HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFK
LLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR gtTYFP
HFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVD
PVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR To
tal n sequences n labels
Learned Weights and Bias
QP Solver (slow)
From KKT
7
Averaged Perceptron SVM Training
Training Algorithm
Y. Freund and R. Schapire, Large Margin
Classification Using the Perceptron Algorithm.
8
Averaged Perceptron SVM Training
Iterate t Epochs
Sequence Training Data
Run Perceptron Algorithm
gtVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS
HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFK
LLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR gtTYFP
HFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVD
PVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR To
tal n sequences n labels
Generalized Bound for k
Final Weight Vector, Voting Weights
s no. of dimensions in feature space k no. of
mistakes made during perceptron run
SCOP experiments show For average n
1000 Average k 50-60
9
Averaged Perceptron SVM Classification
Testing Algorithm
Note Only k kernel products with unknown
sequence x need to be computed. Recurrence
relation
M is the set of mistake indices
10
Implementation Details
  • Built on top of protclass (Protein
    Classification) platform
  • Java Platform
  • Classification Task
  • Classification Task
  • Hash table scan instead of Mismatch Trie
  • Generate mismatch mappings once using shifts
  • Dynamic kernel matrix storage
  • Still needs debugging
  • Speed/Space Performance
  • 80 reduction in space requirement
  • 50 reduction in training time
  • 50 reduction in testing time
  • Mainly from simple online algorithm

11
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com