Title: Bioinformatics Research Overview
1Bioinformatics Research Overview Li
Liao Develop new algorithms and (statistical)
learning methods gt Capable of incorporating
domain knowledge gt Effective, Expressive,
Interpretable
2Motivations
- Understanding correlations between genotype and
phenotype - Predicting genotype ltgt phenotype
- Phenotypes
- Protein function
- Drug/therapy response
- Drug-drug interactions for expression
- Drug mechanism
- Interacting pathways of metabolism
3Projects
- Homology detection, protein family classification
- (funded by a DuPont SE award)
- Support Vector Machines
- Hidden Markov models
- Graph theoretic methods
- Probabilistic modeling for BioSequence (funded by
NIH) - HMMs, and beyond
- Motifs finding
- Secondary structure
- Comparative Genomics
- Identify genome features for diagnostic and
therapeutic purposes - (funded by an Army grant)
- Evolution of metabolic pathways
- Tree and graph comparisons
4Detect remote homologues
- Attributes to be looked at
- Sequence similarity, Aggregate statistics (e.g.,
protein families), Pattern/motif, and more
attributes (presence at phylogenetic tree). - How to incorporate domain specific knowledge into
the model so a classifier can be more accurate? - Results
- Quasi-consensus based comparison of profile HMM
for protein sequences (submitted to
Bioinformatics) - Using extended phylogenetic profiles and support
vector machines for protein family classification
(SNPD 04) - Combining Pairwise Sequence Similarity and
Support Vector Machines for Detecting Remote
Protein Evolutionary and Structural Relationships
(JCB 2003)
5Support Vector Machines
6- Data phylogenetic profiles
- - How to account for correlations among
profile components? - profile extension (Narra Liao, SNPD 04)
Tree-based distance
Hamming distance
0 1 1 1 1
x
3 0.1
1 1 1 1 1
y
3 0.5
1 1 1 1 0
z
7Quasi consensus based comparison of HMMs
- From MSA to profile HMMs using
- existing packages (SAM-T99 or HMMER)
- Generation of quasi consensus
- sequence from the model
- Alignment of consensus sequence of a
- model with the other model
8(No Transcript)
9Sequence Models (HMMs and beyond)
- Motivations What is responsible for the
function? - Patterns/motifs
- Secondary structure
- To capture long range correlations of bio
sequences - Transporter proteins
- RNA secondary structure
- Methods generative versus discriminative
- Linear dependent processes
- Stochastic grammars
- Model equivalence
10TMMOD An improved hidden Markov model for
predicting transmembrane topology (to appear in
IEEE ICTAI04)
11Mod. Reg. Data set Correct topology Correct location Sens- itivity Speci- ficity
TMMOD 1 (a) (b) (c) S-83 65 (78.3) 51 (61.4) 64 (77.1) 67 (80.7) 52 (62.7) 65 (78.3) 97.4 71.3 97.1 97.4 71.3 97.1
TMMOD 2 (a) (b) (c) S-83 61 (73.5) 54 (65.1) 54 (65.1) 65 (78.3) 61 (73.5) 66 (79.5) 99.4 93.8 99.7 97.4 71.3 97.1
TMMOD 3 (a) (b) (c) S-83 70 (84.3) 64 (77.1) 74 (89.2) 71 (85.5) 65 (78.3) 74 (89.2) 98.2 95.3 99.1 97.4 71.3 97.1
TMHMM S-83 64 (77.1) 69 (83.1) 96.2 96.2
PHDtm S-83 (85.5) (88.0) 98.8 95.2
TMMOD 1 (a) (b) (c) S-160 117 (73.1) 92 (57.5) 117 (73.1) 128 (80.0) 103 (64.4) 126 (78.8) 97.4 77.4 96.1 97.0 80.8 96.7
TMMOD 2 (a) (b) (c) S-160 120 (75.0) 97 (60.6) 118 (73.8) 132 (82.5) 121 (75.6) 135 (84.4) 98.4 97.7 98.4 97.2 95.6 97.2
TMMOD 3 (a) (b) (c) S-160 120 (75.0) 110 (68.8) 135 (84.4) 133 (83.1) 124 (77.5) 143 (89.4) 97.8 94.5 98.3 97.6 98.1 98.1
TMHMM S-160 123 (76.9) 134 (83.8) 97.1 97.7
12Genomics study of enterobacterial BT agents
(funded by the US Army via Center for Biological
Defense, USF )
- Goals
- Identification of genes and sequence tags as
targets for novel diagnosis and therapy - BT agents Yersinia pestis, Salmonella,
Escherichia coli O157H7) - Methods
- Various bioinformatics tools and databases
13Comparative Genomics
- Motivation
- Evolution of metabolic pathways
- Gene functions
- De novo (alternative pathways)
- Genetic engineering
- Drug discovery
- Methods
- Put data into a context knowledge/data
representation - Trees, graphs, etc.
- Learning models/methods
14Profiling pairs of attribute-value
15- What we found
- Informative way to compare genomes
- Majority pathways (or rather their enzyme
components) evolve in congruence with species
16What we do next
- Database and search engine
- Off-line self-consistent iteration
- Pathways in a network
- Graph comparisons
- Identify key components of networks
- Small world topology
- Cross-level interactions with regulatory networks