Title: talk in bioitworld2002
1From Informatics to Bioinformatics
Limsoon Wong Laboratories for Information
Technology Singapore
2What is Bioinformatics?
3Themes of Bioinformatics
Bioinformatics Data Mgmt Knowledge
Discovery Data Mgmt Integration
Transformation Cleansing Knowledge Discovery
Statistics Algorithms Databases
4Benefits of Bioinformatics
To the patient Better drug, better treatment To
the pharma Save time, save cost, make more To
the scientist Better science
5From Informatics to Bioinformatics
MHC-Peptide Binding (PREDICT)
Protein Interactions Extraction (PIES)
8 years of bioinformatics RD in Singapore
Gene Expression Medical Record Datamining (PCL)
Cleansing Warehousing (FIMM)
Gene Feature Recognition (Dragon)
Integration Technology (Kleisli)
Venom Informatics
1994
1998
1996
2002
2000
ISS
LIT
KRDL
6Data Integration
A DOE impossible query For each gene on a
given cytogenetic band, find its non-human
homologs.
7Data Integration Results
sybase-add (nameGDB", ...) create view L
from locus_cyto_location using GDB create view E
from object_genbank_eref using GDB select
accn g.genbank_ref, nonhuman-homologs
H from L as c, E as g, select u
from g.genbank_ref.na-get-homolog-summary as u
where not(u.title string-islike "Human")
andalso not(u.title
string-islike "H.sapien") as H where
c.chrom_num "22 andalso g.object_id
c.locus_id andalso not (H )
- Using Kleisli
- Clear
- Succinct
- Efficient
- Handles
- heterogeneity
- complexity
8Data Warehousing
(uid 6138971, title "Homo sapiens
adrenergic ...", accession "NM_001619",
organism "Homo sapiens", taxon 9606,
lineage "Eukaryota", "Metazoa", , seq
"CTCGGCCTCGGGCGCGGC...", feature
(name "source", continuous true,
position (accn "NM_001619",
start 0, end 3602,
negative false), anno
(anno_name "organism", descr "Homo
sapiens"), ), )
- Motivation
- efficiency
- availabilty
- denial of service
- data cleansing
- Requirements
- efficient to query
- easy to update.
- model data naturally
9Data Warehousing Results
! Log in oracle-cplobj-add (name "db", ...) !
Define table create table GP (uid "NUMBER",
detail "LONG") using db ! Populate table with
GenPept reports select uid x.uid, detail x
into GP from aa-get-seqfeat-general "PTP as
x using db ! Map GP to that table create view
GP from GP using db ! Run a queryto get title
of 131470 select x.detail.title from GP as
x where x.uid 131470
Relational DBMS is insufficient because it forces
us to fragment data into 3NF. Kleisli turns
flat relational DBMS into nested relational DBMS.
It can use flat relational DBMS such as Sybase,
Oracle, MySQL, etc. to be its update-able complex
object store.
10Epitope Prediction
TRAP-559AA MNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYS
E EVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLN LNDNAIH
LYVNVFSNNAKEIIRLHSDASKNKEKALIIIRS LLSTNLPYGRTNLTDA
LLQVRKHLNDRINRENANQLVVIL TDGIPDSIQDSLKESRKLSDRGVKI
AVFGIGQGINVAFNR FLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAV
CVEVEK TASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQ CE
EERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENI IDNNPQEPSPN
PEEGKDENPNGFDLDENPENPPNPDIPEQ KPNIPEDSEKEVPSDVPKNP
EDDREENFDIPKKPENKHDN QNNLPNDKSDRNIPYSPLPPKVLDNERKQ
SDPQSQDNNGN RHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREE
HE KPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVP GAATPY
AGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN
11Epitope Prediction Results
- Prediction by our ANN model for HLA-A11
- 29 predictions
- 22 epitopes
- 76 specificity
- Prediction by BIMAS matrix for HLA-A1101
Number of experimental
binders 19 (52.8) 5 (13.9)
12 (33.3)
Rank by BIMAS
12Transcription Start Prediction
13Transcription Start Prediction Results
14Medical Record Analysis
- Looking for patterns that are
- valid
- novel
- useful
- understandable
15Gene Expression Analysis
- Classifying gene expression profiles
- find stable differentially expressed genes
- find significant gene groups
- derive coordinated gene expression
16Medical Record Gene Expression Analysis Results
- PCL, a novel emerging pattern method
- Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI
benchmarks - Works well for gene expressions
Cancer Cell, March 2002, 1(2)
17Protein Interaction Extraction
What are the protein-protein interaction
pathways from the latest reported discoveries?
18Protein Interaction Extraction Results
- Rule-based system for processing free texts in
scientific abstracts - Specialized in
- extracting protein names
- extracting protein-protein interactions
Jak1
19Behind the Scene
- Allen Chong
- Judice Koh
- SPT Krishnan
- Huiqing Liu
- Seng Hong Seah
- Soon Heng Tan
- Guanglan Zhang
- Zhuo Zhang
- Vladimir Bajic
- Vladimir Brusic
- Jinyan Li
- See-Kiong Ng
- Limsoon Wong
- Louxin Zhang
and many more students, folks from
geneticXchange, MolecularConnections, and other
collaborators.