talk in taiwan, 27/8/2001 - PowerPoint PPT Presentation

About This Presentation
Title:

talk in taiwan, 27/8/2001

Description:

Title: talk in taiwan, 27/8/2001 Author: Limsoon Wong Last modified by: KRDL Created Date: 5/9/1998 10:45:39 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 20
Provided by: Limsoo6
Category:

less

Transcript and Presenter's Notes

Title: talk in taiwan, 27/8/2001


1
From Informatics to Bioinformatics
Limsoon Wong Kent Ridge Digital Labs Singapore
2
What is Bioinformatics?
3
What are the Themes of Bioinformatics?
Bioinformatics Data Mgmt Knowledge
Discovery Data Mgmt Integration
Transformation Cleansing Knowledge Discovery
Statistics Algorithms Databases
4
What are the Benefits of Bioinformatics?
  • To the patient
  • Better drug, better treatment
  • To the pharma
  • Save time, save cost, make more
  • To the scientist
  • Better science

5
Data Integration
  • A DOE impossible query
  • For each gene on a given cytogenetic band, find
    its non-human homologs.

6
Data Integration Results
  • sybase-add (nameGDB", ...)
  • create view L from locus_cyto_location using
    GDB
  • create view E from object_genbank_eref using GDB
  • select
  • accn g.genbank_ref, nonhuman-homologs
    H
  • from
  • L as c, E as g,
  • (select u
  • from g.genbank_ref.na-get-homolog-summary
    as u
  • where not(u.title string-islike "Human")
    andalso
  • not(u.title string-islike
    "H.sapien")) as H
  • where
  • c.chrom_num "22 andalso
  • g.object_id c.locus_id andalso
  • not (H )
  • Using Kleisli
  • Clear
  • Succint
  • Efficient
  • Handles
  • heterogeneity
  • complexity

7
Data Warehousing
(uid 6138971, title "Homo sapiens
adrenergic ...", accession "NM_001619",
organism "Homo sapiens", taxon 9606,
lineage "Eukaryota", "Metazoa", , seq
"CTCGGCCTCGGGCGCGGC...", feature
(name "source", continuous true,
position (accn "NM_001619",
start 0, end 3602,
negative false), anno
(anno_name "organism", descr "Homo
sapiens"), ), )
  • Motivation
  • efficiency
  • availabilty
  • denial of service
  • data cleansing
  • Requirements
  • efficient to query
  • easy to update.
  • model data naturally

8
Data Warehousing Results
  • Relational DBMS is insufficient because it forces
    us to fragment data into 3NF.
  • Kleisli turns flat relational DBMS into nested
    relational DBMS. It can use flat relational DBMS
    such as Sybase, Oracle, MySQL, etc. to be its
    updatable complex object store. It can even use
    all of these systems simultaneously!

! Log in oracle-cplobj-add (name "db", ...) !
Define table create table GP (uid "NUMBER",
detail "LONG") using db ! Populate table with
GenPept reports select uid x.uid, detail x
into GP from aa-get-seqfeat-general "PTP as
x using db ! Map GP to that table create view
GP from GP using db ! Run a queryto get title
of 131470 select x.detail.title from GP as
x where x.uid 131470
9
Epitope Prediction
TRAP-559AA MNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYS
E EVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLN LNDNAIH
LYVNVFSNNAKEIIRLHSDASKNKEKALIIIRS LLSTNLPYGRTNLTDA
LLQVRKHLNDRINRENANQLVVIL TDGIPDSIQDSLKESRKLSDRGVKI
AVFGIGQGINVAFNR FLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAV
CVEVEK TASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQ CE
EERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENI IDNNPQEPSPN
PEEGKDENPNGFDLDENPENPPNPDIPEQ KPNIPEDSEKEVPSDVPKNP
EDDREENFDIPKKPENKHDN QNNLPNDKSDRNIPYSPLPPKVLDNERKQ
SDPQSQDNNGN RHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREE
HE KPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVP GAATPY
AGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN
10
Epitope Prediction Results
  • Prediction by our ANN model for HLA-A11
  • 29 predictions
  • 22 epitopes
  • 76 specificity
  • Prediction by BIMAS matrix for HLA-A1101

Number of experimental
binders 19 (52.8) 5 (13.9)
12 (33.3)
Rank by BIMAS
11
Gene Expression Analysis
  • Clustering gene expression profiles
  • Classifying gene expression profiles
  • find stable differentially expressed genes

12
Gene Expression Analysis Results
  • The Discovery System
  • Correlation test
  • Voter selection
  • Class prediction

13
Protein Interaction Extraction
What are the protein-protein interaction
pathways from the latest reported discoveries?
14
Protein Interaction Extraction Results
  • Rule-based system for processing free texts in
    scientific abstracts
  • Specialized in
  • extracting protein names
  • extracting protein-protein interactions

Jak1
15
Transcription Start Prediction
16
Transcription Start Prediction Results
17
Medical Record Analysis
  • Looking for patterns that are
  • valid
  • novel
  • useful
  • understandable

18
Medical Record Analysis Results
  • DeEPs, a novel emerging pattern method
  • Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI
    benchmarks
  • Works for gene expressions

19
Behind the Scene
  • Research
  • Vladimir Bajic
  • Vladimir Brusic
  • Jinyan Li
  • See-Kiong Ng
  • Limsoon Wong
  • Louxin Zhang
  • Business
  • Peter Saunders
  • Industry Assignees
  • Hao Han (gX)
  • Rahul Despande (MC)
  • Engineering
  • Allen Chong
  • Judice Koh
  • SPT Krishnan
  • Seng Hong Seah
  • Guanglan Zhang
  • Zhuo Zhang
  • Students
  • Huiqing Liu
  • Song Zhu
  • Kun Yu
Write a Comment
User Comments (0)
About PowerShow.com