Title: Semantic Web Technologies for Analysis of Transcriptome
1- Semantic Web Technologies for Analysis of
Transcriptome - Rose Dieng-Kuntz1, Khaled Khelif1, Olivier
Corby1 Pascal Barbry2 - 1INRIA - Sophia Antipolis ACACIA project,
http//www.inria.fr/acaciahttp//www.inria.fr/ac
acia/corese - 2IPMC, Sophia Antipolishttp//www.ipmc.fr
2Outline
- Context Memory of Biochip Experiments
- The MEAT Project
- Semi-automatic generation of semantic
annotations - Conclusions Requirements for Semantic Web
3Context Biochip experiments
- DNA microarrays (gene chips, biochips) enable
to simultaneously measure the expression level
and transcription rate of various genes in an
organism.
- Applications in biology, medicine, pharmacology
- Gene discovery
- Disease diagnosis or prognosis
- Drug discovery Pharmacogenomics
- Toxicological research Toxicogenomics
4Towards Biochip Experiment Memory
Experiment sheets
Biologist
Domain Ontologies
Experiment DB
Documents
- Need of Knowledge Management for a community of
biologists ? Biochip Experiment memory
- Need of support to validation interpretation
of results of biochip experiments
5The MEAT Project
MEDIANTE
MEAT-AnnotSearch
MEAT-Miner
MEAT-Onto
UMLS, Gene Onto
6Phases before experiment
Biologist checks validates probes available on
the biochip selects a subset
7Phases after experiment
Storage of the experiment description and of its
results in MEDIANTE, according to Array Express
format
8MEAT-AnnotSearch
ARRAY-EXPRESS - Experiment description - Result
description
9MEATAnnot Technical Choices
- NLP tools term extractor relation extractor
- Extraction of terms corresponding to UMLS
Ontology concepts, from texts
- Extraction of relations between them, from texts
- Automatic generation of a semantic annotation and
representation in RDF
10Relationship extraction
Test corpus
Syntex
- Syntex (Bourigault D. 2000) Corpus syntactic
analyser
- Used to reveal verb syntagms usually used in
the biochip domain
11Relationship extraction
- Choosing potential relationship revealed by
Syntex
- Writing relationship extraction grammar using
JAPE
Tag.lemme "play" SpaceToken (Token.string
"a" Token.string "an")? (SpaceToken)?
(Token.string "vital" Token.string
"important" Token.string "critical" Token
.string "some" Token.string
"unexpected" Token.string "multifaceted"
Token.string "major")? (SpaceToken)? Tag
.lemme "role"
12System architecture
13Example
HGF plays an important role in lung
development
The information extracted from this sentence are
14RDF Annotation Generated
ltrdfRDF xmlnsrdf'http//www.w3.org/1999/0
2/22-rdf-syntax-ns' xmlnsm'http//www.inria.
fr/acacia/meat' xmlnsrdfs'http//www.w3.org/
2000/01/rdf-schema'gt ltmAmino_Acid_Peptide_or_Pr
otein rdfabout'HGF'gt ltmplay_rolegt
ltmOrgan_or_Tissue_Function rdfabout'lung_
development'/gt lt/mplay_rolegt lt/mAmino_Ac
id_Peptide_or_Proteingt lt/rdfRDFgt
15CORESE Semantic search engine
16Ontology-based query
Annotation Base
UMLS
17Semantic Web requirements
- Adaptation of Corese semantic search engine to
OWL
- Corese query language vs SPARQL
- Contextual annotations ? Need of expression of
multiple contexts / viewpoints
- Temporal queries on the past biochip experiment
base temporally evolving ontologies
annotations
- Scalability of NLP tools articles stemming from
scientific watch on the open (semantic) Web
18Many thanks to
- ACACIA team in particular Khaled Khelif,
Laurent Alamarguy, Olivier Corby, Alain Giboin - IPMC Pascal Barbry, Kevin Le Brigand, Hélène,
Chimène, Yves - Bayer Crop Science Rémi Bars
- Didier Bourigault (ERSS), developer of Syntex
- The developers of GATE (Sheffield Univ.)
19Support to health network
Medical Ontology
Semantic Annotations
Translator
Life Line
Coresesearch engine
Virtual Staff
Member of the health network
Nautilus DB
20Visual Staff Architecture