Title: Image Bioinformatics
1Image Bioinformatics
- Image semantics in life sciences research
Graham Klyne Image Bioinformatics Research Group
of the Oxford e-Science Centre Department of
Zoology University of Oxford, UK
2Outline
- Introduction - Image Bioinformatics Research
Group - BioImage database - key technologies
- Drosophila Testis Gene Expression Database
Project - Future plans and aspirations
- Round-up and discussion
3IntroductionImage Bioinformatics Research Group
- We are David Shotton, Chris Catton, Graham
Klyne, Liz Mellings, based in the Zoology
Department at Oxford - Backgrounds in cell biology, microscopy, animal
behaviour, video, database design, ontology,
Internet and Web standards, Semantic Web, data
curation - Part of the Oxford e-Science Centre
- Drawing on expertise and standards in biology,
computing and ontologies, applied to
life-sciences research
4Image Bioinformatics
- Images (and other media) are fundamental records
in bioscience - Vast amounts of raw data, not readily amenable to
automatic interpretation or indexing - Acquisition is often costly and time consuming
metadata increases value - Only summaries can be published in traditional
journals - Increasingly, bioinformatics research is in
silico, mining data from diverse online sources - Alternative routes to publication are needed for
research data
5The current state of the art
- Concerning the image data you requested - this
is a tough one. The image was recorded about ten
years ago, and I never managed to write a paper
about the work so it was never published. The
original data (if they still exist) must be on
some magneto-optical disk in one of many boxes in
my flat - quite hopeless to find at short notice.
All I can promise is that Ill look into this
once I am back from my travels but that will
take a few months. Whether anyone still has
hardware capable of reading the disc is quite
another matter! Sorry about this. - anon
6Technical Goals
- Storage technology is not a goal for us
- Assembly of systems to capture and publish
research images and metadata with associated
high-level descriptions - Preserving the association between raw data and
high-level descriptions - Provide access to data in terms of research
domain concepts - Combine research image data with other online
resources (gene databases, literature databases,
etc.) - Web-style interoperability and evolveability
7The BioImage Project
- Images are semantic instruments for capturing
aspects of the real world, and form a vital part
of the scientific record, for which words are no
substitute - In the post-genomic world, attention is now
focused on the organization and integration of
information within cells, for functional analyses
of gene products - In a month a single active cell biology lab may
generate between 10 and 100 Gbytes of
multidimensional image data
8So we built a database
9Key Technologies
- Ontologies
- An ontology is a formal, explicit specification
of a shared conceptualisation Studer 1998,
after Gruber - Controlled vocabulary for expressing high-level
biological concepts (BioImage uses this to
construct a user interface) - Formal constraints (based on Description Logics)
capture elements of biological domain knowledge - Inference can confirm existing knowledge and
suggest new facts - Semantic Web
- RDF provides a standard format and formal
semantics for exchanging ground facts - OWL is a standard for ontology definitions
10Other Technologies
- Java, Jena Semantic Web toolkit
- Postgres SQL
- Apache Tomcat, Java Servlets, Struts
- XML, XSLT, STXX, SiteMesh
- Protégé ontology builder, inference systems
- Agile development, Junit, Cactus, etc.
- Also applied to information design
- etc.
11BioImage overview
12Drosophila Testis Gene Expression Database
(DTGED) Project
- Research the function of genes whose expression
is dependent on specific (aly-class) proteins - PIs Dr Helen White-Cooper and Dr David Shotton
- We are working closely with the DTGED research
team based in the Zoology Department, University
of Oxford - Genes code for the production of complex
chemicals (enzymes, proteins, etc) used in
biological processes - But the expression of any gene is dependent on
the cell environment, including the presence of
other gene products - Observable biological consequences (phenotypes)
may result from subtle interactions between many
gene products and other factors - This project aims to document such interactions
in drosophila (fruit fly) spermatogenesis
13(No Transcript)
14(No Transcript)
15Images of Expression Patterns
- To an expert observer these images clearly show
gene expression at different stages of
spermatogenesis - Each image corresponds to a different combination
of gene and a strain of drosophila - These in situ hybridization images are the end
game - the final stage of a non-trivial process
of screening and preparation - Reproducibility and interpretation requires that
the preparatory steps are recorded along with the
images
CG2247 wt
CG2247 topi
CG12907 aly
CG12907 topi
16So how is it done? (1)
17So how is it done? (2)
18DTGED Experimental Data Flows
19DTGED Technologies Used
- Minimalist approach to development working with
available web-based tools, etc. - ProtégéRacer (DL reasoner) for design and
testing of ontologies for experimental data - Note that expert annotations are open-ended
- BioImage for Ontology-directed capture and
staging of annotations and observations - Extends original purpose of BioImage
- Open Microscopy Environment (OME) for capture and
staging of images and image metadata - Haskell for conditioning Excel spreadsheet data
and combining it with other data sources - BioImage for publication of images and metadata
20Future Work and Aspirations
- OntoImage
- Kaleidoscope
- V-Lab
- The Ontogenesis Network - Evolving Community
Ontologies - Standard Animal Behaviour Ontology (SABO)
- Feedback to open standards communities
21OntoImage
- Extend BioImage with enhanced queries
- Incorporate knowledge from external ontologies
- Cross-reference data from multiple sources
- Composite query planning
- Multi-faceted query presentation (Kaleidoscope)
- Additional data curation
- Plants A Prototype Arabidopsis Image Database
- Animal behaviour A Video Collection of Mouse
Behaviours - Mammals Genes of the Mammalian Secretory Pathway
22Kaleidoscope
- Interface and faceted presentation for queries
over multi-source data - (Former working name ImageBLAST)
- Early proposed interface design illustrations
follow - Front page
- Hypersearch entry form
- Search results - Drosophila gene database
(FlyBase) - Search results - BioImage
- Search Results - Protein folding simulations
(PDB)
23Kaleidoscope - proposed front page
24Kaleidoscope - hypersearch interface
25Kaleidoscope - search results
26Kaleidoscope - search results
27Kaleidoscope - search results
28V-Lab
- Collaborative video annotation
- Biological applications include animal behaviour,
bacteriology, embryonic cell division also,
possible applications in medicine, sports,
psychology, education, ergonomics... - Capture expert observations about video contents
- Attach annotations to specific frames and/or
regions of a video sequence - Software agents as assistants
- e.g. motion tracking
- Publication via BioImage
29Evolving OntologiesThe Ontogenesis network
- Currently with participants from
- Oxford - Manchester - Cambridge - FreshwaterLife
(Cumbria) - Engineering usable ontologies
- Dealing with advances in domain knowledge
- Re-evaluating old data in light of new knowledge
- Reasoning with provenance of information
- Resolving semantic conflicts
30Summary
- Image Bioinformatics deploys a collection of
tools for annotation and publication of
multidimensional image data - We aim to assemble a diverse open-source toolkit,
using existing components as much as possible - Semantic rigour is needed for interoperable
capture of expert observations - Information requirements are open-ended
extensibility and evolvability are key goals - Information design as much as software design
- We are a small, application-focused group seeking
to work with appropriate technical expert
collaborators
31Questions, discussion
http//bioimage.ontonet.org/moin/IbrgPresentations
?actionAttachFiledoget target20050727-IB
RG-presentation.ppt
32Other notes
- Why RDF?
- Missing isnt broken Brickley
- Aggregation is free
- Evolvability
- Formal semantic framework
- basis for meaning- (or truth-) preserving
transformations - A little inference goes a long way Hendler
33BioImage key features
- Concepts vs keywords
- abstract away from representation
- Model-View-Controller architecture
- separate data model from presentation and
processing logic - Link to separately defined domain knowledge
- Gene Ontology (GO), Microarray (MGED), NCBI
taxonomy, etc. - Evolution
- adopting new ontologies
- re-purposing old data
- Truth-preserving aggregation
- Ontology-guided user interface
- for submission and query
- Image and non-image data