Title: Representation of Ontology Annotation Information in Grid Computing Prospectus
1Representation of Ontology Annotation Information
in Grid Computing Prospectus
- November 18, 2004
- David A. Gaitros
- Department of Computer Science
- Florida State University
2Overview
- Background on Annotation
- The Problem
- Research Goals
- Research Objectives
- Projected Accomplishments
- Projected Activities
- Graphical Annotation
- Generic Annotation
- Biological Database Problems
- Proposed Web Services Implementation
- Morphbank Database Schema
- Expected Challenges
- Masters Thesis/Projects
- Conclusion
3Background on Annotation
- Scientific Annotation Middleware (SAM)
- We are creating a Scientific Annotation
Middleware (SAM) system that will provide
researchers and developers with the capabilities
necessary to manage the complexity resulting from
the collaborative, cross-disciplinary,
compute-intensive research. SAM will include
components and services that enable researchers,
applications, problem solving environments (PSE)
and software agents to create metadata and
annotations about data objects and the semantic
relationships between them. Human access to the
middleware will be through a researchers
notebook interface available via desktop
computers and PDA devices. - Source http//collaboratory.emsl.pnl.gov/docs/col
lab/sam/samprojoverview.html
4Background on Annotation
5Background on Annotation
- Garlic IBM
- Garlic is a project being developed by members
of the database group in Computer Science. The
goal of Garlic is to enable large-scale
multimedia information systems large scale in
that they involve lots of data with multimedia
taken as broadly as possible to mean data of many
types. We are particularly concerned about
situations in which there is enough data of
sufficiently specialized types that users have
already made decisions about how to manage it,
and have stored it in separate repositories that
are specifically adapted to data of that type. - Source http//www.almaden.ibm.com/cs/garlic/
6Background on Annotation
Query tool
C API
Garlic Schema Object Oriented Middleware
Metadata
Image Wrapper
Relational Wrapper
Document Wrapper
RDBMS
Document Store
Image Store
Sourcehttp//www.almaden.ibm.com/cs/garlic/
7Background on Annotation
- Data Annotation in Collaborative Research
Environments - Michael Gertz, Department of Computer Science,
University of California at Davis, Concept-based
data annotation techniques for scientific
databases - It is well accepted that the creation,
management, and utilization of different forms of
metadata play a major role in realizing
information systems infrastructure that are able
to provide a rich data query, sharing, and
management techniques. - We claim there is still a major gap between the
creation of such semantic rich structures and the
usage of these structures to actually enrich
various forms of data.
8Background on Annotation
- Concept Based Data Annotations
Concepts (Base concepts and relationships type
concepts
Data Annotation
Web accessible data
Scientific Data at Site B
Scientific Data at Site A
Source Dr. Michael Gertz, UC Davis
9The Problem
- The discovery of information relies on the
ability of scientists to find and access the
correct data - As such, grids and grid computing have emerged as
an ever increasing means of sharing large of
amounts of information among collaborating
organizations. - Searches conducted on annotation of metadata are
still limited due to the fact that most database
and grid applications are still using ad hoc data
storage and retrieval techniques. - Searches on information still rely on a
scientists intimate knowledge of data location,
format, and how to use specific applications - An annotation tool capable of satisfying the
requirements of the Biological community does not
currently exist
10Research Goals
- Improve the ability of biological researchers to
search annotated databases for information they
need to support their research or findings - Suggest that such improvements can be applied to
other scientific applications
11Research Objectives
- Examine current methods of annotation
- Categorize general features of annotation
- Define systematic techniques that can be applied
to current ad hoc annotation methods
12Projected Accomplishments
- Identification of the functional areas within the
data grid community - Define a relationship model that applies to all
scientific annotation - Develop a transformation model whereby any
annotation can expressed as an object ( data
operations)
13Projected Activities
- Initial Plan is to use the new MorphBank Database
to prove the concept - Develop a new MorphBank Schema
- Develop a new MorphBank Website
- Develop a reliable and more capable
multi-annotation software tool to replace the
I-Note 1.0 annotation package - Develop the methods and schemas that will allow
scientists to extract different annotations from
Biological images and other objects
14Graphical Annotation
- There may be more then one image or object
associated with a specimen - No practical upper limit can be defined
- Standards are still being defined
- Each image or object may hold several pieces of
information. - No practical upper limit can be defined
- Automating annotation is still in the early
stages. - Searching the image themselves for data is not
feasible in large database systems. - Searching large strings free entry text is also
inefficient
15Graphical Annotation(cont)
- Initially used the I-NOTE software to defined the
requirements for the development of a new piece
of software to work with Morphbank and on Windows
XP/Linux. - Will employ at least the ability to annotate any
addressable object in the new tool with Morphbank
to show that annotations can be mixed.
16Morphology Publication Example
Riccardi, Annotation Nov 5, 2004
17Example of Extensible Annotation
Riccardi, Annotation Nov 5, 2004
18Example of Extensible Annotation
Riccardi, Annotation Nov 5, 2004
19Example of Extensible Annotation
Riccardi, Annotation Nov 5, 2004
20Source http//www.iath.virginia.edu/inote/
21Source http//www.iath.virginia.edu/inote/
22Limitations of I-note Software
- Currently not supported
- University of Virginia has cut funding for the
project. - Used University programmers for development
- Works only on a Windows 95 platform
- Code is not maintainable, development was
accomplished in a Java Development Environment - Development project was not documented.
- Could not attach other objects or documents
- Only worked on certain graphic images.
- Annotations were not scalable with the image
- Annotations were overlay images and had to be
stored as full images. - Cannot address multiple objects.
23Generic Annotation
- Need to develop a method to store different
annotations as objects - Need to develop a method to search different
annotations for similar or associated information - Replace Ad Hoc queries with more systematic
methods - Higher level of ontology for annotations
- Need to determine the minimum amount of
information needed to represent and access this
object
24Generic Annotation
- General Requirements
- Platform and architecture independent
- Stand-alone application that can function as a
web services - Looking at both server and client side
applications - Exchange of information must be done using web
service features such as XML documents - Annotation on images must include
- Multiple annotations per image/object
- Must not alter the original image/object
- Must include references to points and areas
- Must include text, graphics, and voice
- Must include the ability to make general
annotation remarks - Must be able to associate multiple objects with
an annotation including other annotations
25MorphBank Annotation
Morphbank
XML
Morphbank Viewer
Morphbank Browser
Annotation Applet
XML
MorphBank RDMS
Image Files
26Biological Database Problems
- Taxonomy terms and definitions are not
universally defined - Any database system would have to accommodate
different taxonomic structures - Darwin Core standard is not sufficient to satisfy
this problem - Each Biological study group develops their own
character codes and states - There is no standardization
- Any database system would have to accommodate
different character codes and states - There is currently not enough justification for
the different Biological communities to develop
tight integration standards
27Proposed MorphBank WebServices
INSERTION AND UPDATE
BIOLOGICAL DATA ANALYSIS
WORLD BROWSE
SEARCH DISCOVERY
ADMINISTRATION
DATA DISPLAY
HIGH LEVEL WEBSERVICES
ANNOTATION DISCOVERY
ANNOTATION QUERY
BIOLOGICAL QUERY
METADATA ANNOTATION
USER VALIDATION SECURITY
BIOLOGICAL DATA DISPLAY
DATA VALIDATION
ANNOTATION AGGREGATION
BIO DATA DISCOVERY
ANNOTATION DATA DISPLAY
CORE WEBSERVICES
Web Services Access (update, insert, delete,
query)
SERVICE TRANSLATION LIBRARY
METADATA HOLDINGS
Other Bio DB
Character State Catalog
MorphBank XML Files
Image XML Files
Image Files
MorphBank DB
Based upon the Earth Systems Grid (ESG) Model
28MorphBank Website
Intro Screen
Info/Help
Login
WEB/DB Administration
World Browse
Restricted User
Add
Update
Delete
Annotate
RU/Browse
Browse
DS3
DS2
DS1
World Read
Under Review
Working Data Set
29Specimen Table
-
- Table structure for table 'specimen'
-
- CREATE TABLE specimen(
- MorphBankSpecimenID int(32) auto-increment NOT
NULL, - CatalogNumber varchar(128) NOT NULL,
- DateLastModified date NOT NULL default
'0000-00-00', - InstitutionCode varchar(128),
- CollectionCode varchar(128),
- ScientificName varchar(128),
- BasisOfRecord char(1),
- TSN int(32),
- CollectionNumber varchar (128),
- FieldNumber varchar (128),
- CollectorName (128),
- DateCollected date NOT NULL default
'0000-00-00', - TimeofDate time,
- ContinentOcean varchar(128),
-
30Specimen Table cont.
- CONTINUED FROM PREVIOUS PAGE.
- Country varchar(56),
- StateProvince varchar(56),
- County varchar(56),
- Locality varchar(56),
- Latitude double,
- Longitude double,
- CoordinatePrecision int(8),
- MinimumElevation int(32),
- MaximumElevation int(32),
- MinimumDepth int(32),
- MaximumDepth int(32),
- Sex varchar(8),
- PreparationType varchar(255),
- IndividualCount int(32),
- PreviousCatalogNumber varchar(128),
- RelationshipType varchar(128),
- RelatedCatalogItem varchar (128),
- DevelopmentalStage varchar (128),
31Image Table
-
- Table Structure for Table 'image'
-
- CREATE TABLE image (
- ImageID int(32) NOT NULL auto-increment,
- MorphBankSpecimenID int(32),
- ViewNumber int(32) ,
- ImageScale varchar(64),
- XDimensionPixels int(32) NOT NULL,
- YDimensionPixels int(32) NOT NULL
- ResolutionInPixelsPerInch int(32) NOT NULL,
- OriginalFileName varchar (255) NOT NULL,
- Magnification varchar(128),
- ImageFileType varchar(128),
- PRIMARY KEY (ImageID))
- TYPEMyISAM DEFAULT CHARSETlatin1
-
32Viewtable
-
- Table Structure for Table 'viewtable'
-
- CREATE TABLE viewtable (
- ViewNumber int(32) NOT NULL,
- ImagingTechnique varchar (128),
- ImagingPreparationTechnique varchar (128),
- SpecimenPart varchar (128),
- ViewAngle varchar (128),
- Sex varchar(8),
- DevelopmentalStage varchar (128),
- PRIMARY KEY (ViewNumber))
- TYPEMyISAM DEFAULT CHARSETlatin1
-
33Objectannotation Table
-
- Table Structure for Table 'imageannotation'
-
- CREATE TABLE imageannotation (
- AcessionNumber int(32)
- ImageAnnotationSeqNo int(32) NOT NULL
auto-incremental, - CatalogNumber varchar(128) NOT NUL
- AnnotationLocX int(32),
- AnnotationLocy int(32),
- AnnotationRadius int(32),
- AnnotationTypeid int(32),
- PhylogeneticCharacterID int(32),
- PhylogeneticCharacterStateID int(32),
- AnnotationAuthor varchar(128),
- AnnotationDate date DEFAULT '0000-00-00',
- ImageID int(32),
- AnnotationObject varchar(255),
- PRIMARY KEY (ImageAnnotationSeqNo))
- TYPEMyISAM DEFAULT CHARSETlatin1
34AnnotationType Table
-
- Table Structure for Table annotationtype'
-
- CREATE TABLE annotationtype (
- annotationtypeID int(32) NOT NULL
auto-incremental, - annotationtitle varchar(25),
- keywords varchar(255),
- description varchar(128),
- PRIMARY KEY (annotationtypeID))
- TYPEMyISAM DEFAULT CHARSETlatin1
-
35PhylogeneticCode Table
-
- Table Structure for Table 'phylogeneticcode'
-
- CREATE TABLE phylogeneticcharacter (
- PhylogeneticCharID int(32) NOT NULL
auto-increment, - CharacterNumber int(32),
- PublicationID int(32),
- TSN int(32),
- CharacterDescription varchar (128),
- ViewID int(32),
- Sex varchar(8),
- Stage varchar (128),
- SimilarEntries varchar (128),
- RelatedCharacterID int (32),
- RelationType varchar (128),
- SuggestedTaxonRange varchar (128),
- PRIMARY KEY (CharacterID))
- TYPEMyISAM DEFAULT CHARSETlatin1
-
36Phylogeneticstate Table
-
- Table Structure for Table 'phylogeneticstate'
-
- CREATE TABLE phylogeneticstate (
- StateID int(32) NOT NULL auto-increment,
- phylogeneticcharID int(32) NOT NULL,
- Description varchar(128),
- ImageID int(32),
- AnnotationSequenceNumber int(32),
- PRIMARY KEY (StateID))
- TYPEMyISAM DEFAULT CHARSETlatin1
-
37SpecimenPhyChar Table
-
- Table Structure for Table SpecimenPhyChar'
-
- CREATE TABLE SpecimenPhyChar(
- SpecimenPhyCharID int (32) NOT NULL
Auto-increment, - SpecimenID int (32) NOT NULL,
- PhylogeneticCharID int(32),
- ImageID int(32),
- ImageAnnotationSeqNo int (32),
- PRIMARY KEY (SpecimenPhyCharID))
- TYPEMyISAM DEFAULT CHARSETlatin1
38Publication Table
-
- Table Structure for Table 'PublicationTable'
-
- CREATE TABLE publicationtable (
- PublicationID int (32) NOT NULL auto-inrement,
- PublicationAuthor varchar (128),
- PublicationYear char(4),
- PublicationJournal varchar (128),
- PublicationTitle varchar (128),
- PublicationPagesFrom int(32),
- PublicationPagesto int(32),
- PRIMARY KEY (PublicationID))
- TYPEMyISAM DEFAULT CHARSETlatin1
39UserTable
-
- Table Structure for Table 'UserTable'
-
- CREATE TABLE usertable (
- UserID int (32) NOT NULL Auto-increment,
- Level int (8),
- UIN int (8),
- PIN int (16),
- Name varchar (128),
- Email varchar (128),
- Affiliation varchar (128),
- Address varchar (255),
- Country varchar (128),
- GroupID int(32),
- PRIMARY KEY (UserID))
- TYPEMyISAM DEFAULT CHARSETlatin1
40GroupTable
-
- Table Structure for Table 'Grouptable'
-
- CREATE TABLE grouptable (
- GroupID int (32) NOT NULL,
- GroupName varchar (128) NOT NULL,
- User int(32),
- PRIMARY KEY (GroupID))
- TYPEMyISAM DEFAULT CHARSETlatin1
41Expected Challenges
- The effort is contingent upon development of a
reliable annotation toolset - Development of a generic biological schema
- Integration of web services with the new
MorphBank system and other Biological Database
Systems - Obtaining consensus among the different
participants on basic biology ontology issues - Possible use of a general biological thesaurus
42Masters Thesis/Projects
- MorphBank Requirements Analysis (Thesis/Project)
- MorphBank Module Implementation(Project)
- MorphBank Security (Thesis/Project)
- MorphBank Mirror Site Implementation
(Thesis/Project) - MorphBank Operational Site Procedures (Project)
43Masters Thesis/Projects
- Biological Image eXchangE System (BIXES)
- A method and associate software to allow
heterogeneous Biological Image Database Systems
to exchange images and metadata (project/thesis) - Biological Image Search technique (School of
Computation Sciences research project/thesis)
44Conclusion
- More efficient search on large scientific data
systems - Demonstrate that this application is works for
biological databases - Show thiss feasible for any scientific
application - Provide a new and supported annotation tool set
that can be used across the web.