Representation of Ontology Annotation Information in Grid Computing Prospectus

About This Presentation
Title:

Representation of Ontology Annotation Information in Grid Computing Prospectus

Description:

'We are creating a Scientific Annotation Middleware (SAM) system that will ... CREATE TABLE imageannotation ( AcessionNumber int(32) ... –

Number of Views:61
Avg rating:3.0/5.0
Slides: 45
Provided by: davidg97
Learn more at: https://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Representation of Ontology Annotation Information in Grid Computing Prospectus


1
Representation of Ontology Annotation Information
in Grid Computing Prospectus
  • November 18, 2004
  • David A. Gaitros
  • Department of Computer Science
  • Florida State University

2
Overview
  • Background on Annotation
  • The Problem
  • Research Goals
  • Research Objectives
  • Projected Accomplishments
  • Projected Activities
  • Graphical Annotation
  • Generic Annotation
  • Biological Database Problems
  • Proposed Web Services Implementation
  • Morphbank Database Schema
  • Expected Challenges
  • Masters Thesis/Projects
  • Conclusion

3
Background on Annotation
  • Scientific Annotation Middleware (SAM)
  • We are creating a Scientific Annotation
    Middleware (SAM) system that will provide
    researchers and developers with the capabilities
    necessary to manage the complexity resulting from
    the collaborative, cross-disciplinary,
    compute-intensive research. SAM will include
    components and services that enable researchers,
    applications, problem solving environments (PSE)
    and software agents to create metadata and
    annotations about data objects and the semantic
    relationships between them. Human access to the
    middleware will be through a researchers
    notebook interface available via desktop
    computers and PDA devices.
  • Source http//collaboratory.emsl.pnl.gov/docs/col
    lab/sam/samprojoverview.html

4
Background on Annotation
5
Background on Annotation
  • Garlic IBM
  • Garlic is a project being developed by members
    of the database group in Computer Science. The
    goal of Garlic is to enable large-scale
    multimedia information systems large scale in
    that they involve lots of data with multimedia
    taken as broadly as possible to mean data of many
    types. We are particularly concerned about
    situations in which there is enough data of
    sufficiently specialized types that users have
    already made decisions about how to manage it,
    and have stored it in separate repositories that
    are specifically adapted to data of that type.
  • Source http//www.almaden.ibm.com/cs/garlic/

6
Background on Annotation
  • The Garlic Approach

Query tool
C API
Garlic Schema Object Oriented Middleware
Metadata
Image Wrapper
Relational Wrapper
Document Wrapper
RDBMS
Document Store
Image Store

Sourcehttp//www.almaden.ibm.com/cs/garlic/
7
Background on Annotation
  • Data Annotation in Collaborative Research
    Environments
  • Michael Gertz, Department of Computer Science,
    University of California at Davis, Concept-based
    data annotation techniques for scientific
    databases
  • It is well accepted that the creation,
    management, and utilization of different forms of
    metadata play a major role in realizing
    information systems infrastructure that are able
    to provide a rich data query, sharing, and
    management techniques.
  • We claim there is still a major gap between the
    creation of such semantic rich structures and the
    usage of these structures to actually enrich
    various forms of data.

8
Background on Annotation
  • Concept Based Data Annotations

Concepts (Base concepts and relationships type
concepts
Data Annotation
Web accessible data
Scientific Data at Site B
Scientific Data at Site A
Source Dr. Michael Gertz, UC Davis
9
The Problem
  • The discovery of information relies on the
    ability of scientists to find and access the
    correct data
  • As such, grids and grid computing have emerged as
    an ever increasing means of sharing large of
    amounts of information among collaborating
    organizations.
  • Searches conducted on annotation of metadata are
    still limited due to the fact that most database
    and grid applications are still using ad hoc data
    storage and retrieval techniques.
  • Searches on information still rely on a
    scientists intimate knowledge of data location,
    format, and how to use specific applications
  • An annotation tool capable of satisfying the
    requirements of the Biological community does not
    currently exist

10
Research Goals
  • Improve the ability of biological researchers to
    search annotated databases for information they
    need to support their research or findings
  • Suggest that such improvements can be applied to
    other scientific applications

11
Research Objectives
  • Examine current methods of annotation
  • Categorize general features of annotation
  • Define systematic techniques that can be applied
    to current ad hoc annotation methods

12
Projected Accomplishments
  • Identification of the functional areas within the
    data grid community
  • Define a relationship model that applies to all
    scientific annotation
  • Develop a transformation model whereby any
    annotation can expressed as an object ( data
    operations)

13
Projected Activities
  • Initial Plan is to use the new MorphBank Database
    to prove the concept
  • Develop a new MorphBank Schema
  • Develop a new MorphBank Website
  • Develop a reliable and more capable
    multi-annotation software tool to replace the
    I-Note 1.0 annotation package
  • Develop the methods and schemas that will allow
    scientists to extract different annotations from
    Biological images and other objects

14
Graphical Annotation
  • There may be more then one image or object
    associated with a specimen
  • No practical upper limit can be defined
  • Standards are still being defined
  • Each image or object may hold several pieces of
    information.
  • No practical upper limit can be defined
  • Automating annotation is still in the early
    stages.
  • Searching the image themselves for data is not
    feasible in large database systems.
  • Searching large strings free entry text is also
    inefficient

15
Graphical Annotation(cont)
  • Initially used the I-NOTE software to defined the
    requirements for the development of a new piece
    of software to work with Morphbank and on Windows
    XP/Linux.
  • Will employ at least the ability to annotate any
    addressable object in the new tool with Morphbank
    to show that annotations can be mixed.

16
Morphology Publication Example
Riccardi, Annotation Nov 5, 2004
17
Example of Extensible Annotation
Riccardi, Annotation Nov 5, 2004
18
Example of Extensible Annotation
Riccardi, Annotation Nov 5, 2004
19
Example of Extensible Annotation
Riccardi, Annotation Nov 5, 2004
20
Source http//www.iath.virginia.edu/inote/
21
Source http//www.iath.virginia.edu/inote/
22
Limitations of I-note Software
  • Currently not supported
  • University of Virginia has cut funding for the
    project.
  • Used University programmers for development
  • Works only on a Windows 95 platform
  • Code is not maintainable, development was
    accomplished in a Java Development Environment
  • Development project was not documented.
  • Could not attach other objects or documents
  • Only worked on certain graphic images.
  • Annotations were not scalable with the image
  • Annotations were overlay images and had to be
    stored as full images.
  • Cannot address multiple objects.

23
Generic Annotation
  • Need to develop a method to store different
    annotations as objects
  • Need to develop a method to search different
    annotations for similar or associated information
  • Replace Ad Hoc queries with more systematic
    methods
  • Higher level of ontology for annotations
  • Need to determine the minimum amount of
    information needed to represent and access this
    object

24
Generic Annotation
  • General Requirements
  • Platform and architecture independent
  • Stand-alone application that can function as a
    web services
  • Looking at both server and client side
    applications
  • Exchange of information must be done using web
    service features such as XML documents
  • Annotation on images must include
  • Multiple annotations per image/object
  • Must not alter the original image/object
  • Must include references to points and areas
  • Must include text, graphics, and voice
  • Must include the ability to make general
    annotation remarks
  • Must be able to associate multiple objects with
    an annotation including other annotations

25
MorphBank Annotation
Morphbank
XML
Morphbank Viewer
Morphbank Browser
Annotation Applet
XML
MorphBank RDMS
Image Files
26
Biological Database Problems
  • Taxonomy terms and definitions are not
    universally defined
  • Any database system would have to accommodate
    different taxonomic structures
  • Darwin Core standard is not sufficient to satisfy
    this problem
  • Each Biological study group develops their own
    character codes and states
  • There is no standardization
  • Any database system would have to accommodate
    different character codes and states
  • There is currently not enough justification for
    the different Biological communities to develop
    tight integration standards

27
Proposed MorphBank WebServices
INSERTION AND UPDATE
BIOLOGICAL DATA ANALYSIS
WORLD BROWSE
SEARCH DISCOVERY
ADMINISTRATION
DATA DISPLAY
HIGH LEVEL WEBSERVICES
ANNOTATION DISCOVERY
ANNOTATION QUERY
BIOLOGICAL QUERY
METADATA ANNOTATION
USER VALIDATION SECURITY
BIOLOGICAL DATA DISPLAY
DATA VALIDATION
ANNOTATION AGGREGATION
BIO DATA DISCOVERY
ANNOTATION DATA DISPLAY
CORE WEBSERVICES
Web Services Access (update, insert, delete,
query)
SERVICE TRANSLATION LIBRARY
METADATA HOLDINGS
Other Bio DB
Character State Catalog
MorphBank XML Files
Image XML Files
Image Files
MorphBank DB

Based upon the Earth Systems Grid (ESG) Model
28
MorphBank Website
Intro Screen
Info/Help
Login
WEB/DB Administration
World Browse
Restricted User
Add
Update
Delete
Annotate
RU/Browse
Browse
DS3
DS2
DS1
World Read
Under Review
Working Data Set
29
Specimen Table
  • Table structure for table 'specimen'
  • CREATE TABLE specimen(
  • MorphBankSpecimenID int(32) auto-increment NOT
    NULL,
  • CatalogNumber varchar(128) NOT NULL,
  • DateLastModified date NOT NULL default
    '0000-00-00',
  • InstitutionCode varchar(128),
  • CollectionCode varchar(128),
  • ScientificName varchar(128),
  • BasisOfRecord char(1),
  • TSN int(32),
  • CollectionNumber varchar (128),
  • FieldNumber varchar (128),
  • CollectorName (128),
  • DateCollected date NOT NULL default
    '0000-00-00',
  • TimeofDate time,
  • ContinentOcean varchar(128),

30
Specimen Table cont.
  • CONTINUED FROM PREVIOUS PAGE.
  • Country varchar(56),
  • StateProvince varchar(56),
  • County varchar(56),
  • Locality varchar(56),
  • Latitude double,
  • Longitude double,
  • CoordinatePrecision int(8),
  • MinimumElevation int(32),
  • MaximumElevation int(32),
  • MinimumDepth int(32),
  • MaximumDepth int(32),
  • Sex varchar(8),
  • PreparationType varchar(255),
  • IndividualCount int(32),
  • PreviousCatalogNumber varchar(128),
  • RelationshipType varchar(128),
  • RelatedCatalogItem varchar (128),
  • DevelopmentalStage varchar (128),

31
Image Table
  • Table Structure for Table 'image'
  • CREATE TABLE image (
  • ImageID int(32) NOT NULL auto-increment,
  • MorphBankSpecimenID int(32),
  • ViewNumber int(32) ,
  • ImageScale varchar(64),
  • XDimensionPixels int(32) NOT NULL,
  • YDimensionPixels int(32) NOT NULL
  • ResolutionInPixelsPerInch int(32) NOT NULL,
  • OriginalFileName varchar (255) NOT NULL,
  • Magnification varchar(128),
  • ImageFileType varchar(128),
  • PRIMARY KEY (ImageID))
  • TYPEMyISAM DEFAULT CHARSETlatin1

32
Viewtable
  • Table Structure for Table 'viewtable'
  • CREATE TABLE viewtable (
  • ViewNumber int(32) NOT NULL,
  • ImagingTechnique varchar (128),
  • ImagingPreparationTechnique varchar (128),
  • SpecimenPart varchar (128),
  • ViewAngle varchar (128),
  • Sex varchar(8),
  • DevelopmentalStage varchar (128),
  • PRIMARY KEY (ViewNumber))
  • TYPEMyISAM DEFAULT CHARSETlatin1

33
Objectannotation Table
  • Table Structure for Table 'imageannotation'
  • CREATE TABLE imageannotation (
  • AcessionNumber int(32)
  • ImageAnnotationSeqNo int(32) NOT NULL
    auto-incremental,
  • CatalogNumber varchar(128) NOT NUL
  • AnnotationLocX int(32),
  • AnnotationLocy int(32),
  • AnnotationRadius int(32),
  • AnnotationTypeid int(32),
  • PhylogeneticCharacterID int(32),
  • PhylogeneticCharacterStateID int(32),
  • AnnotationAuthor varchar(128),
  • AnnotationDate date DEFAULT '0000-00-00',
  • ImageID int(32),
  • AnnotationObject varchar(255),
  • PRIMARY KEY (ImageAnnotationSeqNo))
  • TYPEMyISAM DEFAULT CHARSETlatin1

34
AnnotationType Table
  • Table Structure for Table annotationtype'
  • CREATE TABLE annotationtype (
  • annotationtypeID int(32) NOT NULL
    auto-incremental,
  • annotationtitle varchar(25),
  • keywords varchar(255),
  • description varchar(128),
  • PRIMARY KEY (annotationtypeID))
  • TYPEMyISAM DEFAULT CHARSETlatin1

35
PhylogeneticCode Table
  • Table Structure for Table 'phylogeneticcode'
  • CREATE TABLE phylogeneticcharacter (
  • PhylogeneticCharID int(32) NOT NULL
    auto-increment,
  • CharacterNumber int(32),
  • PublicationID int(32),
  • TSN int(32),
  • CharacterDescription varchar (128),
  • ViewID int(32),
  • Sex varchar(8),
  • Stage varchar (128),
  • SimilarEntries varchar (128),
  • RelatedCharacterID int (32),
  • RelationType varchar (128),
  • SuggestedTaxonRange varchar (128),
  • PRIMARY KEY (CharacterID))
  • TYPEMyISAM DEFAULT CHARSETlatin1

36
Phylogeneticstate Table
  • Table Structure for Table 'phylogeneticstate'
  • CREATE TABLE phylogeneticstate (
  • StateID int(32) NOT NULL auto-increment,
  • phylogeneticcharID int(32) NOT NULL,
  • Description varchar(128),
  • ImageID int(32),
  • AnnotationSequenceNumber int(32),
  • PRIMARY KEY (StateID))
  • TYPEMyISAM DEFAULT CHARSETlatin1

37
SpecimenPhyChar Table
  • Table Structure for Table SpecimenPhyChar'
  • CREATE TABLE SpecimenPhyChar(
  • SpecimenPhyCharID int (32) NOT NULL
    Auto-increment,
  • SpecimenID int (32) NOT NULL,
  • PhylogeneticCharID int(32),
  • ImageID int(32),
  • ImageAnnotationSeqNo int (32),
  • PRIMARY KEY (SpecimenPhyCharID))
  • TYPEMyISAM DEFAULT CHARSETlatin1

38
Publication Table
  • Table Structure for Table 'PublicationTable'
  • CREATE TABLE publicationtable (
  • PublicationID int (32) NOT NULL auto-inrement,
  • PublicationAuthor varchar (128),
  • PublicationYear char(4),
  • PublicationJournal varchar (128),
  • PublicationTitle varchar (128),
  • PublicationPagesFrom int(32),
  • PublicationPagesto int(32),
  • PRIMARY KEY (PublicationID))
  • TYPEMyISAM DEFAULT CHARSETlatin1

39
UserTable
  • Table Structure for Table 'UserTable'
  • CREATE TABLE usertable (
  • UserID int (32) NOT NULL Auto-increment,
  • Level int (8),
  • UIN int (8),
  • PIN int (16),
  • Name varchar (128),
  • Email varchar (128),
  • Affiliation varchar (128),
  • Address varchar (255),
  • Country varchar (128),
  • GroupID int(32),
  • PRIMARY KEY (UserID))
  • TYPEMyISAM DEFAULT CHARSETlatin1

40
GroupTable
  • Table Structure for Table 'Grouptable'
  • CREATE TABLE grouptable (
  • GroupID int (32) NOT NULL,
  • GroupName varchar (128) NOT NULL,
  • User int(32),
  • PRIMARY KEY (GroupID))
  • TYPEMyISAM DEFAULT CHARSETlatin1

41
Expected Challenges
  • The effort is contingent upon development of a
    reliable annotation toolset
  • Development of a generic biological schema
  • Integration of web services with the new
    MorphBank system and other Biological Database
    Systems
  • Obtaining consensus among the different
    participants on basic biology ontology issues
  • Possible use of a general biological thesaurus

42
Masters Thesis/Projects
  • MorphBank Requirements Analysis (Thesis/Project)
  • MorphBank Module Implementation(Project)
  • MorphBank Security (Thesis/Project)
  • MorphBank Mirror Site Implementation
    (Thesis/Project)
  • MorphBank Operational Site Procedures (Project)

43
Masters Thesis/Projects
  • Biological Image eXchangE System (BIXES)
  • A method and associate software to allow
    heterogeneous Biological Image Database Systems
    to exchange images and metadata (project/thesis)
  • Biological Image Search technique (School of
    Computation Sciences research project/thesis)

44
Conclusion
  • More efficient search on large scientific data
    systems
  • Demonstrate that this application is works for
    biological databases
  • Show thiss feasible for any scientific
    application
  • Provide a new and supported annotation tool set
    that can be used across the web.
Write a Comment
User Comments (0)
About PowerShow.com