DISCOVERYspace - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

DISCOVERYspace

Description:

The display and organization of the data is user-directed, allowing for ... be compared and visualized for pair-wise and multi-comparison analysis, using a ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 2
Provided by: Mart326
Category:

less

Transcript and Presenter's Notes

Title: DISCOVERYspace


1
DISCOVERYspace A platform for gene expression
analysis
Varhol RJ, Leung D, Robertson N, Oveisi-Fordoei
M, Fjell C, Zuyderduyn S, Siddiqui A, Marra M,
Jones S www.bcsgc.ca/discoveryspace
DISCOVERYplatform (www.bcgsc.ca/discoveryspace)
Gene Expression
CMOST DISCOVERYspace Plug-in
The CMOST plug-in acts as a user interface for
the CMOST database. It uses the DISCOVERYspace
framework to allow the user to map experimental
derived SAGE tags using the CMOST approach. It
can also draw upon the many resources and
features that DISCOVERYspace has to offer.
Serial analysis of gene expression (SAGE) is a
technique that allows for a cells transcriptome
to be globally characterized. By identifying
genes that are abnormally abundant or absent in
two differing cell types (i.e. Normal vs.
Cancer), with the hope to identify the molecular
mechanisms of cancer and find potential
diagnostic or treatment targets.
  • The availability of large amounts of gene
    expression data, improving technologies and the
    increasing number of specialized databases,
    provides the potential for greater understanding
    of the molecular characteristics of a cell.
    Differential expression studies can focus on the
    localization or synchronization aspects of gene
    expression as well as atypical alterations of a
    cells transcriptome as a result of disease.
  • Both the complexity of a cells transcriptome and
    the amount of data that is available can make
    data manipulation very challenging for
    non-bioinformaticians.
  • We have made an attempt to eliminate much of the
    hardships involved with gene expression analysis
    by building an application that is visual,
    flexible and comprehensive.
  • This is accomplished by
  • Systematically warehousing existing biological
    knowledge
  • Assigning visual context to raw knowledge and
    resulting analysis
  • A robust yet flexible underlying architecture to
    rapidly develop software plugins for specific
    experimental approaches
  • The DISCOVERY platform consists of a data
    warehousing system, DISCOVERYdb, which stores
    various public/private datasources as well as
    providing additional, pre-computed annotations.
    The client, DISCOVERYspace, accesses the data
    through an ontology layer which provides and
    intuitive abstraction of the underlying
    relational structures. This abstraction allows
    the client to navigate the qualities and links of
    data objects without requiring knowledge of the
    storage mechanisms.

4. Golden Ratio Layout
GO Browser
The SAGE plugin offers functionality to analyze
SAGE data within DISCOVERYspace by providing
tools to identify statistically significant
observations. Expression profiles can be compared
and visualized for pair-wise and multi-comparison
analysis, using a Venn Table, Venn Diagram or Two
Dimensional Expression Viewer.
Tag to Gene Mapping
SAGE Serial Analysis of Gene Expression. An
experimental technique that splices a short
(10bp) nucleotide fragment fromt eh 3 most
Nlalll site of cDNAs derived from mRNA
transcripts. These tags are randomly concatenated
into long nucleotide chains suitable for
sequencing providing a profile of the cells
transcriptome. Transcriptome All sequences
transcribed from the nuclear and mitochondrial
genomes.
The Gene Ontology Consortium publishes a database
of biological terms organized into a hierarchy.
This database has become increasingly referenced
by other biological databases such as LocusLink
and third-party annotations such as GOA. The GO
Browser is an integrated component within the
DISCOVERYspace platform. It allows the user to
characterize, possibly large, sets of biological
records by reference to associated Gene Ontology
terms. The GO Browser supports records from any
biological database which relates to GO terms.
Currently RefSeq, LocusLink, Interpro, Swissprot,
MGC and Unigene are supported. After finding the
terms directly associated with a given set of
biological records, the Browser navigates the GO
hierarchy to score terms based upon indirect,
ancestral relationships. These ancestral
associations allow the user to make deductions
about commonalities within the set of biological
records being analyzed.
One of the limitations of the SAGE technique is
associating biological meaning to each of the
generated SAGE tags, referred to as tag mapping.
Assigning annotation to SAGE data is commonly
accomplished by mapping the tag directly to a
limited selection of databases. These approaches
however do not account for insertions, deletions,
SNPs, sequencing errors, alternative transcripts,
antisense tags and other anomalies. These
techniques also discard those tags that appear
once in a SAGE library (singletons), which
minimizes the potential for gene discovery. Our
Comprehensive Mapping of SAGE Tags (CMOST)
approach improves upon the current techniques by
accounting for all of the above mentioned
sequence variations by perturbing experimental
SAGE tags (sense and antisense) and mapping them
to virtual tags. The virtual tags are extracted
from seven different datasources and stored
within DISCOVERYdb followed by a comprehensive
mapping attempt to seven publicly available
databases, allowing for a greater chance of
discovering new genes.
The platform has been used in assisting
investigations using model organisms such as cell
death elucidation in Drosophila early development
and the mechanisms of ageing in C.elegans as
well as aiding in human studies of early-stage
lung cancer and telomerase-induced cell
immortality.
DISCOVERYdb
300 dpi
The underlying foundation of the DISCOVERY
platform is a database warehouse constituting of
over 20 publicly available datasources that
describe genes, pathways, clinical information,
functional genetic components, and biological
vocabularies, all of which are accessible within
DISCOVERYspace. The collection of datasources can
be rapidly expanded through the addition of
public/private databases by means of datasource
specific administration tools. Once a datasource
has been assimilated into the network, it
automatically gets updated, based on a schedule
specified by the database administrator. Most
major datasources cross-reference one another,
allowing queries to traverse the vast knowledge
space with relative ease. However, when
datasources do not internally reference other
datasources, an analytical linkage is created,
generating new relationships. These large scale
computational analyses are based on similarity
approaches, and stored within the database to
allow for quick access.
Virtual SAGE tags (sense and antisense) are
extracted from RefSeq, MGC, Golden Path, Ensembl
(Transcripts, EST Transcripts, Transcription
units), Genbank (Mitochondria, Non-protein coding
genes) and stored within DISCOVERYdb
Availability DISCOVERYspace is an open source
effort. Platform builds are available for
download at www.bcgsc.ca/discoveryspace. The
software has been developed and tested on both
Linux and Windows platforms
150 dpi
600 dpi
Records from previous analyses can be selectively
isolated from other tools within DISCOVERYspace
and imported into the GO Browser, for further
characterization.
4. Golden Ratio Layout
Tag modification
It is by mapping tags through this comprehensive
approach - through using a broad range of data
sources, antisense tags, and taking into account
tag modification anomalies - that tag to gene
mapping is increased from 40-50 with common
methods, to 80-90. This methodology is used
extensively for most mapping projects including
Mouse Atlas.
Single Base Modification (e.g. 4th base)
GO terms that have direct associations within a
selected sub-set can be identified, providing the
ability to determine the type of biological
activity for each item in the sub-set. Each term
is scored by the percentage of the selected
record set that is associated with the particular
term. This percentage is indicated by the blue
bar. For example, the top row indicates that 40
of the selection is directly associated with the
term 'oxidoreductase activity'. Four of these
terms also have ancestral associations with the
REFSEQ set in addition to the direct
associations, indicated by the red bars.
3. Sub-Section Labels
DISCOVERYspace
Single Base Insertion (e.g. into the 2nd base
position)
Single Base Deletion (e.g. the 10th base position)
DISCOVERYspace software obtains much of its
information from DISCOVERYdb and is designed to
facilitate a fast, flexible and intuitive
visualization of experimental and genomic data.
Availability
The display and organization of the data is
user-directed, allowing for specialized
visualization tools to be displayed for added
functionality. The application contains a number
of useful features such as cut-and-paste
functionality between common MS Office software
packages, keyword searches and the ability to
cache relevant information for quick resumption
of analysis.
Ancestrally associated terms can be utilized to
provide a macro overview of how experimentally
associated data are biologically characterized.
Each term is scored by strength of association to
the selected set.
A comparison of different tag to gene mapping
methods for 3 pooled Mouse Atlas libraries.
Write a Comment
User Comments (0)
About PowerShow.com