Making Sense of Public Domain Expression Data- GeneVestigator - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Making Sense of Public Domain Expression Data- GeneVestigator

Description:

The MIAME (Minimum Information About a Microarray Experiment) ... Select the find 3 closest genes' option. IER2, FOS, JUN, have similar expression to nfkbia. ... – PowerPoint PPT presentation

Number of Views:237
Avg rating:3.0/5.0
Slides: 29
Provided by: metsa
Category:

less

Transcript and Presenter's Notes

Title: Making Sense of Public Domain Expression Data- GeneVestigator


1
Making Sense of Public Domain Expression Data-
GeneVestigator
2
On the Agenda -
  • Microarray databases characteristics
  • pros and cons
  • Examples
  • GEO and ArrayExpress
  • GeneVestigator - meta-analytical approach

3
Meta-data in Microarray Experiments
Gene expression studies generate large amounts of
data !
http//titan.biotec.uiuc.edu/cs491jh/slides/cs491j
h-Yong.ppt268,6,Capturing Data and Meta-data in
Microarray Experiments
4
Properties of High-throughput Data
Microarray databases have the ability to accept,
store and export (share) large quantities of
data. Data (stored) contain Many genes Many
samples Various organisms/tissues Variety of
biological phenomena Time course Replicates Differ
ent technologies various data format Data
Retrieval user-friendly web-based interfaces
Links to Analysis Tools
5
Gene Expression Matrix
The final gene expression matrix (on the right)
is needed for higher level analysis and mining
Samples
?
Genes
Gene expression levels
http//titan.biotec.uiuc.edu/cs491jh/slides/cs491j
h-Yong.ppt271,8,Gene Expression Matrix
6
Microarray Data Precision and Loss
Electron microscopy
Only provided in 0.1 of public experiments
Processed data loses precision !
90 of CEL files generated from microarray
experiments have never been deposited to any
repository. Stokes et al. BMC Bioinformatics 2008
9(Suppl 6)S18  
http//www.bio-miblab.org/arraywiki
7
Microarray Data Formats
  • Raw image data, the intensity of the signal at
    each spot is proportional to the expression level
    of the gene under test.
  • Image intensities are quantified using image
    analysis software.
  • B. Raw numerical data (signal intensities).
  • C. Processed data.

A.
B.
C.
8
Problem Raw Data
  • Complete description of complex experiments is
    desired.
  • We dont always know whats important
  • Noise probes could end up being informative
    (e.g. detection of a splice variant).
  • The Future
  • Better (more accurate) summarization algorithms
    will emerge.
  • New uses for raw data may emerge.
  • Challenge Store the raw data in accessible form.

Different labs have different needs a central
system is needed !
9
Complexity and Categories of Data
and MIAME 6 parts
The MIAME (Minimum Information About a Microarray
Experiment) guidelines contain standards for
publication of information. Brazma et al. (2001),
Nature Genetics 29(4), 365-71
Publication
Experimental design
Sample Source treatment, prep. labelling
Source (e.g., Taxonomy)
Array design
Normalization
Data measurements
http//www.ict.ox.ac.uk/odit/projects/digitalrepos
itory/docs/workshop/Helen_Parkinson-RDMW0608.ppt4
29,18,Slide 18
10
Microarray Database Repositories are Biased
The relative size of each pie corresponds to the
number of experiments contained in each
repository.
All human data
Mostly old data
Mostly custom arrays
Mostly human data
Mainly Affy chips
Stokes et al. BMC Bioinformatics 2008 9 (Suppl
6) S18 http//www.biomedcentral.com/1471-2105/9/S
6/S18
11
Overlaps of Data Between Repositories
Stokes et al. BMC Bioinformatics 2008 9 (Suppl
6) S18   http//www.biomedcentral.com/1471-2105/9
/S6/S18
Total Experiments 2376
August 2005 June 2006
12
User-Friendly Microarray Databases
  • Many gene expression databases exist commercial
    and non-commercial.
  • Most focus on either a particular technology,
    particular organism or both.
  • We will discuss most promising ones
  • ArrayExpress EBI (AE)
  • The Gene expression Omnibus (GEO NCBI)
  • GeneVestigator

13
http//www.ncbi.nlm.nih.gov/geo/
The Gene Expression Omnibus is a public
repository in the Entrez database that includes
high-throughput gene expression data, hosted at
the National library of Medicine (NIH). GEO was
designed to accommodate diverse types of data.
14
Gene Express Omnibus - Experiment centered view
(GDS)
15
Gene Express Omnibus - Gene centered view
Expression profile of the Dystrophin gene in a
DataSet examining skeletal muscle biopsies from
12 Duchenne muscular dystrophy patients and 12
normal subjects. Red bars level of abundance
of an individual transcript across the Samples
that make up a DataSet. Values are presented as
arbitrary units. Single channel normalized
Values signal count data. Dual channel
submitted Values are normalized log ratios.Blue
square rank order, give an indication of where
the expression of that gene falls with respect to
all other genes on that array (enrichment).
16
http//www.ebi.ac.uk/microarray-as/ae/
Metsada Pasmanik-Chor, TAU Bioinformatics Unit,
19/3/09
16
17
Query ArrayExpress
Annotations
Experiments and description
Click
Condition
Gene name
Species
Results a list of all experiments, ordered by p
value. For each experiment short description,
experimental factors and gene expression.
18
Query ArrayExpress similar expressed genes
Select the find 3 closest genes option. IER2,
FOS, JUN, have similar expression to nfkbia.
19
HeatMap Atlas Output
Number of up/down regulated genes
Experimental condition
http//www.ebi.ac.uk/microarray-as/atlas/qr?q_gene
saa4q_updnupdnq_orgnMUSMUSCULUSq_expt28al
lconditions29viewheatmapview
20
GeneVesigator a reference expression database
and meta-analysis system
21
Genevestigator a system for the meta-analysis
of microarray data
A database Web-browser data mining interface
for Affymetrix GeneChip data, based on a the new
concept of Meta-Profiles, relying on reference
expression databases. Allows biologists to study
the expression and regulation of genes in a broad
variety of contexts by summarizing information
from hundreds of manually curated microarray
experiments. Workspaces and views can be stored
into files and re-opened for another analysis
session (.gvw which stands for
GenevestigatorWorkspace).
Application server
Java application
Analysis output
http//bar.utoronto.ca/ICAR19/ICAR19_BioinfoWorksh
op20-20Genevestigator.ppt257,2,Overview of the
Genevestigator system
22
Database Content and Quality
  • Database consist of large and various manually
    curated and quality-controlled Affymetrix chips
  • Quality control of EACH experiment is manually
    done by Genevestigator curators using a pipeline
    of Bioconductor packages performing
    normalization and probe-level analysis.
  • Low quality arrays are characterized by
  • fall out of range relative to the other arrays
    from the same experiment,
  • exhibit higher RNA degradation,
  • particularly noisy,
  • do not correlate with replicate samples.

Metsada Pasmanik-Chor, TAU Bioinformatics Unit,
19/3/09
22
23
User Hardware Requirements
  • Genevestigator is a web-based application running
    in Java.
  • Java applet provides several advantages
  • users dont have to install any software
  • users always work with the latest software
    release
  • Java is more powerful than HTML/Javascript for
    data manipulation
  • To run the application, client machines must have
    Java runtime environment
  • (JRE version 1.4.2 or higher) installed (usually
    available by default on PCs).
  • JRE is freely available for download at Sun
    Microsystems (http//www.Java.com).
  • To optimally work with the Genevestigator
    application, we recommend
  • screen resolution 1024 x 768 or higher
  • memory preferably 512 MB RAM or more

24
GeneVestigator Species Availability
Species Human Mouse
Rat Mammals
Arrays
Human 133_2 Human Genome 10k 20k 47
k 1109, 3786, 2782
Mouse Genome 12k 40k 3071, 1967
Rat Genome 8k 31k 2146, 858
Number of arrays
Species Arabidopsis Barley
Rice Soybean Plants
Barley Genome 22k 706
Rice Genome 22k -
Arrays
Arabidopsis Genome 22k 3110
Number of arrays
25
Data Sources and Referencing
The Genevestigator analysis platform comprises a
large database of manually curated microarray
experiments collected from the public domain or
from individual contributors. The array
annotations necessary for data analysis were
retrieved from public repositories and/or, if
insufficiently available, from the authors
themselves. Genevestigator contains data from
the following repositories and databases
Link Database
http//www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus (GEO)
http//www.ebi.ac.uk/arrayexpress/ ArrayExpress
http//chipperdb.chip.org/adb/adb-home ChipperDB
http//www.arabidopsis.org/ The Arabidopsis Information Resource (TAIR)
httpproteogenomics.musc.eduma MUSC Microarray Database
http//pepr.cnmcresearch.org Public Expression Profiling Resource (PEPR)
http//affymetrix.arabidopsis.info/narrays/experimentbrowse.pl NASC Microarray Database (NASCArrays)
http//arrayconsortium.tgen.org/np2/home.do NIH Neuroscience Microarray Consortium
https//genes.med.virginia.edu/intro to geoss.html Gene Expression Open Source System (GEOSS)
http//www.cbil.upenn.edu/RAD/php/index.php RNA Abundance Database (RAD)
26
GeneVestigator focus on gene expression in the
context of
  1. Time (Gene expression during stages of
    development\life-cycle).
  2. Space (Tissue specific expression).
  3. Response (Expression caused by stimuli biotic
    stress, abiotic stress, chemical, hormone,
    light, drug treatment, disease).

Users can query the database to retrieve the
expression patterns of individual genes
throughout chosen environmental conditions,
growth stages, or organs. Reversely, mining
tools allow users to identify genes specifically
expressed during selected stresses, growth
stages, or in particular organs
Access
Free / By license
27
http//sbw.kgi.edu/
28
Thank-you !
Dr. Metsada Pasmanik-Chor Bioinformatics
Unit, Life Science, TAU Tel x 6992 E-mail
metsada_at_bioinfo.tau.ac.il Bioinfo. Unit webpage
http//bioinfo.tau.ac.il
Bioinformatics Intro, 15/12/2008, Metsada
Pasmanik-Chor
28
Write a Comment
User Comments (0)
About PowerShow.com