Title: Microarray experiments. Database and Analysis Tools.
1Microarray experiments. Database and Analysis
Tools.
Kate Milova cDNA Microarray Facility March
24, 2005
2Outline.
- Microarray platforms and services available at
AECOM - cDNA
- Long Oligo
- Afymetrix
- Database (cDNA Long Oligo) structure and
content - Printing information
- Chip layout
- Annotation
- Annotation algorithms and data mining
- On-line Analysis Tools
- Normalization
- Signal filtering
- Comparison
- Statistical packages and Analysis software
- Summary
3Microarray Platforms at AECOM.
4How to choose a microarray platform.
5Before starting your microarray experiment.
6cDNA Microarray Facility. Home page.
Standart Custom Arrays. Description Prices
Hybridization, labeling, bioinformatics, workshops
Database for cDNA Long Oligo Arrays. Analysis
Pipeline
AECOM cDNA microarray facility. Supported
publications
Useful links of analysis tools
7Database for Analysis of Microarrays at AECOM.
Contents.
Chip layout
Gene Annotation
Printing Information
- Accession
- Clone ID
- Clone end
- Vector name
- Clone name
- UniGene cluster ID
- Best blast hit
- Main blast parameters (score, E-value,
identity, blast date, etc.) - Gene ID
- Gene symbol
- Gene synonyms
- Chromosome
- Map location
- GO IDs
- GO Annotation
- Chip name
- Spot information (Accession or clone id or
bacterial control) - Spot location
- Library name
- Clone location on 384 plate
- Clone location on 96 plate
- Chip name
- Specie
- Number of spots
- Number of controls
- Number of pen domains
- Number of slides
- Printing pattern
- Distance between spots
- Number of rows
- Number of columns
- Printing date
- Master chip
8Annotation sources NCBI.
UniGene ID? Accession
UniGene
UniGene ID ? Blast against UniGene clusters
Entrez Gene
UniGene ID ? Gene ID ? GO ID
NCBI
Blast Software
Blast Search
Refseq NT databases ? Annotation
9Annotation sources NCBI.
UniGene ID? Accession
UniGene
UniGene ID ? Blast against UniGene clusters
- NCBI ? UniGene ? UniGene ID
- UniGene Id for cDNA arrays is obtained from the
UniGene source file for each particular accession
number of the clone. - NCBI ? UniGene ? Blast
- UniGene Id for Long Oligo arrays is obtained
from blast results - Blast search was done with the set of oligo
sequences against UniGene clusters with cutoff
99 for sequence identity and 90 for
overlapping. - UniGene Id for the oligo hitting multiple
UniGene clusters is marked as an Ambiguous
cluster ID.
NCBI
10Annotation sources NCBI.
- UniGene ID ? Gene ID
- All information retrieved from Enrez Gene
project is based on the UniGene cluster ID and
corresponding Gene ID. - Gene ID is ambiguous in Gene ID to UniGene
cluster ID connection. - Parsing filter was used to eliminate ambiguous
Gene IDs.
- Gene ID ? GO ID
- For each Gene ID corresponding Gene Ontology IDs
were retrieved from Entrez Gene source file - There might be a few or more then 10 different
GO IDs for a Gene ID. All of them are collected.
11Annotation sources NCBI.
- Blast Software package is installed on the
microarray server. - This software allows to format databases and run
batch homology search for any combination of
custom databases and query sequences. - Refseq NT databases. Annotation
- Loaded formatted and periodically updated on the
microarray server. - When databases are updated we run blast search
of cDNA and Long Oligo sequences. - Blast results are parsed using our algorithm for
annotation extraction.
NCBI
Blast Software
Blast Search
Refseq NT databases? Annotation
12Annotation Extraction Algorithm.
Raw Data
Sequences
Homology search against RefSeq NT
Alignment quality check
90
80
13Annotation sources Gene Ontology.
Biological process
Molecular function
Gene Ontology
- Gene Ontology.
- Multiple GO IDs for each Gene ID are retrieved
in the previous step from Enrez Gene ( if
available).
Cellular compartment
- Gene Ontology annotation for all GO IDs is kept
in three different information fields biological
processes, molecular function and cellular
compartment. For each of the fields all available
annotation was prefiltered with redundancy check
and concatenated.
14cDNA Microarray Facility. Database.
15Database Search.
16Microarray Data Analysis Pipeline.
17Pipeline. LOWESS Normalization.
18cDNA Microarray Facility. Pipeline. Filtering.
19Pipeline. Data set Comparison.
20Summary
21cDNA Microarray Facility. Services.
22Annotation Extraction Algorithm.
Database of cDNA Long Oligo sequences
All hits now go through linguistic filter
Blast search against Refseq NT databases
Hits which passes two tests are defined as Good
Hits
All hits are examined with alignment quality check
- Best blast Hit is
- First good Refseq hit from group 1 OR
- First good NT hit from group 1 OR
- First good Refseq hit from group 2 OR
- First good NT hit from group 2
Only hits with gt90 identity are left
All hits are divided in two groups 1. gt 80 of
overlapping and 2. lt 80 (Partially similar)
23cDNA Microarray Facility. Arrays.
24cDNA Microarray Facility. Publications.
25Annotation Extraction Algorithm.
Raw Data
Sequences
Homology search against RefSeq NT
26Before starting your microarray experiment.
27Microarray Experts.
28Microarray Platforms at AECOM.