Title: MDMS-A Web Tool to Manage
1MDMS-A Web Tool to Manage Analyze Gene
Expression Microarray Data
2Overview
- Steps in analysis of Gene Expression Microarray
Data - Preprocessing
- Filtering
- Statistical Analysis
- Machine Learning Data Mining (Clustering)
- Functional Analysis
- Data Analysis features in MDMS
- Workflow in MDMS
- Analysis of Early Lung Development dataset using
MDMS - MDMS Demo
3Steps in Microarray Data Analysis
Analysis of Data Deriving Knowledgebase from
Datum and mining Information from the
knowledgebase
4Steps in Microarray Data Analysis
- Image Quantification
- Check for artifacts, Segmentation
- Extraction of expression values of genes
- Preprocessing
- Background Correction
- Normalization
- Summarization
- MAS5, RMA, GC-RMA, DChip
www.swegene.org/SWEGENE_microarray_eng.php?Id18
5Steps in Microarray Data Analysis
- Filtering
- About 10-50 of the genome is not expressed in a
given tissue - Aim is to isolate the genes that are expressed
- Also helps in more accuracy in statistical
significance tests - Specific Non-specific filtering
- Filter of Presence/Absence calls
- Filter on expression signal, Variability in gene
expression
6Steps in Microarray Data Analysis
- Statistical Analysis
- Many genes will be expressed to perform many
routine tasks in the cell - Aim is to isolate genes responsible for
phenotypic variation - Interesting Vs Random
- Variant significance tests T-Test, ANOVA
- Multiple Testing Correction
7Steps in Microarray Data Analysis
- Machine Learning Approaches Data Mining
- Small changes in gene expressions can
collectively regulate an important pathway, which
by themselves may not be statistically
significant - Limitations with fewer replicates and fitting
approximate models on data during statistical
analysis - Aim is to find significant patterns in the data
set. - Periodic, Time-lagged, cyclic
- Machine Learning approaches mine data for
information data mining using computational and
statistical techniques (Eg Clustering)
8Functional Analysis
- Functional Analysis
- Given a statistically significant pattern or list
significant of genes, how significant is it
biologically? - Aim is to find genes that are responsible for the
phenotypic condition - Extracting annotations and finding functionally
similar genes. - Gene Ontology
- Gene set enrichment, relating genes to known
pathways
http//cardioserve.nantes.inserm.fr/ptf-puce/image
s/camembert_go.gif
9Data Analysis Features in MDMS
- All data analysis features in MDMS are
implemented through Bioconductor Package
(http//www.bioconductor.org) - Covers many aspects of data analysis for
Gene-Expression, SNP, Custom made arrays - Many different tests for quality control,
preprocessing, filtering, statistical analysis,
machine learning and functional analysis - Large user community, helpful mailing lists, used
by many labs in many countries - Tutorials are available on the website and
hands-on training is also available. - Better than all available packages in terms of
coverage of data analysis aspects. - Open Source
10Data Analysis Features in MDMS
- MDMS supports Affymetrix Gene Expression arrays
- No Image Quantification (usually done at
microarray facility) - Quality Control
- 3/5 bias
- Detection calls
- Background signals
- Correlation coefficients between arrays
11MDMS - Preprocessing
- Preprocessing
- MAS5 Default Affymetrix normalization
- RMA Robust Multichip Analysis
- GC-RMA, DChip (Li-Wong)
- MAS5 and RMA are highly recommended
- Available literature shows significant advantages
of RMA over MAS5
12MDMS - Filtering
- Filtering
- Expression value cut-off
- Eg. All genes gt 200
- Detection calls
- Eg. All genes that are detected as Present
- Fold Change
- Eg. All genes that have gt 2 fold or less than -2
fold - Inter-Quartile Range (1st 3rd quartiles)
- For genes that show higher variability
- All analysis is done on a log 2 scale
13MDMS Statistical Analysis
- Significance Tests
- LIMMA (Linear Models of Microarrays)
- SAM (Significance Analysis of Microarrays)
- EBAM (E-Bayes Analysis of Microarrays)
- Correction for Multiple Testing
- FDR, Bonferroni, Holms correction
- Machine Learning
- Clustering
- Hierarchical Clustering, K-Means, Self Organizing
Maps.
14MDMS-Functional Analysis
- Functional Analysis through GOAPhAR
- Gene Annotation
- Protein Annotation
- Biological Pathways
- Gene Ontology Annotation
- Protein Interaction Evidence
- All gene lists generated using the data analysis
options can be saved in the database for future
use. These can be also downloaded as text files.
15MDMS-WORKFLOW
Microarray Core
USER
Data Repository
Software Rat2302, Hg133U
MDMS Database
Preprocessing
Filtering
Statistical Analysis Machine Learning
GOAPhAR
Annotation
16Data Analysis Example
- Data set specifications (GSE3541)
- The aim of the study is to find genes involved in
early lung development. - Mechanical Stress was applied to fetal type II
endothelial cells taken from 19 day old rat
embryos - Data set Processing
- Data was preprocessed by MAS5
- Expression gt 200, Invariant change between pairs
of control experiment samples gt 50 (75
filtered) - SAM statistical method was used to find
significant genes (92 genes, 63 up and 29
down-regulated) - 34 up-regulated genes were selected for further
analysis
17Biological Significance of Clusterings
- K-Means was applied to 34 genes, with K2, 3, 4,
.,29 - Random clusterings were generated for K
2,3,4,29 to compare the statistical clusterings
to random - Biological significance scores were calculated
for all clusterings. - A z-score and P-value was calculated for each K
value
18Probeset Cluster Gene Symbol Gene Title
1377064_at 0 Dusp6 dual specificity phosphatase 6
1386908_at 0 Glrx1 glutaredoxin 1 (thioltransferase)
1367811_at 1 Phgdh 3-phosphoglycerate dehydrogenase
1368489_at 1 Fosl1 fos-like antigen 1
1368789_at 1 Acpp acid phosphatase, prostate
1375213_at 1 Pck2_predicted phosphoenolpyruvate carboxykinase 2 (mitochondrial)
1387925_at 1 Asns asparagine synthetase
1368990_at 2 Cyp1b1 cytochrome P450, family 1, subfamily b, polypeptide 1
1370690_at 2 Hspa9a_predicted heat shock 70kDa protein 9A
1375025_at 2 Gm963_predicted Gene model 963
1376134_at 2 RGD1307789 similar to hypothetical protein MGC3207
1387408_at 2 Siah2 seven in absentia 2
1367741_at 3 Herpud1 homocysteine-inducible, endoplasmic reticulum stress-inducible, ubiquitin-like domain member 1
1368391_at 3 Slc7a1 solute carrier family 7 (cationic amino acid transporter, y system), member 1
1372665_at 3 Psat1 phosphoserine aminotransferase 1
1375964_at 3 Psph phosphoserine phosphatase
1368203_at 4 Scnn1a sodium channel, nonvoltage-gated 1 alpha
1369868_at 4 Iag2 implantation-associated protein
1371900_at 4 Cugbp1 CUG triplet repeat, RNA binding protein 1
1373412_at 4 Nt5c3_predicted 5'-nucleotidase, cytosolic III (predicted)
1387088_at 4 Gal galanin
1389725_at 5 Tm7sf2 transmembrane 7 superfamily member 2
1367795_at 6 Ifrd1 interferon-related developmental regulator 1
1368582_at 6 Slc7a3 solute carrier family 7 (cationic amino acid transporter, y system), member 3
1369772_at 6 Slc6a9 solute carrier family 6 (neurotransmitter transporter, glycine), member 9
1370080_at 6 Hmox1 heme oxygenase (decycling) 1
1372042_at 6 Cmtm3_predicted CKLF-like MARVEL transmembrane domain containing 3
1377112_at 6 Cda_predicted cytidine deaminase
1398771_at 6 Slc3a2 solute carrier family 3 (activators of dibasic and neutral amino acid transport), member 2
1374034_at 7 Cars_predicted cysteinyl-tRNA synthetase
1374221_at 7 Slc29a3 Solute carrier family 29 (nucleoside transporters), member 3
1374324_at 7 --- Transcribed locus
1372601_at 8 Atf5 activating transcription factor 5
1386888_at 8 Eif4ebp1 eukaryotic translation initiation factor 4E binding protein 1
19Biological Significance of Clusterings
- The study found that genes related to amino acid
synthesis, amino acid transport and sodium ion
transport contributed to lung development. - 1 gene for sodium ion transport
- 4 genes for amino acid transport were found in 2
clusters - 4 genes for amino acid synthesis were found in 2
clusters
20MDMS
- Demonstration - Using MDMS to analyze data
21MDMS
- Questions, comments, suggestions