MDMS-A Web Tool to Manage - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

MDMS-A Web Tool to Manage

Description:

MDMS-A Web Tool to Manage & Analyze Gene Expression Microarray Data. Sachin Mathur ... Analysis of Early Lung Development dataset using MDMS. MDMS Demo ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 22
Provided by: bioinform9
Category:

less

Transcript and Presenter's Notes

Title: MDMS-A Web Tool to Manage


1
MDMS-A Web Tool to Manage Analyze Gene
Expression Microarray Data
  • Sachin Mathur

2
Overview
  • Steps in analysis of Gene Expression Microarray
    Data
  • Preprocessing
  • Filtering
  • Statistical Analysis
  • Machine Learning Data Mining (Clustering)
  • Functional Analysis
  • Data Analysis features in MDMS
  • Workflow in MDMS
  • Analysis of Early Lung Development dataset using
    MDMS
  • MDMS Demo

3
Steps in Microarray Data Analysis
Analysis of Data Deriving Knowledgebase from
Datum and mining Information from the
knowledgebase
4
Steps in Microarray Data Analysis
  • Image Quantification
  • Check for artifacts, Segmentation
  • Extraction of expression values of genes
  • Preprocessing
  • Background Correction
  • Normalization
  • Summarization
  • MAS5, RMA, GC-RMA, DChip

www.swegene.org/SWEGENE_microarray_eng.php?Id18
5
Steps in Microarray Data Analysis
  • Filtering
  • About 10-50 of the genome is not expressed in a
    given tissue
  • Aim is to isolate the genes that are expressed
  • Also helps in more accuracy in statistical
    significance tests
  • Specific Non-specific filtering
  • Filter of Presence/Absence calls
  • Filter on expression signal, Variability in gene
    expression

6
Steps in Microarray Data Analysis
  • Statistical Analysis
  • Many genes will be expressed to perform many
    routine tasks in the cell
  • Aim is to isolate genes responsible for
    phenotypic variation
  • Interesting Vs Random
  • Variant significance tests T-Test, ANOVA
  • Multiple Testing Correction

7
Steps in Microarray Data Analysis
  • Machine Learning Approaches Data Mining
  • Small changes in gene expressions can
    collectively regulate an important pathway, which
    by themselves may not be statistically
    significant
  • Limitations with fewer replicates and fitting
    approximate models on data during statistical
    analysis
  • Aim is to find significant patterns in the data
    set.
  • Periodic, Time-lagged, cyclic
  • Machine Learning approaches mine data for
    information data mining using computational and
    statistical techniques (Eg Clustering)

8
Functional Analysis
  • Functional Analysis
  • Given a statistically significant pattern or list
    significant of genes, how significant is it
    biologically?
  • Aim is to find genes that are responsible for the
    phenotypic condition
  • Extracting annotations and finding functionally
    similar genes.
  • Gene Ontology
  • Gene set enrichment, relating genes to known
    pathways

http//cardioserve.nantes.inserm.fr/ptf-puce/image
s/camembert_go.gif
9
Data Analysis Features in MDMS
  • All data analysis features in MDMS are
    implemented through Bioconductor Package
    (http//www.bioconductor.org)
  • Covers many aspects of data analysis for
    Gene-Expression, SNP, Custom made arrays
  • Many different tests for quality control,
    preprocessing, filtering, statistical analysis,
    machine learning and functional analysis
  • Large user community, helpful mailing lists, used
    by many labs in many countries
  • Tutorials are available on the website and
    hands-on training is also available.
  • Better than all available packages in terms of
    coverage of data analysis aspects.
  • Open Source

10
Data Analysis Features in MDMS
  • MDMS supports Affymetrix Gene Expression arrays
  • No Image Quantification (usually done at
    microarray facility)
  • Quality Control
  • 3/5 bias
  • Detection calls
  • Background signals
  • Correlation coefficients between arrays

11
MDMS - Preprocessing
  • Preprocessing
  • MAS5 Default Affymetrix normalization
  • RMA Robust Multichip Analysis
  • GC-RMA, DChip (Li-Wong)
  • MAS5 and RMA are highly recommended
  • Available literature shows significant advantages
    of RMA over MAS5

12
MDMS - Filtering
  • Filtering
  • Expression value cut-off
  • Eg. All genes gt 200
  • Detection calls
  • Eg. All genes that are detected as Present
  • Fold Change
  • Eg. All genes that have gt 2 fold or less than -2
    fold
  • Inter-Quartile Range (1st 3rd quartiles)
  • For genes that show higher variability
  • All analysis is done on a log 2 scale

13
MDMS Statistical Analysis
  • Significance Tests
  • LIMMA (Linear Models of Microarrays)
  • SAM (Significance Analysis of Microarrays)
  • EBAM (E-Bayes Analysis of Microarrays)
  • Correction for Multiple Testing
  • FDR, Bonferroni, Holms correction
  • Machine Learning
  • Clustering
  • Hierarchical Clustering, K-Means, Self Organizing
    Maps.

14
MDMS-Functional Analysis
  • Functional Analysis through GOAPhAR
  • Gene Annotation
  • Protein Annotation
  • Biological Pathways
  • Gene Ontology Annotation
  • Protein Interaction Evidence
  • All gene lists generated using the data analysis
    options can be saved in the database for future
    use. These can be also downloaded as text files.

15
MDMS-WORKFLOW
Microarray Core
USER
Data Repository
Software Rat2302, Hg133U
MDMS Database
Preprocessing
Filtering
Statistical Analysis Machine Learning
GOAPhAR
Annotation
16
Data Analysis Example
  • Data set specifications (GSE3541)
  • The aim of the study is to find genes involved in
    early lung development.
  • Mechanical Stress was applied to fetal type II
    endothelial cells taken from 19 day old rat
    embryos
  • Data set Processing
  • Data was preprocessed by MAS5
  • Expression gt 200, Invariant change between pairs
    of control experiment samples gt 50 (75
    filtered)
  • SAM statistical method was used to find
    significant genes (92 genes, 63 up and 29
    down-regulated)
  • 34 up-regulated genes were selected for further
    analysis

17
Biological Significance of Clusterings
  • K-Means was applied to 34 genes, with K2, 3, 4,
    .,29
  • Random clusterings were generated for K
    2,3,4,29 to compare the statistical clusterings
    to random
  • Biological significance scores were calculated
    for all clusterings.
  • A z-score and P-value was calculated for each K
    value

18
Probeset Cluster Gene Symbol Gene Title
1377064_at 0 Dusp6 dual specificity phosphatase 6
1386908_at 0 Glrx1 glutaredoxin 1 (thioltransferase)
1367811_at 1 Phgdh 3-phosphoglycerate dehydrogenase
1368489_at 1 Fosl1 fos-like antigen 1
1368789_at 1 Acpp acid phosphatase, prostate
1375213_at 1 Pck2_predicted phosphoenolpyruvate carboxykinase 2 (mitochondrial)
1387925_at 1 Asns asparagine synthetase
1368990_at 2 Cyp1b1 cytochrome P450, family 1, subfamily b, polypeptide 1
1370690_at 2 Hspa9a_predicted heat shock 70kDa protein 9A
1375025_at 2 Gm963_predicted Gene model 963
1376134_at 2 RGD1307789 similar to hypothetical protein MGC3207
1387408_at 2 Siah2 seven in absentia 2
1367741_at 3 Herpud1 homocysteine-inducible, endoplasmic reticulum stress-inducible, ubiquitin-like domain member 1
1368391_at 3 Slc7a1 solute carrier family 7 (cationic amino acid transporter, y system), member 1
1372665_at 3 Psat1 phosphoserine aminotransferase 1
1375964_at 3 Psph phosphoserine phosphatase
1368203_at 4 Scnn1a sodium channel, nonvoltage-gated 1 alpha
1369868_at 4 Iag2 implantation-associated protein
1371900_at 4 Cugbp1 CUG triplet repeat, RNA binding protein 1
1373412_at 4 Nt5c3_predicted 5'-nucleotidase, cytosolic III (predicted)
1387088_at 4 Gal galanin
1389725_at 5 Tm7sf2 transmembrane 7 superfamily member 2
1367795_at 6 Ifrd1 interferon-related developmental regulator 1
1368582_at 6 Slc7a3 solute carrier family 7 (cationic amino acid transporter, y system), member 3
1369772_at 6 Slc6a9 solute carrier family 6 (neurotransmitter transporter, glycine), member 9
1370080_at 6 Hmox1 heme oxygenase (decycling) 1
1372042_at 6 Cmtm3_predicted CKLF-like MARVEL transmembrane domain containing 3
1377112_at 6 Cda_predicted cytidine deaminase
1398771_at 6 Slc3a2 solute carrier family 3 (activators of dibasic and neutral amino acid transport), member 2
1374034_at 7 Cars_predicted cysteinyl-tRNA synthetase
1374221_at 7 Slc29a3 Solute carrier family 29 (nucleoside transporters), member 3
1374324_at 7 --- Transcribed locus
1372601_at 8 Atf5 activating transcription factor 5
1386888_at 8 Eif4ebp1 eukaryotic translation initiation factor 4E binding protein 1
19
Biological Significance of Clusterings
  • The study found that genes related to amino acid
    synthesis, amino acid transport and sodium ion
    transport contributed to lung development.
  • 1 gene for sodium ion transport
  • 4 genes for amino acid transport were found in 2
    clusters
  • 4 genes for amino acid synthesis were found in 2
    clusters

20
MDMS
  • Demonstration - Using MDMS to analyze data

21
MDMS
  • Questions, comments, suggestions
Write a Comment
User Comments (0)
About PowerShow.com