Microarray Data Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Microarray Data Analysis

Description:

Title: Microarray Data Analysis Author: Kath Last modified by: Janet Murray Created Date: 10/14/2003 5:52:40 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:207
Avg rating:3.0/5.0
Slides: 56
Provided by: kath208
Learn more at: https://www.uvm.edu
Category:

less

Transcript and Presenter's Notes

Title: Microarray Data Analysis


1
Microarray Data Analysis
  • The Bioinformatics side of the bench

2
The anatomy of your data files from Affymetrix
array analysis
  • .DAT image file (107 pixels)
  • .CEL measured cell intensities
  • .CDF cell descriptions files (identify probe
    sets and probe set pairs)
  • .CHP calculated probe set data
  • .RPT report generated from .CHP

3
Quality Control (QC) of the chip visual
inspection
  • Look at the .DAT file or the .CHP file image
  • Scratches? Spots?
  • Corners and outside border checkerboard
    appearance (B2 oligo)
  • Positive hybridization control
  • Used by software to place grid over image
  • Array name is written out in oligos!

4
(No Transcript)
5
Chip defects
6
Internal controls
  • B. subtilis genes (added poly-A tails)
  • Assessment of quality of sample preparation
  • Also as hybridization controls
  • Hybridization controls (bioB, bioC, bioD, cre)
  • E. coli and P1 bacteriophage biotin-labeled cRNAs
  • Spiked into the hybridization cocktail
  • Assess hybridization efficiency
  • Actin and GAPDH assess RNA sample/assay quality
  • Compare signal values from 3 end to signal
    values from 5 end
  • ratio generally should not exceed 3
  • Percent genes present (P)
  • Replicate samples - similar P values

7
Microarray Data Process/Outline
  • Experimental Design
  • Image Analysis scan to intensity measures (raw
    data)
  • Normalization clean data
  • More low level analysis-fold change, ANOVA,
    data filtering
  • Data mining-how to interpret gt 6000 measures
  • Databases
  • Software
  • Techniques-clustering, pattern recognition etc.
  • Comparing to prior studies, across platforms?
  • Validation

8
Experimental Design
  • A good microarray design has 4 elements
  • A clearly defined biological question or
    hypothesis
  • Treatment, perturbation and observation of
    biological materials should minimize systematic
    bias
  • Simple and statistically sound arrangement that
    minimizes cost and gains maximal information
  • Compliance with MIAME (minimal information about
    microarray experiment)
  • The goal of statistics is to find signals in a
    sea of noise
  • The goal of exp. design is to reduce the noise so
    signals can be found with as small a sample size
    as possible

9
Observational Study vs. Designed Experiment
  • Observational study-
  • Investigator is a passive observer who measures
    variables of interest, but does not attempt to
    influence the responses
  • Designed Experiment-
  • Investigator intervenes in natural course of
    events
  • What type is our DMSO exp?

10
Experimental Replicates
  • Why?
  • In any exp. system there is a certain amount of
    noiseso even 2 identical processes yield
    slightly different results
  • Sources?
  • In order to understand how much variation there
    is it is necessary to repeat an exp a of
    independent times
  • Replicates allow us to use statistical tests to
    ascertain if the differences we see are real

11
(No Transcript)
12
Technical vs. Biological Replicates
As we progress from the starting material to the
scanned image we are moving from a system
dominated by biological effects through one
dominated by chemistry and physics noise Within
Affy platform the dominant variation is usually
of a biological nature thus best strategy is to
produce replicates as high up the experimental
tree as possible
13
Low level data analysis / pre-processing
  • Varying biological or cellular composition among
    sample types.
  • Differences in sample preparation, labeling or
    hybridization
  • Non specific cross-hybridization of target to
    probes.
  • Lead to systemic differences between individual
    arrays
  • Raw Data Quality Control
  • Scaling
  • Normalization and filtering.

14
Image Analysis - Raw Data
15
From probe level signals to gene abundance
estimates
The job of the expression summary algorithm is to
take a set of Perfect Match (PM) and Mis-Match
(MM) probes, and use these to generate a single
value representing the estimated amount of
transcript in solution, as measured by that
probeset.
To do this, .DAT files containing array images
are first processed to produce a .CEL file, which
contains measured intensities for each probe on
the array. It is the .CEL files that are
analyzed by the expression calling algorithm.
16
MAS 5.0 output files
  • For each transcript (gene) on the chip
  • signal intensity
  • a present or absent call (presence call)
  • p-value (significance value) for making that call
  • Each gene associated with GenBank accession
    number (NCBI database)

17
How are transcripts determined to be present or
absent?
  • Probe pair (PM vs. MM) intensities
  • generate a detection p-value
  • assign Present, Absent, or Marginal call
    for transcript
  • Every probe pair in a probe SET has a potential
    vote for presence call

18
PM and MM Probes
  • The purpose of each MM probe is to provide a
    direct measure of background and stray-signal
    (perhaps due to cross-hybridization) for its
    perfect-match partner. In most situations the
    signal from each probe-pair is simply the
    difference PM - MM.
  • For some probe-pairs, however, the MM signal is
    greater than the PM value we have an apparently
    impossible measure of background.

19
Thank goodness for software!!!
  • MAS 5.0 does these calculations for you
  • .CHP file
  • Basic analysis in MAS 5.0, but it wont handle
    replicates
  • Import MAS 5.0 (.CHP) data into other software,
    Genesifter, GCOS, SpotFire, and many others

20
Signal Intensity
  • Following these calculations, the MAS 5.0
    algorithm now has a measure of the signal for
    each probe in a probeset.
  • Other algortihms, ex RMA, GCRMA, dCHIP, PLIER and
    others have been developed by academic teams to
    improve the precision and accuracy of this
    calculation
  • In our Exp we will use RMA and GCRMA

21
How do we want to analyze this data?
  • Pairwise analysis is most appropriate
  • Control vs. DMSO
  • List of genes that are upregulated or
    downregulated
  • Determine fold up or down cutoffs
  • What is significant?
  • 1.5 fold up/down?
  • 2 fold up/down?
  • 10 fold up/down?

22
Normalization - clean data
  • Normalizing data allows comparisons ACROSS
    different chips
  • Intensity of fluorescent markers might be
    different from one batch to the other
  • Normalization allows us to compare those chips
    without altering the interpretation of changes in
    GENE EXPRESSION

23
  • Why Normalize Data?
  • The experimental goal is to identify biological
    variation (expression changes between samples)
  • Technical variation can hide the real data
  • Unavoidable systematic bias should be recognized
    and corrected
  • Normalization is necessary to effectively make
    comparisons
  • between chips-and sometimes within a single chip.
  • There are different methods of normalization the
    assumptions of where variation exist will
    determine the normalization techniques used.
  • Always look at data before and after
    normalization
  • Spike in controls can help show which method may
    be best

24
Caveat
  • There is NO standard way to analyze microarray
    data
  • Still figuring out how to get the best answers
    from microarray experiments
  • Best to combine knowledge of biology, statistics,
    and computers to get answers

25
Venn Diagrams
MAS 5.0
GCRMA
RMA
26
Data processing is completed now what?Fold
change, ANOVA, Data filtering
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
Where are we now?
  • Ran analysis, output is a GENE LIST
  • List indicates what genes are up or down
    regulated
  • p values for t-test
  • Graphs of signal levels
  • Absolute numbers not as important here as the
    trends you see
  • Now what????

36
What is the first set of genes on our chips that
will be filtered out?
37
Follow the links
  • Click on a gene
  • Find links to other databases
  • Follow links to discover what the protein does
  • Now the fun part begins.

38
Back to Biology
  • Do the changes you see in gene expression make
    sense BIOLOGICALLY?
  • If they dont make sense, can you hypothesize as
    to why those genes might be changing?
  • Leads to many, many more experiments

39
The Gene Ontologies
A Common Language for Annotation of Genes from
Yeast, Flies and Mice
and Plants and Worms
and Humans
and anything else!
40
Gene Ontology Objectives
  • GO represents concepts used to classify specific
    parts of our biological knowledge
  • Biological Process
  • Molecular Function
  • Cellular Component
  • GO develops a common language applicable to any
    organism
  • GO terms can be used to annotate gene products
    from any species, allowing comparison of
    information across species

41
Sriniga Srinivasan, Chief Ontologist, Yahoo!
The ontology. Dividing human knowledge into a
clean set of categories is a lot like trying to
figure out where to find that suspenseful black
comedy at your corner video store. Questions
inevitably come up, like are Movies part of Art
or Entertainment? (Yahoo! lists them under the
latter.) -Wired Magazine, May 1996
42
The 3 Gene Ontologies
  • Molecular Function elemental activity/task
  • the tasks performed by individual gene products
    examples are carbohydrate binding and ATPase
    activity
  • Biological Process biological goal or objective
  • broad biological goals, such as mitosis or purine
    metabolism, that are accomplished by ordered
    assemblies of molecular functions
  • Cellular Component location or complex
  • subcellular structures, locations, and
    macromolecular complexes examples include
    nucleus, telomere, and RNA polymerase II
    holoenzyme

43
Example Gene Product hammer
Function (what) Process (why) Drive nail (into
wood) Carpentry Drive stake (into soil)
Gardening Smash roach Pest Control Clowns
juggling object Entertainment
44
Biological Examples
Molecular Function
Biological Process
Cellular Component
45
Validation
  • Not enough to just do microarrays
  • Usually validate microarray results via some
    other technique
  • rt-PCR
  • TaqMan
  • Northern analysis
  • Protein level analysis
  • No technique is perfect

46
Yeast Genome and Data Mining
47
Dynamic Nature of Yeast Genome
eORF essential kORF known hORF homology
identified shORF short tORF transposon
identified qORF questionable dORF disabled
First published sequence claimed 6274 genes a
that has been revised many times, why?
48
6603 4373 1410 820
The Affy detection oligonucleotide sequences are
frozen at the time of synthesis, how does this
impact downstream data analysis?
49
Terms, Definitions, IDs
term MAPKKK cascade (mating sensu
Saccharomyces) goid GO0007244 definition
MAPKKK cascade involved in transduction of mating
pheromone signal, as described in
Saccharomyces definition_reference PMID9561267
50
SGD
51
(No Transcript)
52
(No Transcript)
53
SGD public microarray data sets available for
public query
54
Homework
  1. Go to http//www.yeastgenome.org/ and find 3
    candidate genes of known f(x) and one of
    undefined f(x) that you might predict to be
    altered by DMSO treatment
  2. What GO biological processes and molecular
    mechanisms are associated with your candidate
    genes?
  3. Where, subcellularly does the protein reside in
    the cell?
  4. What other proteins are known or inferred to
    interact with yours? How was this interaction
    determined? Is this a genetic or physical
    interaction?
  5. Find the expression of at least one of your known
    genes in another public ally deposited microarray
    data set?
  6. Name of data set and how you found it?
  7. What is the largest Fold change observed for this
    gene in the public study?
  8. Now that you are microarray technology experts
    can you give me 3 reasons why the observed
    transcript level difference may not be confirmed
    through a second technology like RTQPCR?

55
Suggested Reading
Write a Comment
User Comments (0)
About PowerShow.com