Microarray Data Processing for Affymetrix arrays - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Microarray Data Processing for Affymetrix arrays

Description:

To understand and use some of the tools for exploring and pre ... 'Crop Circles' 'Ring of Fire' Lab 2.1. 38. NUSE Plots. Normalized. Unscaled. Standard. Errors ... – PowerPoint PPT presentation

Number of Views:229
Avg rating:3.0/5.0
Slides: 41
Provided by: stephe78
Category:

less

Transcript and Presenter's Notes

Title: Microarray Data Processing for Affymetrix arrays


1
Microarray Data Processingfor Affymetrix arrays
  • Ben Bolstad
  • Biostatistics
  • University of California, Berkeley
  • www.stat.berkeley.edu/bolstad

2
Goals of this session
  • To understand and use some of the tools for
    exploring and pre-processing Affymetrix data.
  • This session has two parts
  • Theory Discussion of methodology
  • Hands on experimentation with BioC tools

3
Affymetrix GeneChip arrays
  • High density oligonucleotide array technology as
    developed by Affymetrix
  • www.affymetrix.com

Overview images courtesy of Affymetrix unless
otherwise specified
4
Probes and Probesets
5
Two Probe Types
Reference Sequence
  • TAGGTCTGTATGACAGACACAAAGAAGATG
  • CAGACATAGTGTCTGTGTTTCTTCT
  • CAGACATAGTGTGTGTGTTTCTTCT

PM the Perfect Match
MM the Mismatch
6
Constructing the Chip
Source Lipshutz et al (1999) Nature Genetics
Supplement The Chipping Forecast
7
Focusing on a Single GeneChip Cell Location
8
Sample Preparation
9
Hybridization to the Chip
10
The Chip is Scanned
11
Chip dat file checkered board close up pixel
selection
12
Chip cel file checkered board
Courtesy F. Colin
13
Pre-processing Affymetrix Microarrays
  • Take the 500K probe intensities and turn them
    into 15K gene expression measures
  • Computing expression measures
  • Background adjustment
  • Normalization
  • Summarization
  • I will discuss in more detail the steps in the
    RMA algorithm

14
Background/Signal Adjustment
  • A method which does some or all of the following
  • Corrects for background noise, processing effects
  • Adjusts for cross hybridization
  • Adjust estimated expression values to fall on
    proper scale
  • Probe intensities are used in background
    adjustment to compute correction (unlike cDNA
    arrays where area surrounding spot might be used)

15
RMA Background Approach
  • Convolution Model



Observed O
Signal S
Noise N
16
Correction is given by
17
Other background correction methods
  • MAS 5.0
  • Location Specific gridding
  • Subtraction of Mismatch
  • GCRMA
  • uses sequence information to derive a background
    adjustment

18
Normalization
  • Non-biological factors can contribute to the
    variability of data ... In order to reliably
    compare data from multiple probe arrays,
    differences of non-biological origin must be
    minimized.1
  • Normalization is a process of reducing unwanted
    variation across chips. It may use information
    from multiple chips
  • 1 GeneChip 3.1 Expression Analysis Algorithm
    Tutorial, Affymetrix technical support

19
Non-Biological Variability
5 scanners for 6 dilution groups
20
Non-linear normalization needed
A Non-linear Normalization
Unnormalized
Scaled
21
Quantile Normalization
  • Normalize so that the quantiles of each chip are
    equal. Simple and fast algorithm. Goal is to
    give same distribution to each chip.

Target Distribution
Original Distribution
22
Sort columns of original matrix
Take averages across rows
Set average as value for All elements in the row
Unsort columns of matrix to original order
23
It Reduces Variability
Fold change
Expression Values
Also no serious bias effects. For more see
Bolstad et al (2003)
24
Other normalization methods
  • Scaling
  • Non-linear with baseline
  • Cyclic Loess
  • Contrast
  • VSN

25
Summarization
  • Problem Calculating gene expression values.
  • How do we reduce the 11-20 probe intensities for
    each probeset on to a gene expression value?
  • Our Approach
  • RMA a robust multi-chip linear model fit on the
    log scale

26
The RMA Model
  • where
  • is a probe-effect i 1,,I
  • is chip-effect ( is
    log2 gene expression on array j) j1,,J

27
Median Polish Algorithm
Imposes Constraints
Sweep Rows
Sweep Columns
Iterate
28
Other summarization approaches
  • Single chip
  • AvDiff (Affymetrix) no longer recommended for
    use due to many flaws
  • Mas 5.0 (Affymetrix) use a 1 step Tukey
    Biweight to combine the probe intensities in log
    scale
  • Multiple Chip
  • MBEI (Li-Wong dChip) a multiplicative model on
    natural scale

29
(No Transcript)
30
RMA mostly does well in practice
Detecting Differential Expression
Not noisy in low intensities
RMA
MAS 5.0
31
One Drawback
RMA
MAS 5.0
Some fixes for this are being developed see GCRMA
(Irizarry and Wu, JHU)
32
For more comparisons see affycomp
33
Probe Level Modelling
  • Robust regression using M-estimation
  • In this talk, we will use Hubers influence
    function . The software handles many more.
  • Fitting algorithm is IRLS with weights dependent
    on current residuals
  • Software for fitting such models is part of
    affyPLM package of Bioconductor

34
We Will Focus on the Summarization PLM
Array Effect
  • Array effect model
  • With constraint

Pre-processed Log PM intensity
Probe Effect
35
Quality Assessment using PLM
  • PLM quantities useful for assessing chip quality
  • Weights
  • Residuals
  • Standard Errors
  • Expression values relative to median chip

36
Pseudo-chip images
Residuals
Weights
Positive Residuals
Negative Residuals
37
An Image Gallery
Crop Circles
Tricolor
Ring of Fire
http//www.stat.berkeley.edu/bolstad/PLMImageGall
ery/
38
NUSE Plots
  • Normalized
  • Unscaled
  • Standard
  • Errors

39
RLE Plots
Relative Log Expression
40
A word of acknowledgement
Some Slides Terry Speed Francois Colin Rafael
Irizarry
Write a Comment
User Comments (0)
About PowerShow.com