Title: A Routine Approach
1A Routine Approach to Quality Control
Peter Haberl 19. 11. 2001
2Content
The GDE Controller
- Workflow
- Gradients
- Distortions
- Local defects
- Condensing
Playing with negative AvgDiff values
3GDE Controller
Workflow
... is part of the GD ExpressionistTM system
4GDE Controller
Workflow
... extends the conventional data flow
5GDE Controller
Workflow
login
options and thresholds
available chip layouts (.CDF files)
available experiments (.CEL files)
6GDE Controller
Workflow
7GDE Controller
Gradients
Gradients
incomplete washing? thermal effects? ... ?
8GDE Controller
Gradients
Idea) (single chip version)
ln ( counts )
...
ln ( intensity )
) developed in discussions with H. Seidel
(Schering, Berlin)
9GDE Controller
Gradients
all sector histograms after first step
all sector histograms after third step
scale factor a(x,y) in first step
offset b(x,y) in first step
10GDE Controller
Gradients
11GDE Controller
Gradients
Result of Gradient Correction
original
corrected
heat map of the scale factor a(x,y)
12GDE Controller
Gradients
Further example of Gradient Correction
corrected
heat map of the scale factor
original
13GDE Controller
Distortions
Distortions
A log-log plot of coding (i.e. PM and MM)
features can show a nonlinear relationship when
compared to the features of a reference
chip. One of the reasons can be that chips from
different chip lots are combined to a
series. (Again, the reference chip can only be
constructed if enough similar chips are
available.)
14GDE Controller
Distortions
Idea
divide the reference signal region into stripes
containing the same number of points (red
lines) in each stripe, determine the median of
experiment signals (or equivalently the point
of maximum density) force this median line to be
the diagonal of the new point cloud this
determines the (intensity dependent)
transformation
15GDE Controller
Distortions
Result of Distortion Correction
impossible to correct
16GDE Controller
Reference Chip
Both gradient and distortion detection/correction
require the concept of a
Reference chip
normalized set
reference chip
17GDE Controller
Local Defects
Local defects
There are local defects which are already visible
in a global chip view
view of outlier locations
Aim Can we reliably detect smaller local
defects, if possible automatically?
18GDE Controller
Local Defects
Idea
19GDE Controller
Local Defects
actual defects
differential regulation
20GDE Controller
Local Defects
This method can identify defects which would be
hard to find ...
21GDE Controller
Local Defects
... or invisible, even in a zoomed view
22GDE Controller
Local Defects
differential regulation
For old (row-wise spotted) chips, there is the
danger that differen-tially expressed genes are
detected as chip artefacts Application of
pattern search algorithms can solve this problem
23GDE Controller
Local Defects
Further example of a local defect
24GDE Controller
Local Defects
Defects can have a certain spatial extension
25GDE Controller
Local Defects
Most frequent structures
26GDE Controller
Local Defects
... and others
27GDE Controller
Local Defects
- An interactive chip viewer allows to
- view identified mask areas
- zoom and find out which genes
- are affected by masking
- manually edit the masked areas
28GDE Controller
Workflow
reporting
export to database, into analysis software or as
.CEL files
choose between different condensing
algorithms MAS4, MAS5, GeneData ( trimmed mean
of log(PM) )
29Playing with negative AvgDiff values
log-log plot
replicates
large differential expression
correlation of large values is visible
only positive values can be displayed
30Playing with negative AvgDiff values
linear-linear plot
replicates
large differential expression
negative values can be displayed
poor resolution for small values
large values appear scattered
31Playing with negative AvgDiff values
cube-root plot
replicates
display of positive and negative values
damping at large values
zero density regions (artefact)
32Playing with negative AvgDiff values
lin-log transformation
y sign(x)ln( 1 x )
y x
sign(x)ln( 1 x )
y ln(x)
interpolates smoothly between linear (for small
values) and logarithmic (for large values)
behaviour
damping of high values
33Playing with negative AvgDiff values
lin-log plot
replicates
large differential expression
A good choice is x AvgDiff / Target , i.e.
the target intensity sets the scale Lines of
constant factors are shown in blue (2), red (5)
and green (10)
34Playing with negative AvgDiff values
The lin-log plots allow to look at positive and
negative AvgDiff values simultaneously. But why
would we want to look at the negatives at all?
Consider the following experiment Construct
faked .CEL files, where all PM-MM-pairs are
interchanged, and condense them with the old
Affymetrix algorithm (ignoring AbsCall).
Amusing observation If one ignores that the
scale factor
gets negative, (MAS doesnt Failed to
analyze due to invalid Scale Factor) the old
(MAS4) algorithm would be invariant under PM
MM !
35Playing with negative AvgDiff values
Original data the three-tissue-dataset 3
groups with 6 replicates each
perfect group separation
within replicate groups
across replicate groups
36Playing with negative AvgDiff values
PM MM data These are log-log-plots of
negative AvgDiffs. The good correlation at high
values indicates that these numbers
are reproducible. The difference
between replica groups is not so obvious, but
...
37Playing with negative AvgDiff values
... clustering again results in a complete group
separation Take-home message The
mismatches carry information which can
be measured reproducibly and can be used (at
least) for pattern comparisons.