HWW Gene Expression Experiments: How Why Whats the problem - PowerPoint PPT Presentation

About This Presentation

Title:

HWW Gene Expression Experiments: How Why Whats the problem

Description:

The principle: have two denatured DNA strands bond together, ... Syringe-solenoid ink-jet dispenser. Contact (using rigid pin tools, similar to filter array) ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 66

Provided by: yoseph3

Category:

more less

Transcript and Presenter's Notes

Title: HWW Gene Expression Experiments: How Why Whats the problem

1
HWW Gene Expression Experiments
How?Why?Whats the problem?
2
High Throughput Experiments
FunctionalGenomics
Bioinformatics
3
DNA Hybridization

The principle have two denatured DNA strands
bond together, then check double strand amount
(florescent dye, radioactive label)
Traditional Southern/Northern/Western Blot
The great advance micro array DNA chips
automation, material eng., computer aided
(including algorithmic solutions)

4
History

cDNA microarrays have evolved from Southern
blots, with clone libraries gridded out on nylon
membrane filters being an important and still
widely used intermediate. Things took off with
the introduction of non-porous solid supports,
such as glass - these permitted miniaturization -
and fluorescence based detection. Currently,
about 20,000 cDNAs can be spotted onto a
microscope slide. The other, Affymetrix
technology can produce arrays of 100,000
oligonucleotides on a silicon chip.

5
THE PROCESS
Building the Chip
PCR PURIFICATION and PREPARATION
MASSIVE PCR
PREPARING SLIDES
PRINTING
Preparing RNA
Hybing the Chip
CELL CULTURE AND HARVEST
POST PROCESSING
ARRAY HYBRIDIZATION
RNA ISOLATION
DATA ANALYSIS
PROBE LABELING
cDNA PRODUCTION
6
Building the Chip
PCR PURIFICATION and PREPARATION
MASSIVE PCR
Full yeast genome 6,500 reactions
IPA precipitation EtOH washes 384-well format
PRINTING
The arrayer high precision spotting device
capable of printing 10,000 products in 14 hrs,
with a plate change every 25 mins
PREPARING SLIDES
Polylysine coating for adhering PCR products to
glass slides
POST PROCESSING
Chemically converting the positive polylysine
surface to prevent non-specific hybridization
7
Preparing RNA
CELL CULTURE AND HARVEST
Designing experiments to profile
conditions/perturbations/ mutations and carefully
controlled growth conditions
RNA ISOLATION
RNA yield and purity are determined by system.
PolyA isolation is preferable but total RNA is
useable. Two RNA samples are hybridized/chip.
cDNA PRODUCTION
Single strand synthesis or amplification of RNA
can be performed. cDNA production includes
incorporation of Aminoallyl-dUTP.
8
Hybing the Chip
ARRAY HYBRIDIZATION
Cy3 and Cy5 RNA samples are simultaneously
hybridized to chip. Hybs are performed for 5-12
hours and then chips are washed.
DATA ANALYSIS
Ratio measurements are determined via
quantification of 532 nm and 635 nm emission
values. Data are uploaded to the appropriate
database where statistical and other analyses can
then be performed.
PROBE LABELING
Two RNA samples are labelled with Cy3 or Cy5
monofunctional dyes via a chemical coupling to
AA-dUTP. Samples are purified using a PCR
cleanup kit.
9
Printing Microarrays

Print Head
Plate Handling
XYZ positioning
Repeatability Accuracy
Resolution
Environmental Control
Humidity
Dust
Instrument Control
Sample Tracking Software

10
Ngai Lab arrayer , UC Berkeley
11
Microarray Gridder
12
Printing Approaches

Non - Contact
Piezoelectric dispenser
Syringe-solenoid ink-jet dispenser
Contact (using rigid pin tools, similar to filter
array)
Tweezer
Split pin
Micro spotting pin

13
Micro Spotting pin
14
(No Transcript)
15
Practical Problems

Surface chemistry uneven surface may lead to
high background.
Dipping the pin into large volume -gt pre-printing
to drain off excess sample.
Spot variation can be due to mechanical
difference between pins. Pins could be clogged
during the printing process.
Spot size and density depends on surface and
solution properties.
Pins need good washing between samples to prevent
sample carryover.

16
Post Processing Arrays

Protocol for Post Processing Microarrays
Hydration/Heat Fixing
1. Pick out about 20-30 slides to be processed.
2. Determine the correct orientation of slide,
and if necessary, etch label on lower left corner
of array side
3. On back of slide, etch two lines above and
below center of array to designate array area
after processing
4. Pour 100 ml 1X SSC into hydration tray and
warm on slide warmer at medium setting
5. Set slide array side down and observe spots
until proper hydration is achieved.
6. Upon reaching proper hydration, immediately
snap dry slide
7. Place slides in rack.

17
Practical Problems 1

Comet Tails
Likely caused by insufficiently rapid immersion
of the slides in the succinic anhydride blocking
solution.

18
Practical Problems 2
19
Practical Problems 3

High Background
2 likely causes
Insufficient blocking.
Precipitation of the labeled probe.
Weak Signals

20
Practical Problems 4
Spot overlap Likely cause too much
rehydration during post - processing.
21
Practical Problems 5
Dust
22
Steps in Images Processing
1. Addressing locate centers
2. Segmentation classification of pixels either
as signal or background. using seeded region
growing).
3. Information extraction for each spot of the
array, calculates signal intensity pairs,
background and quality measures.
23
Steps in Image Processing
3. Information Extraction

Spot Intensities
mean (pixel intensities).
median (pixel intensities).
Pixel variation (IQR of log (pixel intensities).
Background values
Local
Morphological opening
Constant (global)
None
Quality Information

Signal
Background
24
Addressing

This is the process of assigning coordinates
to each of the spots.
Automating this part of the procedure permits
high throughput analysis.

4 by 4 grids 19 by 21 spots per grid
25
Addressing

Registration

Registration
26
Problems in automatic addressing

Misregistration of the red and green channels
Rotation of the array in the image
Skew in the array

Rotation
27
Segmentation methods

Fixed circles
Adaptive Circle
Adaptive Shape
Edge detection.
Seeded Region Growing. (R. Adams and L. Bishof
(1994) Regions grow outwards from the seed
points preferentially according to the difference
between a pixels value and the running mean of
values in an adjoining region.
Histogram Methods
Adaptive threshold.

28
Examples of algorithms and software implementation
29
Limitation of fixed circle method
SRG
Fixed Circle
30
Limitation of circular segmentation

Small spot
Not circular

Results from SRG
31
Information Extraction

Spot Intensities
mean (pixel intensities).
median (pixel intensities).
Background values
Local
Morphological opening
Constant (global)
None
Quality Information

Take the average
32
Local Backgrounds
33
Summary of analysis possibilities

Determine genes which are differentially
expressed (this task can take many forms
depending on replication, etc)
Connect differentially expressed genes to
sequence databases and perhaps carry out further
analyses, e.g. searching for common upstream
motifs
Overlay differentially expressed genes on pathway
diagrams
Relate expression levels to other information on
cells, e.g. known tumour types
Define subclasses (clusters) in sets of samples
(e.g. tumours)
Identify temporal or spatial trends in gene
expression
Seek roles for genes on the basis of patterns of
co-expression
..much more
Many challenges transcriptional regulation
involves redundancy, feedback, amplification, ..
non-linearity

34
Biological Question
Data Analysis Modeling
Sample preparation
Microarray Life Cycle
MicroarrayDetection
Microarray Reaction
Taken from Schena Davis
35
Oligonucleotide Arrays
36
Schadt et al., Journal of Cellular Biochemistry,
2000
37
Oligonucleotide Arrays Tech.

20 probes per gene, 25bases each
Probe size 24x24 micron (contain 106 copies of
the probe)
Probe is either a Perfect Match (PP) or a Miss
Match (MM)
MM
usually at the center of the probe
Aim to give estimate on the random hybrd.

38
Motivation

Data is noisy, missing values.
Each array is scanned separately, in different
settings
? To extract biological meaningful results we
need

Good expression estimations

Scale/Normalize across arrays

39
What we need

Image segmentation
Background/Gradient correction
Artifact detection
Allow array to array comparison (scale/normalize)
Assess gene presence (quantitative Measure)
Find differentially expressed genes

40
Why isnt Normalization Easy?

No ability to read mRNA level directly

Various noise factors ? hard to model exactly.

Variable biological settings, experiment
dependent.

Need to differentiate between changes caused by
biological signal from noise artifacts.

41
Variability Sources

Real Biology
Biological noise
Biological Signal
Sample preparation related
Technical dependent

42
dChip MBEI

Based on several papers by Li Wong (PNAS, 2001
vol 98 no.1 and others)
Implemented on their freely available dChip
software
Model based The estimation is based on a model
of how the probe intensity values respond to
changes of the expression levels of the gene

43
dChip Model
i is the array indexj is the probe index
is the baseline response of the probe due to non
specific hybridization
is the rate of increase of the MM response
is the additional rate of increase of the PM
response
44
dChip Reduced Model
Basic idea Least square parameter estimation,
iteratively fitting and
45
dChip Reduced Model
For one array, assume that the set has
been learned from a large number of arrays, and
therefore known and fixed Given this set, the
linear least square estimate for theta is
An approx. Std. can be computed for this
estimator
46
dChip Reduced Model

Similarly, we regard the set as known, and
compute std. for each phi
We use these estimated Std. to find outlier and
exclude them from the computation

47
Dchip Array outliers detection
48
Dchip Probe outliers detection
49
Normalization/Scaling

We saw how to get MBEI from dchip, i.e measure
quantitation
We still need to scale the different arrays
Arrays usually differ in overall image brightness
(differ in time, place, exper. Cond.)
This is usually done PRIOR to the measure
quantitation manipulations (as dChips MBEI we
just described).

50
Global Normalization/Scaling

Suppose we have two arrays X,Y with values x1xM
and y1 .. yM
Global normalization (MAS 5) find the constant
a such that
Which means
When we have multiple arrays then we choose Y to
be the avg. of all arrays or compute a such that
sum_i (x_i) constant

Better way a(x) i.e adopt the fit parameter as a
function of expression level ( as by dChip)
51
dChip Normalization/Scaling

Big question Which gene to use for this
scaling??
There are various ways to choose the set
House keeping genes (Affy. chips)
Spiked controls added in various stages of the
experiment, in a range of concentrations
Both of the above are very good in theory but
(still) not in practice (esp. in Affy chips)
The result several approaches suggested on how
to use the set of genes tested in the experiments
Well review dChips solution The Invariant set

52
dChip Invariant Set

Main idea
Initialize set of probes P all probes
Order the probes in both arrays by their
expression values
Give each probe in each array an index according
to its relative expression order
Find a set of probes P whos relative order is
similar in both arrays
Set P P and iterate from stage (2) until
convergence
Use the resulting P to compute a piecewise linear
running median line as the normalization curve

53
(No Transcript)
54
(No Transcript)
55
Normalization Tools Current State

Commonly Used
RMA by Speed Lab
dChip by Li Wong
GeneChip MAS5 (Affy. built in tool)
The Future
New Chip design (both Affy. And cDNA) with better
probes, better built in controls etc.
New algorithms facilitating probes GC content
(gcRMA), location etc.
New MAS tool (this year ?) is also supposed to
incorporate RMA,dChip etc.

56
How to Measure Performance?

Theoretical Validation use some theoretical
assumptions and evaluate Statistical
characteristics of the method at hand.
Experimental Validation
Use public data sets to measure different aspects
of performance
Evaluate relevant characteristics on your data
set. Design your data set accordingly (if
possible)

57
A Benchmark for Affy. Expression Measures

Main Idea Define a universal test set test
statistics
Based on 3 publicly available spike in data sets
Tests for
Variability across replicate arrays
Response of GE measures to change in abundance of
RNA
Sensitivity of fold change measures to amount of
actual RNA sample
Accuracy of fold change as a measure of relative
expression
Usefulness of raw fold change score to detect
differential expressed genes

Cope et al. Bioinformatics, 03 (Speeds Lab)
58
MA Plot
M1 X1 X2A (X1 X2)/ 2 Where Xi is the
log2 of expression measure
59
Variance across replicates plot
Test Statistics 1. Median std. 2. Avg. R2
(squared corr. coef.) between two replicates
60
Observed Expression vs. Nominal Expression Plots
Test Statistics Fit a linear curve and
compute1. linear fit slope (should be 1) 2. R2
to the linear fit
61
ROC Curves

One of the chief uses of GE arrays is to identify
differentially expressed genes
ROC ( Receiver Operator Characteristic)A
graphical representation of both Sens. and Spec.
as a function of threshold value
X axis TPR (Sens.)
Y axis FPR (1-Spec.)
In this case Use fold change as the score,
knowing which probes are spiked or not..

62
FC ROC Plots
Here actual TP, FP numbers are used for the
axes Test Statistic AUC (area under the graph)
63
FC ROC Plots
Same as before, but only for FC 2 cases (harder)
64
The Benchmark Bottom Line

15 parameters used to test performace
3 synthetic spike in data sets
Automatic submission and evaluation tool
comparative results atwww.biostat.jhsph.edu

65
Other Tests

Evaluate separately normalization and expression
measures techniques ( as by Huffman et al.,
Genome Biology, Vol. 3, 2002)
How do we evaluate performance on our own, very
specific, data??? ( hint see next class..)

Write a Comment

User Comments (0)