HWW Gene Expression Experiments: How Why Whats the problem - PowerPoint PPT Presentation

About This Presentation
Title:

HWW Gene Expression Experiments: How Why Whats the problem

Description:

The principle: have two denatured DNA strands bond together, ... Syringe-solenoid ink-jet dispenser. Contact (using rigid pin tools, similar to filter array) ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 66
Provided by: yoseph3
Category:

less

Transcript and Presenter's Notes

Title: HWW Gene Expression Experiments: How Why Whats the problem


1
HWW Gene Expression Experiments
How?Why?Whats the problem?
2
High Throughput Experiments
FunctionalGenomics
Bioinformatics
3
DNA Hybridization
  • The principle have two denatured DNA strands
    bond together, then check double strand amount
    (florescent dye, radioactive label)
  • Traditional Southern/Northern/Western Blot
  • The great advance micro array DNA chips
    automation, material eng., computer aided
    (including algorithmic solutions)

4
History
  • cDNA microarrays have evolved from Southern
    blots, with clone libraries gridded out on nylon
    membrane filters being an important and still
    widely used intermediate. Things took off with
    the introduction of non-porous solid supports,
    such as glass - these permitted miniaturization -
    and fluorescence based detection. Currently,
    about 20,000 cDNAs can be spotted onto a
    microscope slide. The other, Affymetrix
    technology can produce arrays of 100,000
    oligonucleotides on a silicon chip.

5
THE PROCESS
Building the Chip
PCR PURIFICATION and PREPARATION
MASSIVE PCR
PREPARING SLIDES
PRINTING
Preparing RNA
Hybing the Chip
CELL CULTURE AND HARVEST
POST PROCESSING
ARRAY HYBRIDIZATION
RNA ISOLATION
DATA ANALYSIS
PROBE LABELING
cDNA PRODUCTION
6
Building the Chip
PCR PURIFICATION and PREPARATION
MASSIVE PCR
Full yeast genome 6,500 reactions
IPA precipitation EtOH washes 384-well format
PRINTING
The arrayer high precision spotting device
capable of printing 10,000 products in 14 hrs,
with a plate change every 25 mins
PREPARING SLIDES
Polylysine coating for adhering PCR products to
glass slides
POST PROCESSING
Chemically converting the positive polylysine
surface to prevent non-specific hybridization
7
Preparing RNA
CELL CULTURE AND HARVEST
Designing experiments to profile
conditions/perturbations/ mutations and carefully
controlled growth conditions
RNA ISOLATION
RNA yield and purity are determined by system.
PolyA isolation is preferable but total RNA is
useable. Two RNA samples are hybridized/chip.
cDNA PRODUCTION
Single strand synthesis or amplification of RNA
can be performed. cDNA production includes
incorporation of Aminoallyl-dUTP.
8
Hybing the Chip
ARRAY HYBRIDIZATION
Cy3 and Cy5 RNA samples are simultaneously
hybridized to chip. Hybs are performed for 5-12
hours and then chips are washed.
DATA ANALYSIS
Ratio measurements are determined via
quantification of 532 nm and 635 nm emission
values. Data are uploaded to the appropriate
database where statistical and other analyses can
then be performed.
PROBE LABELING
Two RNA samples are labelled with Cy3 or Cy5
monofunctional dyes via a chemical coupling to
AA-dUTP. Samples are purified using a PCR
cleanup kit.
9
Printing Microarrays
  • Print Head
  • Plate Handling
  • XYZ positioning
  • Repeatability Accuracy
  • Resolution
  • Environmental Control
  • Humidity
  • Dust
  • Instrument Control
  • Sample Tracking Software

10
Ngai Lab arrayer , UC Berkeley
11
Microarray Gridder
12
Printing Approaches
  • Non - Contact
  • Piezoelectric dispenser
  • Syringe-solenoid ink-jet dispenser
  • Contact (using rigid pin tools, similar to filter
    array)
  • Tweezer
  • Split pin
  • Micro spotting pin

13
Micro Spotting pin
14
(No Transcript)
15
Practical Problems
  • Surface chemistry uneven surface may lead to
    high background.
  • Dipping the pin into large volume -gt pre-printing
    to drain off excess sample.
  • Spot variation can be due to mechanical
    difference between pins. Pins could be clogged
    during the printing process.
  • Spot size and density depends on surface and
    solution properties.
  • Pins need good washing between samples to prevent
    sample carryover.

16
Post Processing Arrays
  • Protocol for Post Processing Microarrays
  • Hydration/Heat Fixing
  • 1. Pick out about 20-30 slides to be processed.
  • 2. Determine the correct orientation of slide,
    and if necessary, etch label on lower left corner
    of array side
  • 3. On back of slide, etch two lines above and
    below center of array to designate array area
    after processing
  • 4. Pour 100 ml 1X SSC into hydration tray and
    warm on slide warmer at medium setting
  • 5. Set slide array side down and observe spots
    until proper hydration is achieved.
  • 6. Upon reaching proper hydration, immediately
    snap dry slide
  • 7. Place slides in rack.

17
Practical Problems 1
  • Comet Tails
  • Likely caused by insufficiently rapid immersion
    of the slides in the succinic anhydride blocking
    solution.

18
Practical Problems 2
19
Practical Problems 3
  • High Background
  • 2 likely causes
  • Insufficient blocking.
  • Precipitation of the labeled probe.
  • Weak Signals

20
Practical Problems 4
Spot overlap Likely cause too much
rehydration during post - processing.
21
Practical Problems 5
Dust
22
Steps in Images Processing
1. Addressing locate centers
2. Segmentation classification of pixels either
as signal or background. using seeded region
growing).
3. Information extraction for each spot of the
array, calculates signal intensity pairs,
background and quality measures.
23
Steps in Image Processing
3. Information Extraction
  • Spot Intensities
  • mean (pixel intensities).
  • median (pixel intensities).
  • Pixel variation (IQR of log (pixel intensities).
  • Background values
  • Local
  • Morphological opening
  • Constant (global)
  • None
  • Quality Information

Signal
Background
24
Addressing
  • This is the process of assigning coordinates
    to each of the spots.
  • Automating this part of the procedure permits
    high throughput analysis.

4 by 4 grids 19 by 21 spots per grid
25
Addressing
  • Registration

Registration
26
Problems in automatic addressing
  • Misregistration of the red and green channels
  • Rotation of the array in the image
  • Skew in the array

Rotation
27
Segmentation methods
  • Fixed circles
  • Adaptive Circle
  • Adaptive Shape
  • Edge detection.
  • Seeded Region Growing. (R. Adams and L. Bishof
    (1994) Regions grow outwards from the seed
    points preferentially according to the difference
    between a pixels value and the running mean of
    values in an adjoining region.
  • Histogram Methods
  • Adaptive threshold.

28
Examples of algorithms and software implementation
29
Limitation of fixed circle method
SRG
Fixed Circle
30
Limitation of circular segmentation
  • Small spot
  • Not circular

Results from SRG
31
Information Extraction
  • Spot Intensities
  • mean (pixel intensities).
  • median (pixel intensities).
  • Background values
  • Local
  • Morphological opening
  • Constant (global)
  • None
  • Quality Information

Take the average
32
Local Backgrounds
33
Summary of analysis possibilities
  • Determine genes which are differentially
    expressed (this task can take many forms
    depending on replication, etc)
  • Connect differentially expressed genes to
    sequence databases and perhaps carry out further
    analyses, e.g. searching for common upstream
    motifs
  • Overlay differentially expressed genes on pathway
    diagrams
  • Relate expression levels to other information on
    cells, e.g. known tumour types
  • Define subclasses (clusters) in sets of samples
    (e.g. tumours)
  • Identify temporal or spatial trends in gene
    expression
  • Seek roles for genes on the basis of patterns of
    co-expression
  • ..much more
  • Many challenges transcriptional regulation
    involves redundancy, feedback, amplification, ..
    non-linearity

34
Biological Question
Data Analysis Modeling
Sample preparation
Microarray Life Cycle
MicroarrayDetection
Microarray Reaction
Taken from Schena Davis
35
Oligonucleotide Arrays
36
Schadt et al., Journal of Cellular Biochemistry,
2000
37
Oligonucleotide Arrays Tech.
  • 20 probes per gene, 25bases each
  • Probe size 24x24 micron (contain 106 copies of
    the probe)
  • Probe is either a Perfect Match (PP) or a Miss
    Match (MM)
  • MM
  • usually at the center of the probe
  • Aim to give estimate on the random hybrd.

38
Motivation
  • Data is noisy, missing values.
  • Each array is scanned separately, in different
    settings
  • ? To extract biological meaningful results we
    need
  • Good expression estimations
  • Scale/Normalize across arrays

39
What we need
  • Image segmentation
  • Background/Gradient correction
  • Artifact detection
  • Allow array to array comparison (scale/normalize)
  • Assess gene presence (quantitative Measure)
  • Find differentially expressed genes

40
Why isnt Normalization Easy?
  • No ability to read mRNA level directly
  • Various noise factors ? hard to model exactly.
  • Variable biological settings, experiment
    dependent.
  • Need to differentiate between changes caused by
    biological signal from noise artifacts.

41
Variability Sources
  • Real Biology
  • Biological noise
  • Biological Signal
  • Sample preparation related
  • Technical dependent

42
dChip MBEI
  • Based on several papers by Li Wong (PNAS, 2001
    vol 98 no.1 and others)
  • Implemented on their freely available dChip
    software
  • Model based The estimation is based on a model
    of how the probe intensity values respond to
    changes of the expression levels of the gene

43
dChip Model
i is the array indexj is the probe index
is the baseline response of the probe due to non
specific hybridization
is the rate of increase of the MM response
is the additional rate of increase of the PM
response
44
dChip Reduced Model
Basic idea Least square parameter estimation,
iteratively fitting and
45
dChip Reduced Model
For one array, assume that the set has
been learned from a large number of arrays, and
therefore known and fixed Given this set, the
linear least square estimate for theta is
An approx. Std. can be computed for this
estimator
46
dChip Reduced Model
  • Similarly, we regard the set as known, and
    compute std. for each phi
  • We use these estimated Std. to find outlier and
    exclude them from the computation

47
Dchip Array outliers detection
48
Dchip Probe outliers detection
49
Normalization/Scaling
  • We saw how to get MBEI from dchip, i.e measure
    quantitation
  • We still need to scale the different arrays
  • Arrays usually differ in overall image brightness
    (differ in time, place, exper. Cond.)
  • This is usually done PRIOR to the measure
    quantitation manipulations (as dChips MBEI we
    just described).

50
Global Normalization/Scaling
  • Suppose we have two arrays X,Y with values x1xM
    and y1 .. yM
  • Global normalization (MAS 5) find the constant
    a such that
  • Which means
  • When we have multiple arrays then we choose Y to
    be the avg. of all arrays or compute a such that
    sum_i (x_i) constant

Better way a(x) i.e adopt the fit parameter as a
function of expression level ( as by dChip)
51
dChip Normalization/Scaling
  • Big question Which gene to use for this
    scaling??
  • There are various ways to choose the set
  • House keeping genes (Affy. chips)
  • Spiked controls added in various stages of the
    experiment, in a range of concentrations
  • Both of the above are very good in theory but
    (still) not in practice (esp. in Affy chips)
  • The result several approaches suggested on how
    to use the set of genes tested in the experiments
  • Well review dChips solution The Invariant set

52
dChip Invariant Set
  • Main idea
  • Initialize set of probes P all probes
  • Order the probes in both arrays by their
    expression values
  • Give each probe in each array an index according
    to its relative expression order
  • Find a set of probes P whos relative order is
    similar in both arrays
  • Set P P and iterate from stage (2) until
    convergence
  • Use the resulting P to compute a piecewise linear
    running median line as the normalization curve

53
(No Transcript)
54
(No Transcript)
55
Normalization Tools Current State
  • Commonly Used
  • RMA by Speed Lab
  • dChip by Li Wong
  • GeneChip MAS5 (Affy. built in tool)
  • The Future
  • New Chip design (both Affy. And cDNA) with better
    probes, better built in controls etc.
  • New algorithms facilitating probes GC content
    (gcRMA), location etc.
  • New MAS tool (this year ?) is also supposed to
    incorporate RMA,dChip etc.

56
How to Measure Performance?
  • Theoretical Validation use some theoretical
    assumptions and evaluate Statistical
    characteristics of the method at hand.
  • Experimental Validation
  • Use public data sets to measure different aspects
    of performance
  • Evaluate relevant characteristics on your data
    set. Design your data set accordingly (if
    possible)

57
A Benchmark for Affy. Expression Measures
  • Main Idea Define a universal test set test
    statistics
  • Based on 3 publicly available spike in data sets
  • Tests for
  • Variability across replicate arrays
  • Response of GE measures to change in abundance of
    RNA
  • Sensitivity of fold change measures to amount of
    actual RNA sample
  • Accuracy of fold change as a measure of relative
    expression
  • Usefulness of raw fold change score to detect
    differential expressed genes

Cope et al. Bioinformatics, 03 (Speeds Lab)
58
MA Plot
M1 X1 X2A (X1 X2)/ 2 Where Xi is the
log2 of expression measure
59
Variance across replicates plot
Test Statistics 1. Median std. 2. Avg. R2
(squared corr. coef.) between two replicates
60
Observed Expression vs. Nominal Expression Plots
Test Statistics Fit a linear curve and
compute1. linear fit slope (should be 1) 2. R2
to the linear fit
61
ROC Curves
  • One of the chief uses of GE arrays is to identify
    differentially expressed genes
  • ROC ( Receiver Operator Characteristic)A
    graphical representation of both Sens. and Spec.
    as a function of threshold value
  • X axis TPR (Sens.)
  • Y axis FPR (1-Spec.)
  • In this case Use fold change as the score,
    knowing which probes are spiked or not..

62
FC ROC Plots
Here actual TP, FP numbers are used for the
axes Test Statistic AUC (area under the graph)
63
FC ROC Plots
Same as before, but only for FC 2 cases (harder)
64
The Benchmark Bottom Line
  • 15 parameters used to test performace
  • 3 synthetic spike in data sets
  • Automatic submission and evaluation tool
    comparative results atwww.biostat.jhsph.edu

65
Other Tests
  • Evaluate separately normalization and expression
    measures techniques ( as by Huffman et al.,
    Genome Biology, Vol. 3, 2002)
  • How do we evaluate performance on our own, very
    specific, data??? ( hint see next class..)
Write a Comment
User Comments (0)
About PowerShow.com