Title: MICROARRAYS and FUNCTIONAL GENOMICS
1MICROARRAYSand FUNCTIONAL GENOMICS
SAMIR
ELLEN
HSUAN
MIKE
2Human Genome Project
- By Dept of Energy and NIH and
- Celera Genomics
- Identified 30-40K protein coding
- genes
- Thousands of genes and their
- activation levels -- observed by
- microarrays
3Functional Genomics
- Goal
- Only 50 of genes have known functions.
- Interplay between sequence protein
- environment.
- Helps identify groups of genes and that
- characterize a particular class of tumor by
comparing diff. in gene expression of - Normal and Tumor cell.
4Biological Concepts
- What is a Gene?
- What is Gene Expression (GE)?
- Gene Expression level quantitative
- description of Gene Expression by
- measuring intermediary molecules
- produced during protein synthesis
5Central Dogma
- The expression of the genetic information stored
in the DNA molecule occurs in two stages
6Biology
- In simple words,
- GE involves copying DNA code into
- mRNA molecule. Thus, a measure of gene
expression level is the abundance of mRNA
produced.
7Techniques to measure GE
- 1) Northern and Southern Blots
- Developed in 1977
- Identify and locate mRNA and DNA
- sequences that are complementary
- to a segment of DNA
- Small number of genes examined at
- a time.
8Techniques to measure GE (cont.)
- 2) SAGE (Serial Analysis of Gene Expression)
- Velculescu et al. (1995)
- Short stretches of DNA uniquely identify
- the genes expressed in a particular cell.
- These are used to mark transcripts and
- the no. of transcripts by each gene
- determines the measure of GE.
- Slow
9Techniques to measure GE
- 3) Microarrays
- Developed in 1995
- Simultaneous measurement of relative
- expression level of a large number of genes.
- Experimentation in parallel
- Two important concepts
- a) Reverse Transcription
- b) Hybridization
10How Microarrays Work
- Reverse Transcription
- Hybridization
- Heat.
- Strands melt at 65 C.
- Cool down.
11Hybridization
12Microarray Terminology
- Oligonucleotides (oligos)
- Probes
- Target
- Dyes
13Microarray Types
- cDNA Technology Microarray (Stanford
- Schena et al, 1995)
- Synthetic Oligo-nucleotide Microarray (Affymetrix)
14cDNA Microarrays
15A flow chart of the process
16RGB Overlay of Cy3 and Cy5
17 Q1 Who invented the first cDNA microarray?
18Who invented the first cDNA microarray?
- Quantitative monitoring of gene expression
patterns with a complementary DNA microarray
(Schena et al., Science, 1995) - 45 genes of Arabidopsis were analyzed, and 2
color fluorescence hybridization was used then.
19Q2 When should I use cDNA microarrays for my
experiment?
- http//www.ncbi.nlm.nih.gov/About/primer/microarra
ys.html
20When should I use cDNA microarrays for my
experiment?
21Example 1 search for predictive markers of
papillary thyroid carcinoma
- Yano et al., Clin Cancer Res. 2004 Mar
1510(6)2035-43. - Gene expression profiles for PTC tissue, normal
thyroid tissue, and healthy peripheral blood
cells were compared by use of a human 4000-gene
cDNA microarray. - Protein expressions of the up-regulated genes in
PTC were examined in thyroid tissues by
immunohistochemistry.
22Example 1 (cont.)
- Basic fibroblast growth factor, which has been
identified as a biomarker for PTC, was
overexpressed in 54 of PTC cases, 67 of
follicular thyroid carcinomas, and 36 of benign
thyroid neoplasms. - Platelet-derived growth factor was overexpressed
in 81 of PTC cases and 100 of follicular
carcinomas, but was immunonegative in normal
thyroid tissues and benign thyroid neoplasms.
23Example 2 analysis of endothelial cells in
response to green tea
- Satippour et al., Int J Oncol. 2004
Jul25(1)193-202. - HUVEC (human umbilical vein endothelial cells)
were exposed in vitro to green tea for either 6
or 48 h. - Result a global down-regulation of multiple
genes involved in - Endothelial cell growth,
- Signal transduction,
- Oxidation,
- Up-regulation of several apoptotic genes.
24Q3 What probes are available?
- Actually you can put whatever you want!
25Probe selection
- For organisms whose genomes have been completely
sequenced, it is possible to array genomic DNA
from every known gene or suspected open reading
frame (ORF) in the organism. - The Pat Brown Lab has arrayed all known or
suspected genes of S. cerevisciae (roughly 6100)
on a single microarray. - Amplify clone inserts from human cDNA libraries.
- Available arrays in U. Michigan
- http//www.umich.edu/caparray/arrays.html
- Available commercialized cDNA clones
- http//www.openbiosystems.com/incyte_cdna_and_est_
clones.php
26Probe Selection
- Usually selected from databases such as GenBank,
dbEST, and UniGene. - You can choose either full-length cDNA, or
partially sequenced cDNAs (or ESTs). - To avoid redundancy, UniGene is preferred.
- UniGene is an experimental system for
automatically partitioning GenBank sequences into
a non-redundant set of gene-oriented clusters.
Each UniGene cluster contains sequences that
represent a unique gene, as well as related
information such as the tissue types in which the
gene has been expressed and map location.
27Q4 How much will it cost to use cDNA microarrays
in my experiment?
- Using cDNAs is Relatively cheap!
28How much do cDNA microarrays cost?
- About several hundred dollars per slide, but
cheaper than Affymetrix arrays. - Pricing in NINDS NIMH Microarray Consortium
- http//arrayconsortium.tgen.org/np2/public/service
sPricing.jsp - Pricing in U. Miami microarray core facility
- http//www.biomed.miami.edu/arrays/services_ma_pri
cing.html
29Q5 How small can my samples be?
- Often the case is you need more sample than you
think.
30How small can my samples be?
- According to a 1999 paper in Nature, 50 to 200
ug of total RNA is required for one slide.
31How small can my samples be?
- mRNA accounts for only about 3 of all RNA in a
cell. Usually, 10 ug of total RNA is the minimum
requirement. - (Quantities as small as 1ug can be amplified
first, but its unreasonable to expect perfectly
uniform amplification.) - The requirement in U. Miami
- Tissue 100 mgCell Culture 5 million
cellsTotal RNA 100 ug or mRNA 400 ng - http//www.biomed.miami.edu/arrays/faq.html3
- The amount of clinical sample is often very
small. - Tumor tissue may not be homogeneous.
- (These are my own experiences)
32Q6 Where does the bias come from?
- Biological variation
- Non-biological sources of variation
- Labeling efficiencies
- Dye effects (checked by dye swapping)
- Emission intensity is proportional to RNA
concentration (by calibration) - Hybridization (Cross-hybridization)
- Within array spot size, global background, local
background - Between arrays spot run, PCR batch
- The amount of RNA from the 2 samples are assumed
to be equal. (Difficult to check!)
33Q7 How are cDNA microarray experiments designed?
- A three layer experimental design proposed by
Churchill
34Churchill, Nature, 2002
35Conclusion of the features
- Low specificity, high sensitivity.
- Inter-experiment comparisons can be difficult,
unless done on exactly same slide print batch. - Can have up to 50,000 cDNA per slide.
- Highly customizable.
- Dye swap doubles the cost of each data point.
- Often some drop out of spots due to printing
process and not all clones sequence-verified,
causing tracking issues.
36Useful web resources
- Microarray data analysis
- http//www.statsci.org/micrarra/
- The Stanford microarray resources
- http//genome-www5.stanford.edu/resources/
- A flash animation
- http//www.bio.davidson.edu/courses/genomics/chip/
chip.html - Comparison of features of different types of
arrays - http//arrayconsortium.tgen.org/np2/public/arrayPl
atforms.jsp
37Oligonucleotide Microarrays
38Oligonucleotide Microarrays
- Oligo refers to short, synthetic cDNA sequences
- Sequences are placed on silicon chips using
photolithographic technology (same as
microprocessor fabrication) - Laser scanner produces image
- SINGLE fluroescence
39Manufacturers
- Patented Affymetrix GeneChip
- Agilent Printed Microarrays
- Amersham Codelink System
- Illumina, Nimblegen
http//www.chem.agilent.com/Scripts/PDS.asp?lPage
3071
40Affymetrix GeneChip
- Premanufactured
- Range in Size
- Lower 17000 (Murine Genome U74)
- Higher 33000 (Human Genome U133)
- Principle using a set of gene fragments
- uniquely identifies specific gene
- reduces the chance of random cross-hybridization
41GeneChip details
- 25-mers 25 bases per oligonucleotide probe
- Probe set each gene represented by probe
pairs (11-20 per gene) - Probe pair
- PM Perfect match probe
- MM Mismatch probe (13th homomeric base)
- Non specific binding
- Background noise
- Each square probe cell contains millions of
samples of a PM/MM probe - Probes are distributed to prevent systematic bias
- Approximately 30,000 genes can be represented on
1 cm2
42Probe sets
43Probe Selection
- Good selection results in reliability,
sensitivity, specificity - Computer models Experiments used
- Hybridization conditions considered (pH,
salinity, temperature) - Accounts for splicing and polyadenylation
variants - Cross-hybridization potential considered
44Oligo vs. cDNA technology
- Pros
- Spot quality / consistency
- More information per gene (PM/MM, exon
specificity) - Requires less total RNA (5 µg) due to less cross
hybridization - More dynamic range (detects 1 in 106)
- Less prone to systematic errors found in cDNA
printing - Cons
- Probe selection unavailable
- Expense (400-600 / chip)
45Pre-Hybridization Procedures
- Total RNA is extracted
- mRNA is reverse transcribed to cDNA
- cDNA is made double stranded
- Double stranded cDNA is denatured to cRNA and
fluorescently labeled
46Post hybridization procedures
- Chip is washed and scanned
- Laser scanner generates image
- Image is gridded to identify probes in cells
- Analysis extracts the signal intensity of each
cell - Information from each set of probes is combined
47Resulting Data
48Resulting Data
- Image DAT file 107 pixels or 50MB
- Some groups are working on lossless compression
(FDA regulations) - Quantification CEL file Cell intensities,
probe level values - CDF file Chip Description file contains layout
of GeneChip
49Oligonucleotide Analysis
- Chip contains recognizable features to help
alignment - Important steps
- Quantifying the results image processing
- Finding meaning data analysis
- Most expressions measures are based on PM-MM to
correct for background and non-specific binding - MM can also carry a signal
- Details on Algorithms and Analysis for will be
discussed in detail at a later date.
50Open Challenges Recent News
51Open Challenges
- For Analysts
- Experimental Design and Probe Selection
- Quality Assessment, Normalization, and Metric
Selection - Validation
- Differential and Survival Analysis
- Does clustering provide the right answer?
- For Lab Scientists and Engineers
- Minimize Cost of Experiments
- Minimize Time of Experiments
- Minimize Cross-Hybridization
52Experimental Design
- Considerations
- Budget
- Number of Samples
- Number and kind of replicates
- Hypothesis Test or Generation
- Analysis Method
53Probe Selection
- cDNA Arrays
- Which genes to probe (UniGene)
- Synthesized oligo probes are gaining
popularity (length, /gene) - Oligonucleotide Arrays
- Probe Selection has been evolving with each
new array design - Goal is to improve specificity and sensitivity
with the minimal number of probes used - This was recently addressed in a paper by L.
Zhang, M. Miles, and K. Aldape that appeared as a
letter in Nature Biotechnology
54Probe Selection
- Paper Summary . . . In one slide . . .
- hybridization dependent on steric hindrance,
probe-probe interaction, RNA secondary structure
interactions. - Zhang et al present a nearest-neighbor based
simple free energy model for the formation of
RNA-DNA duplexes that includes the assignment of
a nucleotide position specific weight (PDNN) and
takes into account gene specific (GSB) and
non-gene specific binding (NSB) - Conclusions
- Quantification of NSB is critical for
interpreting array data - Determination of GSB and NSB requires only PM
probes - Probe Selection Criteria
- i. Maximize GSB, minimize NSB
55Quality Assessment Normalization
Improved metrics, standardization of quality
metrics and normalization methods by experiment
design
56Survival Analysis
Analysis methods exist for studies where two
discrete conditions are being tested, how can one
continuous variable be tested using microarrays?
An example would be gene expression correlated to
life expectancy of cancer patients
57Analysis Methods
Does clustering give the right answers? What
alternatives to clustering exist?
58Recently in the News
- Variation Detection, SNP Genotyping, Haplotype
Blocks and Tagging SNPs - Diagnostic Arrays
- Protein Arrays
59Detection of DNA Variation By Using DNA Chips
60Single Nucleotide Polymorphisms (SNPs)
- Each variable base (SNP) results from a single
error in DNA replication that occurred once in
the history of mankind - Each SNP is characterized by only two bases
- The more ancient the error, the more common the
SNP - SNPs result in functional differences by altering
the quality and/or the quantity of cellular
proteins - DNA sequence comparison of any two copies of the
human genome reveals only 0.1 sequence
variability
61Detection of DNA Variation By Using DNA Chips
DNA sequence of Interest
DNA Synthesized on chip
5
3
5
3
G
A
A
A
A
T
C
C
A
T
G
T
T
C
G
T
T
G
T
C
A
C
G
A
G
A C G T
DNA 1
Labeled DNA Hybridized to Chip
5
3
A
A
A
A
A
T
C
C
A
T
G
T
T
C
G
T
T
G
T
C
A
C
G
A
G
A C G T
DNA 2
62 Genotype Determination of Individual Samples
Polymorphism
Polymorphism
Synthesized Oligonucleotides
Sense
Antisense
5
3
5
3
63SNP Genotyping Array
CC
GC
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
G
A
GG
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
T
C
G/C Polymorphism
cgtatcgtagaaGtctatgctaatg
alleleref A
cgtatcgtagaaCtctatgctaatg
allelealt B
-2,-1,0,1,2
SNP base position
64Pharmacogenomic studies
Drug Response responders vs. non-responders sid
e-effect vs. no side-effect
65Diagnostic Arrays
AmpliChip CYP450 (Roche Affymetrix) First
chip-based test approved for diagnostic use in
the European Union The test detects genetic
variations in the Cytochrome P450 2D6 and 2C19
genes and provides the associated predictive
phenotype (poor, intermediate, extensive, or
ultra-rapid metabolizer). Results can be used by
physicians as an aid for selecting drugs and
individualizing treatment doses for drugs
primarily metabolized by the enzymes these genes
encode.
66Diagnostic Arrays
The AmpliChip CYP450 Test distinguishes 29 known
polymorphisms in the CYP2D6 gene, including gene
duplication and gene deletion, as well as two
major polymorphisms in the CYP2C19 gene.
Detection of these CYP2D6 polymorphisms results
in the identification of 33 unique alleles,
including seven CYP2D6 gene duplication alleles
67Microarrays On Campus?
- Many Labs on the USC campus use microarrays in
their experiments, including - Aparicio Lab Replication fork firing
- Arbeitman Lab Drosophila Sex Differentiation
- Finkel Lab GASP and DNA eating in E. coli
68Acknowledgements
69Questions?