Title: Microarrays
1Microarrays
- By
- Heather Matthews
- Tom Shi
- Shih-Hsuan Wu
2Agenda
- Introduction, History, Discoveries and
Applications - Heather
- cDNA Microarray
- Tom
- Synthetic Oligonucleotide Array
- Shih-Hsuan
3Overview
- Biology background
- What do microarrays do?
- Why are microarrays important?
- Types of experiments done using microarrays
- Applications of microarrays
- History of microarrays
- Important experiments using microarrays
- The future of microarrays
4Biology Background
- DNA in somatic cells, all cells except the
gametes, is the same regardless of tissue type or
physiological state. - Proteins in cells can differ depending on whether
genes are expressed in the cells and at what
level. - Genes are expressed at different levels in cells
depending on the - -tissue type
- -time in the cell cycle
- -developmental stage
- -physiological condition
5What do microarrays do?
- Microarrays can measure the expression level of
thousands of genes at once. - This is called doing experiments in parallel.
6Why would you want to know the expression level
of genes?
- Its important to know what level genes are
expressed at in certain cells, because in a
diseased cell, for example a cancer cell, the
genes that are over or under expressed could
potentially be regulated to the expression level
of a normal cell, which could help in the
treatment of the disease. - Drugs could be developed to inhibit proteins that
are too abundant in diseased cells.
7Types of microarray experiments
- Compare/contrast expression levels
-
- -in disease vs. non-disease states, such as
healthy cells vs. cancer cells. - -in different types of tumor cells
- -in different types of tissue, such as muscle
vs. nerve cells. -
8Type of microarray experiments continued
- Compare/contrast expression levels
- at different stages in the cell cycle
- in organisms treated with a drug vs. a control
group - of genes from the same type of cell in different
environments - of genes from the same type of cell in different
species
9Applications of microarrays
- Drug Discovery - Potential to identify certain
genes that could indicate targets for drugs to
help cure certain diseases. - Diagnostics Diagnosis of diseases for which a
gene mutation has been identified. - Custom Drug Selection Alternative forms of
genes can affect the action or metabolism of a
drug. Specific drugs and/or dosages could be
customized for each patient. - Accelerate FDA approval
10History of microarrays
- Current microarray technology is very recent.
- One of the first articles using microarrays was
written in October of 1995 (Schena et al. 1995) - However, quantitative methods to measure gene
expression levels that involve hybridization have
been around for more than 20 years.
11Other methods used to measure gene expression
levels involving hybridization
- Northern and Southern Blots
- -Introduced in the mid 1970s
- -Drawback limited to examining a small number
of genes at a time. -
12Hybridization methods continued
- SAGE(Serial Analysis of Gene Expression)
- -Introduced in 1995.
- -Simultaneously measures expression levels of a
large number of genes. - -uses short sequence tags to mark the
transcripts of a gene and to identify the number
of transcripts generated by each gene. - -Drawbacks - time consuming, involves multiple
steps and extensive sequencing to identify
appropriate tags.
13Important discoveries that led to current
microarray technology
- In the late 1980s robotic devices made it
possible to spot a surface in a compact and
regular pattern. - 10,000 spots on a 22 X 22 cm2 surface
- In the mid-1990s, the number of genes assayed in
an experiment was increased by further reducing
the pitch ( the center to center distance between
the spots) to 200-400µm. - Up to 2,500 spots /cm2
- Sample required was reduced
- Now can do 3,000 spots/cm2
14One of the first experiments involving microarrays
- Quantitative Monitoring of Gene Expression
Patterns With a Complementary DNA Microarray.
(Schena et al. 1995) - Compared expression levels of genes in wild-type
Arabidopsis thaliana and a HAT4 (transcription
factor) transgenic line of A. thaliana. - Transgenic involves the transfer of foreign DNA
into cells. - Chose A. thaliana because it has the smallest
known genome of any higher eukaryote.
15Schena article continued
- Results showed a 50-fold elevation of HAT4 mRNA
in the transgenic line compared to the wildtype. - Expression of all other genes differed by less
than a factor of 5 between the HAT4-transgenic
and wild type plants. - Showed that the gene transfer of the HAT4
transcription factor to A. thaliana was
successful and the HAT4 gene caused phenotype
changes in the transgenic line such as earlier
production of flowers, altered pigmentation, and
poor germination.
16A second experiment done in the Schena article
- Compared the expression levels of genes in cells
from the root and cells from the leaf. - A comparison of the scan revealed widespread
differences in gene expression between root and
leaf tissue. - mRNA from the light regulated CABI gene was 500
times more abundant in the leaf. - The expression of 26 other genes differed by a
factor of 5 between the leaf and the root.
17A recent major discovery involving the
sub-classification of lung cancer cells
- Molecular Profiling of Non-Small Cell Lung Cancer
(NSCLC) and Correlation With Disease-Free
Survival (Wigle et al. June 2002) - Idea - To use gene expression to develop
molecular classifications of cancers and
correlate gene-expression with disease-free
survival.
18Methods of the experiment
- Samples were collected from tumor cells of 39
patients with NSCLC and frozen in liquid nitrogen
until used. - Patients were given the same treatment for lung
cancer. - After treatment, patients were monitored with a
minimum of a 1 year follow up, to see if the
cancer recurred or not.
19Methods continued
- 24 patients experienced relapse of their tumor.
- 15 patients remained disease-free based on both
clinical and radiological testing. - Previously frozen tumor cells from patients who
relapsed were labeled with 1 dye and tumor cells
from patients who didnt relapse were labeled
with another dye and the gene-expression levels
were compared using microarrays.
20(No Transcript)
21Results/Conclusions
- The patients were clustered hierarchically on the
basis of 2899 genes. - Two groups emerged that appeared to separate
patients that relapsed compared to those that
remained disease free. - This experiment provides evidence that relapse
risk for NSCLC can be determined from gene
expression data from cDNA microarrays. - This could lead to improvements in prognosis and
patient management for patients with NSCLC.
22(No Transcript)
23(No Transcript)
24cDNA Microarray
25Agenda
- Procedure and Technology
- Image Analysis
- Data Processing and Analysis
- Experimental Design
- Advantages and Disadvantages
26Procedure Overview
27Procedure Microarray Fabrication
- Select probes from EST databases such as GenBank.
- cDNA probes cloned in bacteria
- Amplify by PCR, purify the product (0.6 to 2.4
kb) - Poly-L-Lysine Coating on the glass slide
- Slides coated with poly-L-lysine have a surface
that is both hydrophobic and positively charged.
The hydrophobic character of the surface
minimizes spreading of the printed spots, and the
charge appears to help position the DNA on the
surface in a way that makes cross-linking more
efficient. - Printing
- DNA cross-linked to the substrate by ultraviolet
radiation. - Slide Blocking - Succinic anhydride reduce
positive charge, prevent target hybridizing with
substrate
28Printing Robot Albert
29Printing Technologies
- Photolithography
- Mechanical
- Microspotting
- Inkjets
30Procedure cDNA Targets Preparation
- Extract RNA from tissue, using oligo-dT primers.
- Reverse Transcription and Flourescent Label the
cDNA samples using Cy3-dUTP or Cy5-dUTP - Hybridization to the slide
31Procedure Hybridization
- Far more probes in each spot on array than there
are targets that can hybridize to them, so no
competition between target - Hybridization chambers is used to avoid
evaporation
32Electron Micrograph picture
33Scanning Technology
- Confocal Laser
- Sensitive, high resolution
- Stimulate Cy3 and Cy5 one at a time or
simultaneously using a filter. - CCD camera-based (digital camera)
- Cheaper, viable with many different floures,
collect signal over long time, but less
sensitive, low resolution
34Confocal Laser Scan Principle
35Image analysis
- Gridding
- Segmentation
- Intensity Extration
- Background correction
- Target detection
36Gridding
- Assign coordinates to the spots
- Can be automatic or manual
- Issues Rotation/Skew of Array and overall
shift
37Segmentation
- Define individual spot boundaries
- Types of Methods
- Fixed Circle
- Adaptive Circle
- Adaptive Shape
- Historgram
38Segmentation Continued
- Fixed Circle
- impose a boundary of constant diameter.
- problem not all signals are circular and same
size. - Adaptive Circle
- Estimate the diameter for each spot.
- problem signals are not all circular.
39Segmentation Continued
- Adaptive Shape
- Different size and shapes possible
- Seeded Region Growing assign a seed region, and
compare value with neighbor regions, and use
algorithm to merge regions. - Historgram
- Read in values over a larger area then the spot
and use histogram to plot the values - Assign an adaptive threshold to determine whether
a pixel belongs in the foreground or background.
40Intensity Extraction
- Foreground and Background Intensity Extracted.
- After background correction, Target Intensity is
derived by subtracting background intensity from
foreground intensity.
41Background Correction
- Why? spot intensity may contain contribution
from contaminations, non-specific binding. - Methods
- Local Background
- Morphological Opening
- Delineate large area around the spot, remove all
spots, and estimate background. - Constant Background
42Methods
Local Background
Morphological Opening
43Effect of Background Correction
44Quality Measure
- Spot size or shape
- Ratio of background intensity to foreground
intensity
45Data Processing
- Use the ratio of Cy3 and Cy5, log transform to
produce normalized data. - MA Plot log2 R/G vs (1/2) log2 (R/G)
- Normalization
- Correct systematic errors of dyes, pins and
background. - Use set of house-keeping genes or all genes as
non-changing control - Filtering disregard extreme measurement values.
Example throw out top 5 and bottom 5 of
target intensities measurements.
46Normalization Examples
Normalization for print-tip group
locations Differently colored lines represent
different print-tip groups.
Yellow control genes Cyan mixed sample pool
titration Red best fit curve for entire sample
Before
After
47Data Analysis
- Design database to hold the data and perform
analysis. - Data Mining -- the automated extraction of hidden
predictive information from databases - Supervised use of predefined class. support
vector machines - Unsupervised hierarchical clustering, k-mean
cluster, self-organizing maps.
48Experiment Design
- Direct vs. Indirect
- Variance differs by factor of 4
- Direct more precise because comparison is within
slides - Indirect more feasible in some circumstances
(comparsion between 3 samples) - Dye Swap reduce systematic difference between
dyes - Replications
- Take multiple readings of same experiment
- Repeat experiment (new RNA extraction)
49Advantages and Disadvantages
- Advantages
- Flexible probes can be custom made
- Higher specificity due to longer probes
- Simultaneous hybridizations in comparative
studies (minimize experimental variation) - Disadvantages
- Large amount of RNA required.
- Cross Hybridization Risk (false positives)
- Probes Multiple Constraints with Glass Possible
- Relatively fewer features than oligo approach
50Oligonucleotide Microarrays
- One Gene Representation on GeneChip
- The Photolithographic Construction of Microarrays
- Major Steps
- Intensity Calculation
- Data Format
- Advantage Disadvantage
51Overview Affymetrix GeneChip Images
52Overview Affymetrix GeneChip Images
53Overview GeneChip Expression Analysis Process
54Overview Catalog GeneChip Expression Arrays
55Gene Representation
56Photolithographic Construction
57GeneChip Single Feature
A single feature on an Affymetrix GeneChip
microarray.
58GeneChip Hybridization
GeneChip Hybridization of tagged probes to
Affymetrix GeneChip microarray.
59Hybridized GeneChip Microarray
Hybridized GeneChip Microarray Cartoon
depicting scanning of tagged and un-tagged
probes on an Affymetrix GeneChip microarray.
60Major Steps
61(No Transcript)
62Intensity Calculation
Each probe cell 10x10 pixels. Gridding
estimate location of probe cell centers.
Signal Remove outer 36 pixels 8x8
pixels. The probe cell signal, PM or MM,
is the 75- th percentile of the 8x8 pixel
values. Background Average of the lowest 2
probe cell values is taken as the background
value and subtracted.
63Affymetrix Microarray Suite (MAS) 5.0
- Experiment Information Files (.EXP)
- - containing information on array type,
sample information, fluidics settings and
hybridization scanner settings. - Image data Files (.DAT)
- - the raw image file direct from the scanner
with the analysis grid. - Cell Intensity Files (.CEL)
- - containing measured intensities locations
for an array that has been hybridized.
64Affymetrix Microarray Suite (MAS) 5.0
- Analysis output File (.CHP)
- - the output generated by the analysis of a
.DAT or .CEL file. - Report File (.RPT)
- - summarizes background noise, housekeeping
information spiked-in controls - Chip Description Files (.CDF).
- - contains names and locations for each gene
represented on the chip.
65Advantages
- High specificity
- Can use small amount of RNA
- Widely used, so annotation of probe sets is of
relatively high quality.
66Disadvantages
- Very expensive to design. (US300,000)
- Expensive to perform experiments. (US400 300
labeling/hybridization) - Limited to the species for which there are chips
available sequence required. - Single target hybridization, so comparison always
involves two experiments, and dye swaps are
impossible.
67The future of microarrays
- Search for a higher density whole-genome chip
that could simultaneously measure the expression
of all 30,000-40,000 human genes. - If they are eventually made, they wont be cheap
and there could be problems with storage and
analysis. - Shift from chips made by research labs to chips
manufactured by companies. - Companies will make standard gene chips for
different organisms and also custom chips.
68Future of microarrays continued
- Fluorescent labeling to electrical detection
- Fluorescent labeling interferes with
hybridization because of steric hindrance, and
requires very expensive detection systems. - Ideally we could quantify the amount of
hybridization by measuring an electrical signal
and not having to modify the sample before
hybridization. - Measuring an electrical signal could be done by
monitoring changes in electrical properties upon
hybridization or by weighing the extra mass of
hybridized material.
69Future of microarrays continued
- Better interpretation of data.
- Development of better software for clustering and
correlation. - Make a standard data format.
- Expression data could be stored in a database.
70Limitations of Microarray Technology
- Does mRNA expression profile accurately reflect
protein abudance in cells? - Gygi Article correlation of mRNA and protein
abundance in yeast only .356 - Protein Microarrays
71References
- Campbell, N. A., Reece, J. B. (2002). Biology.
San Fransisco Benjamin Cummings. - Schena, M. et al. (1995). Quantitative Monitoring
of Gene Expression Patterns with A Complementary
DNA Microarray. Science, 270(5235), 467-470. - Sebastiani, P. et al. (2003). Statistical
Challenges in Functional Genomics. Statistical
Science, 18(1), 33-70.
72References continued
- Strachan, T., Read, A. P.(1999). Human Molecular
Genetics. New York A John Wiley Sons, Inc. - Wigle, D. et al. (2002). Molecular Profiling of
Non-small Cell Lung Cancer and Correlation with
Disease-free Survival. Cancer Research, 62,
3005-3008. - Website
- http//www.dna-arrays.com/index.html
73References continued
- Lectures
- Deonier, R., Microarray Analysis. 10/21/2002
- Calabrese, P., Microarrays/Gene Expression
Arrays. 3/11/2003