Title: BIOMEDICAL DISCOVERY USING MICROARRAYS
1BIOMEDICAL DISCOVERY USING MICROARRAYS
Refresher course 2002 Recent Advances in
Neuroscience
David Murphy University of Bristol
MOLECULAR NEUROENDOCRINOLOGY RESEARCH GROUP
2MICROARRAYS
- Genome information
- Human, mouse, rat
- 37,000 genes?
- Genomic, cDNAs, ESTs
- Global expression patterns
- What genes are expressed where?
- How does this pattern change?
- Physiological cues
- Disease
3ARRAYS
EARLY HISTORY Southern EM, Maskos U, Elder JK
(1992) Analyzing and comparing nucleic acid
sequences by hybridization to arrays of
oligonucleotides evaluation using experimental
models. Genomics 13 1008-17. Schena M, Shalon
D, Davis RW, Brown PO (1995) Quantitative
monitoring of gene expression patterns with a
complementary DNA microarray. Science 270
467-70. Schena M, Shalon D, Heller R, Chai A,
Brown PO, Davis RW (1996) Parallel human genome
analysis microarray-based expression monitoring
of 1000 genes. Proc Natl Acad Sci U S A 93
10614-9. Microarray papers in Pubmed 1997 -
9 1998 - 35 1999 - 117 2000 - 348 2001 -
1048 2002 - 1950
4ARRAYS
Gene discovery Diagnosis Drug discovery
5Hadley and King (2001) JAMA 2862280
Sources of variability
Tissue source Tissue harvesting Microarray
platforms Microarray variability Sample
labelling Data analysis Human error! -
STANDARDISE - POOL MULTIPLE SAMPLES -gt AVERAGE -
LOTS OF REPLICATES
6Hadley and King (2001) JAMA 2862280
What is normal?
Project normal Pritchard et al (2001) PNAS USA
9813266 Compared expression profiles of 5406
genes in 6 C57Bl/6 mice Variance - Kidney -
3.3 Testis - 1.8 Liver - 0.8 Strain
differences Sandberg et al (2000) PNAS USA
9711038 Compared expression profiles of gt10,000
genes in 6 brain regions of C57Bl/6 and 129SvEv
mice 1 of genes differentially expressed in
at least one brain region Age
Physiological state Disease Time of day -
STANDARDISE - POOL MULTIPLE SAMPLES -gt
AVERAGE - LOTS OF REPLICATES
7Hadley and King (2001) JAMA 2862280
Cellular heterogeneity
Specific brain regions Single cell sampling
8Cellular Heterogeneity
Specific brain regions Single cell
sampling Laser capture microdissection
www.arctur.com
9www.arctur.com
Small tissue samples - not enough RNA -gt
linear amplification But can
bias transcript ratios
10Hadley and King (2001) JAMA 2862280
What type of array to use?
cDNA microarrays Oligomer arrays Oligonucleoti
de arrays Affymetrix
11Fabrication of cDNA microarrays
Mechanical micro-spotting
Ink-jetting
12Probing of cDNA microarrays
13Hadley and King (2001) JAMA 2862280
What type of array to use?
cDNA microarrays
- Advantages
- Custom arrays
- Cheap
- Disadvantages
- Mix-ups
- Between 1 and 5 of the clones in even the
best-maintained sets do not contain the sequence
that they are supposed to. - Worst case - Timothy Zacharewski, at Michigan
State University in East Lansing sequenced 1,189
cDNAs from Research Genetics. Only 62 of the
stocks definitely represented a pure sample of
the correct clone. Of the remainder, more than
half seemed to contain the wrong cDNA, and the
rest contained either a mix of different cDNAs or
did not yield a readable sequence. - Variability (especially with lab-made chips)
14Hadley and King (2001) JAMA 2862280
What type of array to use?
cDNA microarrays Oligomer arrays Oligonucleoti
de arrays Affymetrix
15Oligomer Arrays
- Aligent Technologies
- in-situ synthesis of 60-mer length
oligonucleotide probes, base-by-base - Non-contact inkjet process uses standard
phosphoramidite chemistry to deliver extremely
small, accurate volumes (picoliters) of the
chemicals to be spotted - MWG Biotech
- Collections of 50-mer oligonucleotides
- Available as ready-made slides or can be printed
by the user to generate custom chips
Flexible Good quality control Increased
hybridisation specificity and consistency
16Hadley and King (2001) JAMA 2862280
What type of array to use?
cDNA microarrays Oligomer arrays Oligonucleoti
de arrays Affymetrix
17Hundreds of thousands of different
oligonucleotides on a glass surface Photolithogr
aphy-directed combinatorial chemical
synthesis Based on cDNA or EST sequence, up to
20 different independent 25-mer and minimally
overlapping oligonucleotides are selected to
serve as sensitive, unique, sequence
detectors Mismatch control probes are identical
to their perfect match partners except for a
single base difference in a central position
Affymetrix GeneChips
www.affymetrix.com
18Fabrication of Oligonucleotide microarrays
19Hundreds of thousands of different
oligonucleotides on a glass surface Photolithogr
aphy-directed combinatorial chemical
synthesis Based on cDNA or EST sequence, up to
20 different independent 25-mer and minimally
overlapping oligonucleotides are selected to
serve as sensitive, unique, sequence
detectors Mismatch control probes are identical
to their perfect match partners except for a
single base difference in a central position
Affymetrix GeneChips
www.affymetrix.com
20Fabrication of Oligonucleotide microarrays
21www.affymetrix.com
22Hadley and King (2001) JAMA 2862280
What type of array to use?
Oligonucleotide microarrays
Advantages
- Designs based on sequence information
- no need to prepare and verify physical
intermediates such as bacterial clones, PCR
products, cDNAs - less possibility of mix-ups (but)
- Detect individual gene transcripts
- distinguish splice variants, and closely related
members of a gene family - distinguish sense and anti-sense transcripts
- The probe redundancy/mismatch strategy helps
identify and minimise the effects of non-specific
hybridisation and background signal - allows the direct subtraction of
cross-hybridisation signals, and discrimination
between real and non-specific signals.
23Hadley and King (2001) JAMA 2862280
What type of array to use?
Oligonucleotide microarrays
Disadvantages
- Need access to specialist equipment
- Cannot readily fabricate custom arrays
- EXPENSIVE!!
24Hadley and King (2001) JAMA 2862280
Sources of variability
Slide heterogeneity Spotting
variation Printing irregularities -
STANDARDISE - POOL MULTIPLE SAMPLES -gt AVERAGE -
LOTS OF REPLICATES
25Hadley and King (2001) JAMA 2862280
Logistical logjam - A single microarray run can
produce between 100,000 and a million data
points A typical experiment may
requires tens or hundreds of runs
- Analysis
- LOW-LEVEL ANALYSIS - background elimination,
filtration and normalisation - removal of
systematic variation between chips, enabling
group comparisons - HIGH-LEVEL ANALYSIS ( "data mining) - the
uncovering of relevant patterns of interest in
data from a particular problem domain - Statistics
- Presentation - eg CLUSTERING
- Archiving
- Relational and functional databases
- Integrate with other type of information
- Search literature on genes and gene-gene
relationships - - Known regulatory circuits, chromosomal
localisation, cellular localisation of gene
product etc.
26Clustering Organisation of array data on the
basis of similar expression profiles - intuitive
visual assimilation - coincidental?
27Hadley and King (2001) JAMA 2862280
Confirmatory studies
Because of the statistical issues raised by
microarray technology, it is very important that
the findings be confirmed using an independent
method, preferably with separate samples rather
than retesting of the original mRNA. Because data
resulting from a microarray are so extensive, it
is impossible to retest all of the data.
Nevertheless, it is incumbent upon investigators
to evaluate a reasonable number of genes. Gary S
Firestein and David S. Pisetsky (2002) DNA
microarrays Boundless technology or bound by
technology? Guidelines for studies using
microarray technology Arthritis Rheumatism 46
859-861
RNA Quantitative (real-time) PCR Northern
blotting In situ hybridisation Protein Western
blotting Immunocytochemistry
28Quantitative (real-time) PCR TaqMan (Applied
Biosystems)
29Present and future challenges
Hardware - Quality - Reproducibility -
Comparability - Dynamic range - Sensitivity -
Cross hybridisation - Gene number - need not be
complete but should be unbiased Affymetix Human
Genome U133 GeneChip Set comprises two
microarrays containing over 1,000,000 unique
oligonucleotide corresponding to more than 39,000
transcript variants representing greater than
33,000 of the best characterized human genes -
COST!! Software - the search for a body of
mathematics that will serve as a natural
language for gene expression information
Young (2000) Cell 1029 1 UNDERSTAND SOURCES
OF NOISE AND VARIATION IN ORDER TO INCREASE THE
BIOLOGICAL SIGNAL 2 COMBINE EXPRESSION DATA WITH
OTHER SOURCES OF INFORMATION TO IMPROVE THE RANGE
AND QUALITY OF BIOLOGICAL
CONCLUSIONS 3 DEVELOP TECHNIQUES THAT ENABLE THE
MODELLING OF GENE NETWORKS IN THE CONTEXT OF THE
FUNCTIONING OF INTEGRATED BIOLOGICAL
SYSTEMS 4 HUMAN JUDGEMENT AND EXPERTISE WILL BE
REPLACED BY ARTIFICIAL INTELLIGENCE
30Present and future challenges
Gene function - Experimental Design
hypothesis generation - Functional genomics
hypothesis testing -gt BIOCHEMISTRY -gt CELL
BIOLOGY -gt GENETICS -gt STRUCTURAL BIOLOGY -gt
PROTEOMICS -gt SYSTEMS BIOLOGY high throughput
physiology rapid and efficient gene transfer
(germline/somatic)
Vukmirovic and Tilghman Nature 405 820 (2000)
31ARRAYS
Gene discovery Diagnosis Drug discovery
32Biological QuestionPLASTICITY IN VASOPRESSIN
NEURONS
- Peptide hormone
- 9 amino acids
- Water homeostasis
- Gene expression
- Cell-specific expression
- Physiological regulation
- Plasticity
MOLECULAR NEUROENDOCRINOLOGY RESEARCH GROUP
33Antidiuretic Action of Vasopressin
MOLECULAR NEUROENDOCRINOLOGY RESEARCH GROUP
34Biological QuestionPLASTICITY IN VASOPRESSIN
NEURONS
- Peptide hormone
- 9 amino acids
- Water homeostasis
- Gene expression
- Cell-specific expression
- Physiological regulation
- Plasticity
MOLECULAR NEUROENDOCRINOLOGY RESEARCH GROUP
35VASOPRESSIN GENE EXPRESSION
SON
P
V
N
S
O
N
PVN
S
C
N
Control Dehydrated
PHYSIOLOGICAL REGULATION
P
O
S
T
E
R
I
O
R
P
I
T
U
I
T
A
R
Y
Plasticity - Biosynthesis Secretory Morpholo
gical Electrogenic
CELL-SPECIFIC EXPRESSION SON/PVN
MOLECULAR NEUROENDOCRINOLOGY RESEARCH GROUP
36FUNCTIONAL GENOMICS
- 30,000-60,000 GENES
- 20,000-40,000 EXPRESSED IN THE BRAIN?
- How many of these genes are utilised by
vasopressin magnocellular neurons? - How does the pattern of gene expression change
following a physiological stimulus - In the past - opportunistic
- Availability of probes
- Intuition of researchers
- Unbiased global approach
- Identify genes switched on/off in SON by
dehydration
MOLECULAR NEUROENDOCRINOLOGY RESEARCH GROUP
37NIA NEUROARRAYwith Tanya Barret and Kevin
Becker (NIA Baltimore)
ON Notch 2 (7.9) IL6 (2.6) Casp
4 (1.8) OFF SMG1 (-2.4) RGS5
(-2.9)
C
C
C
D
MOLECULAR NEUROENDOCRINOLOGY RESEARCH GROUP
38AFFYMETRIX U34a GeneChip
- Represent a total of
- 8700 rat genes
- Interrogated with SON RNA
- - 0, 1 and 3 days of dehydration
- Performed in triplicate
- -gt dchip
Affymetrix GeneChips
Bristol - Mohamed Ghorbel, Greig Sharman, Marie
Leroux Aarhus - Thomas Thykjaer, Torben Orntoft
www.affymetrix.com
39ANALYSIS dChip Wing Hung Wong and Cheng Li
(www.dchip.org) USER FRIENDLY The programme
pools data from multiple arrays to assess
standard errors for expression indices and allows
statistical assessment of results Comparative
analysis Heirarchical analysis Normalization
- Rank-selection method - Selects a set of
genes with the property that the rank of a gene
in this set according to its expression
measurement in one array is similar to its rank
using values for the second array - Genes
selected this way tend to be non-differentially
expressed - Form a valid basis the computation
of a normalization relation Model-based
expression analysis of oligo-array data Propose
a statistical model for expression data at the
probe level - Probe-level analysis across
multiple arrays - Account for individual probe
effects - Allows automated probe selection -
Reduces errors caused by outliers,
cross-hybridizing probes or image artefacts
40AFFYMETRIX U34a GeneChip
41(No Transcript)
42(No Transcript)
43ARRAYS
Gene discovery Diagnosis Drug discovery
44BREAST CANCER DIAGNOSIS
- Present
- Highly subjective
- Based on judgements of tumour histology by
pathologists - Patients with same stage of disease
- Different treatment responses and outcomes
- Strongest predictors for metastasis
- Lymph node status and histological grade
- Fail to predict tumours according to clinical
behaviour - Chemotherapy or hormone treatment
- Reduce risk at metastases by one-third
- 70-80 of patients would survive without
45BREAST CANCER DIAGNOSIS
- FUTURE
- Array analysis van t Veer et al (2002) Nature
415 530 - Primary breast tumours of 117 young patients
- Identified a gene expression signature predictive
of poor prognosis (short interval to distant
metastases) in patients without tumour cells in
local lymph nodes at diagnosis
46CLUSTER
47ARRAY DIAGNOSIS
Genes regulating - cell cycle - invasion
- metastasis - angiogenesis
Personalised medicine Targets for new
therapies
48ARRAYS
Gene discovery Diagnosis Drug discovery
49DRUG DISCOVERYThe Present
- Pharmacology
- Action of chemical agents on living cells
- Biochemical pathways implicated in
pathophysiological processes - Identify and study key enzymes
- Optimise therapeutic behaviour of small molecules
that bind to and alter the activity of specific
targets - Clinical trials
- US50-500 million
- 90 of clinical trials fail
- 50 lack efficacy
- 25 uneconomic
- 25 unsafe
50DRUG DISCOVERYThe Future
51ARRAYSRedefining the scientific endeavour
- FISHING TRIPS??
- Discovery research
- Not hypothesis driven, but should be just as
rigorous - Unbiased data gathering
- Accelerated answers to obvious questions
- Hypothesis generation
- BUT - how good is the technique in supplying
physiological answers worthy of sustained
follow-up experiments?