Title: Genomics and the Environment: The Standardization Challenge of Gene Expression Profiling
1 Genomics and the Environment
The Standardization Challenge of Gene
Expression Profiling Ben Van Houten,
Ph.D. Laboratory of Molecular Genetics,
DIR Program Analysis Branch, DERT vanhout1_at_niehs
.nih.gov
2 Overview
- Gene expression profiling/microarray experiments
chemicals give a unique signature - need for standardization
- sources of variation
- technical
- biological
- Challenges/Promise
- Discussion Points
3Genomic Approaches to Toxicology
DNA
mRNA
Proteins
Gene expression profiling
Single nucleotide polymorphisms, SNP
Protein-Protein interactions structure-function
mRNA
3-AAAA
Ribosome
protein
4Simplified Overview of Gene Expression Analysis
Using cDNA Microarrays
Treated Population
Control Population
RNA Isolation
Cy3
Cy5
Reverse Transcription
Data analysis
Mix cDNAs and Apply to Array
Scan
Hybridize Under Coverslip
Red induced, green repressed, yellow unchanged
5(No Transcript)
6(No Transcript)
7Differential Expression of 12,000 Human
Genes Keratinocytes vs. Mammary Cells
Hisham Hamadeh, NIEHS
8Toxicant Identification and Classification Using
cDNA Microarrays
Known Agents
Suspected Toxicant
Peroxisome Proliferators
Polycyclic Aromatic Hydrocarbons
Oxidant Stressors
Raw Data
Group A
Toxicant Signature
Group B
Group C
No Match
No Match
Match
9Examined gene expression profile of liver from
rats treated with one dose of Wyeth 14,643,
clofibrate, gemfibrozil, or phenobarbital after
24 hr or two weeks. Found unique chemical
signatures, low inter-animal variation, time
dependence of exposure versus tissue response.
10Clustering analysis of NIEHS Array Data using the
top 50 genes there were selected using the GA/KNN
method. There are four clusters. Each
corresponds to a compoundWyeth, clofibrate,
gemfibrozil, phenobarbital.
Animal/hyb
genes
11FIG. 3. The Partek Pro 2000 software package was
used for visual Principle Component Analysis of
the data for genes that were altered in a
statistically significant manner with any of the
treatments used. Each colored point represents
data from an individual animal treated with the
respective agent for 24 h. From Hamadeh HK,
Bushel PR, Jayadev S, Martin K, DiSorbo O, Sieber
S, Bennett L, Tennant R, Stoll R, Barrett JC,
Blanchard K, Paules RS, Afshari CA. Toxicol Sci
2002 Jun67(2)219-31
12Can these data be repeated by different groups at
different sites, using different platforms?
FIG. 6. Illustration of transient versus delayed
responses in gene expression. From Hamadeh HK,
Bushel PR, Jayadev S, Martin K, DiSorbo O, Sieber
S, Bennett L, Tennant R, Stoll R, Barrett JC,
Blanchard K, Paules RS, Afshari CA. Toxicol Sci
2002 Jun67(2)219-31
13Importance of Standardization
- Many sources of variation in microarray
experiments and application of bioinformatics
tools - Impact of variation on data interpretation
unknown - No standard protocols (best practices) for the
field - Currently very difficult to impossible to compare
gene expression data across microarray platforms
(centers/investigators) - Needed to consolidate gene expression data in a
centralized knowledge database. - Lay foundation for experiments of molecular
responses to environmental stressors and risk
assessment
14Sources of Variation
Technical
Scanning
15Sources of Variation/technical (1)
- RNA Labeling/ Scanning Analysis
- Extraction Hybridization (Bioinformatics)
- RNA extraction methods phenol, RNeasy
Stabilization and Total RNA Isolation System
(Qiagen), RNAlater (Ambion), RNA storage - RNA qualityA260/A280, agarose gels, Bioanalyzer
- RNA Quantity bottle-neck PCR amplification, not
all cDNAs amplify the same - Low abundance genes are the important genes
- RNA extraction from some tissue is very difficult
autolysis, and RNA degradation is a major
concern.
16Sources of Variation/technical (2)
- RNA Labeling/ Scanning Analysis
- Extraction Hybridization (Bioinformatics)
- Chip design cDNA/oligonucleotide spotting,
lithography (Affymetrix) or ink-jet technology
(Agilent). - Array density
- Control spots
- Pre-hybridization and hybridiation conditions
- Direct vs indirect labeling, Dye-flips
- Amplification steps
- Number of hybs
17Sources of Variation/technical (3)
- RNA Labeling/ Scanning Analysis
- Extraction Hybridization (Bioinformatics)
-
- How to assess raw data- intensity of each spot
for each detectable label, - Scanner design, number of scans,
- Target detection, background subtraction
- Intensity/Size thresholds
- Outlier determination
- How many replicates are needed?
- Statistical analysis, data display
- Gene annotation
18Outlier Determination
- 2-fold cut-off for induction or repression
- Simplistic
- Lacks statistical power and sensitivity
- Confidence Level
- Statistics-based analysis
- Hybridization-specific
- Lower confidence balanced by replicates
19Determination of Outliers
Confidence Level 99.5
Threshold 1.55 Fold 26 / 2000 Outliers
Confidence Level 95.0
Threshold 1.35 Fold 90 / 2000 Outliers
20Probabilities of False Outliers in Triplicate
Analyses 1700-Gene Chip, 95 Confidence
21Gene expression changes in Bcl-2 overexpression
in PC12 cells
How to name a gene?
22Gene Annotations
Genomic DNA
Transcription
Messenger RNA
(mRNA)
UniGene ESTs
UniGene Clusters
Gene Product - Protein
23What is in a (Gene) name? p21 CDKN1
WAF1 CIP1 SDI1 CDN1 Not
p21ras !!
24- enhance research in the broad area of
environmental stress responses using microarray
gene expression profiling - development of standards and practices in the
field for data inclusion into a public database
25How do we compare data across platforms ?
26Standardization Experiments
- RNA Labeling/ Scanning Analysis
- Extraction Hybridization (Bioinformatics)
- Experiment 1 Determine variation in RNA labeling
and hybridization and harmonize protocols across
CRMs - Experiment 2 Determine variation in data
analysis (bioinformatics) across CRMs - Experiment 3 Determine variation in RNA
extraction (toxicant-challenged vs. unchallenged) - Experiment 4 Determine sources of variation in
animal husbandry (toxicant-challenged
animals/tissue vs. non)
27 Experiment 1 2 Platform
Standardization and Analysis
- CRM 1-6
- CRM supplied Stnd RNA-1 and Stnd RNA-2
- CRM supplied Stnd mouse chip and selects a
resident chip - CRM uses resident protocols for RNA labeling,
hyb, scanning, gene annotation, raw data analysis - Each CRM conducts data analysis for common
genes on standard and resident chip
Experiment 3
28TRC Data warehouse using GeneX facilitating
data exchange between Cooperative Research Members
- Data input
- Data curation
- Data processing (normalization and analysis)
- Web hosting and sharing
Srinivasa Nagalla M.D.
Clinical Genomics Proteomics Program, Center
for Biomarker Discovery, Department of
Pediatrics, OHSU
29(No Transcript)
30Data entry
- Web-based short MIAME (Minimum Information About
a Microarray Experiment) sheet for experimental
details lthttp//www.mged.org/Annotations-wg/gt - Output from scanners (raw data files)
- Array elements
31(No Transcript)
32(No Transcript)
33Data curation
- Conversion of experimental details into
respective data tables - Gene ID annotation
- Quality assessment (blanks and standards)
- - data being compared from different platforms
using Arabidopsis spike-in RNAs. - Conversion of raw numbers into flat files
34Quality assessment tools
35(No Transcript)
36Data analysis (Biostat)
- Standard regression models
- Reference/experimental
- Raw and processed numbers in data sheets
- Global comparisons (CRM specific and all TRC
members)
37Data sharing (web query)
- Login access to CRM
- Data analysis tools (a single visualization tool
for clustering, etc.) - CRM can download and analyze using their own
tools - Web based NIEHS progress reports
38(No Transcript)
39Sources of Variation/biological
- Chemical purity and vehicle controls.
- Treatment schedule issues of dose and time
- Model considerations cell culture versus in
vivo whole animal studies - Animal husbandry source of animal, strain
background, housing, bedding, food water,
treatment, etc. - Single animal versus pooled tissue?
40Application of Genomics/Proteomics to
Mechanism-based Risk Assessment
ILSI Health and Environmental Sciences Institute
- Issues
- How can genomics and proteomics be applied to
safety and risk assessment? - Of what value will these emerging technologies be
in providing a better understanding of mechanisms
of toxicity? - What information is necessary for evaluation of
methodologies and interpretation of data
generated by various techniques?
41Current HESI Genomics Subcommittee Participants
- Industry (pharmaceutical, agrichemical, chemical,
and consumer products) - 31 corporate members
- Government laboratories NIEHS, U.S. FDA and
EPA, EU CPMP, Japan NIHS - Academic advisors
42Some Early Lessons from ILSI HESI
- Differences due to site of in-life study
interpretation of this difference was aided by
analysis of clinical chemistry, pathology etc. - Single dose, intraperitoneal injections may lead
to high animal variation due to missed dosing-
analysis of individual animals is preferred to
pooled animals. - Low dose, early time points require very
stringent statistical analysis to determine
significant gene changes - High dose, late time point profiling is
confounded by secondary effects (toxicity). - Lack of RNA exchange between groups difficult to
form a basis for comparison or understanding of
where differences are derived from, biological or
technical. - Comparison of gene expression platforms is
complexed by non-uniform gene sets/annotation
where comparison was possible, agreement between
platforms seems to be greater than 75.
43Number of Genes Induced or Repressed in
Individual Animals
The number of genes changed in the pooled samples
and in animals 76, 77, 78 was similar. Many
fewer genes were altered in animals 79 and 80.
44 access to factual knowledge of all kinds is
rising The answer is clear synthesis. We are
drowning in information, while starving for
wisdom. The world henceforth will be run by
synthesizers, people able to put together the
right information at the right time, think
critically about it, and make important choices
wisely. p. 269
45Challenges for implementation
- Will standardization be achievable with current
platforms? - What are the sources of technical and biological
variation? - What does the data mean, how to handle the huge
amounts of data? Microarray data must be
integrated with current databases. - What is the best way to integrate these data?
Need for phenotypic anchoring. - Will current platforms be amenable to large
scale efforts necessary for population studies? - What are the intra- and inter-individual
differences? - Will peripheral lymphocytes be a good surrogate
tissue? -
46Promise of gene expression profiling.
- Toxicology assays - what genes/pathways are
turned on/off following exposure to a toxic
agents? Do agents have unique signatures of
exposure or toxicity? - Basic research - which genes/pathways are
activated/ suppressed during cell
injury/recovery? - Human studies
- - gene expression changes associated with acute
and chronic exposure and/or polymorphisms. - - gene expression changes associated with
exposure leading to dysfunction and disease. - major tool in medicine, from diagnosis/prognosis
to drug sensitivities and effectiveness. - personalized risk assessment.
47Acknowledgments
Sam Wilson, Deputy Director NIEHS Ray
Tennant, Director, NCT http//www.niehs.nih.gov/nc
t/home.htm NIEHS Microarray Center Richard
S. Paules Hisham Hamadeh
http//www.niehs.nih.gov/dert/programs/toxge
nom.htm Stella Sieber Karla Martin
Rick Fanin Jeff Tucker Pierre Bushel
Lee Bennett
Toxicogenomcis Research Consortium Bill
Suk Brenda Weis http//medir.ohsu.edu/genev
iew/ Gene/Protein Expression Center Nigel
Walker Alex Merrick Ken Tomer
Christoph Borchers Julie Foley
http//dir.niehs.nih.gov/microarray/
48Discussion Points
- Who plays what roles in the standardization
process? academia, industry, government agencies - How can we accelerate the process?
develop standards and practices, data sharing
capabilities. - What areas/applications may prove the most useful
and commercially viable? diagnosis/prognosis,
adverse rx screen, occupational exposure,
exposure assessment, personalized risk
assessment - What downsides, unintended consequences may be
lurking in the future? squelch development,
litigation, genetic profiling - Who else needs to be brought into the discussion?
Bioethicists, science writers/reporters, medical
profession, regulators, public