Title: Epigenetic Analysis
1Epigenetic Analysis
- BIOS 691- 803
- Statistics for Systems Biology
- Spring 2008
2Kinds of Questions
- Where are the epigenetic modifications?
- How do they co-vary?
- How do epigenetic changes affect expression of
genes?
3Covariation of Epigenetic Measures
- Motivating questions
- How are epigenetic modifications related?
- What are the major determinants of epigenetic
state? - Statistical techniques
- Covariance calculation
- Principal component analysis
- Linear models
4Location and Covariance
- Question do epigenetic modifiers act on specific
targets or do they act on whole regions of DNA? - Direct experimental evidence contradictory
- Statistics may help
- Covariation patterns may be evidence
5CalcA in NCI60
- Calcitonin A gene
- Two CpG clusters plus 3 odd CpGs
- High correlation within clusters
6CDH1 in NCI 60
7Covariation in Methylation of 7 Genes
- Individual genes have multiple CpG sites
- Most variation overall methylation
Correlation Map of 108 CpG sites in 6 genes
across 5 ECOG pilot samples Red 1 White
0 Blue lt 0
Epigenomic Analysis
8Methylation and Expression
- Single gene (E-cadherin) results suggest overall
methylation correlated with expression
9Methylation and Expression
- HELP assay gives genome-wide sampling of
methylation sites at 15K genes - If select genes with S/N gt 2 in both measures,
then correlations with associated genes are
bi-modal
Epigenomic Analysis
10What Causes Methylation?
- NCI-60 derived from various tissues
- Tissue characteristic profile specific history
of cells - Fit linear model to each methylation site
- 9 tissues for 60 observations
- 51 error df
- Overall 41 of variance attributable to tissue
- What causes the remainder of methylation
differences?
11PCA for Cell-specific Factors
- Residual variance has one strong PC
- Remainder are noise
- 1st PC is almost constant
- Reflects overall level of methylation
- Is this an artifact or is it real?
- Significantly correlated with expression of DNMT1
DNMT3A
12Relations Between Epigenetic Measures - III
13Issue Cancer Stem Cells?
- Hypothesis cancers arise from stem cells rather
than differentiated epithelial cells - How would you tell the difference between
partially differentiated stem cells and
de-differentiated epithelial cells? - Proposal compare characteristic epigenetic
modifications of stem cells with cancers - Epigenetic modifications are distinct
- PRC2 (stem cells) vs methylation (cancer)
14Statistical Methodology
- Test of association 2 x 2 table
- Fisher Exact p 10-5
15Statistical Methodology
- Test of association 2 x 2 table
- Fisher Exact p 10-5
- Alternatives
- T-test (predictor PRC2)
- Linear model (predictor methylation T N )
16PRC2 Methylation Association
17Are CIMPs Stem Cell Clones?
- Distinctive PRC2 sites appear preferentially
methylated in CIMP tumors
18Correlations between epigenetic and expression
measures I
- Copy Number and Expression
19Copy Number and Expression
- Large sections of DNA containing many genes are
often copied or deleted - We think most control elements are copied or
deleted also - If more (or fewer) copies of a gene then ceteris
paribus there should be more (fewer) copies of
RNA
20Integrative Studies of CGH Gene Expression
- Expect to see strong correlation between copy
number and expression in data - Previous studies report report weak effects
- Average correlations from (0.04 to 0.27)
- NCI 60 study average correlation 0.16
21Why Not?
- H1 there really isnt much effect biology
- Somehow the cells are compensating
- In any case there shouldnt be any effect on
non-expressed genes - H2 we may not be able to measure the effect that
is there technical error - Probes may be insensitive/cross-hybridizing
- Signal/noise too low even when probes are
sensitive
22Eliminating Uninformative Genes
- Genes which are silenced will not show effect of
copy number variation - Mean signal a rough proxy
- Remove genes with mean signal above 6.3
- Only genes with significant copy number variation
(above measurement noise) will show effect - Select genes with SD of copy number gt 0.5
23Correlations of Selected Measures
Black All correlations Red Reliably measured
correlations
24Estimating True Correlations
- If measurement noise of SD 0.3 degrades
expression measures, then true correlations of
variables will be mostly closer to 0 than
correlations of measures - Given a correlation and measured standard
deviations, what are most likely true standard
deviations and true correlation?
25MLE of Noisy Correlations
- Noise can be estimated from replicates
- If N large can estimate
- SD of originals can be estimated by ML
- Given s and e, the MLE of correlation can be
inferred - For NCI 60 median MLE correlation 0.65
Epigenomic Analysis
26Correlations between epigenetic and expression
measures II
27Do Epigenetic Marks Regulate Transcription?
- Several studies finding only weak evidence by
correlation analysis - Same technical issue S/N ratio
- Questions
- Does methylation shut down most genes?
- Which histone marks indicate active
transcription?
28Methylation and Expression
- HELP assay gives genome-wide sampling of
methylation sites at 15K genes - Select genes with S/N gt 2 in both measures
- Correlations with gene expression values are
bi-modal
Epigenomic Analysis
29Interpretation of Meth-Expr Corrs
- MLE of negative mode -0.8
- 2/3 of genes under that hump
- Unclear whether positive hump is real or an
artifact of small sample size - Possible explanations
- True induction by methylation
- Methylation of insulator
- Irrelevant CpG site
30Acetylation and Expression
- Histones often acetylated during expression
- Histone 3 lysine 9 (H3K9) acetylation measured
- Measures corrupted by noise
- Blue S/N gt 2.5
- Red S/N gt 2
- Black S/N gt 1.5
31Biological Prediction
- H3K9 acetylation gene expression
- Is this real?
- Experimental test find genes with high
acetylation variance, and little expression
variance by microarray - Results (7 genes)
- Confirm hypothesis
- Implies
- Expression arrays are not sensitive
Epigenomic Analysis