Title: Understanding Gene Regulation: From Networks to Mechanisms
1Understanding Gene Regulation From Networks to
Mechanisms
- Daphne Koller
- Stanford University
2Gene Regulatory Networks
Controlled by diverse mechanisms
Modified by endogenous and exogenous perturbations
http//en.wikipedia.org/wiki/Gene_regulatory_netwo
rk
3Goals
- Infer regulatory network and mechanisms that
control gene expression - Identify effect of perturbations on network
- Understand effect of gene regulation on phenotype
4Outline
- Regulatory networks for gene expression
- Individual genetic variation and gene regulation
- Cell differentiation and gene regulation
- Expression changes underlying phenotype
5Regulatory Network I
- mRNA level of regulator can indicate its activity
level - Target expression is predicted by expression of
its regulators - Use expression of regulatory genes as regulators
Transcription factors, signal transduction
proteins, mRNA processing factors,
Segal et al., Nature Genetics 2003 Lee et al.,
PNAS 2006
6Regulatory Network II
- Co-regulated genes have similar regulation
program - Exploit modularity and predict expression of
entire module - Allows uncovering complex regulatory programs
module
Regulatory Program ?
Segal et al., Nature Genetics 2003 Lee et al.,
PNAS 2006
7Module Networks
- Learning quickly runs out of statistical power
- Poor regulator selection lower in the tree
- Many correct regulators not selected
- Arbitrary choice among correlated regulators
- Combinatorial search
- Multiple local optima
Segal et al., Nature Genetics 2003
8Regulation as Linear Regression
- minimizew (Swixi - ETargets)2
- But we often have hundreds or thousands of
regulators - and linear regression gives them all nonzero
weight!
xN
w2
w1
wN
parameters
ETargets
ETargets w1 x1wN xNe
Problem This objective learns too many regulators
9Lasso (L1) Regression
- minimizew (w1x1 wNxN - ETargets)2 ? C wi
- Induces sparsity in the solution w (many wis set
to zero) - Provably selects right features when many
features are irrelevant - Convex optimization problem
- Unique global optimum
- Efficient optimization
- But, arbitrary choice among correlated regulators
xN
L2
L1
w2
w1
wN
parameters
ETargets
Tibshirani, 1996
10Elastic Net Regression
- minimizew (w1x1 wNxN - ETargets)2 ? C wi
? D wi2 - Induces sparsity
- But avoids arbitrary choices among relevant
features - Convex optimization problem
- Unique global optimum
- Efficient optimization algorithms
xN
L2
L1
w2
w1
wN
ETargets
Zhou Hastie, 2005
Lee et al., PLOS Genetics 2009
11Learning Regulatory Network
- Cluster genes into modules
- Learn a regulatory program for each module
ECM18
ASG7
MEC3
GPA1
UTH1
GPA1
MFA1
MFA1
TEC1
- This is a Bayesian network
- But multiple genes share same program
- Dependency model is linear regression
HAP1
PHO3
PHO5
PHO84
SGS1
RIM15
RIM15
PHM6
PHO2
PHO4
PHO2
PHO4
SEC59
SPL2
SAS5
SAS5
GIT1
VTC3
Lee et al., PLoS Genet 2009
12Outline
- Regulatory networks for gene expression
- Individual genetic variation and gene regulation
- Effect of genotype on expression
- Regulatory potential
- Cell differentiation and gene regulation
- Expression changes underlying phenotype
13Genotype ? phenotype
Different sequences
Different phenotypes
ACTCGGTTGGCCTAAATTCGGCCCGG ACCCGGTAGGCCTTAATTC
GGCCCGG
ACTCGGTAGGCCTATATTCGGCCGGG
14Genotype ? Regulation
Different sequences
ACTCGGTTGGCCTAAATTCGGCCCGG ACCCGGTAGGCCTTAATTC
GGCCCGG
ACTCGGTAGGCCTATATTCGGCCGGG
- Goals
- Infer regulatory network that controls gene
expression - Identify mechanisms by which genetic variation
affects gene expression
15eQTL Data Brem et al. (2002) Science
BY
RM
Expression data
Genotype data
112 individuals
112 individuals
0101100100011 1011110100001 0010110000010
0000010100101 0010000000100
?
3000 markers
6000 genes
16Traditional Approach Single Marker
- Expression quantitative trait loci (eQTL) mapping
- For each gene, find the marker that is most
predictive of its expression level Yvert G et
al. (2003) Nat Gen.
0101100100100011 1011110111100001 0010110001000
010 0000010110100101 1110000
110000100
genes
markers
Genotype data
Expression data
17LirNet Regulatory network
- E-regulators Activity (expression) of regulatory
genes - G-regulators Genotype of genes
- Measured as values of chromosomal markers
marker
ECM18
ASG7
MEC3
GPA1
UTH1
MFA1
TEC1
HAP1
PHO3
PHO5
PHO84
SGS1
RIM15
PHM6
PHO2
PHO4
SEC59
SPL2
SAS5
GIT1
VTC3
Lee et al., PNAS 2006 Lee et al., PLoS Genetics
2009
18The Telomere Module
- 40/42 genes in telomeres
- Enriched for telomere maintenance (p lt 10-11)
helicase activity (p lt 10-18)
- Includes Rif2
- control telomere length
- establishes telomeric silencing
- 6 coding 8 promoter SNPs
- Binds to Rap1p C-terminus
Enrichment for Rap1p targets (29/42 p lt 10-15)
Lee et al., PNAS 2006
19Some Chromatin Modules
- Locus containing Sir1
- 4 coding SNPs
4/5 consecutive genes
Known Sir1 targets
- Locus containing uncharacterized
- Sir1 homologue
- 87(!) coding SNPs
5/7 consecutive genes
Lee et al., PNAS 2006
20Chromatin as Mechanism
Mechanism I
- 23 modules (out of 165) with chromosomal
features - 16 have chromatin regulators (p lt 10-7)
- Chromatin modification explains significant part
of variation in gene expression between strains
Evolutionary strategy to make coordinated
changes in gene expression by modifying small
number of hubs
Lee et al., PNAS 2006
21The Puf3 Module
weight
HAP4 TOP2 KEM1 GCN1 GCN20 DHH1
- PUF family
- Sequence specific mRNA binding proteins (3 UTR)
- Regulate degradation of mRNA and/or repress
translation
BY
RM
112 segregants
147/153 genes (P 10-130) are pulldown targets
of mRNA binding protein Puf3
PUF3 expression genotype
Lee et al., PLOS Genetics 2009
22P-Bodies
mRNAs stored in P-bodies are translationally
repressed
- Dhh1 regulates mRNA decapping in p-bodies
But what regulates sequence-specific localization
of mRNAs to P-bodies?
Sheth and Parker (2003) Science 300805
Beliakova-Bethell , et al. (2006) RNA 1294
23Microscopy experiment
- Fluorescent microscopy Puf3 localizes to P-body
- Supports hypothesis of Puf3 involvement in
regulating mRNA degradation by P-bodies
Dhh1
Dhh1
Puf3
Joint work with David Drubin and Pam Silver
Lee et al., PLOS Genetics 2009
24What Regulates the P-bodies?
- A marker that covers a large region in Chr 14.
- Region contains 30 genes and 318 SNPs.
- Experiments for all 30 genes not feasible!
BY
RM
ChrXIV449639
DHH1
BLM3
GCN20
KEM1
GCN1
Lee et al., PLOS Genetics 2009
25Outline
- Regulatory networks for gene expression
- Individual genetic variation and gene regulation
- Effect of genotype on expression
- Regulatory potential
- Cell differentiation and gene regulation
- Expression changes underlying phenotype
26Motivation
- Not all SNPs are equally likely to be causal.
Regulatory features F 1. Gene region? 2.
Protein coding region? 3. Nonsynonymous? 4.
Create a stop codon? 5. Strong conservation?
SNP 1 Conserved residue in a gene involved in
RNA degradation
SNP 2 In nonconserved intergenic region
- Idea Prioritize SNPs that have good regulatory
features - But how do we weight different features?
Lee et al., PLOS Genetics 2009
27Bayesian L1-Regularization
- higher prior variance
- weight can more easily deviate from 0
- regulator more likely to be selected
P(w) Laplacian(0,C)
P(ymxw) N (?k wmkxmk,e2)
Lee et al., PLOS Genetics 2009
28Metaprior Model (Hierarchical Bayes)
Module 1
Potential regulators
Regulatory features Inside a gene? Protein coding
region? Strong conservation? TF binds to module
genes
xN
x1
x2
Module m
w12
w11
w1N
Regulator k
Emodule 1
Regulatory potential ß1 x Inside a gene?
ß2 x Protein coding region? ß3 x Conserved?
xmk
wmk
Module M
Laplacian (0,Cmk)
xN
x1
x2
x2
xN
ym
wN2
wN1
wMN
P(ymxw) N (?k wmkxmk,e2)
Emodule M
Lee et al., PLOS Genetics 2009
29Metaprior Method
- Empirical hierarchical Bayes
- Use point estimate of model parameters
- Learn priors from data to maximize joint posterior
1. Learn regulatory programs
Maximize P(E,ß,WX)
Maximize P(E,ß,WX)
Lee et al., PLOS Genetics 2009
30Transfer Learning
- What do regulatory potentials do?
- They do not change selection of strong
regulators those where prediction of targets is
clear - They only help disambiguate between weak ones
- Strong regulators help teach us what to look for
in other regulators
Transfer of knowledge between different
prediction tasks
31Learned regulatory weights
Human regulatory weights
Yeast regulatory weights
Regulatory features
Location
AA property change
Gene function
Pairwise feature
Lee et al., PLOS Genetics 2009
32Statistical Evaluation
- PGV Percent genetic variation explained by the
predicted regulatory program for each gene
1650 genes (Lirnet)
500 1000 1500 2000 2500
of genes with PGV gt X
1450 genes (Lirnet without regulatory prior)
850 genes (Geronemo)
250 genes (Brem Kruglyak)
100 90 80 70 60 50 40 30 20 10
0
PGV ()
Lee et al., PLOS Genetics 2009
Lee et al. PNAS 2006
33Biological evaluation I
- How many predicted interactions have support in
other data? - Deletion/ over-expression microarrays Hughes et
al. 2000 Chua et al. 2006 - ChIP-chip binding experiments Harbison et al.
2004 - Transcription factor binding sites Maclsaac et
al. 2006 - mRNA binding pull-down experiments Gerber et al.
2004 - Literature-curated signaling interactions
Lirnet without regulatory features
Supported interactions
Reg
TF
Module
interactions
interactions
modules
modules
Via cascade
Lee et al., PLOS Genetics 2009
34Biological Evaluation II
Lirnet
Zhu et al (Nature Genet 2008)
Random
significance of support
Lee et al., PLOS Genetics 2009
35What Regulates the P-Bodies?
- The regulatory potential over all 318 SNPs in the
region
ChrXIV415,000-495,000
MKT1
0.7
High-scoring regulatory features
Regulatory potential
Conservation 0.230 Mass (Da) 0.066
Cis-regulation 0.208 pK1 0.060
Same GO proc 0.204 Polar 0.057
Non-syn 0.158 pI 0.050
Translation regulation Translation regulation Translation regulation 0.046
0.6
0.5
0.4
Lee et al., PLOS Genetics 2009
Saccharomyces Genome Database (SGD)
36Mkt1
- Mkt1 binds to mRNAs at 3 UTR
- BY has SNP at conserved residue in nuclease domain
mkt1D in RM
BY
RM
Lee et al., PLOS Genetics 2009
37Predicting Causal Regulators
8 validated regulators in 7 regions
14 validated regulators in 11 regions
- Finding causal regulators for 13 chromosomal
hotspots
Region Zhu et al Nat Genet 08 Lirnet (top 3 are considered) Lirnet (top 3 are considered) Lirnet (top 3 are considered)
1 None SEC18 RDH54 SPT7
2 TBS1, TOS1, ARA1, CSH1, SUP45, CNS1, AMN1 AMN1 CNS1 TOS1
3 None TRS20 ABD1 PRP5
4 LEU2, ILV6, NFS1, CIT2, MATALPHA1 LEU2 PGS1 ILV6
5 MATALPHA1 MATALPHA1 MATALPHA2 RBK1
6 URA3 URA3 NPP2 PAC2
7 GPA1 STP2 GPA1 NEM1
8 HAP1 HAP1 NEJ1 GSY2
9 YRF1-4, YRF1-5, YLR464W SIR3 HMG2 ECM7
10 None ARG81 TAF13 CAC2
11 SAL1, TOP2 MKT1 TOP2 MSK1
12 PHM7 PHM7 ATG19 BRX1
13 None ADE2 ORT1 CAT5
Lee et al., PLOS Genetics 2009
38Learning Regulatory Priors
- Learns regulatory potentials that are specific to
organism and even data set - Can use any set of regulatory features
- Sequence features
- Functional features for relevant gene
- Features of regulator/target(s) pairs
- Applicable to any organism, including ones where
functional data may not be readily available
Lee et al., PLOS Genetics 2009
39Conclusion
- Framework for modeling gene regulation
- Use machine learning to identify regulatory
program - Hierarchical Bayesian techniques to capture
regularities in effects of perturbations on
network - Uncovers diverse regulatory mechanisms
- Chromatin remodeling
- mRNA degradation
40Acknowledgements
Dana Peer Columbia University
Aimee Dudley Institute for System Biology
Su-In Lee University of Washington
- Nevan Krogan, UCSF
- Pam Silver, David Drubin, Harvard Medical School
National Science Foundation