Understanding Gene Regulation: From Networks to Mechanisms - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Understanding Gene Regulation: From Networks to Mechanisms

Description:

Understanding Gene Regulation: From Networks to Mechanisms Daphne Koller Stanford University – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 41
Provided by: Era141
Learn more at: http://ai.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Understanding Gene Regulation: From Networks to Mechanisms


1
Understanding Gene Regulation From Networks to
Mechanisms
  • Daphne Koller
  • Stanford University

2
Gene Regulatory Networks
Controlled by diverse mechanisms
Modified by endogenous and exogenous perturbations
http//en.wikipedia.org/wiki/Gene_regulatory_netwo
rk
3
Goals
  • Infer regulatory network and mechanisms that
    control gene expression
  • Identify effect of perturbations on network
  • Understand effect of gene regulation on phenotype

4
Outline
  • Regulatory networks for gene expression
  • Individual genetic variation and gene regulation
  • Cell differentiation and gene regulation
  • Expression changes underlying phenotype

5
Regulatory Network I
  • mRNA level of regulator can indicate its activity
    level
  • Target expression is predicted by expression of
    its regulators
  • Use expression of regulatory genes as regulators

Transcription factors, signal transduction
proteins, mRNA processing factors,
Segal et al., Nature Genetics 2003 Lee et al.,
PNAS 2006
6
Regulatory Network II
  • Co-regulated genes have similar regulation
    program
  • Exploit modularity and predict expression of
    entire module
  • Allows uncovering complex regulatory programs

module
Regulatory Program ?
Segal et al., Nature Genetics 2003 Lee et al.,
PNAS 2006
7
Module Networks
  • Learning quickly runs out of statistical power
  • Poor regulator selection lower in the tree
  • Many correct regulators not selected
  • Arbitrary choice among correlated regulators
  • Combinatorial search
  • Multiple local optima

Segal et al., Nature Genetics 2003
8
Regulation as Linear Regression
  • minimizew (Swixi - ETargets)2
  • But we often have hundreds or thousands of
    regulators
  • and linear regression gives them all nonzero
    weight!


xN
w2
w1
wN
parameters
ETargets
ETargets w1 x1wN xNe
Problem This objective learns too many regulators
9
Lasso (L1) Regression
  • minimizew (w1x1 wNxN - ETargets)2 ? C wi
  • Induces sparsity in the solution w (many wis set
    to zero)
  • Provably selects right features when many
    features are irrelevant
  • Convex optimization problem
  • Unique global optimum
  • Efficient optimization
  • But, arbitrary choice among correlated regulators


xN
L2
L1
w2
w1
wN
parameters
ETargets
Tibshirani, 1996
10
Elastic Net Regression
  • minimizew (w1x1 wNxN - ETargets)2 ? C wi
    ? D wi2
  • Induces sparsity
  • But avoids arbitrary choices among relevant
    features
  • Convex optimization problem
  • Unique global optimum
  • Efficient optimization algorithms


xN
L2
L1
w2
w1
wN
ETargets
Zhou Hastie, 2005
Lee et al., PLOS Genetics 2009
11
Learning Regulatory Network
  • Cluster genes into modules
  • Learn a regulatory program for each module

ECM18
ASG7
MEC3
GPA1
UTH1
GPA1
MFA1
MFA1
TEC1
  • This is a Bayesian network
  • But multiple genes share same program
  • Dependency model is linear regression

HAP1
PHO3
PHO5
PHO84
SGS1
RIM15
RIM15
PHM6
PHO2
PHO4
PHO2
PHO4
SEC59
SPL2
SAS5
SAS5
GIT1
VTC3
Lee et al., PLoS Genet 2009
12
Outline
  • Regulatory networks for gene expression
  • Individual genetic variation and gene regulation
  • Effect of genotype on expression
  • Regulatory potential
  • Cell differentiation and gene regulation
  • Expression changes underlying phenotype

13
Genotype ? phenotype
Different sequences
Different phenotypes
ACTCGGTTGGCCTAAATTCGGCCCGG ACCCGGTAGGCCTTAATTC
GGCCCGG
ACTCGGTAGGCCTATATTCGGCCGGG
14
Genotype ? Regulation
Different sequences
ACTCGGTTGGCCTAAATTCGGCCCGG ACCCGGTAGGCCTTAATTC
GGCCCGG
ACTCGGTAGGCCTATATTCGGCCGGG
  • Goals
  • Infer regulatory network that controls gene
    expression
  • Identify mechanisms by which genetic variation
    affects gene expression

15
eQTL Data Brem et al. (2002) Science
BY
RM
Expression data
Genotype data

112 individuals
112 individuals
0101100100011 1011110100001 0010110000010
0000010100101 0010000000100
?
3000 markers
6000 genes
16
Traditional Approach Single Marker
  • Expression quantitative trait loci (eQTL) mapping
  • For each gene, find the marker that is most
    predictive of its expression level Yvert G et
    al. (2003) Nat Gen.

0101100100100011 1011110111100001 0010110001000
010 0000010110100101 1110000
110000100
genes
markers
Genotype data
Expression data
17
LirNet Regulatory network
  • E-regulators Activity (expression) of regulatory
    genes
  • G-regulators Genotype of genes
  • Measured as values of chromosomal markers

marker
ECM18
ASG7
MEC3
GPA1
UTH1
MFA1
TEC1
HAP1
PHO3
PHO5
PHO84
SGS1
RIM15
PHM6
PHO2
PHO4
SEC59
SPL2
SAS5
GIT1
VTC3
Lee et al., PNAS 2006 Lee et al., PLoS Genetics
2009
18
The Telomere Module
  • 40/42 genes in telomeres
  • Enriched for telomere maintenance (p lt 10-11)
    helicase activity (p lt 10-18)
  • Includes Rif2
  • control telomere length
  • establishes telomeric silencing
  • 6 coding 8 promoter SNPs
  • Binds to Rap1p C-terminus

Enrichment for Rap1p targets (29/42 p lt 10-15)
Lee et al., PNAS 2006
19
Some Chromatin Modules
  • Locus containing Sir1
  • 4 coding SNPs

4/5 consecutive genes
Known Sir1 targets
  • Locus containing uncharacterized
  • Sir1 homologue
  • 87(!) coding SNPs

5/7 consecutive genes
Lee et al., PNAS 2006
20
Chromatin as Mechanism
Mechanism I
  • 23 modules (out of 165) with chromosomal
    features
  • 16 have chromatin regulators (p lt 10-7)
  • Chromatin modification explains significant part
    of variation in gene expression between strains

Evolutionary strategy to make coordinated
changes in gene expression by modifying small
number of hubs
Lee et al., PNAS 2006
21
The Puf3 Module
weight
HAP4 TOP2 KEM1 GCN1 GCN20 DHH1
  • PUF family
  • Sequence specific mRNA binding proteins (3 UTR)
  • Regulate degradation of mRNA and/or repress
    translation

BY
RM
112 segregants
147/153 genes (P 10-130) are pulldown targets
of mRNA binding protein Puf3
PUF3 expression genotype
Lee et al., PLOS Genetics 2009
22
P-Bodies
mRNAs stored in P-bodies are translationally
repressed
  • Dhh1 regulates mRNA decapping in p-bodies

But what regulates sequence-specific localization
of mRNAs to P-bodies?
Sheth and Parker (2003) Science 300805
Beliakova-Bethell , et al. (2006) RNA 1294
23
Microscopy experiment
  • Fluorescent microscopy Puf3 localizes to P-body
  • Supports hypothesis of Puf3 involvement in
    regulating mRNA degradation by P-bodies

Dhh1
Dhh1
Puf3
Joint work with David Drubin and Pam Silver
Lee et al., PLOS Genetics 2009
24
What Regulates the P-bodies?
  • A marker that covers a large region in Chr 14.
  • Region contains 30 genes and 318 SNPs.
  • Experiments for all 30 genes not feasible!

BY
RM
ChrXIV449639
DHH1
BLM3
GCN20
KEM1
GCN1
Lee et al., PLOS Genetics 2009
25
Outline
  • Regulatory networks for gene expression
  • Individual genetic variation and gene regulation
  • Effect of genotype on expression
  • Regulatory potential
  • Cell differentiation and gene regulation
  • Expression changes underlying phenotype

26
Motivation
  • Not all SNPs are equally likely to be causal.

Regulatory features F 1. Gene region? 2.
Protein coding region? 3. Nonsynonymous? 4.
Create a stop codon? 5. Strong conservation?

SNP 1 Conserved residue in a gene involved in
RNA degradation
SNP 2 In nonconserved intergenic region
  • Idea Prioritize SNPs that have good regulatory
    features
  • But how do we weight different features?

Lee et al., PLOS Genetics 2009
27
Bayesian L1-Regularization
  • higher prior variance
  • weight can more easily deviate from 0
  • regulator more likely to be selected

P(w) Laplacian(0,C)
P(ymxw) N (?k wmkxmk,e2)
Lee et al., PLOS Genetics 2009
28
Metaprior Model (Hierarchical Bayes)
Module 1
Potential regulators
Regulatory features Inside a gene? Protein coding
region? Strong conservation? TF binds to module
genes

xN
x1
x2
Module m
w12
w11
w1N
Regulator k
Emodule 1
Regulatory potential ß1 x Inside a gene?
ß2 x Protein coding region? ß3 x Conserved?
xmk

wmk
Module M
Laplacian (0,Cmk)

xN
x1
x2
x2
xN
ym
wN2
wN1
wMN
P(ymxw) N (?k wmkxmk,e2)
Emodule M
Lee et al., PLOS Genetics 2009
29
Metaprior Method
  • Empirical hierarchical Bayes
  • Use point estimate of model parameters
  • Learn priors from data to maximize joint posterior

1. Learn regulatory programs
Maximize P(E,ß,WX)
Maximize P(E,ß,WX)
Lee et al., PLOS Genetics 2009
30
Transfer Learning
  • What do regulatory potentials do?
  • They do not change selection of strong
    regulators those where prediction of targets is
    clear
  • They only help disambiguate between weak ones
  • Strong regulators help teach us what to look for
    in other regulators

Transfer of knowledge between different
prediction tasks
31
Learned regulatory weights
Human regulatory weights
Yeast regulatory weights
Regulatory features
Location
AA property change
Gene function
Pairwise feature
Lee et al., PLOS Genetics 2009
32
Statistical Evaluation
  • PGV Percent genetic variation explained by the
    predicted regulatory program for each gene

1650 genes (Lirnet)
500 1000 1500 2000 2500
of genes with PGV gt X
1450 genes (Lirnet without regulatory prior)
850 genes (Geronemo)
250 genes (Brem Kruglyak)
100 90 80 70 60 50 40 30 20 10
0
PGV ()
Lee et al., PLOS Genetics 2009
Lee et al. PNAS 2006
33
Biological evaluation I
  • How many predicted interactions have support in
    other data?
  • Deletion/ over-expression microarrays Hughes et
    al. 2000 Chua et al. 2006
  • ChIP-chip binding experiments Harbison et al.
    2004
  • Transcription factor binding sites Maclsaac et
    al. 2006
  • mRNA binding pull-down experiments Gerber et al.
    2004
  • Literature-curated signaling interactions

Lirnet without regulatory features
Supported interactions
Reg
TF
Module
interactions
interactions
modules
modules
Via cascade
Lee et al., PLOS Genetics 2009
34
Biological Evaluation II
Lirnet
Zhu et al (Nature Genet 2008)
Random
significance of support
Lee et al., PLOS Genetics 2009
35
What Regulates the P-Bodies?
  • The regulatory potential over all 318 SNPs in the
    region

ChrXIV415,000-495,000
MKT1
0.7
High-scoring regulatory features
Regulatory potential
Conservation 0.230 Mass (Da) 0.066
Cis-regulation 0.208 pK1 0.060
Same GO proc 0.204 Polar 0.057
Non-syn 0.158 pI 0.050
Translation regulation Translation regulation Translation regulation 0.046
0.6
0.5
0.4
Lee et al., PLOS Genetics 2009
Saccharomyces Genome Database (SGD)
36
Mkt1
  • Mkt1 binds to mRNAs at 3 UTR
  • BY has SNP at conserved residue in nuclease domain

mkt1D in RM
BY
RM
Lee et al., PLOS Genetics 2009
37
Predicting Causal Regulators
8 validated regulators in 7 regions
14 validated regulators in 11 regions
  • Finding causal regulators for 13 chromosomal
    hotspots


Region Zhu et al Nat Genet 08 Lirnet (top 3 are considered) Lirnet (top 3 are considered) Lirnet (top 3 are considered)
1 None SEC18 RDH54 SPT7
2 TBS1, TOS1, ARA1, CSH1, SUP45, CNS1, AMN1 AMN1 CNS1 TOS1
3 None TRS20 ABD1 PRP5
4 LEU2, ILV6, NFS1, CIT2, MATALPHA1 LEU2 PGS1 ILV6
5 MATALPHA1 MATALPHA1 MATALPHA2 RBK1
6 URA3 URA3 NPP2 PAC2
7 GPA1 STP2 GPA1 NEM1
8 HAP1 HAP1 NEJ1 GSY2
9 YRF1-4, YRF1-5, YLR464W SIR3 HMG2 ECM7
10 None ARG81 TAF13 CAC2
11 SAL1, TOP2 MKT1 TOP2 MSK1
12 PHM7 PHM7 ATG19 BRX1
13 None ADE2 ORT1 CAT5
Lee et al., PLOS Genetics 2009
38
Learning Regulatory Priors
  • Learns regulatory potentials that are specific to
    organism and even data set
  • Can use any set of regulatory features
  • Sequence features
  • Functional features for relevant gene
  • Features of regulator/target(s) pairs
  • Applicable to any organism, including ones where
    functional data may not be readily available

Lee et al., PLOS Genetics 2009
39
Conclusion
  • Framework for modeling gene regulation
  • Use machine learning to identify regulatory
    program
  • Hierarchical Bayesian techniques to capture
    regularities in effects of perturbations on
    network
  • Uncovers diverse regulatory mechanisms
  • Chromatin remodeling
  • mRNA degradation

40
Acknowledgements
Dana Peer Columbia University
Aimee Dudley Institute for System Biology
Su-In Lee University of Washington
  • Nevan Krogan, UCSF
  • Pam Silver, David Drubin, Harvard Medical School

National Science Foundation
Write a Comment
User Comments (0)
About PowerShow.com