Title: FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS
1FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS
- Jayesh Pandey, Mehmet Koyuturk, Wojciech
Szpankowski, and Ananth Grama. - PURDUE UNIVERSITY
- DEPARTMENT OF COMPUTER SCIENCE
- Supported by the National Institutes of Health
2GENE REGULATION
- Gene expression is the process of synthesizing a
functional protein coded by the corresponding
gene - Genes (and their products) regulate the extent of
each others expression - Any step of gene expression can be modulated
- Transcription, translation, post-transcriptional
modification, RNA transport, mRNA degradation
Ligand independent transcriptional regulation at
chromatin level
3GENE REGULATORY NETWORKS
- Model the organization of regulatory interactions
in the cell - Genes are nodes, regulatory interactions are
directed edges - Boolean network model Edges are signed,
indicating up- (promotion) and down-regulation
(supression)
Flowering time in Arabidopsis
4MOLECULAR ANNOTATION
- Similar systems involving different molecules
(genes, proteins) in different species - Functional annotation of genes provides an
unified understanding of the underlying
principles - Molecular function What is the role of a gene?
- Biological process In which processes is a gene
involved? - Cellular component Where is a genes product
localized? - Gene Ontology provides a library of molecular
annotation - We refer to each annotation class as a functional
attribute
5FROM MOLECULES TO SYSTEMS
- Networks are species-specific
- Annotation is at the molecular level
- Map networks from gene space to function space
- Can generate a library of annotated modular
(sub-) networks
Network of Gene Ontology terms based on
significance of pairwise interactions in yeast
synthetic gene array (SGA) network (Tong et al.,
Science, 2004)
6INDIRECT REGULATION
- Assessment of pairwise interactions is simple,
but not adequate
g1
g3
g5
g1
g3
g5
g2
g4
g4
g2
g4
g4
7FUNCTIONAL ATTRIBUTE NETWORKS
- Multigraph model
- A gene is associated with multiple functional
attributes - A functional attribute is associated with
multiple genes - Functional attributes are represented by nodes
- Genes are represented by ports, reflecting
context
Functional attribute network
Gene network
8FREQUENCY OF A MULTIPATH
- A pathway of functional attributes occurs in
various contexts in the gene network - Multipath in the functional attribute network
Frequency of multipath is 4 on the left, it is 0
on the right
9SIGNIFICANCE OF A PATHWAY
- We want to identify multipaths with unusual
frequency - These might correspond to modular pathways
- Frequency alone is not a good measure of
statistical significance - The distribution of functional attributes among
genes is highly skewed - The degree distribution in the gene network is
highly skewed - Pathways that contain common functional
attributes have high frequency, but they are not
necessarily interesting
10STATISTICAL INTERPRETABILITY
- Additional positive observation gt increased
significance - Additional negative observation gt decreased
significance
B
B
A
A
P(B) lt P(A)
P(B) gt P(A)
Frequency is not statistically interpretable!
11MONOTONICITY
- Frequency is a monotonic measure
- If a pathway is frequent, then all of its
sub-paths are frequent - Algorithmic advantage enumerate all frequent
patterns in a bottom-up fashion - Commonly exploited in traditional data mining
applications - Statistically interpretable measures are not
monotonic! - Statistical significance fluctuates in the search
space - Existing data mining algorithms do not apply
- Significance of pathways are non-monotonic in two
dimensions
12GO HIERARCHY
- Functional attributes are organized in a
hierarchical manner - regulation of steroid biosynthetic process is a
regulation of steroid metabolic process and is
part of steroid biosynthetic process - Interpretable statistical measures are not
monotonic with respect to GO hierarchy
P( ) lt
g1
g5
g3
P( ) lt
g2
g4
P( )
13PATHWAY LENGTH
P( ) gt P( )
P( ) lt P( )
- Open problems
- How can we effectively search in the pathway
space, where significance fluctuates? - How can we find optimal resolution in functional
attribute space?
14STATISTICAL MODEL
- Emphasize modularity of pathways
- Condition on frequency of building blocks!
- We denote each frequency random variable by N,
their realization by n - Significance of pathway p123
- p123 P (N123 n123N12n12, N23n23, N1n1,
N2n2, N3n3)
p123
N1
N2
N3
N12
N23
N123
15SIGNIFICANCE OF A PATHWAY
- Assume that regulatory interactions are
independent - There are n12 n23 occurrences of p 12 and p 23
- The probability that these go through the same
gene is 1/n2 - The probability that at least n123 of the n12n23
pairs of edges go through the same gene can be
bounded by - p123 exp(n12n23Hq(t)) where q 1/n2 and t
n123 / n12n23 - Hq(t) t log(q/t) (1-t) log((1-q)/(1-t)) is the
weighted entropy of t with respect to q - Can be generalized to pathways of arbitrary length
16SIGNIFICANCE OF AN EDGE
- A single regulatory interaction is the shortest
pathway - Statistical significance is evaluated with
respect to baseline model - The number of edges leaving and entering each
functional attribute is specified - Edges are assumed to be independent
- The frequency of a regulatory interaction is a
hypergeometric random variable - Can derive a similar bound for the p-value of a
single regulatory interaction
17ALGORITHMIC ISSUES
- Significance is not monotonic
- Need to enumerate all pathways?
- Strongly significant pathways
- A pathway is strongly significant if all its
building blocks are significant (defined
recursively) - Allows pruning out the search space effectively
- Shortcutting common functional attributes
- Transcription factors, DNA binding genes, etc.
are responsible for mediating regulation - Shortcut these terms, consider regulatory effect
of different processes on each other directly
18NARADAhttp//www.cs.purdue.edu/homes/jpandey/nara
da/
- A software for identification of significant
pathways - Queries
- Given functional attribute T, find all
significant pathways that originate at T - Given functional attribute T, find all
significant pathways that terminate at T - Given a sequence of functional attributes T1, T2,
, Tk, find all occurrences of the corresponding
pathway - Identified pathways are displayed as a tree
- User can explore back and forth between the gene
network and the functional attribute network
19RESULTS
- E. coli transcription network obtained from
RegulonDB - 3159 regulatory interactions between 1364 genes
- Using Gene Ontology, 881 of these genes are
mapped to 318 processes
Pathway length 2 3 4 5
All 427 580 1401 942
Strongly significant 427 208 183 142
Common terms shortcut 184 119 3 1
20MOLYBDATE ION TRANSPORT
Significant regulatory pathways that originate at
molybdate ion transport
Their occurrences in the gene network
21WHAT IS SIGNIFICANT?
- Molybdate ion transport regulates various
processes directly - Mo-molybdopterin cofactor biosynthesis,
oligopeptide transport, cytochrome complex
assembly - It regulates various other processes indirectly
- Through DNA-dependent regulation of
transcription, two-component signal transduction
system, nitrate assimilation - Direct regulation of these mediator processes is
not significant - NARADA captures modularity of indirect
regulation!
22CONCLUSION
- Mapping gene regulatory networks to functional
attribute space demonstrates great potential - Abstract, unified understanding of regulatory
systems - Algorithmically, a wide range of new challenges
- How can we bound interpretable statistical
measures? - How can we handle hierarchy in functional
attribute space? - Discovering new information
- How can we project identified canonical
patterns on other species to discover new
regulatory relationships?