Title: Operon Functional module
1Dividing genome into operons ?
Tells us about the high-order organization of the
genome. Insights about gene function improve
automated function assignment.
Slides by Carl Kingsford
2Classification problem
Which intergenic regions are at the ends of
operons?
Look for a sequence signal that marks the end of
an operon, called an intrinsic terminator.
Slides by Carl Kingsford
3Bacteria that cause
( a hundred more) rely on intrinsic terminators
Slides by Carl Kingsford
4Intrinsic Transcription Terminators
In the DNA
s
RC(s)
T.T..T.T
Gene
loop
Short hairpin followed by a thymine (T)-rich
sequence.
hairpin
s
RC(s)
T-tail
Slides by Carl Kingsford
5How can we find intrinsic terminators?
6Previous Approaches
- Only hairpin portion(Unniraman, 2002 Washio,
1998) - Limited organisms(e.g. Lesik, 2001)
- Slow, exhaustive search with scoring scheme that
tended to overestimate significance(Ermolaeva,
2000) - Simple decision rules (de Hoon, 2005
dAubenton Carafa, 1990)
Slides by Carl Kingsford
7Method Overview
Find candidate terminators
Weak filtering
Score candidates
Score candidate by comparing to whats expected
by random chance
Find hairpins followed by thymines
Remove bad hairpins, tails, overlapping
candidates
TransTermHP
C Kingsford, K Ayanbule, S Salzberg. Genome
Biology 8 (2007)
8Finding Candidate Terminators
(1)
Find best hairpin anchored at
Evaluate 15 nt of tail, using heuristic by
dAubenton Carafa et al
hairpin_score ?
(Ermolaeva, 2000)
H
T
Slides by Carl Kingsford
9Scoring Candidates
(3)
Distribution of (H, T) scores in random sequence
Tail Score (T)
Hairpin Score (H)
Slides by Carl Kingsford
10Performance of TransTermHP vs. de Hoon et al.
Training Error in B. subtilis
- Classification problem
- de Hoon et al. (2005) collect examples from B.
subtilis literature (458 , 567 ). - They fit a line separating (H,T) pairs in
regions from those in regions. - Report their training error.
True positives
ROC curve
False positives
Slides by Carl Kingsford
11- Running Time
- Runs on Bacillus subtilis (4.2 Mb) in 50
seconds. - Previous state-of-the-art 2 days (de Hoon, et
al.) - 343 prokaryotic genomes in lt 4 hours.
- (vs. 500 days for de Hoon et al.)
Slides by Carl Kingsford
12Many ??? have terminators.
Few ??? have terminators.
Tree of (sequenced) prokaryotic life
Vibrio e.g. V. choleræ, V. fischeri, V.
vulnificus
Firmicutes e.g. B. anthracis, Streptococcus
pneumoniae, Staphylococcus aureus, M. pnuemoniæ,
M. genitalium
Pasteurellaceæ e.g. H. influenzæ, H. ducreyi, M.
succiniciproducens
Neisseria e.g. N. meningitidis, N. gonorrhoeæ
Slides by Carl Kingsford
13DNA Uptake
- Some Neisseria Pasteurellaceae can bring DNA
from environment into the cell - DNA uptake sequence (DUS) lets organism recognize
DNA from related organisms (Sisco Smith, 1979) - Anecdotal evidence that terminators often contain
DUS signals(Goodman Scocca, 1988 Kroll1992
Smith1995 Smith1999)
DNA uptake sequence receptor
DUS motif
Cell
exogenous DNA
Slides by Carl Kingsford
14Many Terminators Contain DUS
best, high-confidence terminators that contain
a DUS
Genus known DUS motif
Neisseria
GCCCGTCTAA
Pasteurellaceæ
AAGTGCGGT
Slides by Carl Kingsford
15Discovery of DUS Motif in H. Ducreyi
In 16 of best, high-confidence terminators
Occurs 1371 times (expected 140)
16TransTermHP ?
Fast, accurate computational method for finding
terminators
Largest collection of terminator predictions
available
Quantify relationship between terminators DNA
uptake signals
Assess importance of intrinsic terminators across
many genomes
Discovery of a new uptake signal in H. ducreyi.
Slides by Carl Kingsford