Finding regulatory modules: A statistical approach - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Finding regulatory modules: A statistical approach

Description:

Find sequences of 2 or more sites shared across a number of annotations (common annotations) ... Transitions probabilities depend only on the present state ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 20
Provided by: mikhailv
Category:

less

Transcript and Presenter's Notes

Title: Finding regulatory modules: A statistical approach


1
Finding regulatory modules A statistical approach
  • Mikhail Velikanov
  • Linnaeus Centre for Bioinformatics

2
Introduction
  • Regulatory modules (RMs) sets of regulatory
    sites that work cooperatively
  • TF binding sites and promoter elements
  • Splicing enhancers and suppressors
  • Site clusters
  • site A AND (site B OR site C) AND (NOT site D)
  • Beads on a spring
  • site B is 20 3 bp downstream of site A
  • Distance distributions have a short range and a
    well-defined peak

3
Searching for RMs Setup of the Problem
Motifs
Annotations
  • Seq. length constant and small (0.5 kb)
  • Num. of sites 20
  • No overlapping sites
  • Sites characterized by
  • Identity
  • P-values ( pt)
  • shown by width

Look for annotation patterns that occur
consistently in all or some of the sequences.
4
RMs as Annotation Alignments
  • Align sites by identity
  • Find sequences of 2 or more sites shared across a
    number of annotations (common annotations)
  • Conditions
  • Distances between sites are similar
  • P-values of aligned sites are similar
  • P-values of aligned sites are small

Need a function that measures how well conditions
(1-3) are satisfied (strength of common
annotation).
5
Strength of common annotation site p-values
  • Assume a common annotation of S sites supported
    by N sequences
  • For the i-th site, let pimin, pimax be the
    smallest and the largest of the N p-values
  • pimax measure of how small p-values are
  • Ri pimax/pimin measure of similarity
  • Probability pi of observing p-values as similar
    and as small in N random annotations

6
Strength of common annotationdistances between
sites
  • Account for no overlaps between sites
  • renormalization of pi for each site
  • p0 1 - ?pi positions between sites
  • Compute approximately probability of common
    annotation PCA as a function of p0, p1, , pS
  • Strength of common annotation
  • Z -ln PCA

S


i1



7
Searching for the strongest common annotations
  • Given an input set of annotations, define groups
    of annotations such that
  • each group has at least one common annotation
  • the strongest common annotation of each group is
    distinct
  • NB Groups may fully or partially overlap!

Cannot use standard clustering algorithms.
8
Classification Algorithm
  • Find pairs of annotations with at least one
    common annotation
  • Each pair is a nucleus of a potential group
  • Each group grows by adding annotations one at a
    time
  • the group retains its strongest common annotation
    at each step
  • each addition maximizes the group strength
  • annotation added to one group remains available
    for addition to other groups
  • Where does the growth stop?

(strength group strength, Zg)
9
Stopping criterion
  • No more annotations can be added
  • group contains all annotations in the input set
  • change in the strongest common annotation
  • Formed during growth of another group
  • ignore current group (pruning)
  • Group strength is too small
  • adding an unrelated annotation
  • group strength Zg is a score (Zg gt 0)
  • can be computed for groups of random annotations
  • by the extremal types theorem
  • lim Prob(Zgrand gt Zg) 1 - exp-(Zg/b)-a
  • threshold on Zg
  • numerical calibration of a, b for all possible N,
    S

n ? 8
10
From annotation groups to RMs
  • Need a way to
  • account for optional sites
  • search for homologous RMs

11
RMs as generalized HMMs
  • Generalized (duration) HMMs (gHMMs or dHMMs)
    consist of 2 types of states
  • motif states (PSSMs)
  • annotation sites
  • spacer states (distance distributions)
  • gaps between sites
  • States are connected according to certain
    topology
  • Transitions probabilities depend only on the
    present state
  • Common annotations of groups are simple gHMMs

12
RMs as generalized HMMs
Can make a single model because of the overlap!
  • Common annotations define gHMM states
  • Overlaps define topology and provide estimates of
    transition probabilities
  • Multiple matches to the model

13
From annotations to RMs
RMs
14
Testing the Method Test 1
  • 25 random DNA sequences, 20 are seeded with an
    RM
  • 2 sites with low p-value (lt 10-3) separated by 20
    25 bp
  • Scan sequences with unrelated motif subject to
    p-value threshold
  • 3rd site (random noise in annotations)

m0
m1
15
Testing the Method Test 2
  • 25 random DNA sequences, 2 non-overlapping groups
    of 10 and 11 sequences
  • each group is seeded with a distinct RM (2
    sites)
  • distance between sites is 20 25 bp or 52 55
    bp
  • Extra site added as before

16
Testing the Method Test 3
  • 25 random DNA sequences, 2 overlapping groups of
    12 and 14 sequences
  • same RMs as in previous test
  • groups overlap by 5 sequences
  • Extra site added as before

17
Summary
  • A method for discovery of regulatory modules
    given a set of annotated sequences
  • Builds RMs from recurrent annotation patterns
  • Treats site p-values and distances in consistent
    statistical framework
  • Can use prior information on RMs (Bayesian
    approach)
  • RMs are output as gHMMs
  • flexibility of RMs structure (topology)
  • searching for homologous RMs

18
Future developments
  • Testing the method on real data
  • upstream regions of bacterial operons
  • bacterial Fe-regulons
  • other benchmark sets?
  • Algorithm improvements
  • better stopping criterion (use properties of
    distance distributions)
  • more precise computation of common annotation
    strength
  • better similarity measure for site p-values
    (reduce compensation)

19
Acknowledgements
  • Thanks to David Ardell (LCB, Uppsala) and
    Georgiy Sofronov (Univ. of Queensland, Brisbane)
    for many fruitful discussions
Write a Comment
User Comments (0)
About PowerShow.com