MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays - PowerPoint PPT Presentation

About This Presentation
Title:

MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays

Description:

MOPAC: Motif-finding by. Preprocessing and Agglomerative. Clustering from Microarrays ... Many algorithms exist for motif finding. assume cluster (gene set) is ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 17
Provided by: thomasr1
Category:

less

Transcript and Presenter's Notes

Title: MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays


1
MOPAC Motif-finding by Preprocessing and
AgglomerativeClustering from Microarrays
  • Thomas R. Ioerger1
  • Ganesh Rajagopalan1
  • Debby Siegele2

1Department of Computer Science 2Department of
Biology Texas AM University
2
Analyzing Gene Expression Patterns
  • DNA microarrays
  • 4000 genes E. coli, 6000 genes for yeast
  • Compare expression levels between conditions
  • Example starvation response in E. coli
  • starve cells for nutrient sources
  • reintroduce gt recovery gt exponential growth
  • which genes show changes in response?

3
  • types of response
  • up-regulation
  • down-regulation
  • transient response (spike)
  • (arbitrary temporal patterns)
  • Problem can cluster genes based on response
    pattern, but then what?
  • not all genes in cluster are regulated the same
    way

4
  • Couple with genomic analysis
  • search for common motifs in up-stream regions
  • subsets of co-regulated genes within clusters
  • Assumptions
  • 1. regulation occurs by interaction of
    transcription factors with small motifs
    (10-20bp) within several hundred bp of
    transcription start site
  • 2. among many motifs, the ones of interest will
    be common to some genes in a cluster, but not
    found in any genes outside (with different
    responses)
  • 3. the motif does not have to be shared by all
    genes in the cluster, only a subset

5
Related Work
  • Many algorithms exist for motif finding
  • assume cluster (gene set) is already defined
  • word/string analysis models
  • probabilistic models
  • Gibbs sampling (AlignACE, MotifSampler)
  • Expectation Maximization (MEME)
  • HMMs
  • graph algorithms (e.g. clique)
  • Pevzner and Sze
  • what if motif only appears in a subset of genes?
  • count as parameter in MotifSampler, MEME

6
Overview Our Approach
  • 1. Definition of regulation patterns
  • 2. Extraction of upstream sequences (for up-reg)
  • 3. Define control set (genes with no change)
  • 4. Make a list of all 12-mers in upstream regions
  • 5. Find motifs that occur (more than once) in
    up-regulated set, but not at all in control set
  • 6. Group the motifs using clustering, form
    consensus of patterns

7
Define Regulation Patterns
  • measured at 0, 5, and 15min after recovery
  • discrete representation of changes in expression
    levels
  • relative to exp. growth phase conditions
  • 1 gt2-fold increase
  • -1 gt2-fold decrease
  • 0 otherwise (no significant change)
  • up-regulation patterns
  • (0,1,1) (0,1,0) (0,0,1) (-1,1,1) (-1,1,0)
    (-1,0,1)
  • define control set (0,0,0) (1,1,1) (-1,-1,-1)

8
Extraction of Upstream Sequences
  • nominally, 600bp upstream of translation start
    site (i.e. ORF not transcription start)
  • If gene is a member of an operon
  • take 300bp upstream of gene
  • plus 300bp upstream of translation start of first
    gene in operon
  • databases K12 sequence GOLD
  • operon relationships E. coli Linkage Map (Berlyn
    et al.)
  • use reverse complement if transcribed in rev.

9
Pre-processing
  • extract all 12-mers (overlapping) from upstream
    regions of up-regulated genes
  • note better than DFS
  • remove those that appear in the control set
  • remove those that are dissimilar to everything
    else (de-noising)
  • scoremean distance to all motifs not in same
    upstream region or operon
  • remove if scoregt9/12 mis-matches

10
Clustering
  • compute similarity matrix among motifs
  • repeatedly merge closest neighbors
  • minimum spanning tree
  • single-linkage clustering
  • Stop merging when distgt3/12 mismatches
  • Form consensus relax constraints on nucleotides
    at position by disjunction
  • ACCATGGTATC
  • ACGATGGTATT
  • ACTATAGTATC
  • AC(CTG)AT(AG)GTAT(TC)

11
Experiments
  • Starvation of E. coli for glucose in medium
  • 3 time-points starved (0min), 5min, 15min
  • Data collected in Siegele lab
  • up-regulated 22 genes
  • control set 1361 genes

12
Motifs Found
13
Sequence Logos
14
Distance to Transcription Start
15
Other Forms of Validation
  • Palindromicity 11/13 motifs have indexgt0.5
  • TRANSFAC database
  • e.g. motif 2 matches pattern for MetJ-MetF site
  • a number of other hits for known transcription
    factors
  • biological verification awaits...
  • role in regulation pathway for starvation
    response?

16
Conclusions
  • Augment cluster-analysis of expression patterns
    with motif analysis
  • Efficient method for generating candidates
  • from 12-mers in upstream regions
  • Efficient method for screening them
  • empirically, against a control set, rather than
    probabilistic background model
  • Advantage Pattern does not have to be in all the
    genes in a set
  • Challenges defining appropriate upstream regions
    and the right control set (as filter)
Write a Comment
User Comments (0)
About PowerShow.com