Phylogenetic footprinting - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Phylogenetic footprinting

Description:

Phylogenetic footprinting is a method for the discovery of ... Orthologous vs. Analogous ... Analogous sequences have same kind of function but are not related ... – PowerPoint PPT presentation

Number of Views:493
Avg rating:3.0/5.0
Slides: 29
Provided by: csHel
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic footprinting


1
Phylogenetic footprinting
Topics in Computational Biology Ilkka
Vaahtoranta 4.3.2004
2
Phylogenetic footprinting
  • Introduction
  • Methods used
  • Substring Parsimony Problem (Torsten)
  • Results

3
Introduction
4
The Problem
  • Major challenge of current genomics is to
    understand how gene expression is regulated.
  • An important step towards this understanding is
    the capability to identify regulatory elements.

5
In a Nutshell
  • Phylogenetic footprinting is a method for the
    discovery of regulatory elements in a set of
    orthologous regulatory regions from multiple
    species. It does so by identifying the best
    conserved motifs in those orthologous regions.
  • Idea of phylogenetic footprinting was first
    invented as early as 1988 (Tagle, Koop,
    Goodman...)
  • It was at that time little ahead of its time
  • Only few sequences from related species were
    available

6
Orthologous vs. Analogous
  • Orthologous sequences have the same function in
    different species and are related
  • Analogous sequences have same kind of function
    but are not related
  • Phylogenetic footprinting uses othologous
    sequences

7
Regulatory Elements
RE's
Exon
Intron
5
Promoter sequence
Gene
Promoter
  • Lies usually before the actual gene
  • Rarely after the gene
  • Aproximately 600-1000 bp long sequence
  • Holds regulatory elements

Regulatory elements
  • Relatively short sequences, from 5 to 25 bp long
  • May hold gaps
  • Appear in othervice non-functional sequence

8
Multiple genes in single species
  • Single species
  • Related genes
  • This technique is used to find common regulatory
    factors
  • Only in given organism
  • REs of single gene are not found
  • This is not phylogenetic footprinting

9
Multiple species with orthologous regulatory
regions
What do we need to identify regulatory elements?
  • Set of orthologous non-functinal DNA from species
    that are related
  • For an example one might use the non-coding
    sequence of insulin in ten different vertebrates
  • If well conserved, possible RE
  • This is phylogenetic footprinting

10
Why examine non-functional sequences?
  • Functional sequences evolve slower rate than
    non-functional sequences cause of the selective
    pressure
  • A transition in a functional sequence (gene) may
    change the whole function of coded protein
  • A transition in a non-functional sequence (RE)
    may only change expression freqvency of a gene

11
Phylogenetic footprinting exploits the mutation
rate difference of functional and non-functional
sequences
12
Methods used
13
Global Multiple Alignment
  • CLUSTLAW, GMA tool
  • Global Multiple alignment drawbacks
  • It is a np hard problem.
  • If optimal MA could identify all REs, we could
    not compute it.
  • Because REs are quite short (10 in 1000
    nucleotides), noise of diverged non-functional
    sequences will overcome the short conserved
    signal.

14
Classical motif finding
  • Standard motif finding (MEME, AlignAce,
    ANN-Spec....)
  • Segment based motif finding (DIAGLIN...)
  • Outperform global multipple alignment

All have important shortcoming
  • Do not take account phylogenetic relationships
  • Closely related sequences have too high weight

15
Substring Parsimony Problem, a motif finding
algorithm
  • Formalization of the PF idea
  • Also NP-hard problem but easy to tune up to
    eliminate exponential behavior
  • Substring parsimony searches for best alignments
    in given sequence set.
  • Difference between substringparsimony and
    multiple alignment lies in given phylogenetic
    tree.
  • Multiple alignment does not care about
    relationships of given species. This leads in
    situation where closely related sequences of
    given set gets relatively high weight in the
    solution.

16
Substring Parsimony Problem
Your turn Torsten...
17
Results
18
The Footprinter
  • Available for free at http//bio.cs.wshington.edu
    /software.html
  • Uses substring parsimony method to define
    possible motifs
  • Is under constant development
  • Example data at
  • http//www.soe.ucsc.edu/blanchem/gh1/some_gh1.fa.
    main.html

19
Algorithm performance
  • FootPrinter program
  • Data set c-myc proto-oncogene upstream sequences
  • k12
  • d3
  • n10
  • L varies between 450 to 900 nucleotides
  • Solution 3 distinct conserved substrings
  • Computer P3 550MHz, 512 RAM

20
Algorithm performance

21
FP, Example results 1
  • Metallothionein Gene family
  • Promoter sequences available for wide variety of
    species
  • REs have been experimentally determined in
    several species
  • 4 major isoforms
  • FootPrinter
  • 590 bp upstream
  • K7,8,9,10
  • 12 highly conserved regions of wich 4 have been
    confirmed
  • REs found present in most of the isoforms

22
(No Transcript)
23
FP, Example results 2
  • Insulin gene family
  • two rodents and a pig (two gene copies each)
  • motifs with 0 mutations K8
  • motifs with 1 mutation K9,10
  • Footprinter
  • Find 4 verified motifs
  • Many were missed cause they contained too many
    mutations
  • Search for motifs lengths of 12 and 15 did not
    fix the results

24
FP, insulin gene family
  • Why many known binding sites were lost?
  • Five categories
  • No matches in other species
  • Concerved regions are shorter than looked for
  • Insertions and deletions not allowed
  • Motifs do not meet statistical thresholds
  • Motifs with internal variable sequence

25
Other computational methods
  • Clustalw
  • Tree based global multiple alignement tool
  • Good results with closely related sequences
  • Bad results if sequences are diverged
  • As fast as Footprinter on this test set
  • Diaglin
  • Segment based multiple alignment tool
  • Good results cause of search for short concerved
    regions
  • Same set of found motifs as footprinter on this
    test set
  • 10 times slower than Footprinter on large
    datasets

26
Other computational methods
  • MEME
  • Motif finding tool
  • Searches for motifs with high information content
  • Motifs my appear in different order in sequences
  • Approximately same set of found motifs as
    Footprinter
  • As fast as Footprinter on this test set

27
Future work
  • Lack of sequences
  • Inaccuracies in phylogenetic tree
  • More need to know on what to expect
  • More filtering accuracy
  • How unusually well concerved the region is
  • Predicting pairs and triplets

28
Questions
Write a Comment
User Comments (0)
About PowerShow.com