Sequence Evolution Modeling - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Sequence Evolution Modeling

Description:

Mutations accumulate over multiple generations causing ... Wold Lab. http://weblogo.berkeley.edu. http://pipmaker.bx.psu.edu. JASPAR database. Cistematic ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 19
Provided by: sarah350
Category:

less

Transcript and Presenter's Notes

Title: Sequence Evolution Modeling


1
Sequence Evolution Modeling
  • Programming Project
  • SoCalBSI 2006
  • Sarah Aerni

2
Evolution Theory
  • Organisms share common ancestor
  • At speciation, both genomes are identical
  • Mutations accumulate over multiple generations
    causing organismal genomes to look dissimilar
  • Some regions are conserved across organisms
  • Genes
  • Functional regulatory elements
  • Regions of unknown function
  • Distance between organisms can be measured by
    edit distance between genomes

3
Evolutionary Tree Reconstruction - History
  • Emile Zuckerkandl and Linus Pauling pioneer
    reconstruction using DNA analysis
  • Use parsimony to determine the shortest distance
    between any two genomes
  • Used to study human populations
  • Out of Africa Theory
  • Modern humans did not evolve from Neanderthals
  • Based on mitochondrial data
  • Divergence between Neanderthals and humans is
    greater than the number of generations based on
    carbon dating
  • Can be important in studying disease

4
Evolution of a Genome
  • At speciation both sequences are identical
  • Mutations occur in genomes separately
  • Natural selection determines which changes will
    be carried on to future generations
  • Gene duplication
  • Positive selection increased fitness as a result
    of increased gene product (globin)
  • Negative selection over-expression of genes
    cause defects (Downs Syndrome)
  • Regulatory mutations
  • Changes in transcription factor binding sites
    affects expression of genes

5
Simulating Sequence Evolution
  • Build a model (markov 6) of the human genome
  • Create a dictionary of all 7mers and record their
    corresponding frequencies in the genome
  • Use model to create an initial sequence
  • Assume all bases are represented equally (25
    chance of A,C,G or T) and randomly create a 6mer
  • Elongate the sequence using probabilities given
    from the model built from real sequence data
  • Assign regions of sequence to be of certain types
  • Assign conserved (pick location, extend
    150-500bp)
  • Determine where motifs will be located (regions
    will be considered conserved as long as motif is
    not destroyed)

6
Simulating Sequence Evolution
  • Permit rounds of mutations to occur
  • Mutations rate will based on the profile which is
    input by user
  • Multiple motifs are possible
  • Each location is required to have a nonzero
    probability of being seen and must meet input
    threshold

7
Simulating Sequence Evolution
  • Mutation types are variable
  • Point mutations and indels are represented
  • Insertions will be repeats of the 1-3 base
    preceding or following the insertion location

8
Simulating Sequence Evolution
  • Motif changes
  • Motif will be chosen based on a profile and
    locations will be identified
  • Mutation rates will be adjusted accordingly
  • Rates will vary by location in the motif
  • Loss and appearance of motifs will be adjusted
    dynamically (after each generation)
  • If all motifs in a sequence are destroyed, the
    organism is reset to the previous generation
  • Possible directions
  • Motif finders can be used to re-identify the
    motifs
  • Allow multiple motifs to be inserted
  • Spontaneously pick up any motifs that match a
    profile
  • Incorporate distance to start site?

9
Human-Rodent precursor model
  • Peroxisome-proliferator-activated receptor gamma
  • Bound by heterodimer
  • Profile at right used for mutations in motifs

10
Human-Rodent precursor model
  • 10 million generations
  • 60 kb regions, 5 conservation initially
  • Requires minimum 1 motif per sequence
  • Threshold for motif is 80 identity
  • Mutation rates
  • Conserved 1 mutation/100mb/generation
  • Non-conserved 1 mutation/60mb/generation
  • 50 simulations to collect results

11
Human-Rodent results
  • Distribution of mutations
  • Average sequence size is 60041 bp, 5.02
    conserved after simulation
  • pipmatcher program was able to pick up
    conservation regions
  • 46 sequences had only one motif remaining, 3
    sequences with two, 1 sequence with three
  • Would incorporation of selective advantage for
    multiple TFBS may change results?

12
Conservation results
  • Used pipmaker
  • Focus on exons still picks up conservation
  • Strong argument that conserved regions contain
    elements of interest

13
Conservation results
14
Motif Results
  • Most sequences experience at least one motif loss
  • They occur indiscriminatelythroughout
    simulation

15
Motif Results
  • Most sequences do not evolve multiple binding
    sites
  • Death of a motif is more common
  • Adjust model for improved fitness?

16
Motif results
  • Identified motif locations were used to create a
    new logo
  • Logo is very similar to actual logo
  • If threshold is increased, profile would look
    more strict
  • Could this technique be used to determine motif
    finding thresholds?

Logo created from motifs in evolved sequences
Original logo
17
Further development?
  • Try to answer more complex evolutionary events
  • Gene duplication
  • Incorporate positive and negative selection for
    binding sites
  • Incorporate multiple motifs
  • Try to rediscover sites with motif finders
  • Model could easily be used as a source of
    synthetic data for training motif finders
  • Determine appropriate thresholds

18
Acknowledgments and References
  • Ali Mortazavi
  • SoCalBSI
  • Wold Lab
  • http//weblogo.berkeley.edu
  • http//pipmaker.bx.psu.edu
  • JASPAR database
  • Cistematic
Write a Comment
User Comments (0)
About PowerShow.com