Title: Efficient Exact p-Value Computation and Applications to Biosequence Analysis
1Milestones due today. Anything to report?
2Lecture 17
- Ultraconservation evolutionary data
- Finish early come hear the talk with us?
3Sequence Conservation implies Function
Comparative Genomics of Distantly related species
functional region!
...CTTTGCGA-TGAGTAGCATCTACTATTT...
human
mammalian ancestor
...ACGTGGGACTGACTA-CATCGACTACGA...
mouse
- (but which function/s?...)
4Human Genome full of Conserved Non-Coding
Elements
Human Genome 3109 letters
1.5 known function
compare to other species
gt50 junk
gt5 human genome functional
3x more functional DNA than known!
106 substrings do not code for protein
What do they do then?
5Conserved elements in the Human Genome
all human-mouse alignments
human-mouse ancestral repeats alignment
human-mouse ancestral repeats alignment
election
Difference 5 of Human Genome
85id on average
Mouse consortium, Nature 2002
6Conserved elements in the Human Genome
all human-mouse alignments
human-mouse ancestral repeats alignment
human-mouse ancestral repeats alignment
Simple but Unexpected (the lure of Bioinormtaics)
election
Difference 5 of Human Genome
Ultraconservation
85id on average
Mouse consortium, Nature 2002
7Typical DNA Conservation levels
(dot base identical to human)
Conserved elements between human and mouse are
on average 85 identical. mouse consortium, 2002
8Ultraconserved Elements
fish
481 elements perfectly conserved (100id) over
200bp or more between human, mouse and rat.
Bejerano et al., Science 2004
9Ultraconserved Elements Why?
Hundreds of long substrings identical between
amniotes? they must have rejected many different
changes. But... all functions we understand in
our genome are encoded using redundant codes.
CDS
ncRNA
TFBS
seq.
10Ultras are Functional
Back in 2004 we hypothesized
481 ultraconserved elements
nonexonic subset transcriptional regulators
exonic subset post transcriptional regulation
Pennacchio et al., Nature, 2006
Ni et al., Genes Dev. Lareau et al., Nature,
2007
11Genomic Distribution of Ultraconserved Elements
Origins?
12UC.338 comes from an ancient repeat
ultraconserved exon
novel coelacanth repeat
enhancer
LF-SINE
Bejerano et al, Nature ,2006
13Ultras are Under Strong Human Selection
Mutational cold spots? NO. Rare (new) mutations
are introduced to the population. Fierce
purifying selection? YES. Very few of these get
anywhere near fixation.
A
A A G A
chimp
humans
NonSyn DAF
Ultra DAF
Katzman et al, Science ,2007
14Touch an Ultra And You - DIY
Ahituv et al., PLoS Biology, 2007
15What cant we measure in the lab?
Ne is population size, s selective
dis/advantage. Both of which are VERY wrong in
the lab.
16So it can happen but does it FIX?
DNA element
t
mouse
17Count Fraction Lost, Binned by id
bin by id
count_all
t
human macaque dog mouse rat
count_hole
100bp sliding window
dog
rat
mouse
human
macaque
18Quite Some Time Later
19Pragmatic Genomics
define goal run sensible approach while (results
full of artefacts) characterize artefact
write handler into code rerun
bio
cs
bio
cs
eg sequencing errors, assembly errors
contaminating sequence, ambiguous situations, etc.
20Ultras are Fiercely Retained through Evolution
No Apparent Phenotype
100id primates-dog 1,691,090bp rodents
deleted 1,447bp (0.086)
Ultras are gt300 fold more persistent than neutral
DNA
But Doomed ...
the genomic deletion is
(25 deleted)
21How special are the Ultras?
election
Ultraconservation
22Adding More Species
Aha!!
23Adding More Species
More and more species
Few species
Hmmm.
24Most Non-Coding Elements likely work in cis
IRX1 is a member of the Iroquois homeobox gene
family. Members of this family appear to play
multiple roles during pattern formation of
vertebrate embryos.
gene deserts
regulatory jungles
9Mb
25 and Ultras are the tip of a functional iceberg
gene deserts
regulatory jungles
9Mb
This dense regulatory jungle contain a single
ultra