Title: Annotation and Alignment of the Drosophila Genomes
1Annotation and Alignment of the Drosophila
Genomes Centro de Ciencas Genomicas, May 29,
2006.
2(No Transcript)
3Genes or Regulation?
- 10,516 putative orthologs have been identified
as a core gene set conserved over 2555 million
years (Myr) since the pseudoobscura/melanogaster
divergence - Cis-regulatory sequences are more conserved
than random and nearby sequences between the
speciesbut the difference is slight, suggesting
that the evolution of cis-regulatory elements is
flexible
Richards et al., Comparative genome sequencing of
Drosophila pseudoobscura Chromosomal, gene, and
cis-element evolution, Genome Res., Jan 2005.
4http//rana.lbl.gov/drosophila/wiki/
5BP England, U Heberlein, R Tjian. Purified
Drosophila transcription factor, Adh distal
factor-1 (Adf-1), binds to sites in several
Drosophila promoters and activates transcription,
J Biol Chem 1990.
6S. Chatterji and L. Pachter, GeneMapper
Reference based annotation with GeneMapper, in
press.
http//bio.math.berkeley.edu/genemapper/
7Genes or Regulatory Elements?
- 10,516 10,867 putative orthologs have been
identified as a core gene set conserved over
2555 million years (Myr) since the
pseudoobscura/melanogaster divergence - Cis-regulatory sequences are more conserved
than random and nearby sequences between the
speciesbut the difference is slight, suggesting
that the evolution of cis-regulatory elements is
flexible
Richards et al., Comparative genome sequencing of
Drosophila pseudoobscura Chromosomal, gene, and
cis-element evolution, Genome Res., Jan 2005.
8Alignment of coding sequence
DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCG
CAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_
GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTT
TGATTTCCAGTACTC DroMoj_20041206_
GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATT
TCCAGTACTC DroPse_1_
GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATT
TCCAGAATTC DroSim_20040829_
GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATT
TCCAGTACTC DroVir_20041029_
GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACT
TCCAGTACTC DroYak_1_
GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACT
TCCAGTACTC
Alignment of non-coding sequence
DroAna_20041206_ CTGAAGGAAT-------TCTATATT---
------AAAGAAGATTTCTCATCATTGGTTG DroMel_4_
CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA
---------GTTT DroMoj_20041206_
CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGA
AA------- DroPse_1_
CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCAT
CATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCAT
TAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_200
41029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT-
-AATTTGGTCCAAA------- DroYak_1_
CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC----
-----CTTT
DroAna_20041206_ AATC-----ACTTAC DroMel_4_
ATTCTATGGACTCAC DroMoj_20041206_
----TATTTACTCAC DroPse_1_
------TGTACTTAC DroSim_20040829_
ATTCTATGGACTCAC DroVir_20041029_
----TATTTACTCAC DroYak_1_
ATTTCATAAACTCAC
9Alignment of coding sequence
DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCG
CAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_
GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTT
TGATTTCCAGTACTC DroMoj_20041206_
GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATT
TCCAGTACTC DroPse_1_
GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATT
TCCAGAATTC DroSim_20040829_
GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATT
TCCAGTACTC DroVir_20041029_
GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACT
TCCAGTACTC DroYak_1_
GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACT
TCCAGTACTC
Alignment of non-coding sequence
droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG----
--------------------------- dm2.chr2L
CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-T
TATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGT
AA---------CACATAAA--CGTTTTAAATTC dp3.chr4_group3
CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGA
GGCCATCATCG droSim1.chr2L
CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAAAGCGGG--T
TATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGT
AA---------TAAACAA----TTCTCTAATTT droYak1.chr2L
CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAATAGA
TCCT-TTATTT
droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC-
--------------------ACTTAC dm2.chr2L
-----------------------------------------TATGG
ACTCAC droMoj1.contig_2959 ---------------------
----AAATATTT--------TATTGACTCAC dp3.chr4_group3
-----------------------------------------TGT--
ACTTAC droSim1.chr2L ---------------------
--------------------TATGGACTCAC droVir1.scaffold_6
---------------------------------AAATATTTGGTCC
ACTCAC droYak1.chr2L ---------------------
--------------------CATAAACTCAC
10Example of a conserved microRNA target
11Per site analysis Group 1 mean per site identity 51.3 51.3 47.8
Group 2 mean per site identity 47.8 42.9 42.9
Difference of means (group 1 group 2) 3.6 8.4 4.9
Difference of means resampling p-value 0.05 0.003 1E-5
Distribution comparison KS p-value 0.026 0.0016 2E-6
Per base analysis Group 1 mean per base identity 47.8 47.8 46.3
Group 2 mean per base identity 46.3 42.4 42.4
Difference of means (group 1 group 2) 1.5 5.4 3.9
Richards et al., Comparative genome sequencing of
Drosophila pseudoobscura Chromosomal, gene, and
cis-element evolution, Genome Res., Jan 2005.
12How is an alignment made from two sequences?
Given two sequences of lengths n,m
gtdm2.chr2L CTGCGGGATTAGGGGTCATTAGAGT
GCCGAAAAGCGAGTTTATTCTATGGACTCAC gtdp3.chr4_group3 C
TGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATC
ATCGTGTACTTAC
n50
m62
?
dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---
------TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3
CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCA
TCATCG dm2.chr2L TATGGACTCAC dp3.chr4
_group3 TGT--ACTTAC
13 dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---
------TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3
CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCA
TCATCG dm2.chr2L TATGGACTCAC dp3.chr4
_group3 TGT--ACTTAC
DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---
------GCCGAAAAGCGA---------GTTT DroPse_1_
CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGG
CCATCATCG----
DroMel_4_ ATTCTATGGACTCAC DroPse_1_
------TGTACTTAC
Each alignment can be summarized by counting the
number of matches (M), mismatches (X), gaps
(G), and spaces (S).
14 dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---
------TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3
CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCA
TCATCG dm2.chr2L TATGGACTCAC dp3.chr4
_group3 TGT--ACTTAC
DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---
------GCCGAAAAGCGA---------GTTT DroPse_1_
CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGG
CCATCATCG----
DroMel_4_ ATTCTATGGACTCAC DroPse_1_
------TGTACTTAC
Each alignment can be summarized by counting the
number of matches (M), mismatches (X), gaps
(G), and spaces (S).
2(MX)S112 so X,G and S suffice to
specify a summary.
15The summary of an alignment is a point in 3
dimensional space. For example, the two
alignments just shown correspond to the
points (22,3,12) (18,3,28)
16The summary of an alignment is a point in 3
dimensional space. For example, the two
alignments just shown correspond to the
points (22,3,12) (18,3,28) In the example
of our two sequences there are 379522884096444556
699773447791552717765633 different alignments.
17The summary of an alignment is a point in 3
dimensional space. For example, the two
alignments just shown correspond to the
points (22,3,12) (18,3,28) In the example
of our two sequences there are 379522884096444556
699773447791552717765633 different alignments,
but only 53890 different summaries. So we dont
need to plot that many points.
18The summary of an alignment is a point in 3
dimensional space. For example, the two
alignments just shown correspond to the
points (22,3,12) (18,3,28) In the example
of our two sequences there are 379522884096444556
699773447791552717765633 different alignments,
but only 53890 different summaries. So we dont
need to plot that many points. But 53890 is
still quite a large number. Fortunately, there
are only 69 vertices on the convex hull of the
53890 points. These are the interesting ones,
and we can even draw them
19gtmel CTGCGGGATTAGGGGTCATTAGAGTGCCGA AAAGCGAGTTTATT
CTATGGAC gtpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGG
CGA GGAGAGGCCATCATCGTGTAC
For the sequences
the alignment polytope is
20mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAG
CGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGG
GATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel
CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTT
ATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCAT
GGGGGCGAGGAGAGGCCATCATCG-TGTAC mel
CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTT
ATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCAT
GGGGGCGAGGAGAGGCCATCATC-GTGTAC mel
CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTT
ATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCAT
GGGGGCGAGGAGAGGCCATCATCG-TGTAC mel
CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTT
ATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCAT
GGGGGCGAGGAGAGGCCATCATC-GTGTAC mel
CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTT
ATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCAT
GGGGGCGAGGAGAGGCCATCATCG-TGTAC mel
CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTT
ATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCAT
GGGGGCGAGGAGAGGCCATCATC-GTGTAC mel
CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTT
ATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCAT
GGGGGCGAGGAGAGGCCATCATCG-TGTAC
21mel CTGCGGGATTAGGGGTCATTAGAGT------GCCGAA
AAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGT
AGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTAC
Consensus at a vertex
22The vertices of the polytope have special
significance. Given parameters for a model,
e.g. the default parameters for MULTIZ M
100, X -100, S -30, G
-400 the summary is
the result of maximizing the linear
form -200(X)-400(G)-80(S) over the
polytope. Thus, the vertices of the polytope
correspond to optimal alignments.
23Needleman-Wunsch Alignment
What is usually done, is that a single set of
parameters is specified (M 100, X -100, S
-30, G -400 is a standard default) and then the
optimal vertex is identified using dynamic
programming. An alignment optimal for the vertex
is then selected. The running time of the
algorithm is O(nm) Needleman-Wunsch, 1970,
Smith-Waterman, 1981 and it requires O(nm)
space Hirschberg 1975 . Standard scoring
schemes are Parameters Model
M,X,S Jukes-Cantor with linear
gap penalty M,X,S,G Jukes-Cantor with
affine gap penalty M,XTS,XTV,S,G Kimura-2
parameter with affine gap penalty
24Building Drosophila whole genome multiple
alignments
- MAVID
- http//hanuman.math.berkeley.edu/kbrowser
- MULTIZ
- http//genome.ucsc.edu/
- (currently no D. erecta)
25DroAna_20041206_ CTGAAGGAAT-------TCTATATT---
------AAAGAAGATTTCTCATCATTGGTTG DroMel_4_
CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA
---------GTTT DroMoj_20041206_
CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGA
AA------- DroPse_1_
CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCAT
CATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCAT
TAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_200
41029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT-
-AATTTGGTCCAAA------- DroYak_1_
CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC----
-----CTTT
DroAna_20041206_ AATC-----ACTTAC DroMel_4_
ATTCTATGGACTCAC DroMoj_20041206_
----TATTTACTCAC DroPse_1_
------TGTACTTAC DroSim_20040829_
ATTCTATGGACTCAC DroVir_20041029_
----TATTTACTCAC DroYak_1_
ATTTCATAAACTCAC
MAVID
N. Bray and L. Pachter, MAVID Constrained
ancestral alignment of multiple sequences, Genome
Research 14 (2004) p 693--699
26(No Transcript)
27droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG----
--------------------------- dm2.chr2L
CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-T
TATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGT
AA---------CACATAAA--CGTTTTAAATTC dp3.chr4_group3
CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGA
GGCCATCATCG droSim1.chr2L
CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAAAGCGGG--T
TATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGT
AA---------TAAACAA----TTCTCTAATTT droYak1.chr2L
CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAATAGA
TCCT-TTATTT
droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC-
--------------------ACTTAC dm2.chr2L
-----------------------------------------TATGG
ACTCAC droMoj1.contig_2959 ---------------------
----AAATATTT--------TATTGACTCAC dp3.chr4_group3
-----------------------------------------TGT--
ACTTAC droSim1.chr2L ---------------------
--------------------TATGGACTCAC droVir1.scaffold_6
---------------------------------AAATATTTGGTCC
ACTCAC droYak1.chr2L ---------------------
--------------------CATAAACTCAC
MULTIZ
Blanchette et al., Aligning multiple sequences
with the threaded blockset aligner, Genome
Research 14 (2004) p 708--715
28One (possibly wrong) alignment is not enough the
history of parametric inference
- 1992 Waterman, M., Eggert, M. Lander, E.
- Parametric sequence comparisons, Proc. Natl.
Acad. Sci. USA 89, 6090-6093 - 1994 Gusfield, D., Balasubramanian, K. Naor,
D. - Parametric optimization of sequence alignment,
Algorithmica 12, 312-326. - 2003 Wang, L., Zhao, J.
- Parametric alignment of ordered trees,
Bioinformatics, 19 2237-2245. - 2004 Fernández-Baca, D., Seppäläinen, T.
Slutzki, G. - Parametric Multiple Sequence Alignment and
Phylogeny Construction, Journal of Discrete
Algorithms, 2 271-287.
XPARAL by Kristian Stevens and Dan Gusfield
29Whole Genome Parametric AlignmentColin Dewey,
Peter Huggins, Lior Pachter, Bernd Sturmfels and
Kevin Woods
- Mathematics and Computer Science
- Parametric alignment in higher dimensions.
- Faster new algorithms.
- Deeper understanding of alignment polytopes.
- Biology
- Whole genome parametric alignment.
- Biological implications of alignment
parameters. - Alignment with biology rather than for biology.
30Whole Genome Parametric AlignmentColin Dewey,
Peter Huggins, Lior Pachter, Bernd Sturmfels and
Kevin Woods
- Mathematics and Computer Science
- Parametric alignment in higher dimensions.
- Faster new algorithms.
- Deeper understanding of alignment polytopes.
- Biology
- Whole genome parametric alignment.
- Biological implications of alignment
parameters. -
CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCAT
CATTGGTTG CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAA
AAGCGA---------GTTT CTGGAATAGTTAATTTCATTGTAACACATA
AACGTTTTAAATTCTATTGAAA------- CTGGAAGAGTTTTGATTAGT
AGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- CTGCGGGATT
AGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-
CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCA
AA------- CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAA
TAGATC---------CTTT
analysis
31Whole Genome Parametric AlignmentColin Dewey,
Peter Huggins, Lior Pachter, Bernd Sturmfels and
Kevin Woods
- Mathematics and Computer Science
- Parametric alignment in higher dimensions.
- Faster new algorithms.
- Deeper understanding of alignment polytopes.
- Biology
- Whole genome parametric alignment.
- Biological implications of alignment
parameters. -
CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCAT
CATTGGTTG CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAA
AAGCGA---------GTTT CTGGAATAGTTAATTTCATTGTAACACATA
AACGTTTTAAATTCTATTGAAA------- CTGGAAGAGTTTTGATTAGT
AGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- CTGCGGGATT
AGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-
CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCA
AA------- CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAA
TAGATC---------CTTT
analysis
32computational geometry
33A Whole Genome Parametric Alignment of D.
Melanogaster and D. Pseudoobscura
- Divided the genomes into 1,116,792 constrained
and 877,982 unconstrained segment pairs. - 2d, 3d, 4d, and 5d alignment polytopes were
constructed for each of the 877,802 unconstrained
segment pairs. - Computed the Minkowski sum of the 877,802 2d
polytopes.
34A Whole Genome Parametric Alignment of D.
Melanogaster and D. Pseudoobscura
- Divided the genomes into 1,116,792 constrained
and 877,982 unconstrained segment pairs. - This is an orthology map of the two genomes.
- 2d, 3d, 4d, and 5d alignment polytopes were
constructed for each of the 877,802 unconstrained
segment pairs. - For each segment pair, obtain all possible
optimal summaries for all parameters in a
Needleman--Wunsch scoring scheme. - Computed the Minkowski sum of the 877,802 2d
polytopes. - There are only 838 optimal alignments of the two
Drosophila genomes if the same match, mismatch
and gap parameters are used for all the segment
pair alignments.
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40gtmel CTGCGGGATTAGGGGTCATTAGAGTGCCGA AAAGCGAGTTTATT
CTATGGAC gtpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGG
CGA GGAGAGGCCATCATCGTGTAC
?
How do we build the polytope for
41Alignment polytopes are small
Theorem The number of vertices of an alignment
polytope for two sequences of length n and m is
O((nm)d(d-1)/(d1)) where d is the number of
free parameters in the scoring scheme. Examples
Parameters Model Vertices
M,X,S Jukes-Cantor with linear gap
penalty O(nm)2/3 M,X,S,G Jukes-Cantor with
affine gap penalty O(nm)3/2M,XTS,XTV,S,G K2P
with affine gap penalty O(nm)12/5 L. Pachter
and B. Sturmfels, Parametric inference for
biological sequence analysis, Proceedings of the
National Academy of Sciences, Volume 101, Number
46 (2004), p 16138--16143. L. Pachter and B.
Sturmfels, Tropical geometry of statistical
models, Proceedings of the National Academy of
Sciences, Volume 101, Number 46 (2004), p
16132--16137. L. Pachter and B. Sturmfels (eds.),
Algebraic Statistics for Computational Biology,
Cambridge University Press.
42Back to Adf1
BP England, U Heberlein, R Tjian. Purified
Drosophila transcription factor, Adh distal
factor-1 (Adf-1), binds to sites in several
Drosophila promoters and activates transcription,
J Biol Chem 1990.
43Drosophila DNase I Footprint Database (v2.0)
Home Search Browse by Target Browse by Factor
Target Gene Chromosome Arm Start Stop Transcription Factor Pubmed ID (PMID) Footprint ID (FPID) Footprint Alignment
ems (CG2988) 3R 9723806 9723816 Abd-B (CG11648) 9491376 003205 Abd-B-gtems003205
ems (CG2988) 3R 9723843 9723853 Abd-B (CG11648) 9491376 003206 Abd-B-gtems003206
ems (CG2988) 3R 9723998 9724008 Abd-B (CG11648) 9491376 003208 Abd-B-gtems003208
ems (CG2988) 3R 9724091 9724102 Abd-B (CG11648) 9491376 003209 Abd-B-gtems003209
ems (CG2988) 3R 9724526 9724536 Abd-B (CG11648) 9491376 003211 Abd-B-gtems003211
ems (CG2988) 3R 9724557 9724567 Abd-B (CG11648) 9491376 003213 Abd-B-gtems003213
ems (CG2988) 3R 9724614 9724624 Abd-B (CG11648) 9491376 003214 Abd-B-gtems003214
dpp (CG9885) 2L 2454657 2454685 Adf1 (CG15845) 7791801 003665 Adf1-gtdpp003665
Adh (CG3481) 2L 14615472 14615509 Adf1 (CG15845) 2105454 005046 Adf1-gtAdh005046
Ddc (CG10697) 2L 19116303 19116321 Adf1 (CG15845) 2318884 005464 Adf1-gtDdc005464
Antp (CG1028) 3R 2825018 2825059 Adf1 (CG15845) 2318884 006446 Adf1-gtAntp006446
Adh (CG3481) 2L 14616171 14616209 Adf1 (CG15845) 2105454 005059 Adf1-gtAdh005059
Antp (CG1028) 3R 2825117 2825144 Adf1 (CG15845) 2318884 006447 Adf1-gtAntp006447
Antp (CG1028) 3R 2825151 2825174 Adf1 (CG15845) 2318884 006448 Adf1-gtAntp006448
44Back to Adf1
mel TGTGCGTCAGCGTCGGCCGCAACAGCG pse
TGT-----------------GACTGCG
BLASTZ alignment
45Back to Adf1
mel TGTGCGTCAGCGTCGGCCGCAACAGCG pse
TGT-----------------GACTGCG
mel TGTG----CGTCAGC--G----TCGGCC---GC-AACAG-CG
Pse TGTGACTGCG-CTGCCTGGTCCTCGGCCACAGCCAAC-GTCG
46Back to Adf1
mel TGTGCGTCAGCGTCGGCCGCAACAGCG pse
TGT-----------------GACTGCG
mel TGTG----CGTCAGC--G----TCGGCC---GC-AACAG-CG
pse TGTGACTGCG-CTGCCTGGTCCTCGGCCACAGCCAAC-GTCG
mel TGTGCGTCAGC------GTCGGCCGCAACAGCG pse
TGTGACTGCGCTGCCTGGTCCTCGGCCACAGC-
47Drosophila DNase I Footprint Database (v2.0)
Home Search Browse by Target Browse by Factor
Target Gene Chromosome Arm Start Stop Transcription Factor Pubmed ID (PMID) Footprint ID (FPID) Footprint Alignment
ems (CG2988) 3R 9723806 9723816 Abd-B (CG11648) 9491376 003205 Abd-B-gtems003205
ems (CG2988) 3R 9723843 9723853 Abd-B (CG11648) 9491376 003206 Abd-B-gtems003206
ems (CG2988) 3R 9723998 9724008 Abd-B (CG11648) 9491376 003208 Abd-B-gtems003208
ems (CG2988) 3R 9724091 9724102 Abd-B (CG11648) 9491376 003209 Abd-B-gtems003209
ems (CG2988) 3R 9724526 9724536 Abd-B (CG11648) 9491376 003211 Abd-B-gtems003211
ems (CG2988) 3R 9724557 9724567 Abd-B (CG11648) 9491376 003213 Abd-B-gtems003213
ems (CG2988) 3R 9724614 9724624 Abd-B (CG11648) 9491376 003214 Abd-B-gtems003214
dpp (CG9885) 2L 2454657 2454685 Adf1 (CG15845) 7791801 003665 Adf1-gtdpp003665
Adh (CG3481) 2L 14615472 14615509 Adf1 (CG15845) 2105454 005046 Adf1-gtAdh005046
Ddc (CG10697) 2L 19116303 19116321 Adf1 (CG15845) 2318884 005464 Adf1-gtDdc005464
Antp (CG1028) 3R 2825018 2825059 Adf1 (CG15845) 2318884 006446 Adf1-gtAntp006446
Adh (CG3481) 2L 14616171 14616209 Adf1 (CG15845) 2105454 005059 Adf1-gtAdh005059
Antp (CG1028) 3R 2825117 2825144 Adf1 (CG15845) 2318884 006447 Adf1-gtAntp006447
Antp (CG1028) 3R 2825151 2825174 Adf1 (CG15845) 2318884 006448 Adf1-gtAntp006448
48Per site analysis Group 1 mean per site identity 51.3 51.3 47.8
Group 2 mean per site identity 47.8 42.9 42.9
Difference of means (group 1 group 2) 3.6 8.4 4.9
Difference of means resampling p-value 0.05 0.003 1E-5
Distribution comparison KS p-value 0.026 0.0016 2E-6
Per base analysis Group 1 mean per base identity 47.8 47.8 46.3
Group 2 mean per base identity 46.3 42.4 42.4
Difference of means (group 1 group 2) 1.5 5.4 3.9
49Per site analysis Group 1 mean per site identity 51.3 51.3 47.8
Group 2 mean per site identity 47.8 42.9 42.9
Difference of means (group 1 group 2) 3.6 8.4 4.9
Difference of means resampling p-value 0.05 0.003 1E-5
Distribution comparison KS p-value 0.026 0.0016 2E-6
Per base analysis Group 1 mean per base identity 47.8 47.8 46.3
Group 2 mean per base identity 46.3 42.4 42.4
Difference of means (group 1 group 2) 1.5 5.4 3.9
80.4
50Per site analysis Group 1 mean per site identity 51.3 51.3 47.8
Group 2 mean per site identity 47.8 42.9 42.9
Difference of means (group 1 group 2) 3.6 8.4 4.9
Difference of means resampling p-value 0.05 0.003 1E-5
Distribution comparison KS p-value 0.026 0.0016 2E-6
Per base analysis Group 1 mean per base identity 47.8 47.8 46.3
Group 2 mean per base identity 46.3 42.4 42.4
Difference of means (group 1 group 2) 1.5 5.4 3.9
85.1
51Per site analysis Group 1 mean per site identity 51.3 51.3 47.8
Group 2 mean per site identity 47.8 42.9 42.9
Difference of means (group 1 group 2) 3.6 8.4 4.9
Difference of means resampling p-value 0.05 0.003 1E-5
Distribution comparison KS p-value 0.026 0.0016 2E-6
Per base analysis Group 1 mean per base identity 47.8 47.8 46.3
Group 2 mean per base identity 46.3 42.4 42.4
Difference of means (group 1 group 2) 1.5 5.4 3.9
86.5
52Per site analysis Group 1 mean per site identity 51.3 51.3 47.8
Group 2 mean per site identity 47.8 42.9 42.9
Difference of means (group 1 group 2) 3.6 8.4 4.9
Difference of means resampling p-value 0.05 0.003 1E-5
Distribution comparison KS p-value 0.026 0.0016 2E-6
Per base analysis Group 1 mean per base identity 47.8 47.8 46.3
Group 2 mean per base identity 46.3 42.4 42.4
Difference of means (group 1 group 2) 1.5 5.4 3.9
79.1
53Applications
- Conservation of cis-regulatory elements
- Phylogenetics branch length estimation
Jukes-Cantor correction
This is the expected number of mutations per site
in an alignment with summary (x,s).
54Applications
- Conservation of cis-regulatory elements
- Phylogenetics branch length estimation
55- Algebraic Statistics
- -- A language for unifying and developing many
of the algorithms for biological sequence
analysis -- - The few inference functions theorem
- Polytope propagation
- Phylogenetic tree reconstruction
- Evolutionary models
- Maximum likelihood estimation
- Mutagenic tree models