Title: The Hunt for Genomic Dark Matter:
1The Hunt for Genomic Dark Matter Aligning
Non-coding Functional DNA Naila Mimouni, Gerton
Lunter and Jotun Hein Department of Statistics,
Oxford University
- Motivation
- 5 of the human genome is under selection, 1.5
is genes. - Characterising 3.5.
- First step towards evolutionary analysis of
genome Alignment. - Evaluation for coding DNA Protein structures,
experimentally verified transcripts. - Evaluation for non-coding DNA No objective
evaluation measure. - Aim
- Propose an evaluation measure of alignment
error Gap Attraction. - Build a probabilistic alignment algorithm.
Results Distribution of ungaps for alignment of
human chr 21 Vs. mouse genome (0-80 nucleotide
ungaps) Blastz Clustalw
Confidence Intervals Blastz Clustalw
Gap Attraction Objective measure of
alignment error. The extent of attraction
between two gaps separated by an ungap Blastz
0.8653 Clustalw 0.8882 Fraction
97.42.
- Model
- Assumptions
- Indels rain down uniformly along a sequence.
- Indels happen independently of each other.
- Distribution of Ungaps
- Ungaps maximal genomic regions in two species
that are homologous and have - undergone no insertion or deletion events since
their most recent common ancestor. - Suppose correct alignment
- Alignment scoring joins ungap (in red) to
ungapped region on left
Fig.1 Indels raining down on genome. Each arrow
is an indel event.
Conclusions and Future Work Introduced Gap
Attraction Objective evaluation of alignment.
Other Alignment algorithms Di-align, Lagan,
T-coffee. Pair HMM alignment. Data
Simulation. Estimating functional DNA.
1 Initial sequencing and analysis of the human
genome. The International Human Genome
Consortium. Nature, Feb 2001,
409860-921. 1 Initial sequencing and
comparative analysis of the mouse genome. The
Mouse Genome Sequencing Consortium. Nature,
Dec 2002, 420520-562.