Title: A Large Family of Conserved Noncoding Elements
1A Large Family of Conserved Non-coding Elements
- Michael Kamal
- Broad Institute
2Key Points
- 1. Discovery Some classes of ancient repetitive
elements show significant sequence conservation
at orthologous position. - 2. MER121 is the best conserved AR class.
- 1000 copies in human genome
- Largest CNE family found so far.
- 4. Orthologous copies are similar, paralogous
are not! - 5. Clues to possible biological function
3Motivation CNEs
- 5 of the human genome is under purifying
selection - 3 CNEs (conserved non-coding elements)
- Generally do not cluster based on sequence
similarity - Bejerano(2004) 700K CNEs
- 96 of CNEs are unique in the human genome
- 250 groups with more than 10 CNEs
- Largest group 800 CNEs, but each similar to
only 2 other CNEs -
- CNEs are better conserved than background
Ancient Repeats -
4Ancient repetitive elements
- Ancient Repeats (ARs) transposon relics, predate
mammalian radiation. - 22 of human genome, 780 classes
- Mostly non-functional sequence neutral evolution
- UCSC multiple alignments human, dog, mouse, rat
(HDMR) - ARs often deleted
- ARs often mutated
- Probability AR orthologous base, present in all
species - is perfectly conserved 1/2
-
5Large conserved words in ARs
- Look for gt50bp of perfect conservation across
HDMR - Expect none. Naively
- Find 116 instances across 22 classes.
- 26/115 lie within MER121 only 1/4000th of total
AR bases - Top 10 ranked by density conserved words/Mb
sequence -
-
-
6What is MER121?
- medium frequency reiterated repeat 121
- RepeatMasker 900 copies in human, consensus
412bp - Median divergence 26, size 180bp
- Repbase possible nonautonomous DNA transposon
- Not clearly related to any other known repeat
-
-
-
7What is MER121?
- medium frequency reiterated repeat 121
- RepeatMasker 900 copies in human, consensus
412bp - Median divergence 26, size 180bp
- Repbase possible nonautonomous DNA transposon
- Not clearly related to any other known repeat
-
-
-
8MER121 copies are rarely deleted
- Compare to MER119
- Unremarkable, known DNA transposon
- 1168 copies in human
- 23 diverged from consensus
- Consensus size 586bp
- 4-way retention rate
- How often does a human instance align to gt50bp
in HDMR? - MER121 82
- MER119 28
-
-
9MER121 aligns at high identity
- Within retained copies, how often does a human
base align to bases in all other species? - MER121 96
- MER119 73
-
-
-
-
10MER121 aligns at high identity
- Within retained copies, how often does a human
base align to bases in all other species? - MER121 96
- MER119 73
- How often is a four-way aligned base perfectly
conserved? - MER121 72
- MER119 49
-
-
-
-
11MER121 aligns at high identity
- Within retained copies, how often does a human
base align to bases in all other species? - MER121 96
- MER119 73
- How often is a four-way aligned base perfectly
conserved? - MER121 72
- MER119 49
- Exons 78 (coding)
-
-
-
12Orthologous conservation for top 75 instances
- Multiple alignment of top 75 human retained
instances of most similar to consensus - Show how each human base is conserved in HDMR
-
-
-
-
MER121
MER119
13MER121 in other species
- Opossum (Monodelphis domestica)
- 1000 copies.
- At least 600 have orthologous copies in human
- Chicken
- 2 small copies
- Map to same central 70bp within the consensus
- 1 appears to be orthologous to a human copy
- MER121 dispersal postdated split from chicken
-
-
14Fine-scale conservation
Project HDMR 4-way and human-mono pairwise
aligned instances onto repeat consensus
Record how often an orthologous base present in
all aligned species maps to a given position
along the consensus
position 10
15MER121 center is retained more than flanks
How often does an aligned column map to a given
position along the consensus?
16 MER121 center is not better conserved
Rate of perfect four-way and pairwise identity
along the consensus
Placement
17 MER121 is not a protein coding gene
Size of indels in human-dog aligned
regions Multiples of size 3 not favored for
MER121
18 Little evidence of transcription
- MER121 consensus aligns significantly to 1 cDNA
in large public databases - Riken Fantom3 Mammalian Gene Collection
databases - 150K sequences, 246 Mb
- Significant overlap with only 19 human and mouse
ESTs in Genbank - (Genbank 11 million ESTs, 5.53Gb )
- (Require E-value of at least 1e-03, gt50bp
aligning) -
-
19 Little evidence of RNA secondary structure
- Evofold predictions (Pedersen, et al)
- 51 cases of small overlap for 49K predictions
- Median overlap 24bp, max 49bp
- RNAz predictions (Washietl, et al, 2005)
- No cases of significant overlap within strict set
- 36K predictions (Pgt0.9)
- 2. 2 cases of large overlap within permissive
set - 91K predictions in permissive set (Pgt0.5)
- Overlap size (261bp, 320bp)
-
-
-
20 MER121 generally found in gene-poor regions
- Compare to 241 classes of ARs with 500-4000
copies in the human genome - Rank class by typical distance to an Ensembl Gene
start - MER121 8 farthest (138Kb)
- MER119 130 farthest(59Kb)
- Rank class by typical density of exons in 500Kb
neighborhood - MER121 20 lowest
- MER119 175 lowest close to genome median
-
-
-
21 Clusters of nearby MER121 elements
Expected distance between copies D
3.3Mb Cluster elements that lie within region
of size D/2
5 clusters with gt8 elements
22 Two large clusters
12 elements Inhibin beta-A gene
8 elements TBX3/TBX5 T-box transcription factors
23Summary
- Some Ancient Repeats appear to be under purifying
selection - MER121 is the best conserved AR class, largest
CNE family with 1000 copies - Unusual conservation profile Orthologous copies
are similar, paralogous are not! - Not a protein coding gene, unlikely to be an RNA
gene family - Perhaps cis-regulatory element mobilized by a
transposon around the time of the mammalian
radiation
24Next steps Experiments
- Is MER121 transcribed?
- Enhancer/Insulator assay.
- Chromatin modifications (methylation H3K9?).
- Do proteins bind?
25Acknowledgments
- Co-authors
- Xiaohui Xie
- Eric S Lander
- Many colleagues