A Large Family of Conserved Noncoding Elements - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

A Large Family of Conserved Noncoding Elements

Description:

1. Discovery: Some classes of ancient repetitive elements show ... Opossum (Monodelphis domestica) ~1000 copies. At least 600 have orthologous copies in human ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 28
Provided by: broadin
Category:

less

Transcript and Presenter's Notes

Title: A Large Family of Conserved Noncoding Elements


1
A Large Family of Conserved Non-coding Elements
  • Michael Kamal
  • Broad Institute

2
Key Points
  • 1. Discovery Some classes of ancient repetitive
    elements show significant sequence conservation
    at orthologous positions.
  • 2. MER121 is the best conserved AR class.
  • 1000 copies in human genome
  • Largest CNE family found so far.
  • 4. Orthologous copies are similar, paralogous
    are not!
  • 5. Clues to possible biological function

3
Motivation CNEs
  • 5 of the human genome is under purifying
    selection
  • 3 CNEs (conserved non-coding elements)
  • Generally do not cluster based on sequence
    similarity
  • Bejerano(2004) 700K CNEs
  • 96 of CNEs are unique in the human genome
  • 250 groups with more than 10 CNEs
  • Largest group 800 CNEs, but each similar to
    only 2 other CNEs
  • CNEs are better conserved than background
    Ancient Repeats

4
Ancient repetitive elements
  • Ancient Repeats (ARs) transposon relics, predate
    mammalian radiation.
  • 22 of human genome, 780 classes
  • Mostly non-functional sequence neutral evolution
  • UCSC multiple alignments human, dog, mouse, rat
    (HDMR)
  • ARs often deleted
  • ARs often mutated
  • Probability AR orthologous base, present in all
    species
  • is perfectly conserved 1/2

5
Large conserved words in ARs
  • Look for gt50bp of perfect conservation across
    HDMR
  • Expect none. Naively
  • Find 116 instances across 22 classes.
  • 26/116 lie within MER121 only 1/4000th of total
    AR bases
  • Top 10 ranked by density conserved words/Mb
    sequence

6
What is MER121?
  • medium frequency reiterated repeat 121
  • RepeatMasker 900 copies in human, consensus
    412bp
  • Median divergence 26, size 180bp
  • Repbase possible nonautonomous DNA transposon
  • Not clearly related to any other known repeat

7
What is MER121?
  • medium frequency reiterated repeat 121
  • RepeatMasker 900 copies in human, consensus
    412bp
  • Median divergence 26, size 180bp
  • Repbase possible nonautonomous DNA transposon
  • Not clearly related to any other known repeat

8
MER121 copies are rarely deleted
  • Compare to MER119
  • Unremarkable, known DNA transposon
  • 1168 copies in human
  • 23 diverged from consensus
  • Consensus size 586bp
  • 4-way retention rate
  • How often does a human instance align to gt50bp
    in HDMR?
  • MER121 82
  • MER119 28

9
MER121 aligns at high identity
  • Within retained copies, how often does a human
    base align to bases in all other species?
  • MER121 96
  • MER119 73

10
MER121 aligns at high identity
  • Within retained copies, how often does a human
    base align to bases in all other species?
  • MER121 96
  • MER119 73
  • How often is a four-way aligned base perfectly
    conserved?
  • MER121 72
  • MER119 49

11
MER121 aligns at high identity
  • Within retained copies, how often does a human
    base align to bases in all other species?
  • MER121 96
  • MER119 73
  • How often is a four-way aligned base perfectly
    conserved?
  • MER121 72
  • MER119 49
  • Exons 78 (coding)

12
Paralogous similarity for top 75 instances
  • Multiple alignment of top 75 human retained
    instances most similar to consensus
  • Color-code human bases

G
T
C
gap (no base)
MER121
MER119
13
Orthologous conservation for top 75 instances
  • Multiple alignment of top 75 human retained
    instances of most similar to consensus
  • Show how each human base is conserved in HDMR

MER121
MER119
14
MER121 in other species
  • Opossum (Monodelphis domestica)
  • 1000 copies.
  • At least 600 have orthologous copies in human

15
MER121 in other species
  • Opossum (Monodelphis domestica)
  • 1000 copies.
  • At least 600 have orthologous copies in human
  • Chicken
  • 2 small copies
  • Map to same central 70bp within the consensus
  • 1 appears to be orthologous to a human copy
  • MER121 dispersal postdated split from chicken

16
Fine-scale conservation
Project HDMR 4-way and human-mono pairwise
aligned instances onto repeat consensus
Record how often an orthologous base present in
all aligned species maps to a given position
along the consensus
position 10
17
MER121 center is retained more than flanks

How often does an aligned column map to a given
position along the consensus?
18

MER121 center is not better conserved
Rate of perfect four-way and pairwise identity
along the consensus
Retention
19




MER121 is not a protein coding gene
Size of indels in human-dog aligned
regions Multiples of size 3 not favored for
MER121
20




Little evidence of transcription
  • MER121 consensus aligns significantly to 1 cDNA
    in large public databases
  • Riken Fantom3 Mammalian Gene Collection
    databases
  • 150K sequences, 246 Mb
  • Significant overlap with only 19 human and mouse
    ESTs in Genbank
  • (Genbank 11 million ESTs, 5.53Gb )
  • (Require E-value of at least 1e-03, gt50bp
    aligning)

21




Little evidence of RNA secondary structure
  • Evofold predictions (Pedersen, et al)
  • 51 cases of small overlap for 49K predictions
  • Median overlap 24bp, max 49bp
  • RNAz predictions (Washietl, et al, 2005)
  • No cases of significant overlap within strict set
  • 36K predictions (Pgt0.9)
  • 2. 2 cases of large overlap within permissive
    set
  • 91K predictions in permissive set (Pgt0.5)
  • Overlap size (261bp, 320bp)

22




MER121 generally found in gene-poor regions
  • Compare to 241 classes of ARs with 500-4000
    copies in the human genome
  • Rank class by typical distance to an Ensembl Gene
    start
  • MER121 8 farthest (138Kb)
  • MER119 130 farthest(59Kb)
  • Rank class by typical exonic density in 500Kb
    neighborhood
  • MER121 20 lowest
  • MER119 175 lowest close to genome median

23




Clusters of nearby MER121 elements
Expected distance between copies D
3.3Mb Cluster elements that lie within region
of size D/2
5 clusters with gt8 elements
24


Two large clusters


12 elements Inhibin beta-A gene
8 elements TBX3/TBX5 T-box transcription factors
25
Summary
  • Some Ancient Repeats classes appear to be under
    purifying selection
  • MER121 is the best conserved AR class, largest
    CNE family with 1000 copies
  • Unusual conservation profile Orthologous copies
    are similar, paralogous are not!
  • Not a protein coding gene, unlikely to be an RNA
    gene family
  • Perhaps cis-regulatory element mobilized by a
    transposon around the time of the mammalian
    radiation

26
Next steps Experiments
  • Is MER121 transcribed?
  • Enhancer/Insulator assay.
  • Chromatin modifications (methylation H3K9?).
  • Do proteins bind?

27
Acknowledgments
  • Co-authors
  • Xiaohui Xie
  • Eric S Lander
  • Many colleagues
Write a Comment
User Comments (0)
About PowerShow.com