Genomic organization and functional characterization - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Genomic organization and functional characterization

Description:

Genomic organization and functional characterization of regulatory elements in higher eukaryotes Boris Lenhard Computational Biology Unit Bergen Center for ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 30
Provided by: Albi89
Category:

less

Transcript and Presenter's Notes

Title: Genomic organization and functional characterization


1
  • Genomic organization and functional
    characterization
  • of regulatory elements in higher eukaryotes
  • Boris Lenhard
  • Computational Biology Unit
  • Bergen Center for Computational Science
  • University of Bergen, Norway

2
Genome comparison reveals unknown functional
elements
IDENTITY
IDENTITY
Actin gene compared between human and mouse.
3
  • Ultraconserved non-coding regions (UCR) in
    vertebrate genomes
  • a.k.a. Conserved non-coding elements (CNE)
  • a.k.a. Conserved non-genic sequences (CNG)
  • a.k.a. Highly conserved non-coding regions (HCNR)

4
There exist unusually highly conserved noncoding
elements in vertebrate genomes
5
Ultraconserved regions (UCR) in vertebrate genomes
  • Definition of UCR
  • gt 50 bp
  • humanmouse identity gt95
  • no coding potential
  • 3583 humanmouse UCRs have detectable
    conservation in Fugu
  • A few dozen characterized, all as long-range
    enhancers
  • Many UCRs occur in clusters spanning hundreds of
    kilobases

6
What genes are UCRs associated with?
Nr. Nr UCRs Gene Symbol Description Interpro domains
1 84 MEIS2 Meis1, myeloid ecotropic viral integration site 1 homolog 2 (mouse) Homeobox
2 81 ZFHX1B zinc finger homeobox 1b Homeobox Zn-finger, C2H2 type
3 80 KIAA0390 KIAA0390 gene product Znf_C2H2, NLS_BP
4 79 EBF-3 COE3_HUMAN , Transcription factor COE3 (Early B-cell factor 3) (EBF-3) COE
5 77 ZNF503 zinc finger protein 503 Znf_PHD Znf_C2H2 Eggshell
6 64 IRX-3 IRX-5 IRX-6 Iroquis-class protein IRX-3 Iroquis-class protein IRX-5 Iroquis-class protein IRX-6 Homeobox Homeobox Homeobox
7 62 PBX3 pre-B-cell leukemia transcription factor 3 PBX Homeobox
8 62 NR2F1 nuclear receptor subfamily 2, group F, member 1 Hormone_rec_lig Stdhrmn_receptor Str_ncl_receptor Znf_C4steroid
9 60 FOXP2 -------TFEC forkhead box P2 (immune tolerance development) -----Similar to transcription factor EC Involucrin_rpt TF_Fork_head Znf_C2H2 -------HLH_basic 
10 52 DACH dachshund homolog (Drosophila) Transform_Ski
7
What genes are UCRs associated with?
  • 10 top UCR clusters

Sandelin A, Bailey P, Bruce S, Engstrom PG, Klos
JM, Wasserman WW, Ericson J, Lenhard B. (2004)
Arrays of ultraconserved non-coding regions span
the loci of key developmental genes in vertebrate
genomes. BMC Genomics 599.
8
11 52 PAX2 paired box gene 2 (kidney, differentiation, eyes, CNS) Paired_box Homeobox  
12 52 FOXP1 forkhead box P1 (specification and differentiation of lung epithelium) TF_Fork_head Znf_C2H2  
13 48 BCL11A B-cell lymphoma/leukemia 11A (B-cell CLL/lymphoma 11A) (COUP-TF interacting protein 1) (Ecotropic viral integration site 9 protein) (EVI-9) Znf_C2H2
14 46 IRX-4 IRX-2 IRX-1 IRX-4 IRX-2 IRX-1 Homeobox   Homeobox   Homeobox  
15 46 ATF-2--------- EVX-2-------HOX-D activating transcription factor 2 (brain) ------- HOMEOBOX EVEN-SKIPPED HOMOLOG PROTEIN 2 (EVX-2) ------- HOX-D cluster Znf_C2H2 TF_bZIP -------------------- Homeobox Antifreeze_1 HTH_lambrepressr CytC_heme_bind -------------------- Homeobox HTH_lambrepressr
16 41 NR4A2 nuclear receptor subfamily 4, group A, member 2 (brain) Znf_C4steroid hormone_rec_lig
17 39 FOXD3 forkhead, box D3, at chr163146833-63147169 TF_Fork_head
18 38 LMO4 ---------- KIAA1221 LIM domain only 4 ----------- KIAA1221 (brain) LIM ----------- Znf_C2H2
19 38 ZNF407 zinc finger protein 407 Znf_C2H2
20 35 MEIS1 Meis1, myeloid ecotropic viral integration site 1 homolog (mouse) Homeobox
9
21 35 ZFPM2 (FOG-2) zinc finger protein, multitype 2 (Friend of GATA-2) (cardiogenesis, hematopoiesis) Znf_C2H2  
22 35 TNRC9 trinucleotide repeat containing 9 Highmoblty_12HMG-boxHMG_12_box
23 33 ZFH4 zinc finger homeodomain 4 AMP-bind Homeobox Somatotropin Znf_C2H2 Znf_U1
24 32 SOX6 SRY (sex determining region Y)-box 6 HMG_12_box ATP_GTP_A NLS_BP  
25 31 FLJ20043 Hypothetical protein FLJ20043 CytC_heme_BSZnf_C2H2
26 31 OTP orthopedia homolog (development of the neuroendocrine hypothalamus) Homeobox Homeo_OAR HTH_lambrepressr  
27 30 TCF7L2 transcription factor 7-like 2 (T-cell specific, HMG-box) HMG_box
28 30 SALL3 Sal-like protein 3 (Zinc finger protein SALL3) (hSALL3) Znf_C2H2
29 27 BUB3 Mitotic checkpoint protein BUB3 WD40
30 26 TFAP2A Transcription factor AP-2 alpha (AP2-alpha) (Activating enhancer- binding protein 2 alpha) (AP-2 transcription factor) (Activator protein-2) (AP-2). TF_AP2TF_AP2_alpha
10
What genes are UCRs associated with?
  • Out of 150 most prominent UCR clusters, at least
    144 concide with one or more genes for DNA
    binding proteins (generally transcription
    factors)
  • Among them are most key regulators of animal
    development
  • HOX clusters, Iroquois genes, GSH1, GSH2, PPARg,
    LMO1
  • Many are associated with malignancies and
    recurring chromosomal breakpoints/rearrangement
    sites
  • MEIS2, PBX3, BCL11A, MEIS1, LMO4, BCL11B, EVI1...

11
Quantitative evidence ICategories of genes in
the vicinity of UCRs
Sandelin A, Bailey P, Bruce S, Engstrom PG, Klos
JM, Wasserman WW, Ericson J, Lenhard B. (2004)
Arrays of ultraconserved non-coding regions span
the loci of key developmental genes in vertebrate
genomes. BMC Genomics 599.
  • 50 of Homeobox-containing genes, 20 of
    forkeads, 20 of nuclear receptors and 8 of zinc
    finger proteins are within 200 kb of a UCR
  • Only 3 of random genes are within 200 kb of a UCR

Over-representation of protein domains in genes
flanking UCRs. Bonferroni-corrected and
uncorrected Fisher Exact Test p-values are shown
for the 16 most over-represented INTERPRO
domains. Typical transcription factor domains are
in bold.
12
What is the function of UCRs (contd)?
  • Most known ones enhancers
  • A very small fraction pre-microRNA genes
  • can be easily distinguished from putative
    enhancer elements
  • A distinct conservation pattern between mammals
    and fish
  • Different binding site pattern composition than
    most other UCRs

Pre-miRNA gene
13
Putative conserved regulatory elements show
distinct motif compositions
MOST UCRs CONTAIN A HIGH DENSITY OF BINDING SITES
FOR KEY DEVELOPMENTAL TRANSCRIPTION FACTORS.
14
Can we recognize the neural ultraconserved
enhancers?
  • Most UCRs show a high overrepresentation of a
    number of putative transcription factor binding
    site motifs
  • General homeobox motifs, Sox (SRY) and Oct (POU)
  • Sox2 and Oct3/4 are highly expressed in mouse ES
    cells (Nagano K et al (200 5) Proteomics
    51346-61)
  • Oct and Sox transcription factors control many
    different aspects of neural development and
    embryogenesis, often binding to adjacent sites on
    DNA

Williams, D. C. et al. (2004) J. Biol.
Chem.2791449-1457
15
The SPH (Sox-Oct-Homeobox) modelA simple screen
to select UCRs governing neural expression
  • The model measures the combined probability of
    ocurrence of Sox, Oct(POU) and core homeobox
    motifs in 400 bp regions centered on UCRs

16
SPH-enriched UCRs around genes coding for known
neural patterning regulators
17
SPH-model detects genomic regions with neural
expression
18
UCRs common to all metazoan genomes?
Drosophila
Vertebrates
ETS
TIR
Homeobox
Paired
Cfc4
NHR ligand
von Willebrand factor type C domain
Laminin g
Imunoglobulin
Fibronectin type III
Cadherin
Cyclic nucleotide-binding domain
Neurotransmitter-gated ion-channel transmembrane region
Ligand-gated ion channel
Neurotransmitter-gated ion-channel ligand binding domain
BTB/POZ domain
19
UCRs in Drosophila
  • twist locus

20
  • Core promoters and responsiveness to long-range
    enhancers

21
A textbook-type core promoter
TATA
GC-box
CAAT
22
Large-scale mapping oftranscription start sites
using CAGE (Cap Analysis of Gene Expression)
  • Like SAGE, but 5 ends of cDNAs (using RIKEN 5
    GTP cap trapping technology)
  • Large-scale sequencing of 5 ends (CAGE tags of
    20-22 nucleoties) of mRNAs
  • 6.5 million mouse and 4 million human CAGE tags
    uniquely mapped to genome

23
CAGE tags mapped to genome demarcate
transcription start sites
Myosin heavy chain 3 (Myh3), 1725 CAGE tags
TATA
Betaine-homocysteine methyltransferase (Bhmt),
1659 CAGE tags
TATA
24
CAGE tags mapped to genome demarcate
transcription start sites
Oxoglutarate dehydrogenase (Ogdh), 1496 CAGE tags
Adenylosuccinate lyase (Adsl), 278 CAGE tags
25
Single-peak (SP) vs. broad (BR) core
promotersshape classes of core promoters
26
Association of shape classes with different core
promoter elements
A SP BR PB MU
TATA (all) 3.1e-73 1.9e-16 1.8e-10 2.4e-09
CCAAT (all) 0.04 0.42 0.37 0.49
GC (all) 1e-4 0.20 0.40 0.33
CpG (all) 1.0e-137 1.4e-65 8.7e-06 0.02

B SP BR PB MU
TATA (no CpG) 2.6e-77 1.6e-16 2.8e-16 1.0e-09
CCAAT (no CpG) 6.8e-23 9.2e-16 0.11 0.42
GC (no CpG) 7.8e-25 5.9e-18 0.48 0.35
CpG (no TATA, CCAAT or GC) 4.8e-45 4.7e-17 3.4e-05 0.87
SP (single peak) promoters strongly associated
with TATA boxes BR (broad) promoters strongly
associated with CpG islands and absence of TATA
box
27
Association of shape classes with tissue
specificity
Tissue SP BR PB MU
adipose 1.98P0.14 0.27P0.11 1.58P0.29 0.44P0.47
cns 1.02P0.86 0.69P0.0020 1.22P0.10 1.23P0.10
embryo 4.11P1.21e-22 0.00P6.22e-08 0.30P0.0099 0.00P8.096e-05
liver 2.15P3.56e-21 0.41P1.14e-14 0.71P0.0053 1.07P0.56
lung 2.41P1.37e-10 0.23P1.42e-08 1.11P0.61 0.58P0.049
macrophage 1.39P0.024 0.64P0.0041 0.89P0.59 1.26P0.14
other 3.59P3.87e-19 0.11P4.029e-07 0.33P0.0049 0.36P0.016
testis 4.36P7.70e-06 0.00P0.058 0.00P0.21 0.00P0.21
SP (single peak) promoters (and by association,
TATA-box promoters) strongly associated with
tissue-specific genes (except brain) BR (broad)
promoters (and, by association, CpG island
overlapping TATA-less promoters) strongly
associated with housekeeping genes (and
developmental regulatory genes)
Overrepresented 1e-10 1e-06 0.0001 0.01 1.00
Underrrepresented 1e-10 1e-06 0.0001 0.01 1.00
28
Conclusions
  • Key vertebrate (and most likely invertebrate)
    transcription factor genes are controlled by
    arrays of highly conserved regulatory elements
    the arrays ofter span more than a megabase around
    their target genes.
  • Highly conserved regulatory elements contain
    clusters of putative transcription factor binding
    sites indicative of their function, enabling the
    building of predictive models.
  • There are fundamentally different classes of
    vertebrate core promoters, differing in mechanism
    of transcriptional initiation and choice of TSS,
    tissue specificity, evolutionary dynamics and
    responsivneness to long-range enhancers.

29
Acknowledgements
  • Lenhard Group at CGB, Karolinska Institutet (now
    at Bergen Center for Computational Science,
    University of Bergen)
  • Pär Engström (PhD student)
  • Ying Sheng (PhD student)
  • Albin Sandelin (Postdoc) now at RIKEN GSC
  • Sara Bruce (Project student) now at Dept. Of
    Bioscience, Karolinska Institutet
  • Collaborators
  • RIKEN Genome Science Center
  • Piero Carninci and the members of FANTOM3
    Consortium
  • Wyeth Wasserman group (University of British
    Columbia)
  • Shannan Ho Sui, David Arenillas
  • Johan Ericson group (CMB, Karolinska Institutet)
  • Peter Bailey, Joanna Klos
Write a Comment
User Comments (0)
About PowerShow.com