Title: TraFaC
1TraFaC
- Detecting Cis-Regulatory Element Clusters in
Orthologous and Coordinately Expressed Genes
Anil Jegga Pediatric Informatics CHRF-CHMCC and UC
http//trafac.chmcc.org
2Outline..
- Basic Problem
- Background
- Methodology
- Some Applications, Some Examples
- Known Regulatory Regions
- Co-expressed Genes
- Potential Novel Regulatory Regions
- Some Bottlenecks
- Future Imminent and Long-term
3Problem and the Approach..
- Our Approach
- Sequence Similarity Phylogenetic Footprints
- Transcription Factor (TF) Binding Sites
- Binding Site Cluster Similarity (merge 1 2)
- Identifying potential regulatory regions
- Deciphering secret passwords for cell-type
specific gene expression - Searching for cis-modules in genomic DNA
4Courtesy http//www.bios.co.uk
5Basic Structure of Transcription Regulatory
Machinery
DNA
Introns
mRNA
3 UTR
Transcription factors
TBP
Exons
Eukaryotic Promoter
6RNA POLYMERASE II
REPRESSOR COMPLEX 1
Transcription start site
GENE
DNA
RE
Genes regulatory region
ENHANCER COMPLEX
No. Because the DNA bends extensively.
7The Motivation
Comparative Analysis of Homologous Sequences
After gt900 Myrs of divergence essential
functional elements remain conserved! E.g. PAX6
Human Vs Puffer fish (Fugu rubripes)
http//bio.cse.psu.edu/Multipipmaker
8More Examples.
N-mycprotein Oncogene 2nd Intron
HMG-1 3UTR
9Regulation of Gene Expression An Overview
- Transcriptional
- Tissue-specific transcription factors
- Direct binding of hormones, growth factors, etc.
- Use of alternative promoters
- Post-transcriptional
- Alternative splicing
- Alternative polyadenylation
- Tissue-specific RNA editing
- Translational control mechanisms
- Epigenetic mechanisms Chromatin structure
10Some Basic Questions.
- What is Gene Expression?
- It is the ability of a gene to produce a
biologically active protein - What is a Promoter?
- Combination of short sequence elements to which
RNA polymerase binds in order to Initiate
transcription. - What is an Enhancer?
- Set of short sequence elements which Stimulate
transcription. Function independent of position
or orientation. - What is a Silencer/Repressor?
- Set of short sequence elements which Suppress
transcription.
11Points to Remember.
- Cis-regulators are on the same chromosome as the
gene. - Trans-regulators are transcription factors that
act on many different genes, on different
chromosomes. - Transcription factors are proteins with
DNA-binding domains, which bind to specific DNA
sequences. - Transcription factors usually do not work alone.
12Methodology.
Genomic DNA Sequences
PipMaker (To find conserved regions)
MatInspector/Match (To find putative TF binding
sites)
TRAFAC
13Frequency Matrix for muscle gene MEF2 sites
10.0 0.0 0.0 0.0 22.0 0.0 6.0 2.0 3.0 4.0
22.0 10.0 0.0 2.0 12.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 9.0 20.0 2.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 10.0 3.0 0.0
8.0 22.0 0.0 22.0 16.0 20.0 19.0 18.0 0.0
2.0 a g c t a t t t
t t a a Position Weight Matrix (PWM)
0.74 -2.51 -2.51 -2.51 1.80 -2.51 0.10 -1.07
-0.68 -0.37 1.80 0.74 -2.51 -1.07 0.98 -2.51
-2.51 -2.51 -2.51 -2.51 -2.51 -2.51 -2.51 -2.51
0.61 1.67 -1.07 -2.51 -2.51 -2.51 -2.51 -2.51
-2.51 -2.51 -2.51 0.74 -0.68 -2.51 0.46 1.80
-2.51 1.80 1.36 1.67 1.60 1.52 -2.51
-1.07 Frequency Matrix is generated by aligning
the mef2 sites gt aldolase A enhancer, Hum
(X06351 1956-2016 MEF2 at 30/31) gaatgtcaggggcttc
aggtttccctaaatataggtccctgccagaggatccgtggcggg gt
aldolase A enhancer, Rat (X04260 421-480 MEF2 at
30/31) gaatgtcaggggcctcaggtttcactaaatataggtccttgcc
gcgggattcgtggtggg gt desmin enh, myotube-spec
region, Hum (M63391 2257-2316 MEF2
30/31) tgtttcccagccatgcgttctcctctataaatacccgctctgg
tatttggggttggcagc gt desmin enh, myotube-spec
region, Mouse (Z18892 89-148 MEF2
30/31) cctagctgggcctttccttctcctctataaataccagctctgg
tatttcgccttggcagc So on.22 Sequences
14(No Transcript)
15(No Transcript)
16(No Transcript)
17Functionalities of Trafac
18Cycle of Events in Identifying Potential
Regulatory Regions
mRNA ESTs
19ADA Known Regulatory Regions
20Known Regulatory Regions
21Coordinately Expressed Genes Bates et al., 2002
22Constitutionally Similar Cis-elements Clusters in
the Upstream Region
23Coordinately Expressed Genes Tend to Share
Cis-elements
24Potential Novel regulatory Regions
25Identification of Potential Regulatory Regions
26Tissue-specific Genes Share Cis-elements
27Road-Blocks.
- Unknown TF Binding Sites Analysis based on
Transfac Library - Pre-Trafac Analysis
- Ortholog Genomic Sequences
- Input Files
28Conclusions.
- Phylogenetically conserved non-coding regions are
good indicators of regions of gene regulatory
functions. - Combinations of TF binding sites in the same
relative order and distance can be reliable
indicators of potential regulatory regions. - Coordinately expressed genes tend to share
similar TF binding sites.
29A Word of Caution.
- Trafac is NOT a solution-provider It ASSISTS in
finding solutions. It can be an effective
filtration tool. - Any computationally identified regulated region
has to be validated experimentally. - Transcriptional regulation is NOT the only mode
of control of gene expression.
30Present Status of Trafac Database
31- Addition of more groups of genes to the database
A Continuous process - Database of regulatory modules (Cis-Mols)
Based on - Tissue Specificity
- Gene Ontology Functionalities
- Phenotype
- Gene Expression Pattern
- Multiple Comparison of Genomic DNA Sequences
- A loci comparison instead of individual genes
Whats Next?
32The Team
Acknowledgements
Bruce Aronow Shawn Sherwood Jim Carman Andrew
Pinski Anil Jegga
NIEHS U01 ES11038 Mouse Centers Genomics
Consortium
Howard Hughes Medical Institute