Title: Computational Epigenetics
1Computational Epigenetics
Shen Jean Lim2, Tin Wee Tan2, Joo Chuan
Tong1,2 Â 1Data Mining Department, Institute for
Infocomm Research 2Department of Biochemistry,
Yong Loo Lin School of Medicine, National
University of Singapore
2Epigenetics
- Study of mitotically and/or meiotically heritable
changes in gene expression that are not encoded
in the DNA sequence - Mediated through chemical modifications of DNA
and histones - Alterations in chromatin structure that blocks or
promotes transcriptional initiation
Source http//www.neb.com/nebecomm/tech_referenc
e/epigenetics/overviews.asp
3Levels of Chromatin Packing
4Histones
Source http//www.mun.ca/biology/scarr/Histone_P
rotein_Structure.html
- In eukaryotes, nuclear DNA is found assembled
into chromatin by histones. - Ocatameric histone core is made up of two
molecules of each histone H2A, H2B, H3 and H4 - Packages approximately 147 base pair segments of
nuclear DNA into nucleosome core particles (NCP).
- Histone H1 further condenses the DNA by binding
the linker segments between the nucleosome core
particles
5Histone Modifications
- Occur on flexible N-and C-terminal tails of the
histones or within their globular folds in the
nucleosome core - Histone modifications act individually or
combinatorially - Alter chromatin structure
- Affect transcription, repair, replication and
chromatin condensation and ultimately gene
regulation
6Histone Modifications
Source http//chemistry.gsu.edu/faculty/Zheng/pi
ctures/nucleosome.jpg
- Enzymes involved in this process include DNA
methyltransferases,histone deacetylases, histone
acetylases, histone methyltransferases, histone
demethylases etc
7Epigenetics Importance
- Epigenetic modulations are essential in many
developmental processes - Tissue formation, organ formation and
allele-specific gene expression - Changes in normal epigenetic patterns can
deregulate patterns of gene expression, resulting
in adverse clinical outcomes - Psychiatric disorders, obesity , schizophrenia,
Beckwith-Wiedemann syndrome, Alzheimers disease
8Epigenetics as a research field
- Highly combinatorial in nature due to the array
of diverse control elements - The human genome contains 23,000 genes that are
active in specific cells at precise moments!! - Post-translational modification may affect almost
every solvent-accessible histone residue,
allowing a high level of variability for signal
transduction events
9Epigenetics as a research field
- Enormous combinatorial complexity requires large
number of experiments for systematic studies
(e.g. DNA methylation profiling) - Large-scale initiatives for the systematic
mapping of epigenomic and related data - Alliance for the Human Epigenome and Disease
(AHEAD) Task Force - The ENCyclopedia Of DNA Elements (ENCODE) Project
Consortium - etc etc.
10Computational Epigenetics
- Huge quantity of experimental data generated
requires appropriate bioinformatics
infrastructure for meaningful analysis, modeling
and prediction of DNA-protein interactions - General and specialist databases
- Basic bioinformatics tools
- Sophisticated algorithms
11Computational Epigenetics
- General and specialist databases
- Basic bioinformatics tools
- Sophisticated algorithms
12Computational Epigenetics
- General and specialist databases
- Basic bioinformatics tools
- Sophisticated algorithms
13General databases
- Large amount of data relevant for epigenetic
research are available in scientific literature,
molecular databases and case reports. - PubMed - primary source of data, provides
high-level descriptions of biological entities
and processes
14General databases
Databases described in the Nucleic Acids Research
online Molecular Biology Database Collection
(March 2009)
Total 1,078 molecular biology databases
Galperin MY, Cochrane GR. Nucleic Acids Research
annual Database Issue and the NAR online
Molecular Biology Database Collection in 2009,
Nucleic Acids Res 200937D1-4
15General databases
- Major molecular databases
- GenBank
- DNA Data Bank of Japan
- European Molecular Biology Laboratory
- serve as worldwide repositories for
nucleotide sequences of different origins
16General specialist databases
- Databases for cell-, disease-, organism- and
stage-specific gene expression - General
- NCBIs Gene Expression Omnibus
- Specialist
- Gene Expression Nervous System Atlas
- StemBase
- Etc etc
- Allows for the identification of dynamic changes
in gene expression in different cell types
17Epigenetics databases
- DNA methylation databases
- For the study of methylation content data and
methylation patterns - MethDB, MethPrimerDB
- Histone databases
- Information on histones and histone
fold-containing proteins - Important for research in the compaction and
accessibility of eukaryotic and probably archaeal
genomic DNA - National Human Genome Research Institute
(NHGRI)s Histone Database - Cancer methylation databases
- Analyzing irregular methylation patterns that are
correlated with various cancers - PubMeth, MeInfoText
18Computational Epigenetics
- General and specialist databases
- Basic bioinformatics tools
- Sophisticated algorithms
19Basic Bioinformatics Tools
- Traditional sequence analysis tools allow for the
inference of functional, structural, or
evolutionary relationships between DNA or protein
sequences - E.g. ClustalW , BLAST (Basic Local Alignment
Search Tool) software suite, BLAT (BLAST-Like
Alignment Tool) and TreeView - Diverse applications involving
- Homology searches of ortholog candidates for the
KEGG/GENES database - Predicting the secondary structures of histone
deacetylases - Homology modeling of DNA methyltransferases
- Optimizing the activities of histone deacetylase
inhibitors
20Computational Epigenetics
- General and specialist databases
- Basic bioinformatics tools
- Sophisticated algorithms
21Sophisticated algorithms
- Computational models have been used extensively
to support various epigenome mapping initiatives - Identification of ChIP enrichment sites
(ChIPOTle, TileMap, Ringo) - Accurate mapping of short sequence reads
generated by ChIP-seq to the reference genome
(Blastn, BLAT) - Algorithms for short-read assembly (QPALMA,
AMOScmp) - Data processing and quality assessment of
bisulfite sequencing
22Major Research Areas in Computational Epigenetics
DNA Methylation
Histone Modifications
Cancer Informatics
Stem Cell Informatics
23Major Research Areas in Computational Epigenetics
DNA Methylation
Histone Modifications
Cancer Informatics
Stem Cell Informatics
24Research area DNA Methylation
- Modeling and prediction of DNA methylation
patterns - Prediction of methylation sites
- Focused on arginine and lysine methylations
25Research area DNA Methylation
- Epigenome prediction pipeline
- Integrates DNA methylation, polymerase II
preinitiation complex binding, histone H3K4 di-
and trimethylation, histone H3K9/14 acetylation,
DNase I hypersensitivity and SP1 binding
26Research area DNA Methylation
- Limitation
- Lack of publicly available experimental data
for model construction
27Major Research Areas in Computational Epigenetics
DNA Methylation
Histone Modifications
Cancer Informatics
Stem Cell Informatics
28Research area Histone Modifications
- Analysis, modeling and prediction of histone
modifications in DNA sequences - Machine-learning algorithms for locating
histone-occupied and acetylation, methylation and
phosphorylation positions in DNA sequences - Discovery of activating and repressive histone
modifications - Structure-based techniques for the design of
epigenetic inhibitors - Functional annotation of epigenetic factors
29Research area Histone Modifications
30Major Research Areas in Computational Epigenetics
DNA Methylation
Histone Modifications
Cancer Informatics
Stem Cell Informatics
31Research area Cancer Informatics
- Identify novel methylation patterns that
correlate with progression to malignancy - CancerDip Consortium
- Abnormal DNA methylation within CpG islands
32Research area Cancer Informatics
- Classifying cancer subtypes based on epigenetic
marks
33Major Research Areas in Computational Epigenetics
DNA Methylation
Histone Modifications
Cancer Informatics
Stem Cell Informatics
34Research area Stem Cell Informatics
- Study epigenetic marks in stem cells
- DNA methyltransferases and Polycomb/Trithorax
group response elements (PRE/TRE) possess
epigenetic signatures that are important for the
differentiation of both human ES cells and germ
line stem cells - Stem cells are target cells for cancer
- Epigenetic changes may occur long before they are
distinguishable as tumor cells
35Research area Stem Cell Informatics
- Analyses of up- and down-regulated gene clusters
- Provide valuable information on the effect of
exogenous control on ES cell state in human
36Conclusion
- Realizing the full benefits of the
informatics revolution will require significant
advances in the efficiency of which new data is
discovered, processed, interpreted and made
accessible to researchers
37Conclusion
- Different bioinformatic and mathematical
modeling approaches, in combination with advances
in computational infrastructures, clearly could
lead to improved understanding of
posttranslational modifications at multiple
levels of complexity, from the sub-cellular
molecular level, to the cellular and systems
level, and beyond