Title: Methylation predictors
1Methylation predictors
2Background
- Methylation is assumed to target the majority of
CpG sites in the human genome. - As a consequence of mutational processes the
amount of CpG decreases along the human genome - However, there are regions that are rich in CpGs
and they tend to be less methylated (CpG
islands). - As in biology not all the CpG islands stays
unmathylated and it is not clear why.
3Goal of computational methylation
- Predict by DNA sequence which genomic regions
are methylated and which are not. In particular,
we wish to acquire the ability to predict which
CGIs are methylated and which are unmethylated in
different tissues. -
4Two fundamental papers (1)
- Predicting aberrant CpG island methylation,
- Feltus et al., PNAS,2003
- Proof of concept
- Methylated CGI and unmethylated CGIs have
different - intrinsic DNA properties.
5Two fundamental papers (2)
Methylator http//bio.dfci.harvard.edu/Methylator
/
6The scheme
- Since there is no good model of methylation
status of CGIs, the common approach is
statistical - Step 0 Classify regions according to your
favorite CpG island definition - Step 1 Find set of regions which are known to be
methylated and non methylated - Step 2Build a classifier that distinguish
between the CpG islands using different genomic
parameters. - Step 3 After defining the discriminate
parameters. Use them to predict the methylation
status - Step 4 Verify your best predictions in
experiment that show that you are right in at
least 90 of the cases.
7Step -1 receive large scale data sets, some how
8Step 0 define a CpG island criteria
- Gardiner-Garden
- GC content above 50,
- ratio of observed versus expected number of CpG
dinucleotides above 0.6 - more than 200 basepairs in length.
- Takai Johns
- GC content above 55
- Ratio above 0.65
- More than 500 basepairs in length
9Step 0 Define a CpG island criteria
- The previous are too sharp.
- Recently new criteria for defining a strength of
CpG islands has been suggested, which have
continuous values - Bock et. al. (Plos Computational biology, 2007).
Using other types of epigenetic information to
classify CGIs. For example histone modifications
some types of modifications are indicative to
transcriptional activation other to repression - Tanay et al. (PNAS, 2007).
- Method based on conservation
10Step 1 Map methylated unmethylated CGI
- Bock et al, (March 2006, Plos Genetics
Lymphocytes, HEP (Rakyan et al.,2004) - Fang Fang et al. (July 2006, Bioinformatics)
Brain, 30 Mb DNA (Rollins et al.,2006), HEP,
MethDB - Das et al.(July 2006, PNAS) similar to Fang
Fang
11Step 2 Build classifier2.1 Choose your DNA
attributes
- DNA sequence properties and patterns.
- Repeat frequency and distribution
- CpG island frequency and distribution
- Predicted DNA structure
- Gene and exon distribution
- Predicted transcription factor binding sites
- Evolutionary conservation
- Single nucleotides polymorphism (SNPs)
Bock et al.
12Step 2 Build classifier2.1 Choose your
attributes
13Step 2 Build classifier2.1 Choose your
attributes
14Step 2 Build classifier2.1 Choose your
attributes
- Fang et al. used
- GC
- CpG ratio
- TpG
- Overlapping of CGI with AluY
- 74 TFBS, using transfac matrices
15The top 4 discriminating TFBS
Fang et al. (2006)
16TFBS of known neural regulators found 3 fold
times more in unmethylated CGIs
- AP1 TF family regulates gene expression in
neural cells. They can only bind to unmethyalted
sites. - KROX- Egr. Regulate genes that are related to
neural plasticity. Known to bind to unmethylated
sites - ZF5 expressed in neural tissues. Its preference
to methylated sites is not known - FOXM
17Das et al. 17 features are enough
18Step 2 Build classifier2.2 Choose your method
- Bock et al
- SVM
- Fang et al
- compared different methods but have chosen SVM
- Das et al
- Similar to Fang
19Step 3 Train and predictHow good is your
prediction?
- TN- True Negative
- TP- True Positive
- FN- False Negative
- FP- False Positive
- SP-Specificity
- SE-Sensitivity
- ACC- Accuracy
- Correlation Coeff
20Step 3 Train and predictBock et al.
21Step 3 Train and predict Fang et al.
22Comparison
23(No Transcript)
24(No Transcript)