Title: Analysis of DNA Replication
1Analysis of DNA Replication
- Dutta Lab
- University of Virginia
- July 13th, 2005
2Previous Work
- Proc Natl Acad Sci U S A. 2005 May
3102(18)6419-24. Epub 2005 Apr 21, Temporal
profile of replication of human chromosomes. Jeon
Y, Bekiranov S, Karnani N, Kapranov P, Ghosh S,
MacAlpine D, Lee C, Hwang DS, Gingeras TR, Dutta
A. - Chromosome 21/22 replication profile was studied
in the above work. - The following analysis extends this work to the
ENCODE regions and develops new analysis
techniques
3Overview
- Temporal Profiling of DNA Replication
- Replication Tracks (Available on UCSC GB)
- TR50 calculation
- Segregation
- Correlations with
- Gencode
- AT content
- DnaseI
4Replication Tracks
- e.g. ENm005
- Generated with Affy GTrans utility
- regions exceeding 10-4 p-value.
- Difficult for direct correlations due to overlap
across different time points.
5TR50 Calculation
TR50 - Time at which 50 of the locus is
replicated In the example below, probe A has
a TR50 of 1.25hr (80 at 2hr, 0 at 0hr) probe
B has a TR50 of 6.33hr (100 at 8hr, 40 at 6hr)
Example
Probe Probe
6Specificity of Regions
- Some regions show a panS (not time point
specific) phase pattern of replication - To identify such regions we classify probes as
specific or non-specific - A probe is specific iff at least 50 of the total
signal appears in a single time point. - Utilizing this classification, we can identify
broad regions as specific or non-specific
7Segregation
- Various patterns of replication are seen in the
DNA replication tracks - We segregate regions into 4 classes
- Early Specific replication early in S-phase
- Mid Specific replication mid in S-phase
- Late Specific replication late in S-phase
- Pan-S Non-specific replication
8Region Classifier
- We pass a sliding window (10,000 bp) across each
region and accumulate within the window the ratio
of non-specific vs. specific probes and the
average TR50 - Once the ratio exceeds a threshold (e.g. 60) a
Pan-S region is started - In order to prevent thrashing, a tolerance (e.g.
10) is used and the Pan-S region is not ended
until we drop below thr-tol (e.g. 50)
9Specific Classification
- Within a specific region, we classify sub-regions
as early, mid, or late based on the average TR50 - For ENCODE regions
- 23 Pan-S
- 77 Specific
- 35 Early
- 38 Mid
- 27 Late
10Advantages of Segregation
- Much more suited to correlations with other data
than current replication tracks - Automated identification of non-specific regions
- Provides an independent method to compare with
the current replication tracks
11Examples
ENm005
ENm012
12Correlations Overview
- Segregation utility was just completed
- Not part of data freeze
- Currently working on correlations
- We will show previously completed correlations
with TR50 data - all Encode regions pooled together
13Gencode VS TR50
For each region we calculate Gene Density for
each base pair based on GENCODE annotations. Gene
Density was calculated using a sliding window of
50KB. Lowess smoothing was done with Gene Density
on the x-axis and tr50 on the y axis and a
spearman rank correlation was calculated
(-0.283). We also plot a density profile, which
calculates number of probes with a given Gene
Density.
TR50
Gene Density
Frequency ( of probes)
Gene Density
14AT content vs TR50
For each region we calculate AT content for each
base pair based on hg16. AT content was
calculated using a sliding window of 50bp. Lowess
smoothing was done with AT Content on the x-axis
and tr50 on the y axis and a spearman rank
correlation was calculated (1.0). We also plot a
density profile, which calculates number of
probes with a given AT content.
TR50
AT Content
Frequency ( of probes)
AT Content
15CorrelationDnaseI HS vs TR50
- For each DnaseI Hypersensitive site identified a
window is chosen around it and average tr50 value
is calculated - This data is pooled from all the regions and a
histogram of TR50s is plotted - Also as a background model a histogram of TR50
from all the regions is plotted - A Mann-Whitney Test is performed on these two
distributions
16CorrelationDnaseI HS with TR50
- DnaseI hypersensitive sites are associated with
lower tr50 values as compared to the total
distribution of the tr50s - The medians of the two distributions are
significantly different with a p-value of 0.0075
17Acknowledgements
- Anindya Dutta PI
- Neerja Karnani PostDoc
- Patrick Boyle PostDoc
- Chris Taylor Doctoral Student
- Ankit Malhotra Doctoral Student