CGH Data - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

CGH Data

Description:

Chromosome 8 (241 genes) in 10 cell lines and many tumor samples. Pre-processing CGHa Data ... loess procedure unreliable. Centering. Where is the center (log ratio 0) ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 21
Provided by: MarkRe80
Category:
Tags: cgh | data | loess

less

Transcript and Presenter's Notes

Title: CGH Data


1
CGH Data
  • BIOS 691-804

2
Chromosome Re-arrangements
3
Normal Human Variation
4
Array CGH Technology
5
Chromosome 8 (241 genes) in 10 cell lines and
many tumor samples
6
Pre-processing CGHa Data
  • QA Same as for expression
  • Normalization
  • Are values comparable across arrays?
  • Can noise be reduced?
  • Segmentation
  • Where do copy number aberrations start and stop?
  • Better estimates for how many copies

7
Normalization
  • Most copy numbers are 2
  • Centering necessary
  • Dynamic range varies
  • Mixtures of tumor with normal
  • Saturation not usually a problem
  • Few instances of 10X copy
  • Dye bias sometimes strong
  • loess procedure unreliable

8
Centering
  • Where is the center (log ratio 0)?
  • Sometimes modal copy number is 3
  • Variability in labeling and tissue extraction
  • CGH cant give direct measures of counts
  • Most researchers set modal copy to log-ratio of 0
  • Does it matter?
  • Take 3 as equivalent to 2 for comparison?

9
Dynamic Range
  • Ratios of signal are often less (sometimes much
    less) than actual ratios of copy numbers between
    samples

From Bilke et al, Bioinformatics, 2005
10
Fractional Copy Numbers
  • Often samples are mixtures of tumor and normal
  • Many tumors have two (or more) distinct clones
    with distinct karyotypes
  • Observed copy numbers may lie in between values
    corresponding to whole numbers

11
Probe Bias
  • If errors are random then plot of self vs self
    ratios should be random
  • Actual Corr gt 60
  • Clear bias!
  • Try to estimate it

12
Segmentation
  • Individual probe values are noisy
  • Most aberrations are segments
  • Most segments have many probes
  • Average neighboring probe values to better
    estimate segment value how far?

13
Segmentation
  • Issues
  • How to identify where a segment starts or stops
  • How to find these points efficiently

14
Noise and Signal
15
How to Find Segments?
  • Could be large copy number change over short
    interval or small change over large
  • Look for jumps in running averages
  • Distribution of jumps between probes
  • DNACopy is Maximum Likelihood estimate of change
    points, using all intervals
  • StepGram is efficient computation of (subset of)
    t-scores

16
Theory
  • Classical change-point test statistic
  • Let be values let
    be partial sums
  • Set , where
  • are the differences in levels before and after i
  • Now for segments in middle
  • Let
    , where
  • This is Circular Binary Segmentation
  • Implemented in DNACopy

17
DNACopy
  • In Bioconductor
  • Does ML identification of segments recursively
  • Apply procedure within identified segments
  • Double-checks points near the boundary
  • Does permutation testing to estimate null
    distribution
  • Often data are not Normal

18
StepGram
  • DNACopy is slow!
  • Could try to compute only a fraction of possible
    scores
  • StepGram tries to find a subset of most likely
    scores to compute
  • Much faster!
  • Some inaccuracies
  • Doesnt handle chromosome ends well

19
StepGram Method 1
  • Key Idea
  • Dont compute
  • all possible t-scores
  • Compute only those
  • likely to show
  • significant change
  • Bound the
  • estimated t-scores
  • in future based on
  • current t-scores

20
StepGram Algorithm 2
Write a Comment
User Comments (0)
About PowerShow.com