A segmentation algorithm for copy number data' - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

A segmentation algorithm for copy number data'

Description:

Genomes in a population are polymorphic, giving rise to the diversity and variation. Parts of the genome are deleted (hemi- or homo-zygously) with a decrease in copy ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 23
Provided by: raoulsam
Category:

less

Transcript and Presenter's Notes

Title: A segmentation algorithm for copy number data'


1
A segmentation algorithm for copy number data.
  • Blissfully Short (we promise!)
  • Archisman Rudra and Raoul-Sam Daruwala

2
Biological Significance
  • Genomes in a population are polymorphic, giving
    rise to the diversity and variation
  • Parts of the genome are deleted (hemi- or
    homo-zygously) with a decrease in copy number or
    amplified with an increase copy number.
  • We assume there is a normal copy number value,
    which represents regions with no deletions or
    amplifications.

3
A Bayesian Approach.
  • Priors Deletion Amplification
  • Data Priors Noise
  • Goal Find the most plausible hypothesis of
    regional changes and their associated copy numbers

4
Prior Structure (I)
  • The prior is a probability distribution over the
    structure described below.
  • Given N probes, we assume the data is divided
    into k sub-intervals Ij(µj,i).
  • Each sub-interval has its own mean value, µj. and
    i is the last probe index in the interval.
  • µj may be the normal mean value global mean.

5
Prior Structure (II)
  • The prior depends on two parameters pe and pb.
  • pe is the probability of a particular probe being
    normal.
  • pb is the average number of intervals per unit
    length.

6
Prior Structure (III)
  • We define

as the prior distribution where global
number of probes with the global mean
value and, local number of remaining
probes.
The data is modeled by adding independent
Gaussian noise to this prior structure. In each
interval Ij, the data is modelled as
7
Likelihood Function
  • The µ values of non-global probes are unknown.
  • We estimate these µ values using the sample mean
    for that interval.
  • Our Bayesian solution maximizes L to yield the
    optimal segmentation

8
A dynamic programming algorithm.
  • Extension
  • Adds a new interval to the end.
  • Likelihood function can be incrementally
    computed

9
Dynamic Programming (II)
Let Opti be the optimal segmentation of the first
i probes. Let WSi be the working set for
computing Opti
10
A reasonable choice of priors yields good
segmentation.
11
Raising the value of p_e causes more points to be
classified as part of a normal interval.
12
A reasonable choice of priors yields good
segmentation.
13
By raising p_b those points which arent normal
are segmented very aggressively.
14
Selection of Priors
  • The choice of pe and pb is critical in
    obtaining a reasonable result.
  • Two kinds of errors
  • Goodness of fit Too few segments
  • Over-fitting More segments than necessary.
  • We minimize the maximum likelihood value OR test
    for over-fitting using standard statistical
    tests. F-test.

15
Prior SelectionMinimax Criteria
  • Choose values of Pe and Pb which minimize the
    maximum likelihood value.

16
Chromosome 8
(pe,pb) max at (0.55,0.01)
17
Prior Selection F criterion
  • We want to ensure that for every break introduced
    that we are not over-fitting the data.
  • For each break we have a T2 statistic and the
    appropriate tail probability (p value) calculated
    from the distribution of the statistic. In this
    case, this is an F distribution.
  • For the whole segmentation we take the minimum
    p-value for each break.
  • The best (pe,pb) is the one that leads to the
    maximum min p-value.

18
Chromosome 8
(pe,pb) max at (0.55,0.01)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com