Algorithms for Smoothing Array CGH data - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Algorithms for Smoothing Array CGH data

Description:

Tumor Cell CGH Data Na ve ... scale is relative do expect only few levels breakpoints levels variance We could use better models given insight in tumor pathogenesis ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 31
Provided by: Kees5
Category:

less

Transcript and Presenter's Notes

Title: Algorithms for Smoothing Array CGH data


1
Algorithms forSmoothing Array CGH data
Kees Jong (VU, CS and Mathematics) Elena
Marchiori (VU, Computer Science) Aad van der
Vaart (VU, Mathematics) Gerrit Meijer
(VUMC) Bauke Ylstra (VUMC) Marjan Weiss (VUMC)
2
Tumor Cell
Chromosomes of tumor cell
3
CGH Data
? C o p y
Clones/Chromosomes ?
4
Naïve Smoothing
5
Discrete Smoothing
Copy numbers are integers
6
Why Smoothing ?
  • Noise reduction
  • Detection of Loss, Normal, Gain, Amplification
  • Breakpoint analysis
  • Recurrent (over tumors) aberrations may indicate
  • an oncogene or
  • a tumor suppressor gene

7
Is Smoothing Easy?
  • Measurements are relative to a reference sample
  • Printing, labeling and hybridization may be
    uneven
  • Tumor sample is inhomogeneous
  • vertical scale is relative
  • do expect only few levels

8
Smoothing example
9
Problem Formalization
  • A smoothing can be described by
  • a number of breakpoints
  • corresponding levels
  • A fitness function scores each smoothing
    according to fitness to the data
  • An algorithm finds the smoothing with the highest
  • fitness score.

10
Smoothing
breakpoints
variance
levels
11
Fitness Function
  • We assume that data are a realization of a
    Gaussian noise process and use the maximum
    likelihood criterion adjusted with a penalization
    term for taking into account model complexity

We could use better models given insight in
tumor pathogenesis
12
Fitness Function (2)
CGH values x1 , ... , xn
breakpoints 0 lt y1lt lt yN lt xN levels m1, . .
., mN error variances s12, . . ., sN2
likelihood
13
Fitness Function (3)
Maximum likelihood estimators of µ and s 2
can be found explicitly
Need to add a penalty to log likelihood
to control number N of breakpoints
penalty
14
Algorithms
  • Maximizing Fitness is computationally hard
  • Use genetic algorithm local search to find
    approximation to the optimum

15
Algorithms Local Search
  • choose N breakpoints at random
  • while (improvement)
  • - randomly select a breakpoint
  • - move the breakpoint one position to
    left
  • or to the right

16
Genetic Algorithm
  • Given a population of candidate smoothings
  • create a new smoothing by
  • - select two parents at random from population
  • - generate offspring by combining parents
  • (e.g. uniform crossover or union)
  • - apply mutation to each offspring
  • - apply local search to each offspring
  • - replace the two worst individuals with the
    offspring

17
Experiments
  • Comparison of
  • GLS
  • GLSo
  • Multi Start Local Search (mLS)
  • Multi Start Simulated Annealing (mSA)
  • GLS is significantly better than the other
    algorithms.

18
Comparison to Expert
algorithm
expert
19
Relating to Gene Expression
20
Relating to Gene Expression
21
Algorithms forSmoothing Array CGH data
Kees Jong (VU, CS and Mathematics) Elena
Marchiori (VU, CS) Aad van der Vaart (VU,
Mathematics) Gerrit Meijer (VUMC) Bauke Ylstra
(VUMC) Marjan Weiss (VUMC)
22
(No Transcript)
23
Conclusion
  • Breakpoint identification as model fitting to
    search for most-likely-fit model given the data
  • Genetic algorithms local search perform well
  • Results comparable to those produced by hand by
    the local expert
  • Future work
  • Analyse the relationship between Chromosomal
    aberrations and Gene Expression

24
Example of a-CGH Tumor
? V a l u e
Clones/Chromosomes ?
25
a-CGH vs. Expression
  • a-CGH
  • DNA
  • In Nucleus
  • Same for every cell
  • DNA on slide
  • Measure Copy Number Variation
  • Expression
  • RNA
  • In Cytoplasm
  • Different per cell
  • cDNA on slide
  • Measure Gene Expression

26
Breakpoint Detection
  • Identify possibly damaged genes
  • These genes will not be expressed anymore
  • Identify recurrent breakpoint locations
  • Indicates fragile pieces of the chromosome
  • Accuracy is important
  • Important genes may be located in a region with
    (recurrent) breakpoints

27
Experiments
  • Both GAs are Robust
  • Over different randomly initialized runs
    breakpoints are (mostly) placed on the same
    location
  • Both GAs Converge
  • The individuals in the pool are very similar
  • Final result looks very much like (mean error
    0.0513) smoothing conducted by the local expert

28
Genetic Algorithm 1 (GLS)
  • initialize population of candidate solutions
    randomly
  • while (termination criterion not satisfied)
  • - select two parents using roulette wheel
  • - generate offspring using uniform crossover
  • - apply mutation to each offspring
  • - apply local search to each offspring
  • - replace the two worst individuals with the
    offspring

29
Genetic Algorithm 2 (GLSo)
  • initialize population of candidate solutions
    randomly
  • while (termination criterion not satisfied)
  • - select 2 parents using roulette wheel
  • - generate offspring using OR crossover
  • - apply local search to offspring
  • - apply join to offspring
  • - replace worst individual with offspring

30
Fitness function (2)
CGH values x1 , ... , xn
breakpoints 0 lt y1lt lt yN lt xN
levels m1, . . ., mN
error variances s12, . . ., sN2
likelihood
Write a Comment
User Comments (0)
About PowerShow.com