Algorithms for Smoothing Array CGH data - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Algorithms for Smoothing Array CGH data

Description:

Tumor Cell CGH Data Na ve ... scale is relative do expect only few levels breakpoints levels variance We could use better models given insight in tumor pathogenesis ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 31

Provided by: Kees5

Category:

more less

Transcript and Presenter's Notes

Title: Algorithms for Smoothing Array CGH data

1
Algorithms forSmoothing Array CGH data
Kees Jong (VU, CS and Mathematics) Elena
Marchiori (VU, Computer Science) Aad van der
Vaart (VU, Mathematics) Gerrit Meijer
(VUMC) Bauke Ylstra (VUMC) Marjan Weiss (VUMC)
2
Tumor Cell
Chromosomes of tumor cell
3
CGH Data
? C o p y
Clones/Chromosomes ?
4
Naïve Smoothing
5
Discrete Smoothing
Copy numbers are integers
6
Why Smoothing ?

Noise reduction
Detection of Loss, Normal, Gain, Amplification
Breakpoint analysis

Recurrent (over tumors) aberrations may indicate
an oncogene or
a tumor suppressor gene

7
Is Smoothing Easy?

Measurements are relative to a reference sample
Printing, labeling and hybridization may be
uneven
Tumor sample is inhomogeneous

vertical scale is relative
do expect only few levels

8
Smoothing example
9
Problem Formalization

A smoothing can be described by
a number of breakpoints
corresponding levels
A fitness function scores each smoothing
according to fitness to the data
An algorithm finds the smoothing with the highest
fitness score.

10
Smoothing
breakpoints
variance
levels
11
Fitness Function

We assume that data are a realization of a
Gaussian noise process and use the maximum
likelihood criterion adjusted with a penalization
term for taking into account model complexity

We could use better models given insight in
tumor pathogenesis
12
Fitness Function (2)
CGH values x1 , ... , xn
breakpoints 0 lt y1lt lt yN lt xN levels m1, . .
., mN error variances s12, . . ., sN2
likelihood
13
Fitness Function (3)
Maximum likelihood estimators of µ and s 2
can be found explicitly
Need to add a penalty to log likelihood
to control number N of breakpoints
penalty
14
Algorithms

Maximizing Fitness is computationally hard
Use genetic algorithm local search to find
approximation to the optimum

15
Algorithms Local Search

choose N breakpoints at random
while (improvement)
- randomly select a breakpoint
- move the breakpoint one position to
left
or to the right

16
Genetic Algorithm

Given a population of candidate smoothings
create a new smoothing by
- select two parents at random from population
- generate offspring by combining parents
(e.g. uniform crossover or union)
- apply mutation to each offspring
- apply local search to each offspring
- replace the two worst individuals with the
offspring

17
Experiments

Comparison of
GLS
GLSo
Multi Start Local Search (mLS)
Multi Start Simulated Annealing (mSA)
GLS is significantly better than the other
algorithms.

18
Comparison to Expert
algorithm
expert
19
Relating to Gene Expression
20
Relating to Gene Expression
21
Algorithms forSmoothing Array CGH data
Kees Jong (VU, CS and Mathematics) Elena
Marchiori (VU, CS) Aad van der Vaart (VU,
Mathematics) Gerrit Meijer (VUMC) Bauke Ylstra
(VUMC) Marjan Weiss (VUMC)
22
(No Transcript)
23
Conclusion

Breakpoint identification as model fitting to
search for most-likely-fit model given the data
Genetic algorithms local search perform well
Results comparable to those produced by hand by
the local expert
Future work
Analyse the relationship between Chromosomal
aberrations and Gene Expression

24
Example of a-CGH Tumor
? V a l u e
Clones/Chromosomes ?
25
a-CGH vs. Expression

a-CGH
DNA
In Nucleus
Same for every cell
DNA on slide
Measure Copy Number Variation

Expression
RNA
In Cytoplasm
Different per cell
cDNA on slide
Measure Gene Expression

26
Breakpoint Detection

Identify possibly damaged genes
These genes will not be expressed anymore
Identify recurrent breakpoint locations
Indicates fragile pieces of the chromosome
Accuracy is important
Important genes may be located in a region with
(recurrent) breakpoints

27
Experiments

Both GAs are Robust
Over different randomly initialized runs
breakpoints are (mostly) placed on the same
location
Both GAs Converge
The individuals in the pool are very similar
Final result looks very much like (mean error
0.0513) smoothing conducted by the local expert

28
Genetic Algorithm 1 (GLS)