Title: DNA Copy Number Analysis
1DNA Copy Number Analysis
- Qunyuan Zhang,Ph.D.
- Division of Statistical Genomics
- Department of Genetics Center for Genome
Sciences - Washington University School of Medicine
- 03 - 25 2008
- GEMS Course M 21-621 Computational Statistical
Genetics
2Four Questions
- What is Copy Number ?
- What can Copy Number tell us?
- How to measure/quantify Copy Number?
- How to analyze Copy Number?
3What is Copy Number ?
- Gene Copy Number
- The gene copy number (also "copy number
variants" or CNVs) is the amount of copies of a
particular gene in the genotype of an individual.
Recent evidence shows that the gene copy number
can be elevated in cancer cells. For instance,
the EGFR copy number can be higher than normal in
Non-small cell lung cancer. Elevating the gene
copy number of a particular gene can increase the
expression of the protein that it encodes. - From Wikipedia www.wikipedia.org
4- DNA Copy Number
- A Copy Number Variant (CNV) represents a copy
number change involving a DNA fragment that is 1
kilobases or larger. - From Nature Reviews Genetics, Feuk et al. 2006
- DNA Copy Number ? DNA Tandem Repeat Number
(e.g. microsatellites) -
lt10 bases - DNA Copy Number ? RNA Copy Number
- RNA Copy Number Gene Expression Level
- DNA transcription
mRNA - Copy Number is the amount of copies of a
particular fragment of nucleic acid molecular
chain. It refers to DNA Copy Number in most
publications.
5What can Copy Number tell us?
- Genetic Diversity/Polymorphisms
- - restriction fragment length polymorphism (RFLP)
- - amplified fragment length polymorphism (AFLP)
- - random amplification of polymorphic DNA (RAPD)
- - variable number of tandem repeat (VNTR e.g.,
mini- and microsatellite) - - single nucleotide polymorphism (SNP)
- - presence/absence of transportable elements
-
- - structural alterations (e.g., deletions,
duplications, inversions ) - - DNA copy number variant (CNV)
- Association with phenotypes/diseases
genes/genetic factors
6 Genetic Alterations in Tumor Cells (DNA
Copy Number Changes)
7How to measure/quantify Copy Number?
8 SNP Array From Image to Copy Number
Tumor red intensity
Normal green intensity
more DNA copy number more DNA
hybridization higher intensity
Red lt Green Deletion (CNlt2) Red gt Green
Amplification (CNgt2) Red Green No
Alteration (CN2)
9Array CGH From Image to Copy Number
10How to Analyze Copy Number?
11- General Procedures for Copy Number Analysis
12 Background Adjustment/Correction
Reduces unevenness of a single chip Makes
intensities of different positions on a chip
comparable Before adjustment
After
adjustment
Corrected Intensity (S) Observed Intensity
(S) Background Intensity (B) For each region
i, B(i) Mean of the lowest 2 intensities in
region i
AffyMetrix MAS 5.0
13(No Transcript)
14 Normalization
Reduces technical variation between chips Makes
intensities from different chips
comparable Before normalization
After normalization
15(No Transcript)
16 Raw Copy Number Data
17 Individual Level Analysis
- Analysis for each individual sample (or each
sample pair) - Smoothing
- Significance test of CN amplification and
deletion - Boundary finding (smoothing and segmentation)
- CN estimation
18Smoothing via Sliding Window
19 Smoothing (sliding window30 snps)
Affymetrix
Chrom. 7
Chrom. 7
CN
CN
Mbp
Mbp
Illumina
Chrom. 7
CN
Mbp
20 Significance Test of CN ChangesAn Example
21Sliding Window Smoothing
22Normalization
23P-value calculation
24Calculate FDR for each window
25Select window (FDR lt 0.05)
26Another Example Intensities and Raw CNs, Chr. 1
(Piar101)Black Normal, Red Tumor,
Green Tumor- Normal
27Significance Test for Copy Number Changes
-log(p) values, TSP data, chr. 1, pair101
28Segmentation (break chrom. into CN-homologous
pieces)BioConductor R Packages
(www.bioconductor.org)GLAD package, adaptive
weights smoothing (AWS) methodDNAcopy package,
circular binary segmentation method
29CN Estimation Hidden Markov Model (HMM)
CNAT(www.affymetrix.com) dChip (www.dchip.org)
CNAG (www.genome.umin.jp)
position
hidden status (unknown CN )
observed status (raw CN log ratio of
intensities)
CN estimation finding a sequence of CN values
which maximizes the likelihood of observed raw
CN. Algorithm Viterbi algorithm (can be
Iterative) Information/assumptions below are
needed Background probabilities Overall
probabilities of possible CN values. P(CNx)
x0,1,2,3,4,, n (usually,nlt10) Transition
probabilities Probabilities of CN values of each
SNP conditional on the previous one. P(CN_i1xi
CN_ixj) x0,1,2,3,4,, or n Emission
probabilities Probabilities of observed raw CN
values of each SNP conditional on the
hidden/unknown/true CN status. P(log
ratioltxCNy)f(xCNy) xone of real numbers
y0,1,2,3,4, , or n
30HMM Estimation of CN for Chr. 1
(Piar101)Black Normal Intensities, Red
Tumor Intensities, Green Tumor- Normal Blue
HMM estimated CNs in Tumor Tissue
31 Population Level Analysis
- Analysis for the whole group (or sub-group) of
samples - Overall significance test
- Amplification and deletion frequencies
summarization - Common/concurrent region finding
32Raw CN Changes of Chr. 14(average over 400
pairs )
33Genome-wide Raw Copy Number Changes(sliding
window plot, averaged over 400 pairs )
34Sliding Window Test of Significance of CN
Changes -log(p) values, based on 400 pairs
35Visualization of Concurrent Regions of Chr.
14(400 pairs)
samples
positions
36Software
- Affymetrix Chips (www.affymetrix.com)
- Illumina Chips (www.illumina.com)
- CNAT(www.affymetrix.com)
- dChip (www.dchip.org)
- CNAG (www.genome.umin.jp)
- GenePattern www.broad.mit.edu/cancer/software/gen
epattern/ - BioConductor R Packages (www.bioconductor.org)
- GLAD package, adaptive weights smoothing (AWS)
method - DNAcopy package, circular binary segmentation
method