Title: Cancer Genomics
1Cancer Genomics
Richard K. Wilson, Ph.D.Washington
UniversitySchool of Medicine
rwilson_at_watson.wustl.edu
2Human Genome v1.0
Cancer Genomics
Ancillarygenomesmousechimpetc.
Discovery
TechnologySoftware toolsInfrastructure
CancerOther diseases
3PCR-based re-sequencing
list of candidate genes
large collection of patient samples
4EGFR mutations in NSCLC
Tyrosine kinase
EGF ligand binding
autophos
K
DFG Y
Y
Y
Y
TM
745
Y869
718
964
835
776
858
947
K
DFG R Y
GXGXXG
R
H
M
LREA
Most TKI responders have EGFR mutations Study
1 8/9 (89) vs. 0/7 controls Study 2 5/5
(100) vs. 0/4 controls Study 3 19/24 (79)
vs. 0/20 controls
5Tumor Sequencing Project
600 genes of interest
200 lung adenocarcinoma samples
- Sequencing Centers BCM-HGSC, BI, WUGSC
- Cancer Centers MSKCC, DFCI, SCC, MDA
6TSP Target List
- Too expensive to sequence the whole genome
therefore, focus on drugable targets. - For lung adenocarcinoma TSP 600 genes (exons
only) - Receptor tyrosine kinases (e.g. EGFR)
- Selected serine-threonine kinases
- Known oncogenes
- Known tumor suppressor genes
- EGFR pathway genes
- DNA repair genes
- Etc.
7SNP Arrays
8SNP Arrays
9DNA Chips/SNP Arrays
10Lung Adeno Genomic EventsSNP Array Analysis
Weir et al. Nature (2007)
11Lung Adeno Genomic Events
Weir et al. Nature (2007)
12Lung Adeno Genomic Events
Weir et al. Nature (2007)
13Lung Adenocarcinoma Amplifications
Weir et al. Nature (2007)
14Mutations in lung adenocarcinoma
- KRAS and TP53 Are Mutated in About 1/3 of Tumor
Samples - Indels have not been included in the analysis
15Mutations in TP53, ERBB3, and AKT3 appear to
correlate with tumor grade
N24
N85
N71
Mutation
16Correlations between mutations and clinical
features
- Mutations in PDGFRA, PTEN, NTRK1 and PRKDC show
positive correlation with tumor stage. - Mutations in LRP1B, PRKDC, TP53, and APC
correlate with the solid tumor histological
subtype of lung adenocarcinoma. - High correlation of mutations in EGFR and MYO3B
with never smoker and mutations in KRAS and LRP1B
with smokers.
17EGFR mutations in glioblastoma
- Screen of kinase domains in glioblastoma?no
recurrent mutations - But
119 Lung Tumors no EC mutations 270 HapMap
Normals no EC mutations
18Genomic Studies of Cancer
- Hypothesis-driven (biased)
- Gene sets with related functions kinome,
phosphatome - Genes mutated in other cancers
- Closely related genes
- Investigator-driven ideas
- Data-driven (unbiased)
- Use genomic platforms to identify loci with
recurrent somatic alterations - Array-based RNA profiling
- Array CGH
- Array-based SNP genotyping
? R.K.Wilson 2007
19Acute myelogenous leukemia
- Project initiated in 2002.
- Primary tumors, matched normal tissue (i.e.,
germline variants vs. somatic mutations) - Discovery set (46 tumors) Validation set
(94 tumors) - Initial target list 450 genes
- Orthogonal technologies (CGH arrays, expression
profiling, etc.) for genome characterization and
to detect additional sequencing targets.
20Acute myelogenous leukemia
- FLT3 29
- NPM1 25
- NRAS 9.6
- PTPN11 4
- RUNX1 4
- GCSFR 4
- Others 2-3
21Is there a better approach?
- What are we missing outside of the exons?
- PCR-based re-sequencing
- Relatively expensive
- Diploid (at best) low coverage
? R.K.Wilson 2007
22Solexa/Illumina 1G Analyzer
23Solexa/Illumina 1G Analyzer
Illumina flow cell
- Acts as the microfluidic conduit for cluster
generation and sequencing reagents. - 8-lane flow cell configuration.
- Separate libraries can be sequenced in each lane,
or the same library in all. - 60M clusters are sequenced per flow cell.
24Next Generation Sequencing Technologies
25AML Whole Genome Sequencing
Data types
- Whole genome sequence (tumor genome) Solexa
- FL cDNA normalized library Solexa 454
- Whole genome sequence (epidermal genome) Solexa
- Compare sequence to previously identified
mutations. - Compare increasing coverage levels to
heterozygous SNPs from Affy/Illumina arrays for
coverage evaluation. - Devise strategic approaches to find novel
variants validate and characterize.
Analysis plans
26933124
- 57 y/o Caucasian female
- De novo M1 AML
- 100 blasts in initial BM sample
- Relapsed and died at 11 months
- Normal cytogenetics
- No LOH on Affy 500K SNP array
- Informed consent for whole genome sequencing
27? R.K.Wilson 2007
28(No Transcript)
29AML Whole Genome Sequencing
- As of 1/28/08
- 75 Solexa runs completed (32 bp reads)
- 62 billion bp (22X haploid coverage)
- 2,123,143 sequence variants detected (Q30)
- 492,569 (23.2) are previously undiscovered SNPs
- 46,320 heterozygous (informative) SNPs from Affy
and Ilumina SNP arrays. - 77 of informative SNPs with both WT and variant
alleles were detected in the genome sequence. - 97.4 of informative SNPs of either allele were
detected in the genome sequence.
? R.K.Wilson 2007
30AML Whole Genome Sequencing
933124 genome sequence
2,123,143 variants
Intergenic 145,092
Genic 334,477
dbSNP 1,630,574
Splice_site 99
Other 329,322
Coding 5,056
Synonymous 1,222
Missense 3,402
Nonsense320
Nonstop 9
Only reporting Q30 variants Genic region gene
boundary /- 50kb
31AML Transcriptome Sequencing
Various cDNA library construction procedures
normalization schemes
454 cDNA sequencing Number of mapped cDNA reads
306,267 Solexa cDNA sequencing Number of
mapped reads 47,153,784
32Expressed genes variantgermline frequencies
AML Transcriptome Sequencing
- MYCBP2 1188345
- HSP90B1 6941347
- BCCIP 391394
- NCOR1 256268
- CHFR 23052
- DNAJ 2180
- PTPN11 1981
- NUMA1 1572
- CASPASE 7 145147
- HOX C6 1182
- PLEKHC1 11214
- NTRK3 11210
- CDC2 9682
? R.K.Wilson 2007
33V194M (C to T) in FLT3
CT
CT
cDNA sequence
Tumor genome sequence
34AML Whole Genome Sequencing
- Currently using SXOligoSearchG (Synamatix) to
detect small (1-2 bp) indels. - Evaluating software tools for detection of larger
indels.
35AML Current status
thirsty for knowledge?
? R.K.Wilson 2007
36AML Current status
- Diploid coverage was obtained for 77 of an AML
M1 tumor genome with 22x haploid coverage. - 2.1M sequence variants found (similar to other
whole genomes already finished). - 495,000 novel variants SNPs vs. somatic
mutations - 10x coverage of epidermis (normal) genome just
completed may identify gt90 of variants as rare
SNPs. - Remaining 50,000 variants are being prioritized
by detection in cDNA should be lt1,000 - Very rare somatic mutations in cDNA thusfar (only
2 validated). - No mutator (driver) phenotype is readily
apparent for this AML case passenger mutations
appear to be rare. - We continue to sift through the data
? R.K.Wilson 2007
37Cancer Genomics
- Exon-targeted sequencing (TSP, glioblastoma) is
revealing useful interesting findings
expensive slow! - Next Gen sequencing is here and will have a
substantial near-term impact on the study of
cancer genomes! - Ancillary genome-based technologies (expression
profiling, SNP arrays, cDNA sequencing) are
crucial for understanding the target genome
before considering WGS. - The dream is not hype a comprehensive
understanding of the cancer genome is probable,
and will change the way that you diagnose treat
your patients.
? R.K.Wilson 2007
38Acknowledgments
- WU Genome Sequencing Center
- Elaine Mardis, Li Ding, Dave Dooling, Tracy
Miner, Mike McLellan, Ginger Fewell, Jim Eldred,
Asif Chinwalla, Yumi Kasai, Lucinda Fulton, Vince
Magrini, Matt Hickenbotham, Lisa Cook, Michael
Wendl, Michael Province - WU Siteman Cancer Center
- Tim Ley, Mark Watson, Matt Walter, Rhonda Ries,
Jackie Payton, John DiPersio, Dan Link, Michael
Tomasson, Tim Graubert, Sharon Heath - TSP/TCGA Colleagues
- Baylor HGSC, Broad Institute, many others
- Funding sources
- NHGRI (Wilson), NCI (Ley), Alvin J. Siteman (AML
WGS)
genome.wustl.edu