Title: Aspects of Genetics and Genomics in Cancer Research
1Aspects of Genetics and Genomics in Cancer
Research
- Li Hsu
- Biostatistics and Biomathematics Program
- Fred Hutchinson Cancer Research Center
2Outline
- Cancer facts
- Linkage analysis of family studies
- Genome-wide association studies
3(No Transcript)
4Etiology of Cancer
- The etiology of cancer is multifactorial, with
genetic, environmental, medical, and lifestyle
factors interacting to produce a given
malignancy. - The breakthroughs in high throughput genotyping
technologies have made it possible for
systematically identifying genes that are
responsible for disease occurrence.
5BRCA1 and Breast Cancer
- BRCA1 (breast cancer 1) is a human gene that
belongs to a class of genes known as tumor
suppressors, which maintains genomic integrity to
prevent uncontrolled proliferation. Variations in
the gene have been implicated in a number of
hereditary cancers, namely breast, ovarian and
prostate. The BRCA1 gene is located on the long
(q) arm of chromosome 17 at 38Mb.
6Probability of developing breast cancer by age
(Chen et al. 2009)
carriers
Non-carriers
7Probability of Developing Breast Cancer for BRCA1
carriers
Average Person BRCA1 Carrier
Age 50 2.1(1.7-2.7) 18.8(8.2-2.3)
Age 60 4.1(3.4-5.0) 31.3(14.3-61.2)
Age 70 7.2(6.0-9.0) 45.4(22.7-74.3)
Age 80 10.2(8.4-12.5) 54.9(30.4-81.4)
8 9(No Transcript)
10Linkage Analysis
11- Assume disease gene (D) is rare with full
penetrance
12Linkage Analysis (continued)
- Disease allele (D) originally in chromosome with
allele 3 - How often does D co-segregate with allele 3
(non-recombinant)?
13- Assume disease gene (D) is rare with full
penetrance
14Linkage Analysis (continued)
- Disease allele (D) originally in chromosome with
allele 3 - How often does D co-segregate with allele 3
(non-recombinant)? - 5 meiosises
- How often is D separated from allele 3
(recombinant)?
15- Assume disease gene (D) is rare with full
penetrance
16Linkage Analysis (continued)
- Disease allele (D) originally in chromosome with
allele 3 - How often does D co-segregate with allele 3
(non-recombinant)? - 5 meiosises
- How often is D separated from allele 3
(recombinant)? - 1 meiosis
17Likelihood function
- Set a parameter ? which measures the distance
between allele 3 and D by how frequently they
recombine. - The likelihood function L(?) (1- ?)5 ?
- The maximum likelihood estimate is 1/6
- LOD log10 L(1/6)/L(1/2)
- 0.63
- LOD for 7 families 7x0.63 4.41
18Issues
- Linkage analysis has narrowed down to a region
about 1Mb. However it took another four years
before the BRCA1 gene was mapped. - Reduced penetrance, phenocopy, and genetic
heterogeneity are among the factors that limit
the success of the linkage analysis. - Relevance of the findings to the population at
large.
19Genome-Wide Association Studies(GWAS)
- The Human Genome Project began in 1990 and
completed in 2003.
20Part of sequence from Chromosome 7
- AGACGGAGTTTCACTCTTGTTGCCAACCTGGAGTGCAGTGGCGTGATCTC
AGCTCACTGCACACTCCGCTTTCC/TGG - TTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGACTACAGTC
ACACACCACCACGCCCGGCTAATTTTTG - TATTTTTAGTAGAGTTGGGGTTTCACCATGTTGGCCAGACTGGTCTCGAA
CTCCTGACCTTGTGATCCGCCAGCCTCT - GCCTCCCAAAGAGCTGGGATTACAGGCGTGAGCCACCGCGCTCGGCCCTT
TGCATCAATTTCTACAGCTTGTTTTCTT - TGCCTGGACTTTACAAGTCTTACCTTGTTCTGCCTTCAGATATTTGTGTG
GTCTCATTCTGGTGTGCCAGTAGCTAAAA - ATCCATGATTTGCTCTCATCCCACTCCTGTTGTTCATCTCCTCTTATCTG
GGGTCACA/CTATCTCTTCGTGATTGCATTC - TGATCCCCAGTACTTAGCATGTGCGTAACAACTCTGCCTCTGCTTTCCCA
GGCTGTTGATGGGGTGCTGTTCATGCCT - CAGAAAAATGCATTGTAAGTTAAATTATTAAAGATTTTAAATATAGGAAA
AAAGTAAGCAAACATAAGGAACAAAAAG - GAAAGAACATGTATTCTAATCCATTATTTATTATACAATTAAGAAATTTG
GAAACTTTAGATTACACTGCTTTTAGAGAT - GGAGATGTAGTAAGTCTTTTACTCTTTACAAAATACATGTGTTAGCAATT
TTGGGAAGAATAGTAACTCACCCGAACA - GTGTAATGTGAATATGTCACTTACTAGAGGAAAGAAGGCACTTGAAAAAC
ATCTCTAAACCGTATAAAAACAATTACA - TCATAATGATGAAAACCCAAGGAATTTTTTTAGAAAACATTACCAGGGCT
AATAACAAAGTAGAGCCACATGTCATTT - ATCTTCCCTTTGTGTCTGTGTGAGAATTCTAGAGTTATATTTGTACATAG
CATGGAAAAATGAGAGGCTAGTTTATCAA - CTAGTTCATTTTTAAAAGTCTAACACATCCTAGGTATAGGTGAACTGTCC
TCCTGCCAATGTATTGCACATTTGTGCCC - AGATCCAGCATAGGGTATGTTTGCCATTTACAAACGTTTATGTCTTAAGA
GAGGAAATATGAAGAGCAAAACAGTGCA - TGCTGGAGAGAGAAAGCTGATACAAATATAAATGAAACAATAATTGGAAA
AATTGAGAAACTACTCATTTTCTAAATT - ACTCATGTATTTTCCTAGAATTTAAGTCTTTTAATTTTTGATAAATCCCA
ATGTGAGACAAGATAAGTATTAGTGATGGT - ATGAGTAATTAATATCTGTTATATAATATTCATTTTCATAGTGGAAGAAA
TAAAATAAAGGTTGTGATGATTGTTGATTA
21Genome-Wide Association Study
- 550,000 SNPs on an array
- 2000 diseased individuals (colon cancer cases)
and 2000 normal individuals - Genotype all DNAs for 550,000 SNPs
- That is 2 billion genotyping!
22GWAS on Type 2 Diabetes (Steinthorsdottir et al.,
2007, Nature Genetics)
Cases Controls
AA 809 3049 3858
Aa 509 1917 2426
aa 81 305 385
1398 5271 6669
Cases Controls
AA 751 3107 3858
Aa 539 1887 2426
aa 108 277 385
1398 5271 6669
- Expected count for cases if AA is not associated
with the disease. First, calculate the frequency
of AA genotype in both cases and controls
combined - freq 3858/6669 57.85
- For 1398 cases, we expect to see 139857.85809
individuals having genotype AA.
23GWAS on Type 2 Diabetes
- The chi-square statistic is calculated by finding
the difference between each observed and expected
for each cell, squaring them, dividing each by
the expected, and taking the sum of the results. - (757-809)2/809(3107-3049)2/3049
- Compare the value to a standard chi-square
distribution with degrees of freedom (
rows-1)( col -1) 2. - The p-value for this SNP is 6.772e-5.
24Issues
- Too many SNPs!
- Identifying gene-gene and gene-environmental
interactions are now possible.
25 Germline mutations account for only a small
portion of cancer cases.
http//envirocancer.cornell.edu/FactSheet/General/
fs48.inheritance.cfm
26Summary
- The amount of the data that have been generated
increases exponentially in the last few years. - This creates a great demand on efficient and
valid computational and statistical methods and
tools for picking the needles from a haystack.