Title: Perlegen At-A-Glance
1Perlegen At-A-Glance
- San Francisco Bay Area
- Spun-off from Affymetrix, Inc. in March
2001 - 95 employees
- Approximately half in genetics/biology and half
in bioinformatics - Privately-held
2Using the Human Genome
395 of one human genome is now publicly available
One copy of the human genome consists of 3
billion bases AGTCCTAGCCTGTGATATAGGGCCCTAGATCA.
4One copy of the human genome cost 100 million to
obtain
Why were we willing to spend so much money?
5Variations in DNA sequence affect many aspects of
our lives
Inherited traits or phenotypes
6Any two humans share 99.9 the same DNA sequence
7Traits are influenced to different degrees by
genetics and environment
Environmental contribution
Genetic contribution
8Most common traits are believed to be about 50-50
Diabetes
Skin color
Rheumatoid arthritis
Obesity
Genetics
Environment
Height
Schizophrenia
Osteoporosis
Heart failure
9With knowledge of the genetic component of a
trait
- Diagnostic
- Determine how a patient will respond to a
particular drug treatment - Targets for drug development
- More effective consumer products
- Evaluate the role of lifestyle and enviroment on
the trait
10DNA is double-stranded and connected by very
specific pairing of the four bases A, G, T, C
DNA can be unwound to single-stranded form and
then can be wound again to double-stranded form
based on the specificity of base-pairing called
hybridization
11Human DNA variation results from errors in DNA
replication
12DNA variations come in different forms
- Single nucleotide polymorphism (SNP)
- AGCCTGTCACT AGCCTATCACT
Deletion AGCCTGTCACT AGCCTTCACT
Insertion AGCCTGTCACT AGCCTGGTCACT
Variable number tandem repeat (VNTR) CAGCAGCAG
CAGCAGCAGCAGCAG
13The genetic contribution to a trait may be due to
variation in one genea Mendelian trait
14Before the Human Genome Project, genes
responsible for Mendelian traits were the only
genes we could find
but it still took a decade or more to find one
of these genes
15Once the results of the Human Genome Project
began to emerge, the number of Mendelian trait
genes discovered increased exponentially and the
time to discovery decreased
16There are currently 8309 genes whose variants are
associated with a disorder in OMIM
- Cystic Fibrosis
- Huntingtons Disease
- Familial Breast Cancer
- Severe Combined Immunodeficiency Disorder
knowledge which is used for diagnostics and
preventative therapies, drug development, and
gene therapy
17But Mendelian traits are the minority and the
genetic variants responsible for Mendelian
disorders are rare in the general population
How many genetic variants have we found that are
resonsible for traits and disorders that affect
millions of people?
Very few
18The vast majority of traits are not caused by
variation in a single gene and are called complex
traits
- Probably the result of 10-30 genetic changes
spread across the genome - Any single genetic variant may be responsible for
only a small contribution to the trait
19You may not need to have all of the possible
genetic changes to get the disease
An example where 10 genes are involved in a
disease
Variants (green) in any 4 of the 10 genes causes
disease.
20What does all that mean?
- The genetic variants responsible for common
disease are themselves common in the general
population - These genetic variants are found in both sick and
healthy people
This makes associating these variants with the
disease extremely difficult and expensive
21Genetic Association Study
If a DNA variant is associated with a trait of
interest, affecteds will have a different
frequency of that variant than unaffecteds
22In order to know, with statistical certainty,
that a genetic variant with a small effect is
associated with a trait requires looking at the
DNA of large numbers of people
1,000 people
23At 100 million per genome, we certainly cannot
sequence the genomes for a thousand people for
each trait we are interested in finding the genes
for
we just need to look at the variants in the
genomes of the 1000 people
24Single Nucleotide Polymorphisms (SNPs)
- SNPs are a frequent form of DNA variation and are
scattered randomly across the genome - Each SNP is characterized by only two bases
25Genotyping calls the two variants that each
person carries at one base position in the genome
But to genotype, you need to know the two base
variants and genome position of the SNPs
26 How many SNPs do we need to find across the
genome and genotype in the 1000 people to find
the genes involved in complex traits?
The average cost of a single SNP genotype for one
person is .50 or 500 for 1000 people
27There are 3 million SNPs between two people
1.5 billion!
28Look only at SNPs in the known functional
sequences of the human genome because only
functional regions are likely to be associated
with a trait
Minimize the cost of finding the SNPs and
genotyping the SNPs in a Genetic Association Study
29Look at dense set of common SNPs across the whole
genome
- Not all functional sequences have been
discovered
- The important changes in DNA may not lie in
known functional sequences (which comprise less
than 3 of the genome)
- Even if all important changes are in known
functional sequences, which do you select for
research? (You need to have the correct
hypothesis up-front)
30Discover all the common SNPs by looking at the
sequence of 25 copies of the genome from around
the world
but it takes 1 year to sequence one mammalian
genome, so that would take 25 years, not to
mention the cost!
31Perlegen came up with a faster and cheaper way to
find the common SNPs compared to sequencing,
possible only because we had one copy of the
human genome already known and technology
improvements
32Reading Human Genomic Sequence By Using
Affymetrix DNA Chips
33Take another copy of the human genome, label it
with a fluorophore, and hybridize it to the chip
34Detection of DNA Variation By Using DNA Chips
35How many chips do we have to process to discover
SNPs from the 25 genomes?
600,000 chips. At 200 chips processed per day, it
would take 8.4 years!
36What Perlegen was able to do successfully, that
had never been done before
Cover 15 million bases of genomic DNA on one
wafer!
5000 wafers to find the SNPs in 25 genomes
37Perlegens Technological Advantage
38Human Whole-Genome High-Density Oligonucleotide
Arrays
39Perlegen finished SNP discovery across the entire
human genome for 25 copies of the genome in under
2 years in August 2002
- 1,717,015 common SNPs discovered and confirmed
- Had all the assays developed and working to
genotype all the SNPs
Still, that would require 850 million for
genotyping 1000 people.
But we discovered something else
40SNPs
ATTGCAATCCGTGG...ATCGAGCCATACGATTGCACGCCG AT
TGCAAGCCGTGG...ATCTAGCCATACGATTGCAAGCCG ATTG
CAAGCCGTGG...ATCTAGCCATACGATTGCAAGCCG ATTGCA
ATCCGTGG...ATCGAGCCATACGATTGCACGCCG ATTGCAAG
CCGTGG...ATCTAGCCATACGATTGCAAGCCG
41SNP Space
ATTGCAATCCGTGG...ATCGAGCCATACGATTGCACGCCG AT
TGCAAGCCGTGG...ATCTAGCCATACGATTGCAAGCCG ATTG
CAAGCCGTGG...ATCTAGCCATACGATTGCAAGCCG ATTGCA
ATCCGTGG...ATCGAGCCATACGATTGCACGCCG ATTGCAAG
CCGTGG...ATCTAGCCATACGATTGCAAGCCG
42Theres something amazing about SNPs...
SNPs occur in blocks !
43Haplotype Pattern
ATTGCAATCCGTGG...ATCGAGCCATACGATTGCACGCCG AT
TGCAAGCCGTGG...ATCTAGCCATACGATTGCAAGCCG ATTG
CAAGCCGTGG...ATCTAGCCATACGATTGCAAGCCG ATTGCA
ATCCGTGG...ATCGAGCCATACGATTGCACGCCG ATTGCAAG
CCGTGG...ATCTAGCCATACGATTGCAAGCCG
44The number of haplotype patterns is limited
Possible patterns 26 SNPs X 2 bases 226
Observed patterns 7
1 2 3 4
The majority of the patterns fall into only 4
classes, which can be distinguished from each
other by only 2 SNPS
45 A SNP-Haplotype Map of the Human Genome
2.3 billion bases of genomic DNA sequence
is covered in 175,309 haplotype blocks
13,000 bases is the average haplotype block size
6.5 SNPs is the average number of SNPs per
haplotype block
210,937 SNPs uniquely define haplotypes representi
ng the pattern of DNA variation spanning the
human genome
46The haplotype structure of Chr.21 is available to
the public
http//genome-hg8.cse.ucsc.edu/cgi-bin/hgGateway?d
bhg8
47Genotyping only haplotype-defining SNPs reduces
the number of bases to be looked at in each
individual
1.7 million genotypes/individual
210,000 genotypes/individual
48Whole Genome Scanning Approach
Looking across the entire genome in hundreds of
people
- Does not require a hypothesis up front
- Does not require placing bets on a few locations
- Will reveal many places in the genome that play a
role in the disease or trait
49Whole Genome Association Methodology
105 million
50Genetic Association Study
If a DNA variant is associated with a trait of
interest, affecteds will have a different
frequency of that variant than unaffecteds
51Genetic Association Analysis Using Pooled DNA
Samples
SNP 1
52Whole Genome Association Methodology
All SNP assays per association study using one
DNA pool of affecteds and one DNA pool of
unaffecteds 210,000 SNP assays per
sample 420,000 SNP assays per association study
210,000
53Association Studies currently underway at Perlegen
- Genetics of drug response to a highly effective
drug with GlaxoSmithCline - Small percent of patients have adverse reaction
- Genetics of Diabetes Type 2 with a large
international consortium of researchers - Affects 15 million in the U.S. alone
- Genetics of common traits with Unilever
- Improve effectiveness of beauty products