Personalized Structural Variation of Human Genomes - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Personalized Structural Variation of Human Genomes

Description:

Sequence inversions, insertions and deletions at the ... Bert de Vries. Joris Veltman. Epicure Consortium. Thomas Sander. Ingo Helbig. NIH. Andy Singleton ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 24
Provided by: Eva998
Category:

less

Transcript and Presenter's Notes

Title: Personalized Structural Variation of Human Genomes


1
Personalized Structural Variation of Human
Genomes
  • Evan Eichler
  • University of Washington

Human Variome Project Meeting, Sept 27th, 2008
2
Goals of Human Genome Structural Variation
Sequencing Project
  • Sequence inversions, insertions and deletions at
    the single basepair level (gt5 kbp) in order to
    develop genotype assays to assess phenotypic
    consequence
  • Copy-number status
  • Sequence content
  • Sequence organization (i.e. proximity with
    respect to functional promoter, alleles vs
    paralogues)

Color-Blindness in Humans The Opsin Loci
Adapted from Deeb, SS (2005) Clin. Genet.
67369-377
3
Sequence-Based Resolution of Structural Variation
Human Genomic DNA
Genomic Library (1 million clones)
Sequence ends of genomic inserts Map to human
genome
Dataset 1,122,408 fosmid pairs preprocessed
(15.5X genome coverage) 639,204 fosmid
pairs BEST pairs (8.8 X genome coverage)
4
Structural Variation Sequencing Project
  • 8 HapMap Genomes sequenced to 0.3 X Sanger
    sequence (10 X physical fosmid clone coverage)
  • Identifies 1700 sites of structural variation
    and 525 novel insertions
  • Identifies 4 million SNPs and 795,000 indels
    (3.5 vs. 10 FP)
  • Additional 19 genomes underway (WashU) (6 CEU 6
    ASN 7 YRI)
  • 40 map to duplications 20 are complex
    structures

Kidd et al., Nature 2008
5
Structural Variation of the GSTM1 Locus
Japanese Sample
Yoruban Sample
  • 91 of human genome basepairs are covered by 4
    or more clones

browser view (http//hgsv.washington.edu )
6
Sequenced Structural Variation of APOBEC3B
  • 24.5 kb deletion eliminates most of APOBEC3B but
    creates fusion gene
  • Complete sequence facilitates rapid genotyping.

7
World-Wide Distribution of APOBEC3B Deletion
  • Fusion APOBEC3A/3B lt1 frequency Africans, 88
    Papua New Guineans
  • Analysis of 1269 Human DNA samples.Fst places in
    top 0.169

Kidd et al., Hum. Mol. Genet., 2007
8
Structural Variation Map of the Human Genome
Kidd et al., Nature, 2008.
9
Genotyping
Probe coverage for sequenced deletions on
commercial SNP platforms
Cooper et al. (2008) Nature Genet. Sept 7 Epub.
10
Next-Generation ESP Technology A more
Comprehensive Catalogue of Structural Variation?
  • Tuzun et al., 2005 (ESP Sanger 40 kbp fosmids)
  • vs. Korbel et al., 2007 (454 3 kbp plasmids)
    same genome
  • 297 sites (275 completely sequenced)
  • 102 Deletions 139 Insertions, 56 inversions
  • 181 sites not detected by Korbel
  • 117/181 (64.6) carried duplicated sequences
  • Of these, 42 deletions 107 insertions 32
    inversions.
  • 116 intersected sites
  • 53/116 (45.6) carried duplicated sequences
  • 60 deletions 32 insertions 24 inversions.
  • 75 of Insertions are missed and bias against SD
    events.
  • 800 additional sites found (complementary
    approaches)

11
Depth-of-Coverage
  • Whole genome shotgun sequence detection of
    duplicated sequences (Bailey et al., 2002)
  • Establish benchmarks for depth of coverage based
    on X, autosome and duplications of known copy
    number (33 BACs) compute depth of coverage in 5
    kb windows call regions where 6/7 windows exceed
    3 s.d. of depth of coverage
  • Map reads using mrFAST algorithm to non
    Repeatmasked regions of the genome
  • 75 million 454 WGS JDW and
  • 200-400 million Solexa WGS per individual (CEPH
    Trio)

Aksay, G and Alkan, C
12
454 WGS (JDW) Correlation with Copy-Number
R20.94
R20.96
Solexa WGS (NA12878) Correlation with Copy-Number
R20.92
R20.93
13
Personalized Duplication or Copy-Number Variation
Maps
Venter (Sanger)?
CNP1
Watson (454)?
NA12878 (Solexa)?
CNP2
NA12891 (Solexa)?
NA12892 (Solexa)?
  • Two known 70 kbp CNPs, CNP1 duplication absent
    in Venter but predicted
  • in Watson and NA12878, CNP2 present mother but
    neither father or child

14
Homozygous Deletion
Watson (454)
Venter (Sanger)
NA12878 (Solexa)
NA12878 NIM validated
NA12892 Agilent validated
15
Summary
  • Sequencing1700 sites of common structural
    variation discovered and being sequenced 500
    structural variants per individual gt5kbp human
    genome incomplete (15.6 minor allele and 26.3
    sequence that is CNV is not in reference genome)
  • clone resource provides means to sequence regions
    any complex region of interest
  • Genotyping Current commercial platforms can not
    adequately directly detect gt50 of common
    structural variants
  • Next-generation sequencing increase the yield to
    several thousand sites per individual but will be
    biased to unique regions of the genome.
  • ve Copy-number of duplications may be estimated
    by depth of coverage approach
  • -ve ESP bias against insertions and events
    mapping within duplicated regions require longer
    reads or clone reagents

16
Acknowledgements
UWGSC Maynard Olson Rajinder Kaul
Eichler Lab Jeff Kidd Greg Cooper Andy
Sharp Heather Mefford Andy Itsara Can Alkan Gozde
Aksay Fereydoun Homozdiari Carl Baker Eray
Tuzun Priscillia Siswara FrancescaAntonacci Ze
Cheng Matthew Johnson Zhaoshi Jiang Xinwei
She Neil Shaffer Maika Malig
UCSF Dan Pinkel Donna Albertson
WashU Rick Wilson Tina Graves
Oxford Jonathan Flint Samantha Knight
Agencourt Doug Smith
U. of Pavia Orsetta Zuffardi Stefania Gimelli
UW Joshua Smith Debbie Nickerson Troy Zerr
U. Nijmegen Bert de Vries Joris Veltman
Stanford Rick Myers Devin Absher Jun Li
Epicure Consortium Thomas Sander Ingo Helbig
1000 Genomes Consortium
NIH Andy Singleton
17
Properties of Normal Structural Variation
  • Common 50.3 (866/1720) events seen in 2 or more
    individuals (n9 individuals total)
  • Small Median 7. 8 kbp and average is 13. 1 kbp
    with an average of 500-600 events per individual
    (gt 5 kbp)
  • Gene family bias 107 sequenced events directly
    affect gene structure 87 of these belong to
    gene families
  • Recurrence estimate that 18 of the same events
    occur on different SNP haplotypes
  • Human Genome Reference Incomplete
  • 15.6 of sites, Reference genome is minor allele
  • 26.3 of sites of structural variation correspond
    to sequence that is not represented once within
    the human genome.

18
HERC2 Duplication
Watson (454)
Venter (Sanger)
NA12878 (Solexa)
NA12891 (Solexa)
NA12892 (Solexa)
No large differences!
Alkan, C.
19
Deletion Detection
  • For unique regions (no CNV detected)
  • avg 1672.62 reads/5 kbp
  • median 1640
  • stdev 423.17
  • For fosmid ESP deletion regions (validated by one
    orthogonal method)
  • avg 1273.96 reads/5 kbp
  • median 1143
  • stdev 663.96
  • Of the 164 deletions NA12878, 73 or 44.5 show no
    evidence of depth-of-coverage depression.

20
Hemizygous Deletion
21
ESP Analysis NA12878 ESP Placement Stats
Max Span 1 million bp
Library 1 71,848,232 pairs mapped Expected
insert size 100bp (5X unmasked physical
coverage)?
Library 2 28,739,625 pairs mapped Expected
insert size 150bp (3X unmasked physical
coverage)?
22
ESP Analysis NA12878 ESP Placement Stats
  • Map ESP against repeatmasked hg18 reference
    genome using mrFAST (Tuzun et al, 2005, Kidd et
    al., 2008)
  • Sites supported by gt2 independent clones are
    considered, any clones can have multiple
    discordant mappings in the first pass,
  • Algorithm based on Set-Cover is implemented to
    find a subset of repetitive (and unique) mappings
    where the total number of sites are minimized (at
    the end, each clone is assigned to a single
    location)?

23
Comparison with Kidd 2008
  • Library 1 only,
  • Insert size (100bp) is too small to compare
    against longer insertions in Kidd structural
    variation set,
  • Smaller insertions and deletions may intersect
    with 1-100bp indel set.
Write a Comment
User Comments (0)
About PowerShow.com