Freek T. Bakker - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Freek T. Bakker

Description:

Biosystematics Group, Wageningen UR. The Netherlands. Optimising DNA barcode regions ... 'Using molecular data as species diagnostics isn't new, but global ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 36
Provided by: vsavol
Category:

less

Transcript and Presenter's Notes

Title: Freek T. Bakker


1
Optimising DNA barcode regions
  • Freek T. Bakker
  • Nationaal Herbarium Nederland, Wageningen
    University branch,
  • Biosystematics Group, Wageningen UR
  • The Netherlands

2
Structure of this talk
  1. DNA barcoding, CBOL, GenBank
  2. Non-COI protocol
  3. Models in DN barcode matching

3
DNA barcoding
Using molecular data as species diagnostics
isnt new, but global standardization and scale
of implementation are
4
http//www.barcoding.si.edu/
5
CBOL Structure
Member Organizations
Executive Committee
Secretariat Office
Working Groups
Scientific Advisory Board
6
Uses of DNA Barcodes
  • Establish reference library of barcodes from
    identified voucher specimens
  • If necessary, revise species limits
  • Then
  • Identify unknowns by searching against reference
    sequences
  • Look for matches (mismatches) against library on
    a chip
  • Before long Analyze relative abundance in
    multi-species samples

7
Reference versus micro-Barcodes
  • BARCODE reference records
  • Adhere to data standards
  • Bidirectional reads, 500 bp long
  • Linked to voucher, species name
  • Query barcode records
  • Used in BLAST or other searches
  • Often single pass reads
  • Often very short 100 bp for good IDs
  • Can cost less than 2, take less than 6 hours

8
DNA barcode default
The Consortium for the Barcode of Life (CBOL) has
so far accepted the 648 base-pair Folmer region
of COI (mitochondrial encoded cytochrome oxidase
1) as the default DNA barcode region for
vertebrates and insects and promotes its use in
as many other clades as possible. The
International Nucleotide Sequence Database
Collaboration (INSDC, consisting of GenBank, the
European Molecular Biology Laboratory and the DNA
Data Bank of Japan) has adopted the data
standards proposed by CBOL for BARCODE data
records, and has empowered CBOL to decide which
gene regions can be given BARCODE status.
9
CBOL ?? GenBank
GenBank
New CO1 barcode
Data standards
CBoL
10
How many DNA barcodes do we need, or, whats
ahead?
  • 1.7 x 106 described species
  • 10 barcodes per species
  • 20 x 106 barcodes of 650bp each
  • 10 x 106 more eukaryote species to go
  • 100 x 106 more barcodes of 650bp each
  • In total this would be 65,000,000,000 bp
  • This is twice the total amount of bp currently in
    GenBank!
  • To be completed within the decade
  • (Hajibabaei al., 2005)

11
(No Transcript)
12
(No Transcript)
13
Optimal DNA barcodes
  • Barcoding gap high inter-specific, low
    intra-specific sequence divergence
  • Universal amplification/sequencing with standard
    primers
  • Technically simple to sequence
  • Short enough to sequence in one reaction
  • Easily alignable (few insertions/deletions)
  • Readily recoverable from museum or herbarium
    samples and other degraded samples

14
CO1 divergence in eukaryotes
15
CBOL ?? GenBank
GenBank
Non-CO1 barcode
CBoL
16
non-COI barcode regions
  • COI alone will not do
  • mtDNA evolution too variable across major clades
  • NUMTs
  • Other faults (e.g. heteroplasmy, introgression,
    COI not present e.g. Rubinoff al. 2007)
  • rDNA ITS, D3/D4, cpDNA rpoC1, rpoB, matK
  • Multiple barcodes

17
CBoLs non-CO1 protocol
  • Protocol, to be used as guideline, available now
  • Reject CO1 as suitable region for clade of
    interest
  • Propose alternative region based on required
    evidence as documented
  • Barcode gap?
  • NJ tree
  • Multiple regions?

18
The DNA barcode gap
From Meyer al. PLoS Biology 2004
19
DNA barcode gap
From Van Velzen al. NEV 2007
20
DNA barcode gap
  • Discontinuity minimum inter- and maximum intra
    species divergence
  • However, in paraphyletically clustered
    barcodes intra gt inter divergence!

21
Agave (Agavaceae) rpoB Cowan al.
22
Rejection of CO1
  • Reject CO1 as suitable region for clade of
    interest
  • Propose alternative region based on required
    evidence, i.e.
  • Pattern of intra- and interspecific variation
  • Resolving power
  • Universality
  • Document the number of primer pairs needed to
    succesfully PCR amplify identify species
    throughout the clade of interest

23
Implementation
  • Protocols will be adopted for a period of 6
    months during which CBOL is open to suggestions
    for their improvement from the community.
  • CBOL will normally expect publication of evidence
    for effectiveness of proposed non-COI barcode
    region(s) in a peer-reviewed publication prior to
    submission of a proposal
  • Prior peer review and publication will support
    the proposals claims and will inform the
    community of the proposed barcode region(s)
  • Upon approval by CBOLs Executive Committee,
    INSDC will be informed immediately and BARCODE
    status can be given

24
Challenges
  • Is effectiveness of DNA barcode jeopardized by
    using parameter-poor models?
  • Is NJ too crude to provide correct matches
    between closely related barcodes?
  • How will non-coding DNA sequences perform when
    matching unknowns?
  • Do we need Bayesian matching for critical
    species? (PPs on match, Priors to express
    uncertainty on population parameters)
  • Is matching of multiple barcodes a special case?

25
DNA barcode matching
  • Character-based for closely related barcodes?
  • Phylogenetic clustering
  • Distance-based matching what models?
  • Low divergence ? few parameters (JK, K2P)
  • Codon models?
  • Composite barcodes ? composite models?
  • Non-coding regions length-variation
  • Pragmatism large reference libraries, speed

26
DNA barcode models
  • Simulate DNA barcodes using parameter-rich
    models, derived from insect COI and from cpDNA
    atpB data (GTR, c113)
  • 100 replicates of simulated data sets 60 barcode
    sequences of 654nt
  • Distance models simple ? complex
  • NJ clustering of resulting distances
  • Semistrict consensus of 100 NJ trees

27
DNA barcode models
NJ (poor model)
100 NJ trees
Semistrict consensus
NJatpB r/p
NJCOI r/p
28
DNA barcode models
  • Findmodel (Los Alamos National Lab.)
    best-fitting model for Lepidopteran COI data set
  • MrBayes/Tracer model parameter values
  • Simulation tree angiosperm species-level
    phylogenetic tree topology (not ultrametric)
  • Seq-Gen simulate 100 reps., 654nt60 seqs.
  • PAUP NJ and consensus analysis
  • TreeView tree interpretation

29
cpDNA atpB model
Relative subst. rates
Base composition
30
mtDNA COI model
Relative subst. rates
Base composition
31
atpB vs. COI models
Relative subst. rates
Base composition
atpB
COI
32
atpB
Model tree atpB/GTR
33
COI
Model tree atpB/GTR
34
Over-parametrization?
  • Parameter rich models not efficient in
    reconstructing parameter-rich patterns?
  • Parameter-poor models do better
  • Artefact of pairwise comparison?
  • Various shapes branch lengths
  • Different base-composition across tree
  • Different omega rates across tree
  • Codon models?

35
Conclusions
  • Non-COI barcode regions will be needed and are
    proposed through CBOL protocol
  • CBOL approval ? adoption by INSDC
  • NJ/K2P sufficient for performance testing of
    proposed region
  • Character-based DNA barcode matching needed for
    closely related barcodes
  • Multiple barcodes matched simultaneously
Write a Comment
User Comments (0)
About PowerShow.com