Peter Adams - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Peter Adams

Description:

'The International Human Genome Sequencing Consortium . the successful completion of the Human Genome Project more than two years ahead of schedule. ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 31
Provided by: pa4887
Category:
Tags: adams | genome | human | peter | project

less

Transcript and Presenter's Notes

Title: Peter Adams


1
Bioinformatics in action.Using pure
mathematics, computer science and molecular
biology to sequence problematic regions of genomes
Peter Adams Department of Mathematics The
University of Queensland pa_at_maths.uq.edu.au
2
Marketing versus science.
  • The International Human Genome Sequencing
    Consortium .. today announced the successful
    completion of the Human Genome Project more than
    two years ahead of schedule

.. with the only remaining gaps corresponding to
regions whose sequence cannot be reliably
resolved with current technology.
3
DNA Sequencing
  • Sequence analysis is the process of determining,
    for a given organism, the correct order for the
    four bases C, G, A, T.
  • This is a major exercise
  • The human genome contains 3 billion base pairs
  • Plant genomes can be up to 100 times larger
  • Major focus of international scientific and
    commercial effort
  • Pharmaceutical companies have great interest!
  • Enormous progress has been made, but total
    international sequencing effort is still
    increasing

4
Impediments to current sequencing
  • Most techniques based on Sanger sequencing and
    shotgun sequencing
  • Many patterns of bases create difficulties for
    sequencing
  • For example
  • Direct repeats cause problems with chemical
    processes
  • Inverted repeats cause hairpin and other complex
    structures
  • Repeated motifs create ambiguities in
    reconstruction.
  • Problematic regions left till last because they
    are difficult
  • Alternative labour intensive methods have to be
    used
  • There still many gaps to be closed

5
SBH An alternate technology
  • Sequencing by hybridization was proposed as an
    alternate technology
  • Ideally, SBH involves
  • Finding the SBH spectrum of the target fragment
    (that is, all sub-strings of given length p which
    occur in the target) and
  • Reconstructing the target from its SBH spectrum.
  • SBH probes
  • one probe for each possible sequence of p bases.
  • Probe sites light-up, revealing all subsequences
    in the target
  • p may be 10, 11 or more
  • Enabled via pixels on microarrays called SBH
    chips
  • Need 4p distinct regions on the chip

6
An example with p4.
  • Reveals all subsequences of length 4 in the
    target fragment
  • May (probably wont) reveal repeated
    occurrences
  • May (probably wont) know the total length of
    the target
  • May (probably will) have false positives and
    false negatives
  • Certainly wont reveal the order of the
    subsequences
  • The target fragment is then reconstructed from
    its SBH spectrum, by aligning overlapping
    subsequences in the correct order.

7
T C A C C G T C G C C A C T G T C
C T
T C A C C A C C A C C G
C C G T C G
T C G T C G
T C G C
C G C C
G C C A
C C A C
C A C T
A C T
G
C T G T
T G T C

G T C C
T C C T
Sequence reconstructionfrom an SBH spectrumwith
p4.
Try reconstructing this (without knowing the
order)
8
TCAC
CACC
ACCG
CACC
CCGT
TCAC
CCGT
ACCG
9
Graphical representation of SBH
  • Can represent the SBH spectrum as a combinatorial
    graph
  • Vertices are subsequences of length (p-1) from
    the spectrum
  • Draw a directed edge from vertex u to vertex v
    whenever there is a subsequence in the SBH
    spectrum containing the corresponding vertex
    labels as its prefix and suffix
  • Can represent the SBH reconstruction problem
    graphically
  • Find a path passing through every edge in SBH
    graph exactly once
  • reconstruct the sequence of the target by
    reading vertex labels in turn
  • A well-known problem in pure mathematics
    eulerian trails!

10
Previous example revisited
11
An alternate reconstruction
12
Evaluating the effectiveness of SBH
  • unambiguous reconstruction of the target is not
    always possible
  • two or more distinct DNA fragments can have
    identical subsequence constituents (SBH spectra)
  • repeated subsequences of length (p?1) or more
    may cause ambiguities
  • Important questions are
  • How effective is SBH?
  • What is the likelihood that an unknown target
    fragment of length L will have unambiguous
    reconstruction if probes of length p are used?
  • Can investigate this by simulating SBH
    reconstruction.

13
Simulating SBH reconstruction
  • Obtain a large database of previously sequenced
    genomic DNA
  • For various values of p and L
  • Select a DNA fragment of length L at random from
    database
  • determine all subsequences of length p which are
    present in the selected fragment
  • calculate the number of reconstructions of the
    fragment
  • repeat this process sufficiently many times to
    allow statistical predictions of the proportion
    of fragments of length L that have unique
    reconstruction from their subsequences of length
    p
  • Ideally suited to grid computing!

14
Simulating SBH for various probe lengths
15
Failure of SBH
  • Reconstruction ambiguities are a serious and
    fundamental problem with SBH.
  • Only very short targets have high probability of
    unambiguous reconstruction
  • SBH has never become a competitive sequencing
    technology.

16
Can Bioinformatics help?
Coming from the viewpoint of mathematics and
information technology
  • We observed that the repeat structure of DNA
    causes problems for SBH.
  • Can we reduce the impact of repeats in the
    target fragment, by making the target more
    random?
  • Does this idea make any sense at all?
  • Using techniques from molecular biology,
    mathematics and information technology, the
    answer is yes!

17
Sequence Analysis via Mutagensis (SAM)
  • Deliberately introduce random pointwise
    mutations into some number of copies of a target
    DNA (Biochemistry)
  • Hence (in some copies) disrupt the sequence
    structure which made standard technologies fail
  • Sequence (some) mutants using standard
    technologies (Molecular Biology)
  • Infer the original target from the mutants
    (Mathematics and
    Information Technology)

18
An overview of SAM
19
Key points
  • Mutation
  • Achieved via use of certain mutagenic chemicals
  • Achieve mutation rates of 1 - 30
  • destroys problematic features (for example,
    repeat structures)
  • Information is not lost, but instead is
    distributed across multiple mutated variants
  • Algorithms
  • May require 5-10 (or more) mutant copies
  • reconstruct original sequence, even with high
    mutation rates
  • resolve assembly ambiguities, using similar total
    sequencing coverage to standard methods

20
Example Stem-and-Loop structures
21
Example Improving reads from a sequencer
A Genomic Poly-A sequence ambiguity. A poly-A
region in a human genome clone is described as
having an undefined number of A bases as a result
of poor sequencing reads (Genbank AC006367).
We tried using the latest cycle sequencing
chemistry (first figure). The result was poor
clones contained varying numbers of As and
downstream sequences were unreliable. The
experiment was repeated using SAM. The new
traces (second and third figures) clearly
demonstrate that the mutated fragments are more
readily sequenced.
22
(No Transcript)
23
Inferring the target
  • Three approaches to inferring target
  • Alignment approach
  • Minimum distance approach
  • Bayesian approach

24
Previous example revisited
A section of the multiple alignment of six
mutant copies (M1 to M6) is shown. Inferred
sequence is shown as Inf. Published sequence is
shown as Pub. Output from a standard sequencing
experiment is shown as Seq.
M1 ACTCTGTCTCAAAAACAAAAAAAAAAAAA-----------------
GTGGACTTGGATGG M2 ACTCTGTCTCAAAAAAAAAAAAAACAAAAA-
---------------GTGGACTTGGATTG M3
ACGCTGTCTCAAAAACAAAAAAACAAAAAA----------------GTGG
ACGTGGATTG M4 ACTCTGTCTCAAAAAAAAAAAAACAAAAAA-----
-----------GTGGACTTGGATTG M5 ACTCTGTCTCAACAAAAAAA
ACAAAACAAAA---------------GTGGACTTGGATTG M6
ACTCTGTCTCAAAAAACAAAAAACACAAAC----------------GTGG
CCTTGGATTG Inf ACTCTGTCTCAAAAAAAAAAAAAAAAAAAA----
------------GTGGACTTGGATTG Pub
ACTCTGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTGG
ACTTGGATTG Seq ACTCTGTCTCAAAAAAAAAAAAAAAAAAANNANA-
-----------GTGGACTTGGATTG
25
SAM and SBH.
Primary benefits of applying SAM with SBH
  • allows information from multiple variants to be
    combined in order to infer target
  • If an ambiguity is resolved in any mutant
    variant, this can be used to resolve that
    ambiguity in other variants and the target.

26
An example.
Original fragment
Repeated subsequences
Mutant
Repeated subsequences
27
(No Transcript)
28
Performance
  • Simulations of SAM with SBH and 10 mutant
    variants, with reasonable mutation intensities.
  • Compare with standard SBH (results in parentheses)

29
Thank you!
30
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com