BS961 - PowerPoint PPT Presentation

1 / 100
About This Presentation
Title:

BS961

Description:

BS961 – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 101
Provided by: stan150
Category:
Tags: aacn | bs961

less

Transcript and Presenter's Notes

Title: BS961


1
BS961
  • SESSION 2

2
Objectives
  • Describe how the genome sequence of specific
    microorganisms can be exploited in clinical
    practice.
  • Describe basic principles behind manual and
    automated DNA sequencing, and pyrosequencing.
  • Explain how nucleotide sequence databases are
    accessed and describe the different types of
    databases.
  • Discuss how genes are identified in nucleotide
    sequences.
  •  Reading Brown. Genomes 3,Chapter 4.

3
DNA-based methods for virus pathogens
  • Microarrays
  • PCR

4
Microarrays
  • Miller and Tang (2009) Clinical Microbiology
    Reviews 22, 611-633.

5
Microarrays
6
Microarrays
7
Microarrays- examples
  • Respiratory pathogens
  • Several systems available
  • e.g. ResPlex II assay Qiagen Flu-A, Flu-B,
    PIV-1,PIV-2, PIV-3,PIV-4, RSV-A, RSV-B, hMPV,
    RhV, EnV, and severe acute respiratory CoV
  • Multiplex RT-PCR

8
Microarrays- examples
  • For each pathogen, target-specific capture probes
    are covalently linked to a specific set of
    color-coded beads.
  • Labeled PCR products are captured by the
    bead-bound capture probes in a hybridization
    suspension.

9
Microarrays- examples
  • A microfluidics system delivers the suspension
    hybridization reaction mixture to a dual laser
    detection device.
  • A red laser identifies each bead (or pathogen) by
    its color-coding
  • A green laser detects the hybridization signal
    associated with each bead (indicating the
    presence or absence of a particular pathogen).

10
Microarrays examples
11
Real time PCR
12
Real time PCR example
  • Nix et al (2008) Journal of Clinical
    Microbiology, 46, 2519-2524.
  • Parechoviruses
  • Uses primers to regions present in all
    parechoviruses

13
Parechovirus
14
Multiplex PCR
  • Can multiplex using probes of different colour

15
Sequencing strategies
  • Sequencing usually achieved by
    dideoxynucleotide method
  • This requires
  • Template DNA to be sequenced, together with a
    primer and DNA polymerase.
  • Modified nucleotides, lacking 3OH needed for
    chain extension in DNA synthesis-
    dideoxynucleotides. Mixed with ordinary
    nucleotides, so at each position some chains are
    terminated and some are not, so a range of
    fragments is generated, each ending with the
    specific dideoxynucleotide.
  • A gel system capable of separating DNA on the
    basis of size with a resolution of one
    nucleotide.
  • A detection method- usually dye-labelled
    dideoxynucleotides (each of AGCT labelled with a
    dye of different colour) detectable by laser.

16
Dideoxynucleotide sequencing
  • AAGCTAGCTGGCAAATGGCGTCTCAC
  • TTCGATCGgt primer
  • TTCGATCGA
  • TTCGATCGAC
  • TTCGATCGACC

17
Detection of bands
18
Output
19
Sequence assembly
  • In all sequencing projects the amount of sequence
    which can be obtained from one reaction is much
    less than that needed for the completion of the
    project- some kind of assembly of contiguous
    sequences (contigs) from several overlapping
    sequences is needed.

20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
Strategies for small genomes
  • For small genomes, e.g. bacteria, an almost
    completely shot-gun approach is often the most
    efficient, with completion of gaps by more
    directed methods.
  • e.g. Haemophilus influenzae

24
Sequencing of Haemophilus influenzae
25
Assembling contigs
Sequence 1
GATTCGTAGGCTTTAAGCTTCCGTCGACGCTGCGTAGC
26
Assembling contigs
Sequence 1 Enter into database
GATTCGTAGGCTTTAAGCTTCCGTCGACGCTGCGTAGC
27
Assembling contigs
Sequence 1 Enter into database Sequence 2
28
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database
29
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1?
30
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1?
CGATCGTGCCCCGTACTGACTGCATGCTGACACAGTC
GATTCGTAGGCTTTAAGCTTCCGTCGACGCTGCGTA
31
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1?
CGATCGTGCCCCGTACTGACTGCATGCTGACACA
GTC GATTCGTAGGCTTTAAGCTTCCGTCG
ACGCTGCGTA
32
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1?

CGATCGTGCCCCGTACTGACTGCATG
GATTCGTAGGCTTTAAGCTTCCGTCGACGCTGCGTA
33
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1? No.
X
34
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1? No. Enter into database
35
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1? No. Enter into database Sequence 3
36
Assembling contigs
Sequence 1 Enter into database Sequence 2 Does
it overlap with sequence 1? No. Enter into
database Sequence 3 Compare with database
37
Assembling contigs
Sequence 1 Enter into database Sequence 2 Does
it overlap with sequence 1? No. Enter into
database Sequence 3 Compare with database Does it
overlap with Sequence 1 or Sequence 2
38
Assembling contigs
Sequence 1 Enter into database Sequence 2 Does
it overlap with sequence 1? No. Enter into
database Sequence 3 Compare with database Does it
overlap with Sequence 1 or Sequence 2 No.
39
Assembling contigs
Sequence 1 Enter into database Sequence 2 Does
it overlap with sequence 1? No. Enter into
database Sequence 3 Compare with database Does it
overlap with Sequence 1 or Sequence 2 No. Enter
into database
40
Assembling contigs
Sequence 4
41
Assembling contigs
Sequence 4 Compare with database
42
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3?
43
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No.
44
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into database
45
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5
46
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database
47
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2, 3 or 4.
48
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2, 3 or 4. YES.
49
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2 or 3. YES. Overlaps
sequence 2.
50
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2 or 3. YES. Overlaps
sequence 2. Make contig and enter into
database
51
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2 or 3. YES. Overlaps
sequence 2. Make contig and enter into
database
2 ACCGTCGCCCTGCCCGTAGCTG
52
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2 or 3. YES. Overlaps
sequence 2. Make contig and enter into
database
2 ACCGTCGCCCTGCCCGTAGCTG 5
CCCGTAGCTGCCATTTTCGA
53
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2 or 3. YES. Overlaps
sequence 2. Make contig and enter into
database
2 ACCGTCGCCCTGCCCGTAGCTG 5
CCCGTAGCTGCCATTTTCGA CONTIG ACCGTCGCCCTGCCCGTAGCTG
CCATTTTCGA
54
Joining contigs
2 Large contigs Sequence overlapping both is
found Contigs joined
55
Filling gaps
  • As the sequence accumulates, there are
    diminishing returns. New sequences become rarer
    and some areas sequenced many times.
  • So there are gaps which need to be filled.

56
Gaps
  • Sequence gaps where by random chance no sequence
    has been obtained
  • Physical gaps where the region has not been
    cloned at all, so no sequence can be obtained.

57
(No Transcript)
58
Success of this approach
  • Very many microbial genomes have been sequenced
    in this way

59
Sequencing large genomes
  • Argued that a clone-contig approach is best,
    particularly because organisms with larger
    genomes often contain a lot of repetitive
    sequences and it difficult to join these
    correctly if only short sequences are analysed.

60
Clone contig approach
  • Relies on cloning large fragments of DNA- e.g.
    300kb. These are then mapped onto the chromosome
    using physical maps.

61
Human genome
  • The draft human genome was published by two
    groups at the same time in 2001
  • The International Human Genome Consortium, a
    group of scientists funded by non-profit making
    bodies. Used the clone-contig procedure (Nature
    409, 860-921).
  • A private company, Celera. Used the shotgun
    approach, which was much faster, but did use
    scaffolding data already put into the public
    domain by the first group (Science 291,
    1304-1349).

62
Pyrosequencing
  • Recently a different method, pyrosequencing, more
    suited to ultra high throughput has been
    developed
  • http//www.pyrosequencing.com/DynPage.aspx?id7454

63
(No Transcript)
64
Step 1
  • The reaction contains a primer, template and DNA
    polymerase, but also a number of other
    components- ATP sulfurylase, luciferase and
    apyrase, and the substrates, adenosine 5
    phosphosulfate (APS) and luciferin.

65
Step 2
  • The first of four dNTPs is added to the reaction.
  • DNA polymerase catalyzes the incorporation of the
    deoxynucleotide triphosphate into the DNA strand,
    if it is complementary to the base in the
    template strand.
  • Each incorporation event is accompanied by
    release of pyrophosphate (PPi) in a quantity
    equimolar to the amount of incorporated
    nucleotide.

66
Step 3
  • ATP sulfurylase quantitatively converts PPi to
    ATP in the presence of adenosine 5
    phosphosulfate.
  • This ATP drives the luciferase-mediated
    conversion of luciferin to oxyluciferin that
    generates visible light in amounts that are
    proportional to the amount of ATP.
  • The light produced in the luciferase-catalyzed
    reaction is detected by a charge coupled device
    (CCD) camera and seen as a peak in a program.
  • Each light signal is proportional to the number
    of nucleotides incorporated- this gives a
    different sort of output from dideoxynecleotide
    sequencing.

67
(No Transcript)
68
Step 4
  • Apyrase, a nucleotide degrading enzyme,
    continuously degrades unincorporated dNTPs and
    excess ATP.
  • When degradation is complete, another dNTP is
    added.

69
(No Transcript)
70
Step 5
  • Addition of dNTPs is performed one at a time. As
    the process continues, the complementary DNA
    strand is built up and the nucleotide sequence is
    determined from the signal peak in the pyrogram.

71
(No Transcript)
72
  • The method can be automated considerably. Random
    shearing of genomic DNA, PCR amplification and
    complex sample handling methods mean that around
    400,000 fragments can be sequenced at the same
    time- each sequence being 200-300 nucleotides.
  • These can be automatically assembled into
    contigs, the only problems being repetitive
    sequences due to the small size of the sequences
    generated. This is called 454 sequencing.

73
Pathogen detection
  • e.g. Briese et al (2009) PLOS Pathogens 5,
    e1000455
  • Lujo virus- Arenaviridae.
  • Case of haemmorrhagic disease
  • RT-PCR- random amplification, ligation of
    specific linkers, 454 sequencing

74
Pathogen detection
  • Worked with 3 libraries from different tissue
  • 87,500-106,500 reads from each
  • Found 7 sequence fragments matching with
    areanvirus
  • Completed gaps using conventional PCR

75
(No Transcript)
76
(No Transcript)
77
(No Transcript)
78
New respiratory viruses
79
Sequence databases
  • There are a number of different databases of
    different types.
  • Nucleotide GenBank and EMBL are the main ones
    for well characterised sequences htgs contains
    unfinished High Throughput Genomic Sequences
    (i.e. from genome projects) until they have been
    characterised further.

80
More databases
  • Protein PIR and swissprot are the main ones.
  • Global nr (non-redundant). This is a compilation
    of several databases.
  • ESTs dbest

81
ESTs (expressed sequence tags)
  • Short sequences obtained from total mRNA isolated
    from a tissue.
  • Derived by random cDNA cloning and sequencing
    without further purification.
  • Useful to show which genes expressed in a tissue-
    as only these represented in the RNA.
  • A collection of ESTs from different tissue gives
    an idea of the total number of genes in an
    organism.

82
Accessing databases
  • http//www.ncbi.nlm.nih.gov/
  • Simple search terms

83
(No Transcript)
84
Identifying genes in nucleotide sequences
  • http//www.ncbi.nlm.nih.gov/books/bv.fcgi?indexed
    googleridsef.section.168
  • Initially the sequence generated from a genome
    project is largely featureless and needs to be
    interpreted, the most important things to find
    being the locations of the genes.

85
1 tttgaaaggg gtctcctaga gagcttggcc
gtcgggcctt acaccccgac ttgctgagtt 61
tctctaggag agtccctttc ccagccagag gtggctggtc
aaacaatacc aaacgtaact 121 aaacatctaa
gataacatag ccctatgcct ggtctccacc agttgaaggc
atcttgcaat 181 aaaatgggtg gattaagacg
cttaaagcat ggagtcaatt atcttttcta actagtgatc
241 ttcactgggt ggcagatggc gtgccataac tctattagtg
ggataccacg ctcgtggatc 301 ttatgcccac
acagccatcc tctagtaagt ttgcaaggtg tctgatgagg
cgtgggaact 361 tattggaaat aattacttgc
tgcgaagcat cctactgcca gcggatcaac acctggtaac
421 aggtgcccct ggggccaaaa gccacggttt aacagaccct
ttaggattgg ttaaaacctg 481 agtaattatg
gaagatactt agtacctacc aacttggtaa cagtgcaaac
actagttgta 541 aggcccacga aggatgccca
gaaggtaccc gcaggtaaca agagacactg tggatctgat
601 ctggggccac ctacctctat cctggtgagg tggttaaaaa
acgtctagtg ggccaaaccc 661 aggggggatc
cctggtttcc ttattttagt gtaaatgtca ttatggagac
aatcaagagc 721 attgcagata tggcgaccgg
tgtaactaaa accattgatg ccacaatcaa ttctgttaat
781 gagatcatca ctaacacaga taatgcttca ggtggagata
tattgactaa agttgctgat 841 gatgcttcaa
atattttagg gcccaactgt tatgcgacaa catctgagcc
agaaaacaag 901 gatgtggtgc aagcaaccac
cactgtgaac accactaatc tgacacagca cccatcagca
961 ccaacgttac catttacacc agacttttcg aatgttgaca
cgtttcattc aatggcttat 1021 gatactacaa
ctggtagtaa gaaccctaat aagttagtta ggttaacgac
acatgcttgg 1081 gctagtaccc tacagagggg
tcatcagatt gatcatgtta atctaccagt tgacttctgg
1141 gatgaacaga ggaaaccagc ttatggccat gctaaatatt
ttgcagctgt tcggtgtgga
86
1201 tttcattttc aagtacaggt caatgtgaat cagggaactg
ctgggagtgc tttggtagtg 1261 tatgaaccaa
agccagtagt tgattatgat aaggatttgg aatttggagc
atttaccaat 1321 ttaccacatg tgttaatgaa
cttggccgag actacccagg ccgacttatg tatcccctat
1381 gttgcagata caaactatgt gaagactgat tcatctgact
tagggcaatt gaaagtttat 1441 gtgtggactc
cccttagcat tccatcaggc tcatctaacc aagtggacgt
gactatattg 1501 ggtagcttat tacaattgga
tttccaaaac ccaagggtgt atgggcaaaa tgttgacatt
1561 tacgatacag caccctctaa accaattcca ttgaggaaga
ctaaatattt gactatgagc 1621 acaaaataca
aatggacaag aaataaagta gacatagctg aaggtccagg
ttcaatgaac 1681 atggcaaatg tacttagtac
gacagcagca caatcagtag cattggttgg ggagagggct
1741 ttttatgatc ccaggactgc tggtagcaaa tctagatttg
atgacttagt aaaaatctca 1801 cagttgtttt
cagttatggc agattccacc actccatctg ccaatcatgg
aatagaccaa 1861 aagggttatt tcaaatggtc
tgccaattct gatccacagg caatagtgca tagaaactta
1921 gttcatttaa atctatttcc aaatttgaag gtctttgaaa
acagttattc atacttcaga 1981 ggttctctta
taatcaggtt aagtgtttat gctagtacat tcaacagagg
ccgtttgaat 2041 gggttctttc caaattccag
tacagatgaa acttctgaaa ttgataatgc catctacacc
2101 atatgtgata ttggatctga caatagtttt gagattacta
tcccttattc attttccact 2161 tggatgagga
agacacatgg taaacctatt ggcctattcc agattgaagt
cctaaatagg 2221 ttaacataca attactccag
tccaaatgag gtatactgca tagtgcaagg taaaatggga
2281 caagacgcca aatttttctg ccccactggg tctttagtaa
ctttccagaa ttcatggggt 2341 tcccaaatgg
acttgactga cccgctttgc atagaagatt cagtagaaga
ttgtaagcaa
1201 tttcattttc aagtacaggt caatgtgaat
cagggaactg ctgggagtgc tttggtagtg 1261
tatgaaccaa agccagtagt tgattatgat aaggatttgg
aatttggagc atttaccaat 1321 ttaccacatg
tgttaatgaa cttggccgag actacccagg ccgacttatg
tatcccctat 1381 gttgcagata caaactatgt
gaagactgat tcatctgact tagggcaatt gaaagtttat
1441 gtgtggactc cccttagcat tccatcaggc tcatctaacc
aagtggacgt gactatattg 1501 ggtagcttat
tacaattgga tttccaaaac ccaagggtgt atgggcaaaa
tgttgacatt 1561 tacgatacag caccctctaa
accaattcca ttgaggaaga ctaaatattt gactatgagc
1621 acaaaataca aatggacaag aaataaagta gacatagctg
aaggtccagg ttcaatgaac 1681 atggcaaatg
tacttagtac gacagcagca caatcagtag cattggttgg
ggagagggct 1741 ttttatgatc ccaggactgc
tggtagcaaa tctagatttg atgacttagt aaaaatctca
1801 cagttgtttt cagttatggc agattccacc actccatctg
ccaatcatgg aatagaccaa 1861 aagggttatt
tcaaatggtc tgccaattct gatccacagg caatagtgca
tagaaactta 1921 gttcatttaa atctatttcc
aaatttgaag gtctttgaaa acagttattc atacttcaga
1981 ggttctctta taatcaggtt aagtgtttat gctagtacat
tcaacagagg ccgtttgaat 2041 gggttctttc
caaattccag tacagatgaa acttctgaaa ttgataatgc
catctacacc 2101 atatgtgata ttggatctga
caatagtttt gagattacta tcccttattc attttccact
2161 tggatgagga agacacatgg taaacctatt ggcctattcc
agattgaagt cctaaatagg 2221 ttaacataca
attactccag tccaaatgag gtatactgca tagtgcaagg
taaaatggga 2281 caagacgcca aatttttctg
ccccactggg tctttagtaa ctttccagaa ttcatggggt
2341 tcccaaatgg acttgactga cccgctttgc atagaagatt
cagtagaaga ttgtaagcaa
87
Prokaryotes and archaea
  • Genes are usually easily seen as they contain no
    introns and the genome is very gene-rich with few
    spaces between genes.
  • A simple search for open reading frames (ORFS)
    can often identify the genes. So, translation of
    a DNA sequence in all six reading frames is
    performed using, for example, the Translate tool
    on the ExPASy server. (http//www.expasy.org/tools
    /dna.html).

88
Why 6 reading frames?
89
Why 6 reading frames?
  • Ribosomes read an RNA sequence in triplets
  • GTC GCG ACT AGA ACT CGT GCT AAA
  • Val Ala Thr Arg Thr Arg etc
  • G TCG CGA CTA GAA CTC GTG CTA AA
  • Ser Arg Leu Glu Leu Val etc
  • GT CGC GAC TAG AAC TCG TGC TAA A
  • Arg Asp - Asn Ser Cys etc

90
Why 6 reading frames?
  • So 3 reading frames, but DNA is double stranded
  • Only one strand is usually shown to save space,
    but the other strand could be the one actually
    used
  • This makes a second set of 3 frames, so 6 in all

91
  • GTCGCGACTAGAACTCGTGCTAAA
  • CAGCGCTGATCTTGAGCACGATTT

92
e.g. A section of the E. coli genome
93
  • Most genes have ORFS of at least 100bp and
    often the longest ORF in a region is the gene.
    This is not always the case and so other criteria
    are also employed to analyse the predicted genes
  • The ORF may encode a protein similar to
    previously described ones
  • The ORF may have a typical GC content, codon
    frequency, or oligonucleotide composition for
    known protein-coding genes from the same
    organism).
  • The ORF may be preceded by a typical
    ribosome-binding site
  • The ORF may be preceded by a typical promoter ( a
    region that controls gene expression)

94
Some unicellular eukaryotes
  • The few introns and high gene density make gene
    prediction not as difficult as in higher
    eukaryotes- genes can be confirmed using similar
    methods to prokaryotes.
  • Some, however, do have genes with several introns
    and short ORFS.
  • Here ESTs can be very useful in identifying
    genes.
  • By definition an EST comes from an expressed
    region of DNA, hence a gene.

95
Most multicellular eukaryotes
  • Gene organization is so complex that gene
    identification is a major problem.
  • Here there are often large intergenic regions,
    and also the genes themselves contain numerous
    introns, many of them long.
  • An added complication is the fact that many
    proteins exist in different forms due to
    alternative splicing and it is important to
    identify these variants as they could be related
    to disease or to functions in different tissue
    types.

96
Most multicellular eukaryotes
  • Again ESTs are important in defining genes.
  • Exon boundaries can be predicted- often GT at the
    5' end and AG at the 3' end.
  • Similar sequences in other organisms are very
    useful
  • Statistical analysis of CG content (differ in
    coding regions), CpG islands (located close to
    genes)

97
Organization of the human iduronate 2-sulfatase
gene
  • This gene is located in positions 152960177995
    of human X chromosome
  • Encodes a 550-aa protein
  • Mutations in this gene cause mucopolysaccharidosis
    type II, also known as Hunter's disease
  • Tissue deposits of chondroitin sulfate and
    heparan sulfate.
  • Symptoms of Hunter's disease include coarse
    facial features, hepatosplenomegaly,
    cardiovascular disorders, deafness, and, in some
    cases, progressive mental retardation.

98
  • The top line indicates the X chromosome and shows
    the location of the iduronate sulfatase gene
    (thick line in the middle).
  • Thin lines on the bottom indicate two alternative
    transcripts.
  • Exons are shown with small rectangles.

99
(No Transcript)
100
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com