BS961

About This Presentation

Title:

BS961

Description:

BS961 – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 101

Provided by: stan150

Category:

Tags: aacn | bs961

more less

Transcript and Presenter's Notes

Title: BS961

1
BS961

SESSION 2

2
Objectives

Describe how the genome sequence of specific
microorganisms can be exploited in clinical
practice.
Describe basic principles behind manual and
automated DNA sequencing, and pyrosequencing.
Explain how nucleotide sequence databases are
accessed and describe the different types of
databases.
Discuss how genes are identified in nucleotide
sequences.
Reading Brown. Genomes 3,Chapter 4.

3
DNA-based methods for virus pathogens

Microarrays
PCR

4
Microarrays

Miller and Tang (2009) Clinical Microbiology
Reviews 22, 611-633.

5
Microarrays
6
Microarrays
7
Microarrays- examples

Respiratory pathogens
Several systems available
e.g. ResPlex II assay Qiagen Flu-A, Flu-B,
PIV-1,PIV-2, PIV-3,PIV-4, RSV-A, RSV-B, hMPV,
RhV, EnV, and severe acute respiratory CoV
Multiplex RT-PCR

8
Microarrays- examples

For each pathogen, target-specific capture probes
are covalently linked to a specific set of
color-coded beads.
Labeled PCR products are captured by the
bead-bound capture probes in a hybridization
suspension.

9
Microarrays- examples

A microfluidics system delivers the suspension
hybridization reaction mixture to a dual laser
detection device.
A red laser identifies each bead (or pathogen) by
its color-coding
A green laser detects the hybridization signal
associated with each bead (indicating the
presence or absence of a particular pathogen).

10
Microarrays examples
11
Real time PCR
12
Real time PCR example

Nix et al (2008) Journal of Clinical
Microbiology, 46, 2519-2524.
Parechoviruses
Uses primers to regions present in all
parechoviruses

13
Parechovirus
14
Multiplex PCR

Can multiplex using probes of different colour

15
Sequencing strategies

Sequencing usually achieved by
dideoxynucleotide method
This requires
Template DNA to be sequenced, together with a
primer and DNA polymerase.
Modified nucleotides, lacking 3OH needed for
chain extension in DNA synthesis-
dideoxynucleotides. Mixed with ordinary
nucleotides, so at each position some chains are
terminated and some are not, so a range of
fragments is generated, each ending with the
specific dideoxynucleotide.
A gel system capable of separating DNA on the
basis of size with a resolution of one
nucleotide.
A detection method- usually dye-labelled
dideoxynucleotides (each of AGCT labelled with a
dye of different colour) detectable by laser.

16
Dideoxynucleotide sequencing

AAGCTAGCTGGCAAATGGCGTCTCAC
TTCGATCGgt primer
TTCGATCGA
TTCGATCGAC
TTCGATCGACC

17
Detection of bands
18
Output
19
Sequence assembly

In all sequencing projects the amount of sequence
which can be obtained from one reaction is much
less than that needed for the completion of the
project- some kind of assembly of contiguous
sequences (contigs) from several overlapping
sequences is needed.

20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
Strategies for small genomes

For small genomes, e.g. bacteria, an almost
completely shot-gun approach is often the most
efficient, with completion of gaps by more
directed methods.
e.g. Haemophilus influenzae

24
Sequencing of Haemophilus influenzae
25
Assembling contigs
Sequence 1
GATTCGTAGGCTTTAAGCTTCCGTCGACGCTGCGTAGC
26
Assembling contigs
Sequence 1 Enter into database
GATTCGTAGGCTTTAAGCTTCCGTCGACGCTGCGTAGC
27
Assembling contigs
Sequence 1 Enter into database Sequence 2
28
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database
29
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1?
30
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1?
CGATCGTGCCCCGTACTGACTGCATGCTGACACAGTC
GATTCGTAGGCTTTAAGCTTCCGTCGACGCTGCGTA
31
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1?
CGATCGTGCCCCGTACTGACTGCATGCTGACACA
GTC GATTCGTAGGCTTTAAGCTTCCGTCG
ACGCTGCGTA
32
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1?

CGATCGTGCCCCGTACTGACTGCATG
GATTCGTAGGCTTTAAGCTTCCGTCGACGCTGCGTA
33
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1? No.
X
34
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1? No. Enter into database
35
Assembling contigs
Sequence 1 Enter into database Sequence 2
Compare with database Does it overlap with
sequence 1? No. Enter into database Sequence 3
36
Assembling contigs
Sequence 1 Enter into database Sequence 2 Does
it overlap with sequence 1? No. Enter into
database Sequence 3 Compare with database
37
Assembling contigs
Sequence 1 Enter into database Sequence 2 Does
it overlap with sequence 1? No. Enter into
database Sequence 3 Compare with database Does it
overlap with Sequence 1 or Sequence 2
38
Assembling contigs
Sequence 1 Enter into database Sequence 2 Does
it overlap with sequence 1? No. Enter into
database Sequence 3 Compare with database Does it
overlap with Sequence 1 or Sequence 2 No.
39
Assembling contigs
Sequence 1 Enter into database Sequence 2 Does
it overlap with sequence 1? No. Enter into
database Sequence 3 Compare with database Does it
overlap with Sequence 1 or Sequence 2 No. Enter
into database
40
Assembling contigs
Sequence 4
41
Assembling contigs
Sequence 4 Compare with database
42
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3?
43
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No.
44
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into database
45
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5
46
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database
47
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2, 3 or 4.
48
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2, 3 or 4. YES.
49
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2 or 3. YES. Overlaps
sequence 2.
50
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2 or 3. YES. Overlaps
sequence 2. Make contig and enter into
database
51
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2 or 3. YES. Overlaps
sequence 2. Make contig and enter into
database
2 ACCGTCGCCCTGCCCGTAGCTG
52
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2 or 3. YES. Overlaps
sequence 2. Make contig and enter into
database
2 ACCGTCGCCCTGCCCGTAGCTG 5
CCCGTAGCTGCCATTTTCGA
53
Assembling contigs
Sequence 4 Compare with database Does it overlap
with Sequences 1, 2 or 3. No. Enter into
database Sequence 5 Compare with database Does it
overlap with Sequences 1, 2 or 3. YES. Overlaps
sequence 2. Make contig and enter into
database
2 ACCGTCGCCCTGCCCGTAGCTG 5
CCCGTAGCTGCCATTTTCGA CONTIG ACCGTCGCCCTGCCCGTAGCTG
CCATTTTCGA
54
Joining contigs
2 Large contigs Sequence overlapping both is
found Contigs joined
55
Filling gaps

As the sequence accumulates, there are
diminishing returns. New sequences become rarer
and some areas sequenced many times.
So there are gaps which need to be filled.

56
Gaps

Sequence gaps where by random chance no sequence
has been obtained
Physical gaps where the region has not been
cloned at all, so no sequence can be obtained.

57
(No Transcript)
58
Success of this approach

Very many microbial genomes have been sequenced
in this way

59
Sequencing large genomes

Argued that a clone-contig approach is best,
particularly because organisms with larger
genomes often contain a lot of repetitive
sequences and it difficult to join these
correctly if only short sequences are analysed.

60
Clone contig approach

Relies on cloning large fragments of DNA- e.g.
300kb. These are then mapped onto the chromosome
using physical maps.

61
Human genome

The draft human genome was published by two
groups at the same time in 2001
The International Human Genome Consortium, a
group of scientists funded by non-profit making
bodies. Used the clone-contig procedure (Nature
409, 860-921).
A private company, Celera. Used the shotgun
approach, which was much faster, but did use
scaffolding data already put into the public
domain by the first group (Science 291,
1304-1349).

62
Pyrosequencing

Recently a different method, pyrosequencing, more
suited to ultra high throughput has been
developed
http//www.pyrosequencing.com/DynPage.aspx?id7454

63
(No Transcript)
64
Step 1

The reaction contains a primer, template and DNA
polymerase, but also a number of other
components- ATP sulfurylase, luciferase and
apyrase, and the substrates, adenosine 5
phosphosulfate (APS) and luciferin.

65
Step 2

The first of four dNTPs is added to the reaction.
DNA polymerase catalyzes the incorporation of the
deoxynucleotide triphosphate into the DNA strand,
if it is complementary to the base in the
template strand.
Each incorporation event is accompanied by
release of pyrophosphate (PPi) in a quantity
equimolar to the amount of incorporated
nucleotide.

66
Step 3

ATP sulfurylase quantitatively converts PPi to
ATP in the presence of adenosine 5
phosphosulfate.
This ATP drives the luciferase-mediated
conversion of luciferin to oxyluciferin that
generates visible light in amounts that are
proportional to the amount of ATP.
The light produced in the luciferase-catalyzed
reaction is detected by a charge coupled device
(CCD) camera and seen as a peak in a program.
Each light signal is proportional to the number
of nucleotides incorporated- this gives a
different sort of output from dideoxynecleotide
sequencing.

67
(No Transcript)
68
Step 4

Apyrase, a nucleotide degrading enzyme,
continuously degrades unincorporated dNTPs and
excess ATP.
When degradation is complete, another dNTP is
added.

69
(No Transcript)
70
Step 5

Addition of dNTPs is performed one at a time. As
the process continues, the complementary DNA
strand is built up and the nucleotide sequence is
determined from the signal peak in the pyrogram.

71
(No Transcript)
72

The method can be automated considerably. Random
shearing of genomic DNA, PCR amplification and
complex sample handling methods mean that around
400,000 fragments can be sequenced at the same
time- each sequence being 200-300 nucleotides.
These can be automatically assembled into
contigs, the only problems being repetitive
sequences due to the small size of the sequences
generated. This is called 454 sequencing.

73
Pathogen detection

e.g. Briese et al (2009) PLOS Pathogens 5,
e1000455
Lujo virus- Arenaviridae.
Case of haemmorrhagic disease
RT-PCR- random amplification, ligation of
specific linkers, 454 sequencing

74
Pathogen detection

Worked with 3 libraries from different tissue
87,500-106,500 reads from each
Found 7 sequence fragments matching with
areanvirus
Completed gaps using conventional PCR

75
(No Transcript)
76
(No Transcript)
77
(No Transcript)
78
New respiratory viruses
79
Sequence databases

There are a number of different databases of
different types.
Nucleotide GenBank and EMBL are the main ones
for well characterised sequences htgs contains
unfinished High Throughput Genomic Sequences
(i.e. from genome projects) until they have been
characterised further.

80
More databases

Protein PIR and swissprot are the main ones.
Global nr (non-redundant). This is a compilation
of several databases.
ESTs dbest

81
ESTs (expressed sequence tags)

Short sequences obtained from total mRNA isolated
from a tissue.
Derived by random cDNA cloning and sequencing
without further purification.
Useful to show which genes expressed in a tissue-
as only these represented in the RNA.
A collection of ESTs from different tissue gives
an idea of the total number of genes in an
organism.

82
Accessing databases

http//www.ncbi.nlm.nih.gov/
Simple search terms

83
(No Transcript)
84
Identifying genes in nucleotide sequences

http//www.ncbi.nlm.nih.gov/books/bv.fcgi?indexed
googleridsef.section.168
Initially the sequence generated from a genome
project is largely featureless and needs to be
interpreted, the most important things to find
being the locations of the genes.

85
1 tttgaaaggg gtctcctaga gagcttggcc
gtcgggcctt acaccccgac ttgctgagtt 61
tctctaggag agtccctttc ccagccagag gtggctggtc
aaacaatacc aaacgtaact 121 aaacatctaa
gataacatag ccctatgcct ggtctccacc agttgaaggc
atcttgcaat 181 aaaatgggtg gattaagacg
cttaaagcat ggagtcaatt atcttttcta actagtgatc
241 ttcactgggt ggcagatggc gtgccataac tctattagtg
ggataccacg ctcgtggatc 301 ttatgcccac
acagccatcc tctagtaagt ttgcaaggtg tctgatgagg
cgtgggaact 361 tattggaaat aattacttgc
tgcgaagcat cctactgcca gcggatcaac acctggtaac
421 aggtgcccct ggggccaaaa gccacggttt aacagaccct
ttaggattgg ttaaaacctg 481 agtaattatg
gaagatactt agtacctacc aacttggtaa cagtgcaaac
actagttgta 541 aggcccacga aggatgccca
gaaggtaccc gcaggtaaca agagacactg tggatctgat
601 ctggggccac ctacctctat cctggtgagg tggttaaaaa
acgtctagtg ggccaaaccc 661 aggggggatc
cctggtttcc ttattttagt gtaaatgtca ttatggagac
aatcaagagc 721 attgcagata tggcgaccgg
tgtaactaaa accattgatg ccacaatcaa ttctgttaat
781 gagatcatca ctaacacaga taatgcttca ggtggagata
tattgactaa agttgctgat 841 gatgcttcaa
atattttagg gcccaactgt tatgcgacaa catctgagcc
agaaaacaag 901 gatgtggtgc aagcaaccac
cactgtgaac accactaatc tgacacagca cccatcagca
961 ccaacgttac catttacacc agacttttcg aatgttgaca
cgtttcattc aatggcttat 1021 gatactacaa
ctggtagtaa gaaccctaat aagttagtta ggttaacgac
acatgcttgg 1081 gctagtaccc tacagagggg
tcatcagatt gatcatgtta atctaccagt tgacttctgg
1141 gatgaacaga ggaaaccagc ttatggccat gctaaatatt
ttgcagctgt tcggtgtgga
86
1201 tttcattttc aagtacaggt caatgtgaat cagggaactg
ctgggagtgc tttggtagtg 1261 tatgaaccaa
agccagtagt tgattatgat aaggatttgg aatttggagc
atttaccaat 1321 ttaccacatg tgttaatgaa
cttggccgag actacccagg ccgacttatg tatcccctat
1381 gttgcagata caaactatgt gaagactgat tcatctgact
tagggcaatt gaaagtttat 1441 gtgtggactc
cccttagcat tccatcaggc tcatctaacc aagtggacgt
gactatattg 1501 ggtagcttat tacaattgga
tttccaaaac ccaagggtgt atgggcaaaa tgttgacatt
1561 tacgatacag caccctctaa accaattcca ttgaggaaga
ctaaatattt gactatgagc 1621 acaaaataca
aatggacaag aaataaagta gacatagctg aaggtccagg
ttcaatgaac 1681 atggcaaatg tacttagtac
gacagcagca caatcagtag cattggttgg ggagagggct
1741 ttttatgatc ccaggactgc tggtagcaaa tctagatttg
atgacttagt aaaaatctca 1801 cagttgtttt
cagttatggc agattccacc actccatctg ccaatcatgg
aatagaccaa 1861 aagggttatt tcaaatggtc
tgccaattct gatccacagg caatagtgca tagaaactta
1921 gttcatttaa atctatttcc aaatttgaag gtctttgaaa
acagttattc atacttcaga 1981 ggttctctta
taatcaggtt aagtgtttat gctagtacat tcaacagagg
ccgtttgaat 2041 gggttctttc caaattccag
tacagatgaa acttctgaaa ttgataatgc catctacacc
2101 atatgtgata ttggatctga caatagtttt gagattacta
tcccttattc attttccact 2161 tggatgagga
agacacatgg taaacctatt ggcctattcc agattgaagt
cctaaatagg 2221 ttaacataca attactccag
tccaaatgag gtatactgca tagtgcaagg taaaatggga
2281 caagacgcca aatttttctg ccccactggg tctttagtaa
ctttccagaa ttcatggggt 2341 tcccaaatgg
acttgactga cccgctttgc atagaagatt cagtagaaga
ttgtaagcaa
1201 tttcattttc aagtacaggt caatgtgaat
cagggaactg ctgggagtgc tttggtagtg 1261
tatgaaccaa agccagtagt tgattatgat aaggatttgg
aatttggagc atttaccaat 1321 ttaccacatg
tgttaatgaa cttggccgag actacccagg ccgacttatg
tatcccctat 1381 gttgcagata caaactatgt
gaagactgat tcatctgact tagggcaatt gaaagtttat
1441 gtgtggactc cccttagcat tccatcaggc tcatctaacc
aagtggacgt gactatattg 1501 ggtagcttat
tacaattgga tttccaaaac ccaagggtgt atgggcaaaa
tgttgacatt 1561 tacgatacag caccctctaa
accaattcca ttgaggaaga ctaaatattt gactatgagc
1621 acaaaataca aatggacaag aaataaagta gacatagctg
aaggtccagg ttcaatgaac 1681 atggcaaatg
tacttagtac gacagcagca caatcagtag cattggttgg
ggagagggct 1741 ttttatgatc ccaggactgc
tggtagcaaa tctagatttg atgacttagt aaaaatctca
1801 cagttgtttt cagttatggc agattccacc actccatctg
ccaatcatgg aatagaccaa 1861 aagggttatt
tcaaatggtc tgccaattct gatccacagg caatagtgca
tagaaactta 1921 gttcatttaa atctatttcc
aaatttgaag gtctttgaaa acagttattc atacttcaga
1981 ggttctctta taatcaggtt aagtgtttat gctagtacat
tcaacagagg ccgtttgaat 2041 gggttctttc
caaattccag tacagatgaa acttctgaaa ttgataatgc
catctacacc 2101 atatgtgata ttggatctga
caatagtttt gagattacta tcccttattc attttccact
2161 tggatgagga agacacatgg taaacctatt ggcctattcc
agattgaagt cctaaatagg 2221 ttaacataca
attactccag tccaaatgag gtatactgca tagtgcaagg
taaaatggga 2281 caagacgcca aatttttctg
ccccactggg tctttagtaa ctttccagaa ttcatggggt
2341 tcccaaatgg acttgactga cccgctttgc atagaagatt
cagtagaaga ttgtaagcaa
87
Prokaryotes and archaea

Genes are usually easily seen as they contain no
introns and the genome is very gene-rich with few
spaces between genes.
A simple search for open reading frames (ORFS)
can often identify the genes. So, translation of
a DNA sequence in all six reading frames is
performed using, for example, the Translate tool
on the ExPASy server. (http//www.expasy.org/tools
/dna.html).

88
Why 6 reading frames?
89
Why 6 reading frames?

Ribosomes read an RNA sequence in triplets
GTC GCG ACT AGA ACT CGT GCT AAA
Val Ala Thr Arg Thr Arg etc
G TCG CGA CTA GAA CTC GTG CTA AA
Ser Arg Leu Glu Leu Val etc
GT CGC GAC TAG AAC TCG TGC TAA A
Arg Asp - Asn Ser Cys etc

90
Why 6 reading frames?

So 3 reading frames, but DNA is double stranded
Only one strand is usually shown to save space,
but the other strand could be the one actually
used
This makes a second set of 3 frames, so 6 in all

GTCGCGACTAGAACTCGTGCTAAA
CAGCGCTGATCTTGAGCACGATTT

92
e.g. A section of the E. coli genome
93

Most genes have ORFS of at least 100bp and
often the longest ORF in a region is the gene.
This is not always the case and so other criteria
are also employed to analyse the predicted genes
The ORF may encode a protein similar to
previously described ones
The ORF may have a typical GC content, codon
frequency, or oligonucleotide composition for
known protein-coding genes from the same
organism).
The ORF may be preceded by a typical
ribosome-binding site
The ORF may be preceded by a typical promoter ( a
region that controls gene expression)

94
Some unicellular eukaryotes

The few introns and high gene density make gene
prediction not as difficult as in higher
eukaryotes- genes can be confirmed using similar
methods to prokaryotes.
Some, however, do have genes with several introns
and short ORFS.
Here ESTs can be very useful in identifying
genes.
By definition an EST comes from an expressed
region of DNA, hence a gene.

95
Most multicellular eukaryotes

Gene organization is so complex that gene
identification is a major problem.
Here there are often large intergenic regions,
and also the genes themselves contain numerous
introns, many of them long.
An added complication is the fact that many
proteins exist in different forms due to
alternative splicing and it is important to
identify these variants as they could be related
to disease or to functions in different tissue
types.

96
Most multicellular eukaryotes

Again ESTs are important in defining genes.
Exon boundaries can be predicted- often GT at the
5' end and AG at the 3' end.
Similar sequences in other organisms are very
useful
Statistical analysis of CG content (differ in
coding regions), CpG islands (located close to
genes)

97
Organization of the human iduronate 2-sulfatase
gene

This gene is located in positions 152960177995
of human X chromosome
Encodes a 550-aa protein
Mutations in this gene cause mucopolysaccharidosis
type II, also known as Hunter's disease
Tissue deposits of chondroitin sulfate and
heparan sulfate.
Symptoms of Hunter's disease include coarse
facial features, hepatosplenomegaly,
cardiovascular disorders, deafness, and, in some
cases, progressive mental retardation.

The top line indicates the X chromosome and shows
the location of the iduronate sulfatase gene
(thick line in the middle).
Thin lines on the bottom indicate two alternative
transcripts.
Exons are shown with small rectangles.

99
(No Transcript)
100
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

BS961 - PowerPoint PPT Presentation

BS961

BS961 – PowerPoint PPT presentation