Microbial Genomes - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Microbial Genomes

Description:

Microbial Genomes. 1) Methods for Studying Microbial Genomes ... Bioinformatics. Databases. Literature. Sequence. Experimental. Genome Annotation ... – PowerPoint PPT presentation

Number of Views:535
Avg rating:3.0/5.0
Slides: 67
Provided by: benm7
Category:

less

Transcript and Presenter's Notes

Title: Microbial Genomes


1
Microbial Genomes
  • 1) Methods for Studying Microbial Genomes
  • 2) Analysis and Interpretation of Whole Genome
    Sequences

2
  • Methods for Studying Microbial Genomes

Why study genomes? History of genome
sequencing Methods, Principles Approaches
3
Why study microbial genomes?
  • until whole genome analysis became viable, life
    sciences have been based on a reductionist
    principle dissecting cell and systems into
    fundamental components for further study
  • studies on whole genomes and whole genome
    sequences in particular give us a complete
    genomic blueprint for an organism
  • we can now begin to examine how all of these
    parts operate cooperatively to influence the
    activities and behavior of an entire organism a
    complete understanding of the biology of an
    organism
  • microbes provide an excellent starting point for
    studies of this type as they have a relatively
    simple genomic structure compared to higher,
    multicellular organisms
  • studies on microbial genomes may provide crucial
    starting points for the understanding of the
    genomics of higher organisms

4
Why study microbial genomes?
  • analysis of whole microbial genomes also provides
    insight into microbial evolution and diversity
    beyond single protein or gene phylogenies
  • in practical terms analysis of whole microbial
    genomes is also a powerful tool in identifying
    new applications in for biotechnology and new
    approaches to the treatment and control of
    pathogenic organisms

5
History of microbial genome sequencing
  • 1977 - first complete genome to be sequenced was
    bacteriophage ?X174 - 5386 bp
  • first genome to be sequenced using random DNA
    fragments - Bacteriophage ? - 48502 bp
  • 1986 - mitochondrial (187 kb) and chloroplast
    (121 kb) genomes of Marchantia polymorpha
    sequenced
  • early 90s - cytomegalovirus (229 kb) and
    Vaccinia (192 kb) genomes sequenced
  • 1995 - first complete genome sequence from a free
    living organism - Haemophilus influenzae (1.83
    Mb)
  • late 1990s - many additional microbial genomes
    sequenced including Archaea (Methanococcus
    jannaschii - 1996) and Eukaryotes (Saccharomyces
    cerevisiae - 1996)

6
Microbial genomes sequenced to date
  • currently there are 32 complete, published
    microbial genomes 25 domain Bacteria, 5 Domain
    Archaea, 1 domain Eukarya (www.tigr.org)
  • around 130 additional microbial genome and
    chromosome sequencing projects underway

7
Laboratory tools for studying whole genomes
  • conventional techniques for analysing DNA are
    designed for the analysis of small regions of
    whole genomes such as individual genes or operons
  • many of the techniques used to study whole
    genomes are conventional molecular biology
    techniques adapted to operate effectively with
    DNA in a much larger size range

8
Pulsed Field Gel Electrophoresis
  • agarose gel electrophoresis is a fundamental
    technique in molecular biology but is generally
    unable to resolve fragments greater than 20
    kilobases in size (whole microbial genomes are
    usually greater than 1000 kilobases in size)
  • PFGE (pulsed field gel electrophoresis) is a
    adaptation of conventional agarose gel
    electrophoresis that allows extremely large DNA
    fragments to be resolved (up to megabase size
    fragments)
  • essential technique for estimating the sizes of
    whole genomes/chromosomes prior to sequencing and
    is necessary for preparing large DNA fragments
    for large insert DNA cloning and analysis of
    subsequent clones
  • also a commonly used and extremely powerful tool
    for genotyping and epidemiology studies for
    pathogenic microorganisms

9
Principle of PFGE
  • two factors influence DNA migration rates through
    conventional gels
  • - charge differences between DNA fragments
  • - molecular sieve effect of DNA pores
  • DNA fragments normally travel through agarose
    pores as spherical coils, fragments greater than
    20 kb in size form extended coils and therefore
    are not subjected to the molecular sieve effect
  • the charge effect is countered by the
    proportionally increased friction applied to the
    molecules and therefore fragments greater than 20
    kb do not resolve
  • PFGE works by periodically altering the electric
    field orientation
  • the large extended coil DNA fragments are forced
    to change orientation and size dependent
    separation is re-established because the time
    taken for the DNA to reorient is size dependent

10
Principle of PFGE
11
Principle of PFGE
  • the most important factor in PFGE resolution is
    switching time, longer switching times generally
    lead to increased size of DNA fragments which
    can be resolved
  • switching times are optimised for the expected
    size of the DNA being run on the PFGE gel
  • switch time ramping increases the region of the
    gel in which DNA separation is linear with
    respect to size
  • a number of different apparatus have been
    developed in order to generate this switching in
    electric fields however most commonly used in
    modern laboratories are FIGE (Field Inversion Gel
    Electrophoresis) and CHEF (Contour-Clamped
    Homogenous Electrophoresis)

12
CHEF
Switch Time
Electric Field 1
Electric Field 2
-
-
-
-
-
-
-
-








13
Preparation of DNA for PFGE
  • ideally a genomic DNA preparation that contains a
    high proportion of completely or almost
    completely intact genome copies would be suitable
    for PFGE
  • conventional means of DNA preparation are
    unsuitable for PFGE as mechanical shearing and
    low-level nuclease activity will result in
    fragmented DNA with an average size much smaller
    than an entire microbial genome (usually less
    than 200 kb in size)
  • the solution to this is to prepare genomic DNA
    from whole cells in a semisolid matrix (ie.
    agarose) that eliminates mechanical shearing
  • a very high concentration of EDTA is also used at
    all times in order to eliminate all nuclease
    activity

14
Preparation of DNA for PFGE
  • Procedure
  • 1) intact cells are mixed with molten LMT agarose
    and set in a mold forming agarose plugs
  • 2) enzymes and detergents diffuse into the plugs
    and lyse cells
  • 3) proteinase K diffuses into plugs and digests
    proteins
  • 4) if necessary restriction digests are performed
    in plugs (extensive washing or PMSF treatment is
    required to remove proteinase K activity)
  • 5) plugs are loaded directly onto PFGE and run

15
Preparation of DNA for PFGE
  • for restriction digests, conventional enzymes are
    unsuitable as they cut frequently on an entire
    genome sequence producing DNA fragments that are
    far too small
  • rare cutter restriction endonucleases cut
    genomic DNA with far less frequency than
    conventional restriction enzymes such as HindIII,
    BamHI etc.
  • many rare cutter REs have 6-bp (or longer)
    recognition sites eg. NotI GC?GGCCGC
  • in many cases the frequency of cutting is highly
    species dependent eg. BamHI will cut far less
    frequently on a low GC genome when compared to a
    intermediate or high GC content genome
  • suitable rare cutter enzymes therefore have to be
    determined experimentally for each new species
    being studied

16
Large insert cloning vectors BACs and PACs
  • DNA cloning is another technique fundamental to
    molecular biology that requires adaptation in
    order to be useful in studying DNA at a whole
    genome scale
  • conventional plasmid derived cloning vectors are
    only able to reliably maintain inserts less than
    20 kb in size
  • there are a number of approaches to generating
    clones with inserts in an intermediate size range
    (20 80 kB) such as cosmids, etc.
  • the most commonly used vectors for cloning
    extremely large DNA inserts are BACs (Bacterial
    Artificial Chromosomes) and PACs (P1-derived
    Artificial Chromosomes)
  • both BAC and PAC vectors are plasmid derived
    vectors distinguished from conventional vectors
    by extremely tightly controlled low copy numbers

17
Large insert cloning vectors BACs and PACs
  • these very low copy numbers help to limit the
    strain on host cellular resources generated by
    very large DNA inserts thus eliminating the
    rejection of large insert clones
  • low copy numbers also help to limit recombination
    events with host genomic DNA
  • BAC and PAC vectors both utilise E. coli as the
    host organism
  • BAC vectors are based on the E. coli single copy
    F-factor plasmid the F-factor origin of
    replication is very tightly controlled
  • PAC vectors are based on an identical principle
    but instead use a single copy origin of
    replication derived from P1 phage
  • PAC vectors also contain a pUC19 cassette for
    improved vector purification

18
(No Transcript)
19
Approaches to whole microbial genome sequencing
  • aim of microbial genome sequencing projects is to
    construct, from 500 800 bp sequencing reads
    containing about 1 mistakes, a genome sequence
    of several megabases with an error rate lower
    than 1 per 10000 nucleotides
  • with improving software, decreasing computation
    costs and advancements in automated DNA
    sequencing, an entire microbial genome project
    can be completed in a small laboratory in 1-2
    years
  • there are two main approaches to sequencing
    microbial genomes the ordered clone approach
    and direct shotgun sequencing
  • both require both large and small insert genomic
    DNA libraries in order to be effective

20
Approaches to whole microbial genome sequencing
21
Ordered Clone Approach
  • essentially this technique involves constructing
    a map of overlapping large insert clones covering
    the whole genome and then completely sequencing
    the minimum subset of these ordered clones
  • there are a number of methods used to order
    clones including restriction fingerprinting and
    hybridisation mapping
  • once an ordered large insert clone set is
    identified, a whole genome sequence is determined
    by either shotgun or partial primer walk
    sequencing of each insert
  • the ordered clone approach to DNA sequencing
    requires a large amount of characterisation prior
    to actual DNA sequencing and is therefore a
    relatively time consuming approach, however, it
    may be cheaper than shotgun sequencing an entire
    genome as less redundant sequencing is required
  • with rapid decreases in costs for computing power
    and sequencing this method is no longer
    considered viable for small (lt 5 Mb) genomes

22
(No Transcript)
23
Large DNA fragment
Digest and subclone
Whole Genome
Randomly sequence fragments
Fill gaps
Repeat for entire genome map
24
Random sequencing (shotgun) approach
  • this is the currently the most commonly used
    strategy for microbial whole genome sequencing
  • sequences from both ends of a large number of
    small and large insert clones are generated and
    overlapping sequences joined together to form a
    contig of the whole genome sequence (whole
    inserts not sequenced)
  • although this requires enormous amounts of DNA
    sequencing (often up to 10x genome coverage) and
    computational power for sequence assembly, it is
    a relatively rapid approach to whole genome
    sequencing
  • the first 90 95 of the genome sequence is
    relatively easy to generate by shotgun sequencing
    resulting in several hundred discrete contigs
  • filling the gaps to produce a single contig is
    the most difficult and time consuming phase of
    this process

25
Shear and subclone
Whole Genome
Randomly sequence fragments
Fill gaps
26
Random sequencing (shotgun) approach
  • There are a number of steps in the process -
  • 1) Random large and small insert library
    construction
  • 2) High throughput DNA sequencing
  • 3) Sequence assembly
  • 4) Ordering of contigs
  • 5) Primer walking to complete sequence
  • 6) Annotation

27
Library construction
  • Both conventional and large insert genomic DNA
    libraries should be constructed
  • the small insert library will be used for the
    bulk of the sequencing in order to generate
    suitable coverage of the complete genome
  • the large insert library (BAC, PAC, cosmid etc.)
    will be used as a scaffold during the sequence
    closure phase
  • it is crucial to ensure that both libraries are
    as random as possible - mechanical shearing is
    often used to generate small DNA fragments
  • it is also important that each clone contains
    only one DNA fragment and as such specialised
    methods for library construction must be used

28
DNA Sequencing
  • DNA sequences are generated using vector primers
    for both ends of inserts
  • at least 6X coverage of the genome is required
    although 9 to 10X coverage is often generated

29
Sequence assembly and gap closure
  • 4 major steps in sequence assembly and gap
    closure -
  • 1) random sequences initially interpreted using
    highly accurate base calling software and
    assembled to generate primary contigs using
    software such as PHRAPP
  • 2) computational and experimental techniques used
    to identify linking clones and order primary
    contigs
  • 3) primer walk sequencing of linking clones and
    PCR products to fill sequence gaps between
    contigs
  • 4) confirmation of contig order by PCR

30
Linking Clones
  • one of the most effective means of contig
    ordering and gap filling is linking clones
  • linking clones are those whose terminal sequences
    (from either end of the insert) belong to
    different contigs
  • if the orientation of the sequences and the
    distance to the end of the contig are compatible
    with with the size of the insert, the two contigs
    are likely to be linked
  • the larger the insert the more likely a clone
    will be a linking clone
  • this is why random sequencing is also performed
    on large insert clones - they are far more likely
    to form linking clones

31
Contig 1
Contig 2
Gap
Random Sequencing
Random Sequencing
32
Large Insert Linking Clone
Contig 1
Contig 2
REV
FWD
  • Once all possible linking clones are identified -
  • gaps are classified into two categories - those
    with linking clones (template available for
    sequencing) and physical gaps without linking
    clones ( no DNA template for the region)
  • for those gaps with suitable linking clones, the
    gaps confirmed by PCR and closed by primer walk
    sequencing

33
Large insert Linking Clone
Contig 1
Contig 2
REV
FWD
Primer Walking
34
Physical Gaps
  • Contigs separated with physical gaps (no linking
    clones) are usually spanned by PCR on genomic DNA
    using primers from each end of the contigs
  • the PCR products can then be sequenced to close
    the gaps
  • without linking clones other techniques to order
    contigs must be used in order to guide the
    selection of PCR products

35
For those contigs without linking clones, how do
you fill the gaps?
Linking clone
Supercontig 1
Supercontig 2
Supercontig 3
36
Physical Gaps
  • contigs can be ordered by -
  • peptide linking - contig ends having regions with
    homology to the same gene (or operon / gene
    cluster)
  • southern hybridisation of labelled contig
    terminal oligonucleotides against large
    restriction fragments

37
Linked by Southern Hybridisation
Contig 2
Contig 6
PCR Product
FWD
REV
Primer Walking
38
Gapped Microbial Genomes
  • considering the cost and difficulty in filling
    gaps between contigs some interest has been
    generated by the analysis of gapped microbial
    genomes
  • each gap is usually very small on average
    (approximately 75 bp for a 3.2x coverage library)
  • increasing bioinformatic resources available mean
    that these gaps have little influence on
    functional reconstruction
  • eg. Thiobacillus ferroxidans - all assigned amino
    acid biosynthesis genes (140 in total) identified
    from a gapped genome of 1912 contigs
  • error rates tend to be relatively high compared
    to genome sequences with greater coverage

39
Example - Haemophilus influenzae
  • first complete genome sequence of a free living
    organism (1995)
  • important pathogen
  • genome is around 1.83 megabases in size
  • random sequencing was done for both small insert
    and large insert (lambda) libraries
  • sequencing reactions performed by eight
    individuals using fourteen ABI 377 DNA sequencers
    per day over a three month period
  • in total around 33000 sequencing reactions were
    performed on 20000 templates
  • plasmid extraction performed in a 96 well format
  • 11 mb of sequence was intially used to generate
    140 contigs
  • gaps were closed by lambda linking clones (23),
    peptide links (2), Southern analysis (37) and PCR
    (42)

40
Annotation of Genome Sequences
  • a microbial genome sequence alone is only raw
    data it needs to be interpreted in order to be
    of any scientific significance
  • the process of predicting the location and
    function of all possible coding sequences (genes)
    in a genome sequence is known as annotation
  • although an annotated genome sequence provides a
    large amount of important information it is still
    merely a starting point for completely
    characterising an organism
  • Genome Databases
  • Listing of genomes Genomes online database
    (GOLD)
  • Comprenehsive Genome Databases GenBank, EMBL,
    DDBJ, JGI, TIGR, HAMAP, IMG, PEDANT (curation
    problematic, special tools for exploring /
    comparing / aligning genomes)
  • Taxon specific Genome Database EcoCyc
    (literature derived annotations)

41
Genome Annotation
  • The process after sequencing has been completed.
  • Use of many different tools required
  • Bioinformatics
  • Databases
  • Literature
  • Sequence
  • Experimental

42
Pipelines for the Annotation of Genomes Web
based
43
Pipeline for genome annotation
44
Identifying ORFs
  • most genomes will contain genes with very little
    or no homology to known genes of other organisms
  • for this reason all of the possible ORFs need to
    be identified without relying totally on homology
  • most efficient means for identifying potential
    genes in genome sequences is a three step process
  • 1) submit entire sequence as a 6-frame
    translation for BLAST analysis in order to
    identify some protein coding regions on the basis
    of high levels of homology
  • 2) use these initial coding regions to determine
    the sequence characteristics (GC content, codon
    bias etc.) that distinguish coding and non-coding
    regions of the genome (training the software

45
Identifying ORFs
  • 3) reanalyse the genome sequence using this data
    (plus potential ribosome binding sequences) in
    order to identify all the potential genes
  • using this process it has been experimentally
    shown that around 94 of genes can be accurately
    predicted
  • algorithms are also available to identify ORFs
    without using the training procedure with only
    slightly reduced accuracy
  • GLIMMER is a software for gene prediction and
    used by
  • BASYS- http//wishart.biology.ualberta.ca/basys/cg
    i/submit.pl
  • JCVI (formerly TIGR)- http//www.tigr.org/
  • SABIA- http//www.sabia.lncc.br/

46
Assigning function to ORFs
  • in order to assign function, all predicted ORFs
    are translated to amino acid sequence and
    analysed by homology searches against sequence
    databases (usually Genbank)
  • for each ORF there are three possible results -
  • i) clear sequence homology indicating function
  • ii) blocks of homology to defined functional
    motifs
  • - these should be confirmed experimentally
  • iii) no significant homology or homology to
    proteins of unknown function

47
ORFs of unidentified function
  • in most genome sequences many of the ORFs
    identified cannot be assigned a specific function
    based on homology
  • although the figure varies, usually between 40
    and 50 of ORFs fall into this category
  • clearly this represents a significant gap in our
    knowledge of microbial metabolism
  • these ORFs can be further divided into two
    categories
  • i) conserved hypothetical proteins ORFs with
    no homology to proteins of known function but
    with significant homology to unidentified ORFs
    of other species
  • these ORFs are therefore functionally conserved
    across numerous species and may represent
    important components of central metabolism that
    have not yet been identified

48
ORFs of unidentified function
  • the more universal the distribution of these
    ORFs the more likely they have a fundamental
    role in metabolism
  • ii) ORFs without homologues these are ORFs
    that have no homology to any known sequences
    these may represent genes encoding proteins
    related to more specific organism adaptations
  • eg. Deinococcus radiodurans is a radiation
    resistant organism that contains many ORFs
    without homologues many of these are thought to
    be involved in specialised processes of DNA
    repair

49
(No Transcript)
50
ORF identification and new amino acids
  • In addition to the 20 amino acids, two new but
    rare amino acids have now been identified
  • 21st selenocystine (Sec)
  • 22nd pyrolysin (Pyl)
  • The Sec and Pyl containing proteins are
    predominantly found in members of class
    d-proteobacteria, phylum Proteobacteria.
  • Metagenome analysis of the uncultured
    d-proteobacteria of the gutless mouthless worm,
    Olavius algarvensi, contains the highest
    proportion of Sec Pyl containing proteins to
    date suggesting that symbiosis promotes Sec Pyl
    genetic code
  • Olavius algarvensi, also contains the most wide
    use of the genetic code- 63 out of the 64
    possible codons
  • Sec
  • 99 genes, cluster into 30 protein families
  • present in domains Bacteria, Archaea Eucarya.
  • Sec coded by UGA (UGA also acts as a stop codon)

51
ORF identification and new amino acids
  • Reference
  • Zhang, Y. and Gladyshev, V. N. (2007). High
    content of proteins containing 21st and 22nd
    amino acids, selenocysteine and pyrrolysine, in a
    symbiotic deltaproteobacterium of gutless worm
    Olavius algarvensis. Nucleic Acids Research
  • 2007, 112

52
Structural genomics
  • in order to gain a complete understanding of an
    organism and fully exploit the potential offered
    by microbial genome sequencing, it is essential
    that these unidentified ORFs are assigned
    function
  • in most cases classical molecular biology tools
    will be necessary for this task, however, some
    suggestion of function for these ORFs would
    greatly improve the efficiency of this process
  • one possibility is structural genomics
  • this is the process of determining three
    dimensional structures of all the gene products
    encoded in a microbial genome (1000s of
    structures!!)
  • function can then be inferred on the basis of 3d
    structure comparisons to other proteins
  • this relies on the principle that structure
    determines functions and although two proteins
    with similar amino acid sequences can be assumed
    to have similar structures, two proteins with
    similar structure dont necessarily have the same
    aa sequence

53
Microarray hybridisation
  • a completely annotated microbial genome sequence,
    whilst a powerful scientific tool, still doesnt
    provide all of the information needed to
    understand the complete biology of an organism as
    it essentially a static picture of the genome
  • for truly complete characterisation, the dynamic
    nature of gene expression within a microbial cell
    needs to be determined
  • microarray technology allows whole organism gene
    expression to be investigated
  • PCR products of every gene from a complete genome
    sequence are bound in a high density array on a
    glass slide
  • these arrays are probed with fluorescently
    labelled cDNA prepared from whole RNA under
    specific environmental conditions
  • the level of cDNA for each ORF is then quantified
    using high resolution image scanners

54
Microarray hybridisation
  • example a microarray containing 97 of the
    predicted ORFs from Mycobacterium tuberculosis
    was used to investigate the response to the
    antituberculosis drug isoniazid (INH)
  • INH was found to induce several genes related to
    outer lipid envelope biosynthesis consistent
    with the drugs physiological mode of action
  • a number of additional genes were also induced
    which may provide potential drug targets in the
    future

55
INH untreated - green
INH treated - red
Overlay
Yellow Red Green (no change in expression)
Green only expressed without INH treatment
Red only expressed after INH treatment
56
Characteristics of sequenced genomes
  • the 32 complete genome sequences currently
    available cover a diverse range in terms of
    phylogeny and environments (eg. human pathogens,
    plant pathogens, extremophiles etc.)
  • what conclusions can be made by comparing the
    genomes of these organisms regarding specific
    adaptations to proliferation in remarkably
    different environments?
  • What conclusions can be made about evolutionary
    relationships between these organisms?

57
Horizontal gene transfer
  • before microbial genome sequences became
    available most of the focus of microbial
    evolution was on vertical transmission of
    genetic information mutation recombination and
    rearrangement within the clonal lineage of a
    single microbial population
  • genome sequences have demonstrated that
    horizontal transfer of genes (between different
    types of organisms) are widespread and may occur
    between phylogentically diverse organisms
  • generally speaking, essential genes (such as 16S
    rRNA) are unlikely to be transferred because the
    potential host most likely already contains genes
    of this type that have co-evolved with the rest
    of its cellular machinery and and cannot be
    displaced
  • genes encoding non-essential cellular processes
    of potential benefit to other organisms are far
    more likely to be transferred (eg. those involved
    in catabolic processes)

58
Horizontal gene transfer
  • clearly, lateral transfer of genomic information
    has enormous potential in improving an
    microorganisms ability to compete effectively -
    this may explain why horizontally transferred
    genes appear so frequently and ubiquitously in
    microbial genomes
  • an example of this is horizontally transferred
    genes between Archaeal and Bacterial
    hyperthermophiles -
  • Thermotoga maritima has 15 clusters of genes
    (4-20kb) most similar to equivalent Archaeal
    hyperthermophile gene regions

59
Whole genome phylogenetic analysis
  • most of the evolutionary relationships between
    microorganisms are inferred by comparison of
    single genes usually 16s rRNA genes
  • although extremely effective, single gene
    phylogenetic trees only provide limited
    information which can make determining broad
    relationships between major groups difficult
  • phylogenetic relationships can be determined by
    whole genome comparisons of the observed absence
    or presence of protein encoding gene families
  • in effect this is similar to using the
    distribution of morphological characteristics to
    determine phylogeny without the problem of
    convergent evolution
  • trees produced using this method are similar to
    16s rRNA trees, however, as more genome sequences
    become available more detailed conclusions can be
    drawn using this method

60
Archaeal Genomes
  • analysis of the 5 complete genome available for
    members of the domain Archaea has provided new
    insights into relationships between Archaea,
    Bacteria and Eukaryotes
  • around 35 of the Archaeal genes form a stable
    core conserved throughout the domain
  • most of these encode proteins involved in
    transcription, translation and DNA metabolism and
    some central metabolic pathways
  • the remainder of the genome is classified as a
    variable shell
  • a relatively high proportion of the variable
    shell genes are most homologous to their
    bacterial counterparts - this suggests horizontal
    gene transfer events
  • a relatively high proportion of the stable core
    genes are most similar to Eukaryotic genes

61
A - Stable core
B - Variable shell
62
Species and strain specific genetic diversity
  • although genome sequencing and analysis is very
    useful when comparing phylogenetically distant
    taxa, it is also of interest to examine the
    genomes of very closely related microorganisms
  • this allows a more quantitative approach for
    examining the relationships between genotype and
    phenotype
  • complete genome sequences have been determined
    for two species of the genus Chlamydia
    (pneumoniae and trachomatis)
  • although the overall genome structure was quite
    similar, C.pneumoniae contained an additional 214
    genes most of which have an unknown function
  • two strains of the bacterium Helicobacter pylori
    have been completely sequenced (26695 and J99)
  • overall the two strains were very similar
    genetically with only 6 of genes being specific
    to each strain

63
Case study - Deinococcus radiodurans
  • D. radiodurans R1 is an extremely radiation
    resistant bacterium
  • genome (total of 3.3 megabases) consists of two
    chromosomes (2.6 and 0.4 mb) a megaplasmid (177
    kb) and a small plasmid (44 kb)
  • considerable genetic redundancy was observed in
    both the chromosomal and plasmid sequences
  • numerous systems for DNA repair, DNA damage
    export were identified
  • a significant proportion of the ORFs identified
    had no database matches - these may be involved
    in unique cellular adaptations to radiation and
    stress response

64
Case study - Neisseria meningitits
  • N. meningititis causes bacterial meningitis and
    is therefore an important pathogen
  • genome is 2.2 megabases in size
  • 2121 ORFs were identified with many having
    extremely variable GC (recently acquired genes)
  • many of these recently acquired genes are
    identified as cell surface proteins
  • there is a remarkable abundance and diversity of
    repetitive DNA sequences
  • nearly 700 neisserial intergenic mosaic elements
    (NIMEs) - 50 to 150 bp repeat elements
  • these repeat elements may be involved in
    enhancing recombinase specific horizontal gene
    transfer

65
Case study - Borellia burgdorferi
  • B. burgdorferi is a spirochaete which causes Lyme
    disease
  • it has a 0.91 megabase linear genome and at least
    17 linear and circular plasmids which total 0.53
    megabases
  • 853 predicted ORFs identified - these encode a
    basic set of proteins for DNA replication,
    transcription, translation and energy metabolism
  • no genes encoding proteins involved in cellular
    biosynthetic reactions were identified - appears
    to have evolved via gene loss from a more
    metabolically competent precursor
  • there is significant amount of genetic redundancy
    in the plasmid sequences although a biological
    role has not been determined
  • it is possible the these plasmids undergo
    frequent homologous recombination in order to
    generate antigenic variation in surface proteins

66
Summary
  • Microbial genome sequencing and analysis is a
    rapidly expanding and increasingly important
    strand of microbiology
  • important information about the specific
    adaptations and evolution of an organism can be
    determined from genome sequencing
  • however, genome sequencing merely a strong
    starting point on road to completely
    understanding the biology of microorganisms
  • further characterisation of ORFs of unknown
    function, in combination with gene expression
    analysis and proteomics is required
Write a Comment
User Comments (0)
About PowerShow.com