Title: Predicting plasmid host range with dinucleotide analysis
1Predicting plasmid host range with dinucleotide
analysis Haruo Suzuki, Masahiro Sota, Celeste J.
Brown, and Eva M. Top Department of Biological
Sciences, University of Idaho, Moscow, Idaho 83844
R-071
Contact Information Department of Biological
Sciences, PO Box 443051, University of Idaho,
Moscow, ID 83844-3051, Phone 1-208-885-8858
Fax 1-208-885-7905, E-mail hsuzuki_at_uidaho.edu
- INTRODUCTION
- Bacterial plasmids are ubiquitous mobile elements
that serve as a pool of many host-beneficial
traits such as antibiotic resistance 1. To
understand the role of plasmids in horizontal
gene transfer, we need to gain insight into the
life history of the plasmids, i.e., the range
of hosts in which they have evolved. Since
extensive data support the proposal that foreign
DNA acquires the host's nucleotide composition
during long-term residence (amelioration) 2,
comparison of nucleotide composition of plasmids
and chromosomes could shed light on a plasmids'
life history. - The average absolute difference, ?, has been
commonly used to measure differences in
dinucleotide relative abundance or 'genomic
signature' between bacterial chromosomes and
plasmids 3. Here, we introduced Mahalanobis
distance, D2, which accounts for the
variance-covariance structure of the chromosomal
signatures.
Was virulence plasmid pO157 acquired from
Yersinia pestis? Bacterial strains most similar
in dinucleotide composition to plasmid pO157 from
enterohemorrhagic Escherichia coli O157H7 EDL933.
Mahalanobis distance identifies plausible hosts
of Bacillus anthracis virulence plasmid
pXO1 Five highest ranking bacterial strains
based on Mahalanobis distance D2 (A) and average
absolute difference ? (B) for plasmid pXO1.
GOAL To compare performance of Mahalanobis
distance and commonly used ?-distance in
identifying plasmids hosts based on dinucleotide
composition similarity
- Yersinia pestis, agent of the bubonic plague,
were identified as potential long-term hosts of
plasmid pO157.
Inferring potential hosts of a plasmid from
unknown host Bacterial strains most similar in
dinucleotide composition to broad-host-range
plasmid pB10 captured directly from environmental
samples 4.
- The D2 value ranked plausible hosts (B. cereus
sensu lato) first, while the ? value ranked
unlikely hosts first. - Different species comprising the B. cereus sensu
lato group are largely defined by differences in
plasmids, while the chromosomes are highly
similar. - The known natural host of pXO1, B. anthracis,
ranked 5 based on the D2 value, and only 10 based
on the ? value.
- METHODS
- Analyses were implemented using G-language Genome
Analysis Environment, available at
http//www.g-language.org. Complete genome
sequences of 195 chromosomes and 412 plasmids of
Bacteria were retrieved from the NCBI ftp site
(ftp//ftp.ncbi.nih.gov/genomes/Bacteria). - Dinucleotide relative abundance (xij) is defined
as - where fi and fj denote the frequency of the
mononucleotide i and j respectively (i,j ?
A,C,G,T) and fij the frequency of the
dinucleotide ij. - We compared two distance measures to quantify the
dinucleotide relative abundance difference
between DNA sequences. - Average absolute difference, ?, used previously
- where xij and yij denote the relative abundance
value of the dinucleotide ij for the plasmid and
the chromosomal segment, respectively. - Mahalanobis distance, D2, introduced in this
study
Mahalanobis distance performs better to identify
plasmids hosts Distributions of ranks of
dinucleotide relative abundance similarities
between plasmids and their known natural hosts
based on Mahalanobis distance D2 (A) and average
absolute difference ? (B)
- Members of the ß-Proteobacteria were identified
as long-term evolutionary hosts of plasmid pB10,
which is in agreement with experimental evidence
of the plasmids replication range.
- CONCLUSIONS
- Mahalanobis distance performs better than
commonly used ?-distance to identify known
natural hosts of plasmids based on dinucleotide
composition similarity. - Dinucleotide analysis with Mahalanobis distance
can also be used to infer potential hosts of
plasmids, which can then be experimentally tested.
_
Acknowledgements We thank members of IBEST
(Initiative for Bioinformatics and Evolutionary
Studies) at the University of Idaho for useful
discussions. This work was supported by National
Science Foundation (EF-0627988) National
Institutes of Health (P20RR016454,
P20RR16448). References 1 Frost LS et al.
(2005) Nat Rev Microbiol, 3, 722-732. 2
Lawrence JG and Ochman H (1997) J Mol Evol, 44,
383-397. 3 Campbell A et al. (1999) Proc Natl
Acad Sci USA, 96, 9184-9189. 4 Schlüter A et
al. (2003) Microbiology 1493139-53.
- The plasmid-host pairs tended to rank higher when
using the D2 value compared to the ? value
(Wilcoxon signed rank test p lt 10-15).