Title: Repeated Sequences in Nostoc punctiforme and Other Cyanobacteria
1Repeated Sequences in Nostoc punctiforme and
Other Cyanobacteria Jeff Elhai1,2, Michiko Kato,1
Cathy Burke,3 José Luis Costa,4 Sarah Cousins,2
James Godde,5 Lauren Hauser,6 Masahiko Ikeuchi,7
Minoru Kanehisa,8 Toshiaki Katayama,7 Peter
Lindblad,9 Jack Meeks,10 Rei Narikawa,7 Shinobu
Okamoto,8 Heather Satterlee1 Center for the Study
of Biological complexity1, Department of Biology2
Virginia Commonwealth University Chesterfield
Technical Center3 Free University of Amsterdam4
Monmouth College5 Oak Ridge National
Laboratory6 University of Tokyo7 Kyoto
University8 Uppsala University9 University of
California at Davis10
Bestiary of repeated sequences
Small dispersed repeats
Transposable elements
Most interested in genomes focus on genes, the
proteins they encode, and the functional
implications of organisms bearing those proteins.
However, if it were possible to monitor a genome
over short evolutionary time, one would not find
much of interest in the gain or loss of genes but
rather by the ferment in other genomic units
repeated sequences which come and go rapidly
and play major roles in the structure of the
genome.
Number and characteristics of NIS elements
In examining the flanking sequences of NIS2b
sequences, we encountered what appeared to be a
transposable element, bounded by conserved
inverted repeats and flanked by direct repeats
that differed in sequence for each instance.
Surprisingly, the element contained little else
than NIS2b. The many copies of the element make
it clear that it is an active, miniature version
of a transposon whose full version contains the
gene encoding transposase NpR1652. All 50 copies
of NIS4 lie either between genes in parallel
orientation or downstream from convergent genes,
raising the possibility that utilization of the
inverted repeats of NIS2b as a transcriptional
termination signal has selected for the
preservation of random insertions of the
transposon that position NIS2b downstream from a
gene. We have observed many other minitransposons
within the genomes of Nostoc and Anabaena.
NIS (Nostoc Iterated Sequence) elements are small
repetitive sequences first discovered in the
Nostoc punctiforme genome (Costa et al, 2002). We
have characterized six NIS families, and there
are certainly more. Multicopy NIS1 elements
contain 4-nt palindromes of variable sequence
surrounding a conserved core sequence (two
versions shown below). Compensatory mutations
indicate that the NIS1 functions as an
oligonucleotide with secondary structure,
presumably an RNA. NIS1, NIS2a, and NIS3 are
generally flanked by tandem heptameric repeats.
NIS elements that are not thus flanked are
generally found within a minitransposon (NIS2b)
or within composite NIS elements (NIS 4, 5, and
6).
Repeated units may occur in tandem, frequently
observed in eukaryotic genomes, less so in
bacterial. The typical unit is one, two, or three
nucleotides. More often, repeated units are
dispersed more or less at random throughout the
genome. About 50 of our own genome is comprised
of transposable DNA units. In bacteria, small
dispersed elements (e.g. ERIC and REP) have been
observed. Recently, a third class of repeated
sequences has been recognized, consisting of
repeated units separated by short nonrepetitive
gaps.
Dispersed repeats
Transposable elementsSmall repeated elements
Comparison amongst cyanobacteria
In all three cases, the questions of interest are
the same Where do they come from? How do they
move? What are their functional roles?
Composite NIS repeat elements
Tandem repeats
Tandem repeats found in bacteria usually lie
within open reading frames. The size of the
repeat unit is therefore biased towards multiples
of three, otherwise the repeat would disrupt the
open reading frame. The pattern seen generally in
bacteria is seen also in most cyanobacteria. Heter
ocystous cyanobacteria are excep-tional in the
number of their repeats with units of 7nt.
Crocosphaera and Trichodes-mium both have 7-nt
repeats, though not as many as Nostoc and
Anabaena PCC7120.
Composite NIS elements, such as that shown above
occur at a surprisingly high frequency. NIS1 is a
particularly common target for insertion by other
NIS elements.
Origin of NIS elements
Families of Tandem Repeats Repeat
Family Nostoc Anabaena
Trichodesmium AATGACh (STRR2) AATGACA 1
(111) 39 (7) - AATGACT 2 (98) 7
(57) (1) AATGACC 8 (49) 23 (16) - AATTCCC
(STRR4) AATTCCC 6 (56) 74 (3) 41 (8)
AATGCCC 7 (55) - - AATTACG (STRR5)
AATTACG 3 (66) 20 (17) - AdTCCCC (STRR1)
ATTCCCC 5 (60) 32 (9) (1) AATCCCC 16 (24) 3 (2
1) - AGTCCCC 23 (15) 4 (69) - AGCAGGGG
(STRR6) AGCAGGGG 4 (64) 24 (16) - AAAATTC
(STRR7) AAAATTC 21 (17) 1 (116) 10 (22)
First number is the rank second number is
instances
The repeated sequences considered here appear and
disappear rapidly over evolutionary time. One
might expect that an organism's life style would
determine the degree to which it acquires new
sequences perhaps marine creatures would have
fewer opportunities for horizontal gene transfer.
However, the above table makes clear that the
presence of repeated sequences correlates better
with phylogeny than with environmental niche.
Apart from transposons, each type of repeated
sequence poses a mystery as to its propagation.
For the most part, NIS1 appears to propagate by
recombination mediated by its flanking tandem
repeats, and NIS2b appears to propagate by
transposition of the minitransposon that contains
it. But how were the initial insertions made?
NIS5, caught in the act of insertion, provides
important clues, the implications and generality
of which remain to be explored.
Unlike the 3x-mer repeats, 7-nt repeats lie
predominantly in intergenic regions and fall into
a small number of sequence families.20 families
out of the 1170 possible 7-nt families account
for over two-thirds of the total heptameric
repeats in Nostoc. Nostoc and Anabaena have
families in common as, to a lesser extent, do
Trichodesmium and Crocosphaera.
NIS elements are sometimes found inside of genes.
A survey of instances where NIS5 was inserted
within genes of Anabaena PCC 7120 or Anabaena
variabilis revealed that in all three cases,
insertion occurred within a HIP1 site (GCGATCGC),
between the third and fourth nucleotides. All
multicopy sequence variants of NIS5 also possess
GCG at one end and ATCGC at the other, indicating
that HIP1 sites may play an essential role in the
mobility of NIS5, either by serving as
recognition sites or by making available free DNA
ends that might arise during recombination. It
has been suggested HIP1 sites are recombinogenic
(Robinson et al, 1997).
This work was supported in part by grant
EEC0234104 from the NSF/NIH Bioinformatics and
Bioengineering Summer Institute program.