Diapositiva 1 - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Diapositiva 1

Description:

... reporter genes, are long procedures that typically verify one element at a time. ... We used the one that compares the number (and the distribution on the input ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 2
Provided by: Anton81
Category:

less

Transcript and Presenter's Notes

Title: Diapositiva 1


1
Searching for structured motifs in the upstream
regions of hsp70 genes in Tetrahymena
termophila. Roberto Marangoni, Antonietta La
Terza, Nadia Pisanti, Sabrina Barchetta,
Cristina Miceli
marangon_at_di.unipi.it
antonietta.laterza_at_unicam.it
pisanti_at_di.unipi.it
sabrina.barchetta_at_unicam.it
cristina.miceli_at_unicam.it Dipartimento di
Informatica, Università di Pisa Dipartimento
di Biologia M.C.A., Università di Camerino
2 - SMILE and the searching strategy used To
study structured motifs we used a software called
SMILE (Structured Motifs Inference, Localization
and Evaluation) which is based on an algorithm
introduced in Marsan and Sagot (Marsan, L. and M.
-F. Sagot, Algorithms for extracting structured
motifs using a suffix tree with application to
promoter and regulatory site consensus
identification. J. of Comput. Biol. 7, 345362.).
It works with an index (suffix tree) of the set
of sequences instead of working directly with the
sequences. SMILE takes as input a set of
unaligned biological sequences and a list of
parameters. The parameters correspond to the
properties that the patterns sought must satisfy.
SMILE outputs all motifs in the input sequences
that match such properties. The motifs SMILE can
handle are complex as they may be composed of any
specified number of parts, or sub-motifs. We call
such sub-motifs the various boxes of the motif.
An assumption is that the occurrences of the
boxes of a motif must always appear in the same
relative order in the sequences. Each one of the
boxes composing the structured motif has its own
user-defined characteristics. Other parameters
describe characteristics of the whole motif. We
are mainly interested in focusing on the HSE
structured motif, whose structure is very
particular, being composed by three boxes, the
first and the last of which are identical, while
the middle box is the reverse complement of the
other two. This is strongly suggesting for a
particular conformation of the DNA segment, that
can be the responsible of the genetic regulative
function. Indeed, SMILE allows to set parameters
in such a way to focus the search on all the
possible three-boxes motifs, arranged in a
general pattern of the type

XYZ_ZYX_XYZ where X,Y,Z represent any DNA
base, and X, Y and Z represent their
complements. We also tried to investigate
possible spatial correlations between the
patterns found. We also run searches for the
GATA motif, in order to assess its statistical
relevance. For the whole motif in our queries we
have asked that the motif occurs in all the input
sequences in an exact way, and that is composed
of all the three boxes. For each box we have
asked a length 3 and a distance with the next box
that ranges from 2 to 14. All the motifs
extracted according to the specified structural
parameters are classified according to their
statistical significance. SMILE offers two ways
of performing such evaluation. We used the one
that compares the number (and the distribution on
the input sequences) of occurrences of the motifs
found in the original sequence, with their
occurrences in another set of related biological
sequences that are not supposed to contain the
motif and that are obtained by means of a random
shuffling of the original sequences that
maintains the distribution of fragments of length
3 (this number has been suitably chosen as it is
the same as the length of the boxes).
1 - BIOLOGICAL BACKGROUND Gene regulation and
structured motifs Structured genetic motifs,
functioning as regulatory elements, are short DNA
sequences that determine the timing, location and
level of gene expression. Although often only 5
to 20 bp in length, they are critical in
understanding gene regulation. Experimental
procedures for regulatory element discovery, such
as electrophoretic shift assays or in vivo
analysis such as DNA transformation with reporter
genes, are long procedures that typically verify
one element at a time. Therefore, computational
methods have been developed to predict regulatory
elements and their locations in a high-throughput
manner.
Tetrahymena thermophila and heat shock protein
70 genes Genes induced by stresses represent
excellent models to identify new genetic elements
involved in the control of gene expression. Our
attention is mainly focused on genes of the
heat-shock protein family. The expression of the
heat shock genes is known to be regulated mainly
at transcriptional level. The inducibility of the
heat shock genes in response to various
environmental stresses, depend on the activation
of the heat shock factors (HSF). HSF bind to
highly evolutionary conserved heat shock
regulatory elements (HSE) that are composed by
at least three adjacent and inverse repeats of
the motif 5nGAAn 3.One inducible hsp70 gene was
cloned from Tetrahymena thermophila and the
promoter was characterized. It showed to contain
several HSE motifs with canonical and
non-canonical sequences and a new genetic element
with repetitive GATA sequences, that resembles
the element specific for GATA binding factors
(Fig.1) . Electrophoretic mobility shift assays
and mutational changes followed by in vivo
analysis with a reporter gene revealed that the
canonical HSE plays a determinant role in the
induction of hsp70 gene transcription and that
the repetitive GATA sequences are necessary for
the hsp70 expression. By searching into the
entire Tetrahymena genome recently completely
sequenced, other genes of the same family (and
also other stress genes) were identified. Their
promoter sequences represent the data we analized
using SMILE.
3- RESULTS a) HSE-motif and other similar
motifs The following table summarizes the results
obtained investigating for three-boxes structured
motifs searched into the hsp70 genes of
Tetrahymena thermophila.
Motif Score ACA_TGT_ACA 1.01 ATG_CAT_ATG 0.70 G
TT_AAC_GTT 0.70 ATC_GAT_ATC 0.55 TGA_TCA_TGA 0.4
4 CTA_TAG_CTA 0.38 TAG_CTA_TAG 0.34 TTG_CAA_TTG
0.25 CAA_TTG_CAA 0.22 AGA_TCT_AGA 0.22 TCT_AGA
_TCT 0.21 GAA_TTC_GAA 0.12 TTC_GAA_TTC 0.11 CTT
_AAG_CTT 0.10 The score indicates the deviation
from randomness a score gt0 indicates that the
pattern is statistically significant. The yellow
box is highlighting the HSE pattern, which is
experimentally proved to be involved in gene
regulation. No experimental evidences are
available for the other motifs at the
moment. Figure 2 shows a very schematic
representation about the localization of the most
significant motifs found, including the HSE
motif. A preliminary correlation analysis has
given no indication about possible cooperation of
these motifs in gene regulation, but more work is
necessary to address this problem. b)
GATA-motif GATA motif results very frequent in
the searched genes, and highly repeated along the
upstream sequences. This causes a low (but
significant) score, and it is very difficult to
represent in a graph similar to that in the Fig.
2, because of its abundance. Correlation studies
are in progress to investigate possible
association of several GATA boxes in a single
functional motif.
Write a Comment
User Comments (0)
About PowerShow.com