PROMoter SCanningANalysis tool - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

PROMoter SCanningANalysis tool

Description:

... tool to analyse a set of putative promoter sequences and recognize known and ... Sequences from Sergei Denissov, Molecular Biology (NCMLS) ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 22
Provided by: Hul78
Category:

less

Transcript and Presenter's Notes

Title: PROMoter SCanningANalysis tool


1
  • PROMoter SCanning/ANalysis tool

2
Goal
  • Creating a tool to analyse a set of putative
    promoter sequences and recognize known and
    unknown promoters, with built-in scoring system

3
Sequences to be PromScAnned
  • Sequences from Sergei Denissov, Molecular Biology
    (NCMLS)
  • Obtained from the cloning of chromatin (U2-OS
    human cells) highly enriched through double
    immunoprecipitation with anti-TBP antibodies

4
Main database BLAT
  • BLAT BLAST-Like Alignment Tool
  • Aligns the input sequence to the Human Genome
  • Connected to several databases, like
  • mRNAs - GenScan
  • ESTs - TwinScan
  • RepeatMasker - UniGene
  • RefSeq - CpG Islands

5
BLAT Human Genome Browser
6
BLAT method (1)
  • Align sequence with BLAT, get alignment info
  • Per BLAT hit, pick up additional info from
    connected databases
  • mRNAs
  • ESTs
  • RepeatMasker
  • CpG Islands
  • RefSeq Genes

7
BLAT method (2)
  • Additional info is gathered for four different
    positions
  • 1kb to the left query itself
  • 1kb to the right query itself
  • 20kb to the left query itself
  • 20kb to the right query itself
  • (1 kb and 20kb can be adjusted through interface)

(close promoters)
(distant promoters)
8
mRNAs
  • Genbank human mRNAs are aligned against the
    genome using the BLAT program. When a single mRNA
    aligns in multiple places, the alignment having
    the highest base identity is found. Only
    alignments that have a base identity level within
    1 of the best are kept. Alignments must also
    have at least 95 base identity to be kept.

9
ESTs
  • This track shows alignments between human
    Expressed Sequence Tags (ESTs) in Genbank and the
    genome.
  • Expressed sequence tags are single read
    (typically approximately 500 base) sequences
    which usually represent fragments of transcribed
    genes. Aligning regions (usually exons) are shown
    as black boxes connected by lines for gaps
    (usually spliced out introns).

10
RepeatMasker
  • Created by Arian Smit's Repeat Masker program
    which uses the RepBase library of repeats from
    the Genetic Information Research Institute
  • RepBase is a database of repetitive DNA sequence
    elements found in a variety of eukaryotic
    organisms including mammals, fish, insects,
    nematodes, and plants.
  • Different Repeats SINE, LINE, LTR, DNA, Simple,
    Low Complexity, Satellite, tRNA, other

11
CpG Islands
  • CpG CG C immediately followed by G
  • Particularly common near transcription start
    sites, and may be associated with promoter
    regions
  • Normally, in vertebrates CG -gt C is methylated
    -gt methylated C is deaminated -gt TG
  • CpGs are relatively rare, unless there is a
    selective pressure to keep them, or
  • a region is not methylated for some reason,
    perhaps having to do with the regulation of gene
    expression.
  • CpG islands are regions where CpG's are present
    at significantly higher levels than is typical
    for the genome as a whole.

12
RefSeq Genes
  • The RefSeq Genes track shows known protein coding
    genes taken from mRNA reference sequences
    compiled at LocusLink.
  • Refseq mRNAs are aligned against the genome using
    the BLAT program. When a single mRNA aligns in
    multiple places only the best alignments are
    kept. The alignments must also have at least 98
    sequence identity to be kept.

13
Scoring Method (1)
  • For each BLAT hit the Score is
  • S (length(mRNA)/distance(mRNA))sw
  • S (length(EST)/distance(EST))sw
  • S (length(RMSK tRNA)/distance(RMSK tRNA))sw
  • S (length(RMSK LTR)/distance(RMSK LTR))sw
  • S (length(RMSK rest)/distance(RMSK rest))sw
  • S (length(CpG)/distance(CpG))sw
  • S (length(RefSeq Genes)/distance(RefSeq
    Genes))sw
  • (sw scoring weight)

14
Scoring Method (2)
  • Scoring weight reflects reliability of the
    analyzed data how much proof for being
    promoter?
  • Adjustable through interface defaults
  • mRNAs 4
  • ESTs 3
  • RepeatMasker tRNA 3
  • RepeatMasker LTR 2
  • RepeatMasker rest 1
  • CpG Islands 2
  • RefSeq Genes 0

15
DBTSS (1)
  • Additional info from DBTSS DataBase of
    Transcriptional Start Sites
  • Most cDNAs lack precies information of 5
    termini.
  • Oligo-capping method -gt full-length cDNAs.
  • Of about 284,687 5' end sequences obtained,
    155,304 have been corresponded to cDNA sequences
    of known genes (8,996 genes) and are presented in
    the DBTSS

16
DBTSS (2)
  • Mapped each sequence on the human draft genome
    sequence to identify its transcriptional start
    site
  • Overall Score BLAT Score DBTSS Score

17
PromScan Query Interface
  • http//www.cmbi.kun.nl/timhulse/promscan

18
Output (1) Header
Excel also plain text format (tab separated)
possible
19
Output (2) Sequence Report
20
Output (3) Overall Report
Multiple hits are sorted from high score to low
score the higher the score, the higher the
possibility the input sequence is a promoter.
21
Suggestions please!
Write a Comment
User Comments (0)
About PowerShow.com