de novo Analysis of Sequences - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

de novo Analysis of Sequences

Description:

Large number of sequence analysis tools are available on the web ... 4 dynamic programming - stitch together detailed alignments of chunks into alignment of whole ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 32
Provided by: benjam63
Category:

less

Transcript and Presenter's Notes

Title: de novo Analysis of Sequences


1
de novo Analysis of Sequences
Slides by Jane Loveland and Dustin Schones
2
Introduction
  • Large number of sequence analysis tools are
    available on the web
  • Sequences submitted to public databases will
    probably be annotated and incorporated in Ensembl
    and NCBI databases.

HOWEVER
  • Sequences may also be analysed and annotated
    manually.
  • Some of the tools available are not used by
    Ensembl, NCBI etc. and may provide you with
    useful additional information

3
Lab work
de novo cDNA analysis de novo genomic sequence
analysis de novo protein analysis
Confirms/disagrees with in silico predictions
Development of programs
Predictions
Sequence analysis tools
4
de novo analysis of sequence
  • BLAST similarity searching
  • BLAT rapid genome searching
  • PSI-BLAST to pull out similar proteins
  • ORF finder to highlight putative protein products
    of cDNA
  • BLASTP link from ORF finder to investigate what
    these potential protein products might be.
  • SPIDEY to align cDNA to genomic DNA
  • CLUSTALW to align similar sequences
  • View and edit alignments in JALVIEW and GENEDOC
    to produce a coloured and shaded alignment
  • InterProScan to search for protein domains

5
sequence alignment
  • sequence analysis ? sequence alignment
  • what
  • why
  • similar sequence
  • infer homology
  • infer function

sequence ? structure ? function
6
pairwise alignments multiple sequence
alignments
7
Global vs. Local
8
BLAST
Basic Local Alignment Search Tool
  • idea find high scoring local alignments between
    query sequence and target database
  • assumption true match alignments very likely to
    contain within them very high scoring matches
  • heuristics theme search quickly for homologous
    regions and then do slow/exact
    alignments

9
BLAST family
10
BLAST Steps
  • For each word of length W in the query,
    generate a list of all possible words
    (neighborhood) with a score of at least threshold
    T (determined by using the scoring matrix)

11
Determine the locations of all common words
between the query and the database (word hits).
12
(No Transcript)
13
BLAST Steps
  • use dynamic programming to extend hits until
    the score drops a value of X expensive!! --
    90 of time

14
Evaluates the statistical significance of
extended hits and reports only those above the
determined threshold.
15
(No Transcript)
16
BLAST statistical evaluation
  • for local, ungapped alignments
  • m size of query n size of database
  • E expected of HSPs with scores at least S
  • p prob of finding at least one HSP with S
  • good tutorial at
  • http//www.ncbi.nlm.nih.gov/BLAST/tutorial
    /Altschul-1.html

17
BLAT
  • Good for aligning mRNA, ESTs to genome
  • fast
  • aligns whole mRNA, not just exons
  • handles introns and splice-sites

18
BLAT
  • Steps for cDNA alignment
  • 1 break cDNA into n base chunks
  • 2 use index to find regions in genome similar
    to each chunk of cDNA
  • 3 detailed alignment between genome region and
    cDNA chunk
  • 4 dynamic programming - stitch together
    detailed alignments of chunks into alignment of
    whole

19
  • genome cacaattatcacgaccgc (K 8-13 real
    genome)
  • K-mers cac aat tat cac gac cgc
  • 0 3 6 9 12 15
  • cDNA aattctcac
  • 3-mers aat att ttc tct ctc tca cac
  • 0 1 2 3 4 5 6
  • hits aat 0,3 -3
  • cac 6,0 6
  • cac 6,9 -3
  • clump cacAATtatCACgaccgc

example from Jim Kent
20
PSI-BLAST
Position Specific Iterated-BLAST
  • database searches using position-specific scoring
    matrices more powerful than simply using single
    sequence
  • STEPS
  • collect all DB sequences that align with E-val lt
    T
  • align these to make position-specific scoring
    matrix
  • use scoring matrix to search for new hits
  • iterate

21
PSI-Blast
22
ORF-finder
  • graphical analysis tool which finds all open
    reading frames in a sequence
  • looks for start and stop codons
  • assumes upstream start and downstream stop if ORF
    at least 100 amino acid
  • ORFs can be selected to view as DNA sequence or
    amino acid sequence

23
BLAST (Basic Local Alignment Search Tool) is a
set of similarity search programs designed to
explore all of the available sequence databases
regardless of whether the query is protein or
DNA. The BLAST programs have been designed for
speed, with a minimal sacrifice of sensitivity to
distant sequence relationships.   The concept of
BLAST is shown below  
Database of sequences
seq1
seq2
seq3
seq4
seq5
seq6
seq7
seq8
seq9
seq10
seq11
Etc
Sequence of interest
Query 133 agcagccgtttcgactttggcattcggtaccgg

Subject 232 agcagccgtttcgactttggcattcg
gtaccgg
BLAST query run within publicly available
databases within defined data sets, or as a
command line to user defined sets of information
24
PSI-BLAST Position Specific Iterative Blast
  • Pulls out similar proteins, creates an alignment
    of these proteins, and then produces a Position
    Specific Scoring Matrix (PSSM)
  • Blast parameters
  • A BLAST search is then performed again, looking
    for proteins which are similar at highly
    conserved regions of the PSSM.
  • Several iterations can be performed.

25
Graphical analysis tool which finds all ORFs in
a sequence Looks for start and stop
codons Assumes upstream start and downstream
stop when ORF 100aa or over Graphical overview
of all ORFs which can be selected to view the DNA
sequence and the corresponding aa
sequence Integral Blastp
26
  • Aligns a set of spliced nucleotide sequences
    (ESTs, cDNAs or mRNAs) to an unspliced genomic
    DNA sequence, inserting introns of arbitrary
    length when needed.
  • Aligns sequences with high stringency blast
    against the genomic sequence
  • Sorts the blast output by score and then uses
    splice matrices to assign intron/exon boundaries
  • Outputs a list of the exons and introns it has
    found, the alignments and a protein translation
    for each exon
  • May be used for interspecies alignments by
    selecting divergent species

27
Clustalw DNA and Protein alignments
Copy and paste sequences Alignment may be viewed
and edited in Jalview
28
(No Transcript)
29
GeneDoc Alignment Viewer and Editor
30
  • Integrated documentation resource for protein
    domains, families and sites
  • Integrated view of databases
  • Intuitive interface for text and sequence
    searches

31
Summary de novo analysis of sequence Similarity
searching BLAST, BLAT and PSI-blast Find
possible ORFs ORF finder, BLASTP Align cDNA to
genomic DNA Spidey and BLAT Align similar
sequences Clustalw, Jalview, GeneDoc Find
protein domains InterProScan
Write a Comment
User Comments (0)
About PowerShow.com