Title: Practice retrieving data and running stand alone BLAST.
1Practice retrieving data and running stand alone
BLAST. Step 1. Identify genes in the ABA
biosynthesis pathway from the Arabidopsis Cyc
database http//www.arabidopsis.org/biocyc/index.j
sp Step 2. Identify subject database Vitis
vinifera (nucleotide) Solanum pennellii (EST)
2(No Transcript)
3Query Select Pathway by name Enter Abscisic
Acid Submit
4(No Transcript)
5(No Transcript)
6(No Transcript)
7Now what?
8Filter for unique sequences (EXCEL Data, Filter,
Advanced Filter)
9Notepad EDIT, LINE OPPERATIONS, JOIN
LINES SEARCH, REPLACE, space with
spaceORsapce Paste into ENTREZ Nucleotide
search
10(No Transcript)
11PERL chomp next if /\s/ (skip if there is a
space in start of the line) next if /Gene/ (if
line starts with gene, skip) my _at_temp split
/\t/ (data set is tab delimited) hashtemp0
1 (unique sequence i.d. 0 is first element
of the array) Then invoke BioPerl to query NCBI
with the search string TAIRAT AND complete
cds Where AT are the unique accession numbers
from AraCyc and complete cds eliminates genomic
sequence (e.g. complete Ath chrom 4) See
complete script on class site.
12(No Transcript)
13Do we want this much sequence?
14Use the push pin to highlight all boxes for mRNA
(22 sequences) so we dont get chromosome 4
genomic sequences
15(No Transcript)
16(No Transcript)
17(No Transcript)
18Try Use Unix to verify that the file contains
all the sequences Q What command would you
use? A grep c gt filename
19(No Transcript)
20(No Transcript)
21(lycopersicum ORGN AND EST) AND "Solanum
pennellii"porgn__txid28526
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26Try Use Unix to verify that the file contains
all the sequences
27Nucleotide
Vitis ORGN AND EST
28(No Transcript)
29Note syntax of ENTREZ search invoked by organism
tree link
30For class, I recommend downloading the smaller
Nucleotide data set
31Try Use Unix to verify that the file contains
all the sequences
32Now what? Which file needs to be formatted for
BLAST (formatdb)? Which file will be the query
file? What is the syntax for the BLAST (including
PATH)?
33(No Transcript)
34Formatdb /path/formatdb -i /path/filename p
F Run nucleotide BLAST (blastn)
/path/blastall -p blastn -d /path/filename -i
/path/filename o filename e 0.01