Title: BioPerl
1BioPerl
- An Introduction to Perl by Seung-Yeop Lee
- XS extension by Sen Zhang
- BioPerl Introduction by Hairong Zhao
- BioPerl Script Examples by Tiequan Zhang
2Part I. An Introduction to Perl
3What is Perl?
- Perl is an interpreted programming language that
resembles both a real programming language and a
shell. - A Language for easily manipulating text, files,
and processes - Provides more concise and readable way to do jobs
formerly accomplished using C or shells. - Perl stands for Practical Extraction and Report
Language. - Author Larry Wall (1986)
4Why use Perl?
- Easy to use
- Basic syntax is C-like
- Type-friendly (no need for explicit casting)
- Lazy memory management
- A small amount of code goes a long way
- Fast
- Perl has numerous built-in optimization features
which makes it run faster than other scripting
language. - Portability
- One script version runs everywhere (unmodified).
5Why use Perl?
- Efficiency
- For programs that perform the same task (C and
Perl), even a skilled C programmer would have to
work harder to write code that - Runs as fast as Perl code
- Is represented by fewer lines of code
- Correctness
- Perl fully parses and pre-compiles script
before execution. - Efficiently eliminates the potential for runtime
SYNTAX errors. - Free to use
- Comes with source code
6Hello, world!
!/usr/local/bin/perl print Hello, world \n
7Basic Program Flow
- No main function
- Statements executed from start to end of file.
- Execution continues until
- End of file is reached.
- exit(int) is called.
- Fatal error occurs.
8Variables
- Data of any type may be stored within three basic
types of variables - Scalar
- List
- Associative array (hash table)
- Variables are always preceded by a dereferencing
symbol. - - Scalar variables
- _at_ - List variables
- - Associative array variables
9Variables
- Notice that we did NOT have to
- Declare the variable before using it
- Define the variables data type
- Allocate memory for new data values
10Scalar variables
- References to variables always being with in
both assignments and accesses - For scalars
- x 1
- x Hello World!
- x y
- For scalar arrays
- a1 0
- a1 b1
11List variables
- Lists are prefaced by an _at_ symbol
- _at_count (1, 2, 3, 4, 5)
- _at_count (apple, bat, cat)
- _at_count2 _at_count
- A list is simply an array of scalar values.
- Integer indexes can be used to reference elements
of a list. - To print an element of an array, do
- print count2
12Associative Array variables
- Associative array variables are denoted by the
dereferencing symbol. - Associative array variables are simply hash
tables containing scalar values - Example
- freda aaa
- fredb bbb
- fred6 cc
- fred1 2
- To do this in one step
- fred (a, aaa, b, bbb, 6, cc, 1, 2)
13Statements Input/Output
- Statements
- Contains all the usual if, for, while, and more
- Input/Output
- Any variable not starting with , _at_ or is
assumed to be a filehandle. - There are several predefined filehandles,
including STDIN, STDOUT and STDERR.
14Subroutines
- We can reuse a segment of Perl code by placing it
within a subroutine. - The subroutine is defined using the sub keyword
and a name. - The subroutine body is defined by placing code
statements within the code block symbols. - sub MySubroutine
-
- Perl code goes here.
15Subroutine call
- To call a subroutine, prepend the name with the
symbol - MySubroutine
- Subroutine may be recursive (call themselves).
16Pattern Matching
- Perl enables to compare a regular expression
pattern against a target string to test for a
possible match. - The outcome of the test is a boolean result (TRUE
or FALSE). - The basic syntax of a pattern match is
- myScalar /PATTERN/
- Does myScalar contain PATTERN ?
17Functions
- Perl provides a rich set of built-in functions to
help you perform common tasks. - Several categories of useful built-in function
include - Arithmetic functions (sqrt, sin, )
- List functions (push, chop, )
- String functions (length, substr, )
- Existance functions (defined, undef)
18Perl 5
- Introduce new features
- A new data type the reference
- A new localization the my keyword
- Tools to allow object oriented programming in
Perl - New shortcuts like qw and gt
- An object oriented based liberary system focused
around Modules
19References
- A reference is a scalar value which points to
any variable.
20Creating References
- References to variables are created by using the
backslash(\) operator.
- name bio perl
- reference \name
- array_reference \_at_array_name
- hash_reference \hash_name
- subroutine_ref \sub_name
21Dereferencing a Reference
- Use an extra and _at_ for scalars and arrays, and
-gt for hashes. -
-
- print scalar_reference\n
- _at_array_reference\n
- hash_reference-gtname\n
22Variable Localization
- local keyword is used to limit the scope of a
variable to within its enclosing brackets. - Visible not only from within the enclosing
bracket but in all subroutine called within those
brackets -
a 1 sub mySub local a
2 mySub1(a) sub mySub1 print a is
a\n
23Variable Localization contd
- my keyword hides the variable from the outside
world completely. - Totally hidden
a 1 sub mySub my a 2 mySub1(a) su
b mySub1 print a is a\n
24Object Oriented Programming in Perl (1)
- Defining a class
- A class is simply a package with subroutines that
function as methods.
!/usr/local/bin/perl package Cat sub new
sub meow
25Object Oriented Programming in Perl (2)
- Perl Object
- To initiates an object from a class, call the
class new method.
new_object new ClassName
- Using Method
- To use the methods of an object, use the -gt
operator.
cat-gtmeow()
26Object Oriented Programming in Perl (3)
- Inheritance
- Declare a class array called _at_ISA.
- This array store the name and parent class(es) of
the new species.
package NorthAmericanCat _at_NorthAmericanCatISA
(Cat) sub new
27Miscellaneous Constructs
- qw
- The qw keyword is used to bypass the quote and
comma character in list array definitions.
_at_name (Tom, Mary, Michael)
28Miscellaneous Constructs
- gt
- The gt operator is used to make hash definitions
more readable.
client name, , Michael, phone ,
123-3456, email , mich_at_nj.net
29Perl Modules
- A Perl module is a reusable package defined in a
library file whose name is the same as the name
of the package. - Similar to C link library or C class
- package Foo
- sub bar print Hello _0\n
- sub blat print World _0\n
- 1
30Names
- Each Perl module has a unique name.
- To minimize name space collision, Perl provides a
hierarchical name space for modules. - Components of a module name are separated by
double colons (). - For example,
- MathComplex
- MathApprox
- StringBitCount
- StringApprox
31Module files
- Each module is contained in a single file.
- Module files are stored in a subdirectory
hierarchy that parallels the module name
hierarchy. - All module files have an extension of .pm.
Module Is stored in
Config Config.pm
MathComplex Math/Complex.pm
StringApprox String/Approx.pm
32Module libraries
- The Perl interpreter has a list of directories in
which it searhces for modules. - Global arry _at_INC
- gtperl V
- _at_INC
- /usr/local/lib/perl5/5.00503/sun4-solaris
- /usr/local/lib/perl5/5.00503
- /usr/local/lib/perl5/site-perl/5.005/sun4-solaris
- /usr/local/lib/perl5/site-perl/5.005
33Creating Modules
- To create a new Perl module
- ../developmentgth2xs X n FooBar
- Writing Foo/Bar/Bar.pm
- Writing Foo/Bar/Makefile.PL
- Writing Foo/Bar/test.pl
- Writing Foo/Bar/Changes
- Writing Foo/Bar/MANIFEST
- ../developmentgt
34Building Modules
- To build a Perl module
- perl Makefile.PL
- make
- make test
- make install
-
35Using Modules
- A module can be loaded by calling the use
function. - use Foo
- bar( a )
- blat( b )
- Calls the eval function to process the code.
- The 1 causes eval to evaluate to TRUE.
36End of Part I.
37Part IIXS(eXternal subroutine)extension
38XS
- XS is an acronym for eXternal Subroutine.
- With XS, we can call C subroutines directly from
Perl code, as if they were Perl subroutines.
39Perl is not good at
- very CPU-intensive things, like numerical
integration . - very memory-intensive things. Perl programs that
create more than 10,000 hashes run slowly. - system software, like device drivers.
- things that have already been written in other
languages.
40Usually
- These things are done by other highly efficient
system programming languages such as C\C.
41Can we call C subroutine from Perl?
42When perl talks with C subroutine using perl C
API
- two things must happen
- control flow - control must pass from Perl to C
(and back) - C program execution
- Perl program execution
- data flow - data must pass from Perl to C (and
back) - C data representation
- Perl data representation
43In order to use perl C API
- What is Perl's internal data structures.
- How the Perl stack works, and how a C subroutine
gets access to it. - How C subroutines get linked into the Perl
executable. - Understand the data paths through the DynaLoader
module that associate the name of a Perl
subroutine with the entry point of a C subroutine
44If you do code directly to the Perl C API
- You will find You keep writing the same little
bits of code - to move parameters on and off the Perl stack
- to convert data from Perl's internal
representation to C variables - to check for null pointers and other Bad Things.
- When you make a mistake, you don't get bad
output you crash the interpreter. - It is difficult, error-prone, tedious, and
repetitive.
45Pain killer is
46What is XS?
- Narrowly, XS is the name of the glue language
- More broadly, XS comprises a system of programs
and facilities that work together - MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
47MakeMaker -tool
- Perl's MakeMaker facility can be used to provide
a Makefile to easily install your Perl modules
and scripts.
48- MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
49- MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
50Xsub
- The Perl interpreter calls a kind of glue routine
as an xsub. - Rather than drag the Perl C API into all our C
code, we usually write glue routines. (We'll
refer to an existing C subroutine as a target
routine.)
51Xsub- control flow
- The glue routine converts the Perl parameters to
C data values, and then calls the target routine,
passing it the C data values as parameters on the
processor stack. - When the target routine returns, the glue routine
creates a Perl data object to represent its
return value, and pushes a pointer to that object
onto the Perl stack. Finally, the glue routine
returns control to the Perl interpreter.
52Xsub-data flow
- Something has to convert between Perl and C data
representations. - The Perl interpreter doesn't, so the xsub has to.
- Typically, the xsub uses facilities in the Perl C
API to get parameters from the Perl stack and
convert them to C data values. - To return a value, the xsub creates a Perl data
object and leaves a pointer to it on the Perl
stack.
53- MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
54- MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
55XS - language
- Glue routines provide some structure for the data
flow and control flow, but they are still hard to
write. So we don't. - Instead, we write XS code. XS is, more or less, a
macro language. It allows us to declare target
routines, and specify the correspondence between
Perl and C data types. - XS is a collection of macros , while Perl docs
refer to XS as a language, it is a macro
language.
56- MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
57- MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
58Xsubpp-tool
- xsubpp is a XS language processor, xsubpp is the
program that translates XS code to C code. - xsubpp will compile XS code into C code by
embedding the constructs necessary to let C
functions manipulate Perl values and creates the
glue necessary to let Perl access those
functions. - xsubpp expands XS macros into the bits of C
code(xsub-glue routines) necessary to connect the
Perl interpreter to your C-language subroutines . - write XS code so that xsubpp will do the right
thing.
59- MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
60- MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
61H2xs - tool
- h2xs was originally written to generate XS
interfaces for existing C libraries. - h2xs is a utility that reads a .h file and
generates an outline for an XS interface to the C
code.
62- MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
63- MakeMaker,
- Xsub glue routine,
- XS language itself,
- xsubpp,
- h2xs,
- DynaLoader.
64DynaLoader-module
- In order for a C subroutine to become an xsub,
three things must happen - Loadingthe subroutine has to be loaded into
memory - Linkingthe Perl interpreter has to find its
entry point - Installationthe interpreter has to set the xsub
pointer in a code reference to the entry point of
the subroutine
65DynaLoader.
- Fortunately, all this is done for us by a Perl
module called the DynaLoader. - When we write an XS module, our module inherits
from DynaLoader. - When our module loads, it makes a single call to
the DynaLoaderbootstrap method. bootstrap
locates our link libraries, loads them, finds our
entry points, and makes the appropriate calls.
66Development time
Pure perl code
Some Manual change
.c
.h
h2xs
Complier, linker
XS code
Perl module
Xsub(glue subrutine)
library
DynaLoader.
Perl C API
Input
Perl interprator
Output
Running time
67An Example- Needleman-Wunsch(NW)
- Sequence alignment is an important problem in the
bleeding-edge field of genomics. - Sequence alignment is a combinatorial problem,
and naive algorithms run in exponential time. The
Needleman-Wunsch algorithm runs in (more or less)
O(n3), - Dynamic programming algorithm for global optimal
sequence alignment.
68Algorithm
69Score matrix
70Complexity analysis
- The O(n3) step in the NW algorithm is filling in
the score matrix everything else runs in linear
time. We want to - use the C implementation to fill in the score
matrix, - use the Perl implementation for everything else,
- and use XS to call from one to the other.
71Our approach
- Implement the algorithm as a straight Perl module
- Analyze (or benchmark) the code for performance
- Reimplement performance-critical methods (score
matrix filling) in C - Write XS to connect the C routines to the Perl
module
72Performance comparison
- a straight Perl implementation of the NW
algorithm aligns 2 200-character sequences in
300 seconds . - XS version runs the benchmark 200x200 alignment
in 3 seconds. - XS version is about 100 times faster than the
Perl implementation.
73BioToolspSW - pairwise Smith Waterman object
- Bioperl project has pSW implementation.
- pSW is an Alignment Factory. It builds pairwise
alignments using the smith waterman algorithm. - The alignment algorithm is implemented in C and
added in using an XS extension. - The Smith-Waterman algorithm needs O(n2) time to
find the highest scoring cell in the matrix.
74The end of Part II
75Bioperl Introduction
76Whats Bioperl?
- Bioperl is not a new language
- It is a collection of Perl modules that
facilitate the development of Perl scripts for
bio-informatics applications.
77(No Transcript)
78Why Bioperl for Bio-informatics?
- Perl is good at file manipulation and text
processing, which make up a large part of the
routine tasks in bio-informatics. - Perl language, documentation and many Perl
packages are freely available. - Perl is easy to get started in, to write small
and medium-sized programs.
79Bioperl Project
- It is an international association of developers
of open source Perl tools for bioinformatics,
genomics and life science research - Started in 1995 by a group of scientists tired of
rewriting BLAST and sequence parsers for various
formats - Now there are 45 registered developers, 10-15
main developers, 5 core coordinate developers - Project websitehttp//bioperl.org
- Project FTP server bioperl.org
80How many people use Bioperl?
- Bioperl has been used worldwide in both small
academic labs through to enterprise level
computing in large pharmaceutical companies since
1998 - Bioperl Usage Survey
- http//www.bioperl.org/survey.html
81The current status of Bioperl
- The latest mature and stable version 1.0 was
released in March 2002. - This new version contains 832 files. The test
suite contains 93 scripts which collectively
perform 3042 functionality tests. - This new version is "feature complete" for
sequence handling, the most common task in
bioinformatics, it adds some new features and
improve some existing features
82The future of Bioperl
- It is far from mature
- Except sequence handling, all other modules are
not complete. - The portability is not very good, not all modules
will work with on all platforms.
83Bioperl resources
- www.bioperl.org
- http//www.bioperl.org/Core/bptutorial.html
- Example code, in the scripts/ and examples/
directories. - Online course written at the Pasteur Institute.
See http//www.pasteur.fr/recherche/unites/sis/fo
rmation/bioperl.
84Biopython, biojava
- Similar goals implemented in different language
- Most effort to date has been to port Bioperl
functionality to Biopython and Biojava, so the
differences are fairly peripheral - In the future, some bio-informatics tasks may
prove to be more effectively implemented in java
or python, interoperability between them is
necessary - CORBA is one such framework for interlanguage
support, and the Biocorba project is currently
implementing a CORBA interface for bioperl
85Bioperl-Object Oriented
- The Bioperl takes advantages of the OO design to
create a consistent, well documented, object
model for interacting with biological data in the
life sciences. - Bioperl Name space
- The Bioperl package installs everything in the
Bio namespace.
86Bioperl Objects
- Sequence handling objects
- Sequence objects
- Alignment objects
- Location objects
- Other Objects
- 3D structure objects, tree objects and
phylogenetic trees, map objects, bibliographic
objects and graphics objects
87Sequence handling
- Typical sequence handling tasks
- Access the sequence
- Format the sequence
- Sequence alignment and comparison
- Search for similar sequences
- Pairwise comparisons
- Multiple alignment
88Sequence Objects
- Sequence objects Seq, RichSeq, SeqWithQuality,
PrimarySeq, LocatableSeq, LiveSeq, LargeSeq, SeqI - Seq is the central sequence object in bioperl,
you can use it to describe a DNA, RNA or protein
sequence. - Most common sequence manipulations can be
performed with Seq.
89Sequence Annotation
- BioSeqFeature Sequence object can have
multiple sequence feature (SeqFeature) objects -
eg Gene, Exon, Promoter objects - associated with
it. - BioAnnotation A Seq object can also have an
Annotation object (used to store database links,
literature references and comments) associated
with it
90Sequence Input/Output
- The BioSeqIO system was designed to make
getting and storing sequences to and from the
myriad of formats as easy as possible.
91Diagram of Objects and Interfaces for Sequence
Analysis
92Accessing sequence data
- Bioperl supports accessing remote databases as
well as local databases. - Bioperl currently supports sequence data
retrieval from the genbank, genpept, RefSeq,
swissprot, and EMBL databases
93Format the sequences
- SeqIO object can read a stream of sequences in
one format Fasta, EMBL, GenBank, Swissprot, PIR,
GCG, SCF, phd/phred, Ace, or raw (plain
sequence), then write to another file in another
format - use BioSeqIO
- in BioSeqIO-gtnew('-file' gt
"inputfilename", - '-format' gt 'Fasta')
- out BioSeqIO-gtnew('-file' gt
"gtoutputfilename",
'-format' gt 'EMBL') - while ( my seq in-gtnext_seq() )
- out-gtwrite_seq(seq)
94Manipulating sequence data
- seqobj-gtdisplay_id() the human read-able id
of the sequence - seqobj-gtsubseq(5,10) part of the sequence as
a string - seqobj-gtdesc() a description of the
sequence - seqobj-gttrunc(5,10) truncation from 5 to 10
as new object - seqobj-gtrevcom reverse complements
sequence - seqobj-gttranslate translation of the
sequence
95Alignment
- Searching for similar'' sequences, Bioperl can
run BLAST locally or remotely, and then parse the
result. - Aligning 2 sequences with Smith-Waterman (pSW) or
blast - The SW algorithm itself is implemented in C and
incorporated into bioperl using an XS extension. - Aligning multiple sequences (Clustalw.pm,
TCoffee.pm) - bioperl offers a perl interface to the
bioinformatics-standard clustalw and tcoffee
programs. - Bioperl does not currently provide a perl
interface for running HMMER. However, bioperl
does provide a HMMER report parser.
96Alignment Objects
- Early versions used UnivAln, SimpleAlign
- Ver. 1.0 only support SimpleAlign. It allows the
user to - convert between alignment formats
- extracting specific regions of the alignment
- generating consensus sequences.
97- Sequence handling objects
- Sequence objects
- Alignment objects
- Location objects
98Location Objects
- BioLocations a collection of rather
complicated objects - A Location object is designed to be associated
with a Sequence Feature object to indicate where
on a larger structure (eg a chromosome or contig)
the feature can be found.
99Conclusion
- Bioperl is
- Powerful
- Easy
- Waiting for you (biologist) to use
100Scripts Examples by Using Bioperl
101SimpleAlign module
- Description
- It handles multiple alignments of sequences
- Lightweight display/formatting and minimal
manipulation
102Method
- new
- Usage my aln new BioSimpleAlign()
- Function Creates a new simple align object
- Returns BioSimpleAlign
- Args -source gt string representing the
source program where this alignment came from - each_seq
- Usage foreach seq ( align-gteach_seq() )
- Function Gets an array of Seq objects from the
alignment - Returns an array
- length()
- Usage len ali-gtlength()
- Function Returns the maximum length of the
alignment. To be sure the alignment is a block,
use is_flush
103- consensus_string
- Usage str ali-gtconsensus_string(thresho
ld_percent) - Function Makes a strict consensus
- Args Optional treshold ranging from 0 to
100. - The consensus residue has to appear at least
threshold of the sequences at a given
location, otherwise a '?' character will be
placed at that location. (Default value 0) - is_flush
- Usage if( ali-gtis_flush() )
- Function Tells you whether the alignment is
flush, ie all of the same length - Returns 1 or 0
- percentage_identity
- Usage id align-gtpercentage_identity
- Function The function calculates the average
percentage identity - Returns The average percentage identity
- no_sequences
- Usage depth ali-gtno_sequences
- Function number of sequence in the sequence
alignment - Returns integer
104testaln.pfam 1433_LYCES/9-246
REENVYMAKLADRAESDEEMVEFMEKVSNSLGS.EELTVEERNLLSVAYK
NVIGARRAS 1434_LYCES/6-243 REENVYLAKLAEQAERYEE
MIEFMEKVAKTADV.EELTVEERNLLSVAYKNVIGARRAS 143R_ARA
TH/7-245 RDQYVYMAKLAEQAERYEEMVQFMEQLVTGATPAEELT
VEERNLLSVAYKNVIGSLRAA 143B_VICFA/7-242
RENFVYIAKLAEQAERYEEMVDSMKNVANLDV...ELTIEERNLLSVGYK
NVIGARRAS 143E_HUMAN/4-239 REDLVYQAKLAEQAERYDE
MVESMKKVAGMDV...ELTVEERNLLSVAYKNVIGARRAS BMH1_YEA
ST/4-240 REDSVYLAKLAEQAERYEEMVENMKTVASSGQ...ELS
VEERNLLSVAYKNVIGARRAS RA24_SCHPO/6-241
REDAVYLAKLAEQAERYEGMVENMKSVASTDQ...ELTVEERNLLSVAYK
NVIGARRAS RA25_SCHPO/5-240 RENSVYLAKLAEQAERYEE
MVENMKKVACSND...KLSVEERNLLSVAYKNIIGARRAS 1431_ENT
HI/4-239 REDCVYTAKLAEQSERYDEMVQCMKQVAEMEA...ELS
IEERNLLSVAYKNVIGAKRAS
105Script use BioAlignIO str
BioAlignIO-gtnew('-file' gt 'testaln.pfam')
aln str-gtnext_aln() print aln-gtlength,
"\n" print aln-gtno_residues, "\n" print
aln-gtis_flush, "\n" print aln-gtno_sequences,
"\n" print aln-gtpercentage_identity,
"\n" print aln-gtconsensus_string(50), "\n"
pos aln-gtcolumn_from_residue_number('1433_LYC
ES', 14) 6 foreach seq (aln-gteach_seq)
res seq-gtsubseq(pos, pos)
countres foreach res (keys count)
printf "Res s Count 2d\n", res,
countres
106Result argerich-54 biogt perl align.pl 242 103 1
16 66.9052451661147 RE??VY?AKLAEQAERYEEMV??MK?VAE?
?????ELSVEERNLLSVAYKNVIGARRASWRIISSIEQKEE??G?N????
?LIKEYR?KIE?EL??IC?DVL?LLD??LIP?A?????ESKVFYLKMKGD
YYRYLAEFA?G??RKE?AD?SL?AYK?A?DIA?AEL?PTHPIRLGLALNF
SVFYYEILNSPD?AC?LAKQAFDEAIAELDTL?EESYKDSTLIMQLLRDN
LTLWTSD????? Res Q Count 5 Res Y Count
10 Res . Count 1 argerich-55 biogt
107SwissProt,Seq and SeqIO modules
- Description
- SwissProt is a curated database of proteins
managed by the Swiss Bioinformatics Institute.
This is in contrast to EMBL/GenBank/DDBJ Which
are archives of protein information. - It allows the dynamic retrieval of Sequence
objects (BioSeq)
108- SeqIO can be used to convert different formats
- Fasta FASTA format
- EMBL EMBL format
- GenBank GenBank format
- swiss Swissprot format
- SCF SCF tracefile format
- PIR Protein Information Resource format
- GCG GCG format
- raw Raw format
- ace ACeDB sequence format
109Objective
- loading a sequence from a remote server
- Create a sequence object for the BACR_HALHA
SwissProt entry - Print its Accession number and description
- Display the sequence in FASTA format
110(No Transcript)
111(No Transcript)
112- Scripts
- !/usr/bin/perl
- use strict
- use BioDBSwissProt
- use BioSeq
- use BioSeqIO
- my database new BioDBSwissProt
-
- my seq database-gtget_Seq_by_id('BACR_HALHA')
- print "Seq ", seq-gtaccession_number(), " -- ",
seq-gtdesc(), "\n\n" - my out BioSeqIO-gtnewFh ( -fh gt \STDOUT,
-format gt 'fasta') - print out seq
113Result argerich-47 biogt perl protein.pl Seq
P02945 -- BACTERIORHODOPSIN PRECURSOR
(BR). gtBACR_HALHA BACTERIORHODOPSIN PRECURSOR
(BR). MLELLPTAVEGVSQAQITGRPEWIWLALGTALMGLGTLYFLVKG
MGVSDPDAKKFYAITT LVPAIAFTMYLSMLLGYGLTMVPFGGEQNPIYW
ARYADWLFTTPLLLLDLALLVDADQGT ILALVGADGIMIGTGLVGALTK
VYSYRFVWWAISTAAMLYILYVLFFGFTSKAESMRPEV ASTFKVLRNVT
VVLWSAYPVVWLIGSEGAGIVPLNIETLLFMVLDVSAKVGFGLILLRSR
AIFGEAEAPEPSAGDGAAATSD argerich-48 biogt
114Summary
- Perl language and modules
- Perl XS
- Bioperl
- Example scripts
115References
- 1 L. Wall and R. Schwarz. Programming Perl.
OReilly Associates, Inc, 1991. - 2 Web Developers Virtual Library.
http//www.wdvl.com/Authoring/Languages/Perl/5/ - 3 OReily Perl.com. http//www.perl.com/
- 4 http//archive.ncsa.uiuc.edu/General/Training/
PerlIntro/ - 5 http//www.vis.ethz.ch/manuals/Perl/intro.html
- 6 http//www.fukada.com/selena/tutorials/perl5/i
ndex.html - 7 http//world.std.com/swmcd/steven/perl/module
_mechanics.html - 8 http//www.sdsc.edu/moreland/courses/IntroPer
l/ - 9 www.bioperl.org/Core/POD/Bio/SeqIO.html
- 10 http//docs.bioperl.org/releases/bioperl-1.0/
Bio/SimpleAlign.html - 11 www.pasteur.fr/recherche/unites/sis/formation
/bioperl/index.html
116References
- 12 www.bioinformatics.com
- 13 Bioperl Standard Perl Modules for
Bioinformatics by Stephen A Chervitz, Georg
Fuellen, Chris Dagdigian, Steven E Brenner, Ewan
Birney and Ian Korf Objects in Bioinformatics
'98 - 15 http//cvs.open-bio.org/cgi-bin/viewcvs/viewc
vs.cgi/bioperl-papers/bioperldesign - 16 http//www.cpan.org
- 17 http//www.maths.tcd.ie/lily/pres2/sld008.ht
m - 18 http//www.sbc.su.se/per/molbioinfo2001/dynp
rog/dynamic.html - 19 http//world.std.com/swmcd/steven/perl/pm/xs
/intro/index.html